Writing a simple Kudu Java API Program

This post is about using Cloudera’s new Kudu service, via its Java API interface, in a Maven-based Java program.

Kudu ships with a Java client library that is ready to use, and also publishes the Java library jars on Cloudera’s Maven repositories. To use within your Maven project pom.xml, add the below to the appropriate areas:

The Repository:

    <repositories>
      <repository>
        <id>cdh.repo</id>
        <name>Cloudera Repositories</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
        <snapshots>
          <enabled>false</enabled>
        </snapshots>
      </repository>
    </repositories>

The Library:

    <dependencies>
        <dependency>
          <groupId>org.kududb</groupId>
          <artifactId>kudu-client</artifactId>
          <version>0.5.0</version>
        </dependency>
    </dependencies>

To start off, you will need to create a KuduClient object, like such:

KuduClient kuduClient =
   new KuduClientBuilder("kudu-master-hostname").build();

To create a table, a schema needs to be built first, and then created via the client. If say we have a simple schema for table “users“, with columns “username(string, key) and “age(8-bit signed int, value), then we can create it as shown below, utilising ColumnSchema and Schema objects over the previously created kuduClient object:

ColumnSchema usernameCol =
    new ColumnSchemaBuilder("username", Type.STRING)
    .key(true).build();
ColumnSchema ageCol =
    new ColumnSchemaBuilder("age", Type.INT8)
    .build();

List<ColumnSchema> columns = new ArrayList<ColumnSchema>();
columns.add(usernameCol);
columns.add(ageCol);

Schema schema = new Schema(columns);
String tableName = "users";

if ( ! kuduClient.tableExists(tableName) ) {
    client.createTable(tableName, schema);
}

Data work (such as inserts, updates or deletes) in Kudu are done within Sessions. The below continuance shows how to insert a row, after creating a Session and a KuduTable for the table “users” and applying the insert object over it:

KuduSession session = kuduClient.newSession();
KuduTable table = kuduClient.openTable(tableName);

Insert insert = table.newInsert();
insert.getRow().addString("username", "harshj");
insert.getRow().addInt("age", 25);

session.apply(insert);

Likewise, you can update existing rows:

Update update = table.newUpdate();
update.getRow().addString("username", "harshj");
// Change from 25 previously written
update.getRow().addInt("age", 26);

session.apply(update);

Or even delete them by key column (“username”):

Delete delete = table.newDelete();
delete.getRow().addString("username", "harshj");

session.apply(delete);

Reading rows can be done via the KuduScanner class. The below example shows how to fetch only the key column data (all of it):

List<String> columnNames = new ArrayList<String>();
columnNames.add("username");

KuduScanner scanner =
    kuduClient.newScannerBuilder(tableName)
    .setProjectedColumnNames(columnNames)
    .build();

while (scanner.hasMoreRows()) {
    for (RowResult row : scanner.nextRows()) {
        System.out.println(row.getString("username"));
    }
}

A fully runnable example can be found also on Kudu’s kudu-examples repository.