Checking if your JRE has the Unlimited-strength Policy Files in place

To use the included 256-bit encryption algorithms within Java, Oracle Java JDK or JRE requires you to download and replace the US_export_policy.jar and local_policy.jar files under your $JAVA_HOME/jre/lib/security/ directory with a variant of these files called ‘JCE Unlimited Strength Jurisdiction Policy Files’ obtained here.
These jars, commonly referred to as the ‘Unlimited JCE files‘, control to limit what algorithms you can or cannot use in your code.

An installation with vanilla (non-unlimited, default) files would show the below output when executed over it. Note specifically the explicit listing of all allowed codecs and the absence of the “256” number:

$ unzip -c $JAVA_HOME/jre/lib/security/local_policy.jar default_local.policy 
…
// Some countries have import limits on crypto strength.
// This policy file is worldwide importable.
grant {
    permission javax.crypto.CryptoPermission "DES", 64;
    permission javax.crypto.CryptoPermission "DESede", *;
    permission javax.crypto.CryptoPermission "RC2", 128, 
          "javax.crypto.spec.RC2ParameterSpec", 128;
    permission javax.crypto.CryptoPermission "RC4", 128;
    permission javax.crypto.CryptoPermission "RC5", 128, 
          "javax.crypto.spec.RC5ParameterSpec", *, 12, *;
    permission javax.crypto.CryptoPermission "RSA", *;
    permission javax.crypto.CryptoPermission *, 128;
};

An installation with the same files replaced with the unlimited variants, would instead show the below. Note specifically the class name of the granted permission:

$ unzip -c $JAVA_HOME/jre/lib/security/local_policy.jar default_local.policy 
…
// Country-specific policy file for countries
// with no limits on crypto strength.
grant {
    // There is no restriction to any algorithms.
    permission javax.crypto.CryptoAllPermission; 
};

Writing a simple Kudu Java API Program

This post is about using Cloudera’s new Kudu service, via its Java API interface, in a Maven-based Java program.

Kudu ships with a Java client library that is ready to use, and also publishes the Java library jars on Cloudera’s Maven repositories. To use within your Maven project pom.xml, add the below to the appropriate areas:

The Repository:

    <repositories>
      <repository>
        <id>cdh.repo</id>
        <name>Cloudera Repositories</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
        <snapshots>
          <enabled>false</enabled>
        </snapshots>
      </repository>
    </repositories>

The Library:

    <dependencies>
        <dependency>
          <groupId>org.kududb</groupId>
          <artifactId>kudu-client</artifactId>
          <version>0.5.0</version>
        </dependency>
    </dependencies>

To start off, you will need to create a KuduClient object, like such:

KuduClient kuduClient =
   new KuduClientBuilder("kudu-master-hostname").build();

To create a table, a schema needs to be built first, and then created via the client. If say we have a simple schema for table “users“, with columns “username(string, key) and “age(8-bit signed int, value), then we can create it as shown below, utilising ColumnSchema and Schema objects over the previously created kuduClient object:

ColumnSchema usernameCol =
    new ColumnSchemaBuilder("username", Type.STRING)
    .key(true).build();
ColumnSchema ageCol =
    new ColumnSchemaBuilder("age", Type.INT8)
    .build();

List<ColumnSchema> columns = new ArrayList<ColumnSchema>();
columns.add(usernameCol);
columns.add(ageCol);

Schema schema = new Schema(columns);
String tableName = "users";

if ( ! kuduClient.tableExists(tableName) ) {
    client.createTable(tableName, schema);
}

Data work (such as inserts, updates or deletes) in Kudu are done within Sessions. The below continuance shows how to insert a row, after creating a Session and a KuduTable for the table “users” and applying the insert object over it:

KuduSession session = kuduClient.newSession();
KuduTable table = kuduClient.openTable(tableName);

Insert insert = table.newInsert();
insert.getRow().addString("username", "harshj");
insert.getRow().addInt("age", 25);

session.apply(insert);

Likewise, you can update existing rows:

Update update = table.newUpdate();
update.getRow().addString("username", "harshj");
// Change from 25 previously written
update.getRow().addInt("age", 26);

session.apply(update);

Or even delete them by key column (“username”):

Delete delete = table.newDelete();
delete.getRow().addString("username", "harshj");

session.apply(delete);

Reading rows can be done via the KuduScanner class. The below example shows how to fetch only the key column data (all of it):

List<String> columnNames = new ArrayList<String>();
columnNames.add("username");

KuduScanner scanner =
    kuduClient.newScannerBuilder(tableName)
    .setProjectedColumnNames(columnNames)
    .build();

while (scanner.hasMoreRows()) {
    for (RowResult row : scanner.nextRows()) {
        System.out.println(row.getString("username"));
    }
}

A fully runnable example can be found also on Kudu’s kudu-examples repository.

Purging away unnecessary local maven repository jars

As someone who works with a lot of Java (or other JVM related) projects, I often notice my $HOME/.m2 local maven cache grows quite big over time. This is due to a natural result of a lot of mvn install and other equivalent commands, that build and install the project jars locally (some large projects such as Apache Hadoop end up requiring the use of the local repository instead of built targets, to resolve their inter-dependencies).

More often than not, when working on such projects, the version is usually set to a SNAPSHOT styled one, for example spark-streaming-flume_2.10/1.5.0-SNAPSHOT, and these can be deleted away when you are no longer working on the project. Here’s the simple command I use to erase them away:

find $HOME/.m2 -type d | grep SNAPSHOT\$ | xargs rm -rf

Trusting self-signed SSL certificates in Cloudera Hue

Using self-signed certificates is an easy process. Its very useful in local test environments, something we often rely on at my work.

However, it has its cons when the width of the ecosystem weighs in (software written in languages other than Java, such as Python, C++, etc.), as the support and configuration for trusting self-signed certificates varies from platform to platform.

For Cloudera Hue, which runs on Python (and uses PyOpenSSL plus the Python Requests libraries), while running it over a self-signed certificate file is as easy as configuring it, making it trust other services that it talks to (such as HDFS or YARN) gets difficult when the mentioned services also utilise self-signed certificates. You may often run into the following error or similar forms, within Hue’s File Browser, and other parts, for example:

Processing exception: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed: Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py", line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py", line 371, in inner
    return func(*args, **kwargs)
  File "/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hue/apps/jobbrowser/src/jobbrowser/views.py", line 119, in jobs
    raise ex
RestException: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

While not something you’d want to do on a production grade cluster, you can still get Hue to work in such an environment by applying a small change in the Hue process environment. If you use Cloudera Manager, you could append the below into the CM -> Hue -> Configuration -> Hue Server Environment Advanced Configuration Snippet (Safety Valve) field, save and restart the Hue service:

REQUESTS_CA_BUNDLE=/opt/certificates/host-certs.pem

This utilises the Python Requests library feature of using a custom CA Certificate bundle instead of the default one (which PyOpenSSL appears to self-bundle, rather than reuse the system default certs). You only need to make sure that the pointed PEM file carries all the necessary host certificates (and chain certificates) to allow Hue to talk to every applicable service in the cluster.

Hat tip to my most excellent colleague, Chris Conner, for pointing me to this feature.