My flink program should do a Cassandra look up for each input record and based on the results, should do some further processing.
But I'm currently stuck at reading data from Cassandra. This is the code snippet I've come up with so far.
ClusterBuilder secureCassandraSinkClusterBuilder = new ClusterBuilder() {
@Override
protected Cluster buildCluster(Cluster.Builder builder) {
return builder.addContactPoints(props.getCassandraClusterUrlAll().split(","))
.withPort(props.getCassandraPort())
.withAuthProvider(new DseGSSAPIAuthProvider("HTTP"))
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM))
.build();
}
};
for (int i=1; i<5; i++) {
CassandraInputFormat<Tuple2<String, String>> cassandraInputFormat =
new CassandraInputFormat<>("select * from test where id=hello" + i, secureCassandraSinkClusterBuilder);
cassandraInputFormat.configure(null);
cassandraInputFormat.open(null);
Tuple2<String, String> out = new Tuple8<>();
cassandraInputFormat.nextRecord(out);
System.out.println(out);
}
But the issue with this is, it takes nearly 10 seconds for each look up, in other words, this for
loop takes 50 seconds to execute.
How do I speed up this operation? Alternatively, is there any other way of looking up Cassandra in Flink?
I came up with a solution that is fairly fast at querying Cassandra with streaming data. Would be of use to someone with the same issue.
Firstly, Cassandra can be queried with as little code as,
Session session = secureCassandraSinkClusterBuilder.getCluster().connect();
ResultSet resultSet = session.execute("SELECT * FROM TABLE");
But the problem with this is, creating Session
is a very time-expensive operation and something that should be done once per key space. You create Session
once and reuse it for all read queries.
Now, since Session
is not Java Serializable, it cannot be passed as an argument to Flink operators like Map
or ProcessFunction
. There are a few ways of solving this, you can use a RichFunction and initialize it in its Open
method, or use a Singleton. I will use the second solution.
Make a Singleton Class as follows where we create the Session
.
public class CassandraSessionSingleton {
private static CassandraSessionSingleton cassandraSessionSingleton = null;
public Session session;
private CassandraSessionSingleton(ClusterBuilder clusterBuilder) {
Cluster cluster = clusterBuilder.getCluster();
session = cluster.connect();
}
public static CassandraSessionSingleton getInstance(ClusterBuilder clusterBuilder) {
if (cassandraSessionSingleton == null)
cassandraSessionSingleton = new CassandraSessionSingleton(clusterBuilder);
return cassandraSessionSingleton;
}
}
You can then make use of this session for all future queries. Here I'm using the ProcessFunction
to make queries as an example.
public class SomeProcessFunction implements ProcessFunction <Object, ResultSet> {
ClusterBuilder secureCassandraSinkClusterBuilder;
// Constructor
public SomeProcessFunction (ClusterBuilder secureCassandraSinkClusterBuilder) {
this.secureCassandraSinkClusterBuilder = secureCassandraSinkClusterBuilder;
}
@Override
public void ProcessElement (Object obj) throws Exception {
ResultSet resultSet = CassandraLookUp.cassandraLookUp("SELECT * FROM TEST", secureCassandraSinkClusterBuilder);
return resultSet;
}
}
Note that you can pass ClusterBuilder
to ProcessFunction
as it is Serializable. Now for the cassandraLookUp
method where we execute the query.
public class CassandraLookUp {
public static ResultSet cassandraLookUp(String query, ClusterBuilder clusterBuilder) {
CassandraSessionSingleton cassandraSessionSingleton = CassandraSessionSingleton.getInstance(clusterBuilder);
Session session = cassandraSessionSingleton.session;
ResultSet resultSet = session.execute(query);
return resultSet;
}
}
The singleton object is created only the first time the query is run, after that, the same object is reused, so there is no delay in look up.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.