I read lot about global state store that it does not create change-topic topic for restore instead it use the source topic as restore.
i am create custom key and store the data in global state store, but after restart it will gone because global store on restore will directly take data from source topic and bypass the processor.
my input topic has above data.
{
"id": "user-12345",
"user_client": [
"clientid-1",
"clientid-2"
]
}
i am maintaining two state store as follow:
So i have seen workaround is to create a custom change-log topic and send data with key to that topic that will act as a source topic for the global state store.
but in my scenario i have to fill two record in state store what is the best way to do it.
Example Scenario:
Record1: {
"id": "user-1",
"user_client": [
"clientid-1",
"clientid-2"
]
}
Record2:{
"id": "user-2",
"user_client": [
"clientid-1",
"clientid-3"
]
}
Global-state store should have:
id -> json Record'
clientid-1: ["user-1", "user-2"]
clientid-2: ["user-2"]
clientid-3: ["user-2"]
how to maintain the restore case for the above scenario in global state store
One approach is we maintain a changelog topic (has retention.policy=compact) for GlobalKTable, let call it user_client_global_ktable_changelog
, for the sake of simplicity, let say we serialize your message to java classes (you can just use HashMap or JsonNode or something):
//initial message format
public class UserClients {
String id;
Set<String> userClient;
}
//message when key is client
public class ClientUsers {
String clientId;
Set<String> userIds;
}
//your initial topic
KStream<String, UserClients> userClientKStream = streamsBuilder.stream("un_keyed_topic");
//re-map initial message to user_id:{inital_message_payload}
userClientKStream
.map((defaultNullKey, userClients) -> KeyValue.pair(userClients.getId(), userClients))
.to("user_client_global_ktable_changelog");//please provide appropriate serdes
userClientKStream
//will cause data re-partition before running groupByKey (will create an internal -repartition topic)
.flatMap((defaultNullKey, userClients)
-> userClients.getUserClient().stream().map(clientId -> KeyValue.pair(clientId, userClients.getId())).collect(Collectors.toList()))
//we have to maintain a current aggregated store for user_ids for a particular client_id
.groupByKey()
.aggregate(ClientUsers::new, (clientId, userId, clientUsers) -> {
clientUsers.getUserIds().add(userId);
return clientUsers;
}, Materialized.as("client_with_aggregated_user_ids"))
.toStream()
.to("user_client_global_ktable_changelog");//please provide appropriate serdes
Eg for aggregating user_ids in local state:
//re-key message for client-based message
clientid-1:user-1
//your current aggregated for `clientid-1`
"clientid-1"
{
"user_id": ["user-1"]
}
//re-key message for client-based message
clientid-1:user-2
//your current aggregated for `clientid-1`
"clientid-1"
{
"user_id": ["user-1", "user-2"]
}
Actually we could use the changelog topic of the local state as changelog for GlobalKTable directly if you make some change, which is topic your_application-client_with_aggregated_user_ids-changelog
, by adjust the state to keep both the payload of user key and client key message.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.