简体   繁体   中英

cassandra + pig with wide columns

I am currently working on a recommender application and I am using cassandra with hadoop and pig for map/reduce jobs. To take advantage of the column names properties our team has decided to store data using valueless columns and aggregate column names so for example all hits for a specific content are stored in a column family with a single row, and each column is a hit for the content using the following structure:

rowkey = 'single_row' {
    id_content:hit_date, -
    .
    .
    .
}

With this schema we obtain wide rows instead of skinny; the question is, how do i need to manipulate data in Pig in order to store data in cassandra with this schema?

I'm not sure from your comment if you're using composite columns, or whether you're just concatenating id_content and hit_date.

For normal (ie non-composite) columns, the schema is:

(key, {(col_name, col_value), ...})

In the case of composite columns, I believe the schema is the following:

(key, {((col_name_part_1, col_name_part_2), col_value), ...})

This assessment (for composite columns) is based on reading the patch submitted on https://issues.apache.org/jira/browse/CASSANDRA-3684

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM