简体   繁体   中英

Apache pig on spark

I am using hadoop2.2.0,cassandra2.0.6,pig0.12 and spark1.0.1. I am reading data from cassandra using pig using CassandraStorage handler and did analytic operations. I know spark accept hadoop input format (pig) data.So I want to pass read data by pig query to spark. How can I do that any suggesstions?.

You can store the data in the HDFS and then read it from Spark. Spark actually reads from HDFS. If you use names instead of indexes in Spark (as alias in Pig) you can create a case class in order to give names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM