简体繁体 English

阿帕奇猪在火花

[英]Apache pig on spark

原文 2014-08-16 05:41:04 7 1 hadoop/ cassandra/ apache-pig/ apache-spark

I am using hadoop2.2.0,cassandra2.0.6,pig0.12 and spark1.0.1. 我正在使用hadoop2.2.0，cassandra2.0.6，pig0.12和spark1.0.1。 I am reading data from cassandra using pig using CassandraStorage handler and did analytic operations. 我正在使用CassandraStorage处理程序使用Pig从Cassandra读取数据，并进行了分析操作。 I know spark accept hadoop input format (pig) data.So I want to pass read data by pig query to spark. 我知道spark接受hadoop输入格式（pig）数据，所以我想通过Pig查询将读取的数据传递给spark。 How can I do that any suggesstions?. 我该怎么做呢？

1 个解决方案

You can store the data in the HDFS and then read it from Spark. 您可以将数据存储在HDFS中，然后从Spark读取数据。 Spark actually reads from HDFS. Spark实际上是从HDFS读取的。 If you use names instead of indexes in Spark (as alias in Pig) you can create a case class in order to give names. 如果在Spark中使用名称而不是索引（在Pig中作为别名），则可以创建案例类以提供名称。