简体   繁体   English

火花,cassandra,流,python,错误,数据库,kafka

[英]spark, cassandra, streaming, python, error, database, kafka

im trying to save my streaming data from spark to cassandra, spark is conected to kafka and its working ok, but saving to cassandra its making me become crazy. 我试图将我的流数据从Spark保存到Cassandra,Spark仅适用于kafka及其正常工作,但是保存到Cassandra会使我发疯。 Im using spark 2.0.2, kafka 0.10 and cassandra 2.23, 我正在使用spark 2.0.2,kafka 0.10和cassandra 2.23,

this is how im submiting to spark 我就是这样顺从火花

spark-submit --verbose --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0 --jars /tmp/pyspark-cassandra-0.3.5.jar --driver-class-path /tmp/pyspark-cassandra-0.3.5.jar --py-files /tmp/pyspark-cassandra-0.3.5.jar --conf spark.cassandra.connection.host=localhost /tmp/direct_kafka_wordcount5.py localhost:9092 testing

and this is my code it just a little modification from the spark examples, its works but i cant save this data to cassandra.... 这是我的代码,它只是对火花示例进行了一些修改,其工作原理却无法将数据保存到cassandra中。

and this what im trying to do but just with the count result http://rustyrazorblade.com/2015/05/spark-streaming-with-python-and-kafka/ 这就是我试图做的但只是计数结果http://rustyrazorblade.com/2015/05/spark-streaming-with-python-and-kafka/

    from __future__ import print_function
import sys
import os
import time
import pyspark_cassandra
import pyspark_cassandra.streaming
from pyspark_cassandra import CassandraSparkContext
import urllib
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from pyspark.sql import SQLContext
from pyspark.sql import Row
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import udf
from pyspark.sql.functions import from_unixtime, unix_timestamp, min, max
from pyspark.sql.types import FloatType
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: direct_kafka_wordcount.py <broker_list> <topic>", file=sys.stderr)
        exit(-1)
    sc = SparkContext(appName="PythonStreamingDirectKafkaWordCount")
    ssc = StreamingContext(sc, 1)
    sqlContext = SQLContext(sc)
    brokers, topic = sys.argv[1:]
    kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers})
    lines = kvs.map(lambda x: x[1])
    counts=lines.count()
    counts.saveToCassandra("spark", "count")
    counts.pprint()
    ssc.start()
    ssc.awaitTermination()

i got this error, 我收到这个错误,

Traceback (most recent call last): File "/tmp/direct_kafka_wordcount5.py", line 88, in counts.saveToCassandra("spark", "count") 追溯(最近一次通话):文件“ /tmp/direct_kafka_wordcount5.py”,第88行,在counts.saveToCassandra(“ spark”,“ count”)中

Pyspark Casasndra stopped being updated a while ago and the latest version only supports up to Spark 1.6 https://github.com/TargetHolding/pyspark-cassandra Pyspark Casasndra不久前已停止更新,并且最新版本仅支持Spark 1.6 https://github.com/TargetHolding/pyspark-cassandra

Additionally 另外

counts=lines.count() // Returns data to the driver (not an RDD)

counts is now an Integer. counts现在是一个整数。 This means the function saveToCassandra doesn't apply since that is a function for RDDs 这意味着功能saveToCassandra不适用,因为这是RDD的功能

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM