简体   繁体   中英

Read data in Pyspark from Kinesis

I am trying to read data from kinesis in Pyspark using KinesisUtils.createStream but the issue is I'm getting this error.


  Spark Streaming's Kinesis libraries not found in class path. Try one of the following.

  1. Include the Kinesis library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl:2.4.4 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kinesis-asl-assembly, Version = 2.4.4.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kinesis-asl-assembly.jar> ...

________________________________________________________________________________________________


Traceback (most recent call last):
  File "/Users/ahmad.muhammad/Desktop/kinesis-reader.py", line 8, in <module>
    kinesisStream = KinesisUtils.createStream(ssc,'Ahmad-Kineses','twitter-stream','https://kinesis.us-east-1.amazonaws.com/','us-east-1',InitialPositionInStream.TRIM_HORIZON,20)
  File "/Users/Ahmad.Muhammad/opt/apache-spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kinesis.py", line 84, in createStream
TypeError: 'JavaPackage' object is not callable

assuming you are using pyspark on local machine then what you can do is add env variable to your code, you can do some thing like this. in your terminal try

export PYSPARK_SUBMIT_ARGS = --master local[2] --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.1.0 pyspark-shell

hopefully this will solve your problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM