简体   繁体   English

从 Kinesis 读取 Pyspark 中的数据

[英]Read data in Pyspark from Kinesis

I am trying to read data from kinesis in Pyspark using KinesisUtils.createStream but the issue is I'm getting this error.我正在尝试使用KinesisUtils.createStream从 Pyspark 中的 kinesis 读取数据,但问题是我收到此错误。


  Spark Streaming's Kinesis libraries not found in class path. Try one of the following.

  1. Include the Kinesis library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl:2.4.4 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kinesis-asl-assembly, Version = 2.4.4.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kinesis-asl-assembly.jar> ...

________________________________________________________________________________________________


Traceback (most recent call last):
  File "/Users/ahmad.muhammad/Desktop/kinesis-reader.py", line 8, in <module>
    kinesisStream = KinesisUtils.createStream(ssc,'Ahmad-Kineses','twitter-stream','https://kinesis.us-east-1.amazonaws.com/','us-east-1',InitialPositionInStream.TRIM_HORIZON,20)
  File "/Users/Ahmad.Muhammad/opt/apache-spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kinesis.py", line 84, in createStream
TypeError: 'JavaPackage' object is not callable

assuming you are using pyspark on local machine then what you can do is add env variable to your code, you can do some thing like this.假设您在本地机器上使用 pyspark,那么您可以做的是将 env 变量添加到您的代码中,您可以做这样的事情。 in your terminal try在你的终端尝试

export PYSPARK_SUBMIT_ARGS = --master local[2] --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.1.0 pyspark-shell

hopefully this will solve your problem.希望这将解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM