简体   繁体   中英

Problem sending data with kafka producer in Python (Jupyter Notebook)

i'm trying to create a Big Data analysis using Kafka, Python and Twitter. I have a data stream of tweets that i only take the hashtag of them. My problem goes with the producer Kafka have for use in Python. I can't send the data i want into the topic i created because i don't see any option to send the content of a variable with the producer.

In https://kafka-python.readthedocs.io/en/master/usage.html i can only see the option to send a exact string with b'some_string' . But i want to send the hashtag i take from the Twitter Stream. I don't know much about Python so excuse me if the solution is obvious.

Imports:

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
import kafka
from kafka import SimpleProducer, KafkaClient
from kafka import KafkaProducer

Streaming Context:

ssc = StreamingContext(sc,60)

Keys:

consumer_key="consumer_key"
consumer_secret="consumer_secret"
access_token="access_token"
access_token_secret="access_token_secret"

Tweepy:

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

Producer:

producer = KafkaProducer(bootstrap_servers=['localhost:9092'])

Code:

class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        for hashtag in status.entities['hashtags']:
            prueba = b'hashtag["text"]'
            producer.send('topic', prueba)
            return True
    def on_error(self, status_code):
        if status_code == 420:
            #returning False in on_data disconnects the stream
            return False

StreamListener:

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=MyStreamListener())

Tweet Stream:

myStream.filter(track=['some_text'])

The thing is, the producer only send the literal string of prueba that is "(hashtag["text"])" . I want to send not the exact thing but the content of it.

Thanks in advance.

How about producer.send('topic', hashtag) ? You will also need to make sure to encode the data to raw bytes, which is what kafka stores. If hashtag is a simple string, you could do producer.send('topic', hashtag.encode('utf-8')) . If it is a dict or a more complex data structure, you may need to use json.dumps before encoding to bytes. Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM