简体   繁体   中英

Need help Piping Python twitter script to NLP Bash script (Sed, grep etc…)

Hello, I'm very new to programming and I've started only a few weeks ago. It would be greatly appreciated if I could get some help. Thanks in advance !

  1. My python script (stream_tweets.py) streams 200 - 300 tweets from twitter per minute. The script is located in '/home/computer/Twitter/examples/stream_tweets.py'

  2. I have an NLP (Natural Language Processing) bash script that analyzes sentences and prints it out on bash. The NLP script (corenlp.sh) is located in '/home/computer/Standford/corenlp.sh'

  3. If I create a new bash script, how do I pipe the tweets into the NLP? How would this script look like?

  4. My python tweet script (stream_tweets.py) needs to output the text in utf-8 format, how do I change the script to do so.

  5. The NLP takes a while to load, If tweets are storming into the NLP which hasn't loaded yet, will it affect my script? If so what can I do and how to do it?

  6. Take a look at the stream_twitter.py script

      from TwitterAPI import TwitterAPI TRACK_TERM = 'keyword1,keyword2,keyword3' CONSUMER_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' CONSUMER_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ACCESS_TOKEN_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ACCESS_TOKEN_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' api = TwitterAPI(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET) r = api.request('statuses/filter', {'track': TRACK_TERM}) for item in r: print(item['text'] if 'text' in item else item) 
  7. Take a look at the NLP script

      OS=`uname` # Macs (BSD) don't support readlink -e if [ "$OS" == "Darwin" ]; then scriptdir=`dirname $0` else scriptpath=$(readlink -e "$0") || scriptpath=$0 scriptdir=$(dirname "$scriptpath") fi echo java -mx3g -cp \\"$scriptdir/*\\" edu.stanford.nlp.pipeline.StanfordCoreNLP $* java -mx3g -cp "$scriptdir/*" edu.stanford.nlp.pipeline.StanfordCoreNLP $* 

You pipe output form one command as input to another command on a Linux shell like this:

$ program_a | program_b

So in your case this would look like this:

$ python /home/computer/Twitter/examples/stream_tweets.py | /home/computer/Standford/corenlp.sh

But in order to make this work, you might need to change corenlp.sh so that the last command that invokes the java StanfordCoreNLP program, reads input from the pipe (/dev/stdin) in this case. So change the last line to:

java -mx3g -cp "$scriptdir/*" edu.stanford.nlp.pipeline.StanfordCoreNLP $* < /dev/stdin

In order to make your python script print UTF-8 encoded strings, you need to change your python script in the end to:

import sys
for item in r:
    text = item['text'] if 'text' in item else item
    sys.stdout.buffer.write(text.encode('utf-8'))
    sys.stdout.buffer.write('\n')

I don't think it will be a problem when the Java program requires some time to start, I think the python script will be blocked writing until the pipe buffer is emptied.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM