Hello, I'm very new to programming and I've started only a few weeks ago. It would be greatly appreciated if I could get some help. Thanks in advance !
My python script (stream_tweets.py) streams 200 - 300 tweets from twitter per minute. The script is located in '/home/computer/Twitter/examples/stream_tweets.py'
I have an NLP (Natural Language Processing) bash script that analyzes sentences and prints it out on bash. The NLP script (corenlp.sh) is located in '/home/computer/Standford/corenlp.sh'
If I create a new bash script, how do I pipe the tweets into the NLP? How would this script look like?
My python tweet script (stream_tweets.py) needs to output the text in utf-8 format, how do I change the script to do so.
The NLP takes a while to load, If tweets are storming into the NLP which hasn't loaded yet, will it affect my script? If so what can I do and how to do it?
Take a look at the stream_twitter.py script
from TwitterAPI import TwitterAPI TRACK_TERM = 'keyword1,keyword2,keyword3' CONSUMER_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' CONSUMER_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ACCESS_TOKEN_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ACCESS_TOKEN_SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' api = TwitterAPI(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET) r = api.request('statuses/filter', {'track': TRACK_TERM}) for item in r: print(item['text'] if 'text' in item else item)
Take a look at the NLP script
OS=`uname` # Macs (BSD) don't support readlink -e if [ "$OS" == "Darwin" ]; then scriptdir=`dirname $0` else scriptpath=$(readlink -e "$0") || scriptpath=$0 scriptdir=$(dirname "$scriptpath") fi echo java -mx3g -cp \\"$scriptdir/*\\" edu.stanford.nlp.pipeline.StanfordCoreNLP $* java -mx3g -cp "$scriptdir/*" edu.stanford.nlp.pipeline.StanfordCoreNLP $*
You pipe output form one command as input to another command on a Linux shell like this:
$ program_a | program_b
So in your case this would look like this:
$ python /home/computer/Twitter/examples/stream_tweets.py | /home/computer/Standford/corenlp.sh
But in order to make this work, you might need to change corenlp.sh
so that the last command that invokes the java StanfordCoreNLP program, reads input from the pipe (/dev/stdin) in this case. So change the last line to:
java -mx3g -cp "$scriptdir/*" edu.stanford.nlp.pipeline.StanfordCoreNLP $* < /dev/stdin
In order to make your python script print UTF-8 encoded strings, you need to change your python script in the end to:
import sys
for item in r:
text = item['text'] if 'text' in item else item
sys.stdout.buffer.write(text.encode('utf-8'))
sys.stdout.buffer.write('\n')
I don't think it will be a problem when the Java program requires some time to start, I think the python script will be blocked writing until the pipe buffer is emptied.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.