The task is to: read tweets and separate them into to groups based on specific hours (Month-Day-Year-Hour). The tweets related to a specific hour will be stored in a separate file in a folder. With file name "Mon-Day-Year-Hour.txt".
I am new to python, only starting coding in it as of a couple of days ago for a class. As of right now I have the file that tweets came from loaded into a list, and have sorted the list based on time created. I have looked into the itertools.groupby() function, but I'm not sure how to implement it correctly or for my purpose.
Here's a bit of what I have so far:
for line in open("CrimeReport.txt", "r").readlines():
tweet = json.loads(line)
tweets.append(tweet)
Sorted tweets:
sorted_tweets = sorted(tweets, key=lambda item:datetime.datetime.strptime(item['created_at'],
'%a %b %d %H:%M:%S +0000 %Y'))
I apologize for the poor formatting.
dic = {}
for key, value in groupby(v, lambda x: x%2):
if key not in dic.keys():
dic[key] = list(value)
else:
dic[key] += list(value)
groupby will get your data, and a function which will returns each data's id. by iterating over groupby and adding data into a dictionary you have it completely grouped. But if you have large data, dictionary may not be fast enough.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.