I have a file with a million tweets. The first tweet occurred 2013-04-15 20:17:18 UTC
. I want to update each tweet row afterward with the minutes since minsSince
that first tweet.
I have found help with datetime here , and converting time here , but when I put the two together I don't get the right times. It could be something with the UTC string at the end of each published_at
value.
The error it throws is:
tweets['minsSince'] = tweets.apply(timesince,axis=1)
...
TypeError: ('string indices must be integers, not str', u'occurred at index 0')
Thanks for any help.
#Import stuff
from datetime import datetime
import time
import pandas as pd
from pandas import DataFrame
#Read the csv file
tweets = pd.read_csv('BostonTWEETS.csv')
tweets.head()
#The first tweet's published_at time
starttime = datetime (2013, 04, 15, 20, 17, 18)
#Run through the document and calculate the minutes since the first tweet
def timesince(row):
minsSince = int()
tweetTime = row['published_at']
ts = time.strftime('%Y-%m-%d %H:%M:%S', time.strptime(tweetTime['published_at'], '%Y-%m-%d %H:%M:%S %UTC'))
timediff = (tweetTime - starttime)
minsSince.append("timediff")
return ",".join(minsSince)
tweets['minsSince'] = tweets.apply(timesince,axis=1)
df = DataFrame(tweets)
print(df)
Sample csv file of first 5 rows.
#Import stuff
from datetime import datetime
import time
import pandas as pd
from pandas import DataFrame
#Read the csv file
tweets = pd.read_csv('sample.csv')
tweets.head()
#The first tweet's published_at time
starttime = tweets.published_at.values[0]
starttime = datetime.strptime(starttime, '%Y-%m-%d %H:%M:%S UTC')
#Run through the document and calculate the minutes since the first tweet
def timesince(row):
ts = datetime.strptime(row, '%Y-%m-%d %H:%M:%S UTC')
timediff = (ts- starttime)
timediff = divmod(timediff.days * 86400 + timediff.seconds, 60)
return timediff[0]
tweets['minSince'] = 0
tweets.minSince = tweets.published_at.map(timesince)
df = DataFrame(tweets)
print(df)
I hope this is what you are looking for.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.