I have this data structure :
word1 [('date', freq) , ('date',freq) , ...]
word2 [('date',freq) , ('date',freq) , ...]
and so on.for analyzing the time series , I want to create a dataframe. I can not figure out the best way to do it as I'm quite new to python(and I apologize for that). should I use:
classmethod DataFrame.from_dict(data, orient='index', dtype=None)
There's a lot of possible ways to start, but assuming a structure of words as
words
Out[203]:
[[('2000-01-01', 1), ('2000-01-02', 5)],
[('2000-01-01', 2), ('2000-01-02', 4)]]
the following is a natural starting point.
df = pd.DataFrame(index=range(0), columns=['date', 'word', 'freq'])
i = 0
for j, word in enumerate(words):
for d, f in word:
df.loc[i] = [d, j, f]
i += 1
df.loc[i]
will append new rows. If you know the total number of entries from the start, you could change index=range(0)
to the correct value. Next steps would probably be
df.date = pd.to_datetime(df.date)
df.set_index(['date', 'word'], drop=True)
freq
date word
2000-01-01 0 1
2000-01-02 0 5
2000-01-01 1 2
2000-01-02 1 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.