[英]How to convert each element in a frequency column into a new dataframe row?
So, I have this dataframe in which there is an ID column, a TEXT column and a TOKEN column with the 3 most frequent words in the TEXT所以,我有这个数据框,其中有一个 ID 列、一个 TEXT 列和一个 TOKEN 列,其中包含 TEXT 中最常见的 3 个单词
ID TEXT TOKEN
sentence1 Emma Woodhouse , handsome , clever , and rich ... [(emma, 2), (woodhouse, 2), (handsome, 1)]
sentence2 She was the youngest of the two daughters of a... [(youngest, 1), (two, 1), (daughters, 2)]
sentence3 Her mother had died too long ago for her to ha... [(mother, 2), (died, 1), (long, 1)]
I want to convert the elements of each row in TOKEN column to a new row in a new dataframe.我想将 TOKEN 列中每一行的元素转换为新数据框中的新行。 I have tried many ways but I am not able to get the token elements out of their column.我尝试了很多方法,但我无法将标记元素从它们的列中取出。 The expected output would be like this:预期的输出将是这样的:
WORD FREQ ID TEXT
emma 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
woodhouse 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
handsome 1 sentence1 Emma Woodhouse , handsome , clever , and rich ...
youngest 1 sentence2 She was the youngest of the two daughters of a...
two 1 sentence2 She was the youngest of the two daughters of a...
daughters 1 sentence2 She was the youngest of the two daughters of a...
I am beginning to think that it is not possible to do what I am looking for... can you help me?我开始认为不可能做我正在寻找的事情......你能帮我吗? Thanks!谢谢!
Let us explode
and expand the TOKEN
column into new dataframe, then join
back with original dataframe让我们将explode
列分解并扩展为新的数据TOKEN
,然后join
原始数据框连接
s = df.explode('TOKEN', ignore_index=True)
pd.DataFrame([*s.pop('TOKEN')], columns=['WORD', 'FREQ']).join(s)
WORD FREQ ID TEXT
0 emma 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
1 woodhouse 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
2 handsome 1 sentence1 Emma Woodhouse , handsome , clever , and rich ...
3 youngest 1 sentence2 She was the youngest of the two daughters of a...
4 two 1 sentence2 She was the youngest of the two daughters of a...
5 daughters 2 sentence2 She was the youngest of the two daughters of a...
6 mother 2 sentence3 Her mother had died too long ago for her to ha...
7 died 1 sentence3 Her mother had died too long ago for her to ha...
8 long 1 sentence3 Her mother had died too long ago for her to ha...
You can explode the TOKEN column, then transform and create a dataframe out of it with the desired column names, then you can finally join it with the original dataframe columnwise:您可以分解 TOKEN 列,然后使用所需的列名对其进行转换并创建一个数据框,然后您最终可以将其与原始数据框逐列连接:
pd.concat(
[df.TOKEN.explode().transform(pd.Series)
.rename(columns={0:'WORD', 1:'FREQ'}),
df.drop(columns="TOKEN")],
axis=1)
OUTPUT输出
WORD FREQ ID TEXT
0 emma 2 sentence1 Emma Woodhouse , handsome , clever ,...
0 woodhouse 2 sentence1 Emma Woodhouse , handsome , clever ,...
0 handsome 1 sentence1 Emma Woodhouse , handsome , clever ,...
1 youngest 1 sentence2 She was the youngest of the two daug...
1 two 1 sentence2 She was the youngest of the two daug...
1 daughters 2 sentence2 She was the youngest of the two daug...
2 mother 2 sentence3 Her mother had died too long ago for...
2 died 1 sentence3 Her mother had died too long ago for...
2 long 1 sentence3 Her mother had died too long ago for...
You can reset the index at last if you need to.如果需要,您可以最后重新设置索引。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.