如何将频率列中的每个元素转换为新的数据框行？

Question

So, I have this dataframe in which there is an ID column, a TEXT column and a TOKEN column with the 3 most frequent words in the TEXT所以，我有这个数据框，其中有一个 ID 列、一个 TEXT 列和一个 TOKEN 列，其中包含 TEXT 中最常见的 3 个单词

ID           TEXT                                               TOKEN
sentence1    Emma Woodhouse , handsome , clever , and rich ...  [(emma, 2), (woodhouse, 2), (handsome, 1)]
sentence2    She was the youngest of the two daughters of a...  [(youngest, 1), (two, 1), (daughters, 2)]
sentence3    Her mother had died too long ago for her to ha...  [(mother, 2), (died, 1), (long, 1)]

I want to convert the elements of each row in TOKEN column to a new row in a new dataframe.我想将 TOKEN 列中每一行的元素转换为新数据框中的新行。 I have tried many ways but I am not able to get the token elements out of their column.我尝试了很多方法，但我无法将标记元素从它们的列中取出。 The expected output would be like this:预期的输出将是这样的：

WORD         FREQ    ID           TEXT                                
emma         2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
woodhouse    2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
handsome     1       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
youngest     1       sentence2    She was the youngest of the two daughters of a... 
two          1       sentence2    She was the youngest of the two daughters of a...
daughters    1       sentence2    She was the youngest of the two daughters of a...

I am beginning to think that it is not possible to do what I am looking for... can you help me?我开始认为不可能做我正在寻找的事情......你能帮我吗？ Thanks!谢谢！

Answer 1

Let us explode and expand the TOKEN column into new dataframe, then join back with original dataframe让我们将explode列分解并扩展为新的数据TOKEN ，然后join原始数据框连接

s = df.explode('TOKEN', ignore_index=True)
pd.DataFrame([*s.pop('TOKEN')], columns=['WORD', 'FREQ']).join(s)

        WORD  FREQ         ID                                               TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
1  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
2   handsome     1  sentence1  Emma Woodhouse , handsome , clever , and rich ...
3   youngest     1  sentence2  She was the youngest of the two daughters of a...
4        two     1  sentence2  She was the youngest of the two daughters of a...
5  daughters     2  sentence2  She was the youngest of the two daughters of a...
6     mother     2  sentence3  Her mother had died too long ago for her to ha...
7       died     1  sentence3  Her mother had died too long ago for her to ha...
8       long     1  sentence3  Her mother had died too long ago for her to ha...

Answer 2

You can explode the TOKEN column, then transform and create a dataframe out of it with the desired column names, then you can finally join it with the original dataframe columnwise:您可以分解 TOKEN 列，然后使用所需的列名对其进行转换并创建一个数据框，然后您最终可以将其与原始数据框逐列连接：

pd.concat(
    [df.TOKEN.explode().transform(pd.Series)
     .rename(columns={0:'WORD', 1:'FREQ'}), 
     df.drop(columns="TOKEN")],
axis=1)

OUTPUT输出

        WORD  FREQ         ID                                     TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever ,...
0  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever ,...
0   handsome     1  sentence1  Emma Woodhouse , handsome , clever ,...
1   youngest     1  sentence2  She was the youngest of the two daug...
1        two     1  sentence2  She was the youngest of the two daug...
1  daughters     2  sentence2  She was the youngest of the two daug...
2     mother     2  sentence3  Her mother had died too long ago for...
2       died     1  sentence3  Her mother had died too long ago for...
2       long     1  sentence3  Her mother had died too long ago for...

You can reset the index at last if you need to.如果需要，您可以最后重新设置索引。

如何将频率列中的每个元素转换为新的数据框行？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-07-11 12:37:43

解决方案2
1 2022-07-11 12:53:53

如何将频率列中的每个元素转换为新的数据框行？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-07-11 12:37:43

解决方案2 1 2022-07-11 12:53:53

解决方案1
1 已采纳 2022-07-11 12:37:43

解决方案2
1 2022-07-11 12:53:53