简体   繁体   English

如何将频率列中的每个元素转换为新的数据框行?

[英]How to convert each element in a frequency column into a new dataframe row?

So, I have this dataframe in which there is an ID column, a TEXT column and a TOKEN column with the 3 most frequent words in the TEXT所以,我有这个数据框,其中有一个 ID 列、一个 TEXT 列和一个 TOKEN 列,其中包含 TEXT 中最常见的 3 个单词

ID           TEXT                                               TOKEN
sentence1    Emma Woodhouse , handsome , clever , and rich ...  [(emma, 2), (woodhouse, 2), (handsome, 1)]
sentence2    She was the youngest of the two daughters of a...  [(youngest, 1), (two, 1), (daughters, 2)]
sentence3    Her mother had died too long ago for her to ha...  [(mother, 2), (died, 1), (long, 1)]

I want to convert the elements of each row in TOKEN column to a new row in a new dataframe.我想将 TOKEN 列中每一行的元素转换为新数据框中的新行。 I have tried many ways but I am not able to get the token elements out of their column.我尝试了很多方法,但我无法将标记元素从它们的列中取出。 The expected output would be like this:预期的输出将是这样的:

WORD         FREQ    ID           TEXT                                
emma         2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
woodhouse    2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
handsome     1       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
youngest     1       sentence2    She was the youngest of the two daughters of a... 
two          1       sentence2    She was the youngest of the two daughters of a...
daughters    1       sentence2    She was the youngest of the two daughters of a... 

I am beginning to think that it is not possible to do what I am looking for... can you help me?我开始认为不可能做我正在寻找的事情......你能帮我吗? Thanks!谢谢!

Let us explode and expand the TOKEN column into new dataframe, then join back with original dataframe让我们将explode列分解并扩展为新的数据TOKEN ,然后join原始数据框连接

s = df.explode('TOKEN', ignore_index=True)
pd.DataFrame([*s.pop('TOKEN')], columns=['WORD', 'FREQ']).join(s)

        WORD  FREQ         ID                                               TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
1  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
2   handsome     1  sentence1  Emma Woodhouse , handsome , clever , and rich ...
3   youngest     1  sentence2  She was the youngest of the two daughters of a...
4        two     1  sentence2  She was the youngest of the two daughters of a...
5  daughters     2  sentence2  She was the youngest of the two daughters of a...
6     mother     2  sentence3  Her mother had died too long ago for her to ha...
7       died     1  sentence3  Her mother had died too long ago for her to ha...
8       long     1  sentence3  Her mother had died too long ago for her to ha...

You can explode the TOKEN column, then transform and create a dataframe out of it with the desired column names, then you can finally join it with the original dataframe columnwise:您可以分解 TOKEN 列,然后使用所需的列名对其进行转换并创建一个数据框,然后您最终可以将其与原始数据框逐列连接:

pd.concat(
    [df.TOKEN.explode().transform(pd.Series)
     .rename(columns={0:'WORD', 1:'FREQ'}), 
     df.drop(columns="TOKEN")],
axis=1)

OUTPUT输出

        WORD  FREQ         ID                                     TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever ,...
0  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever ,...
0   handsome     1  sentence1  Emma Woodhouse , handsome , clever ,...
1   youngest     1  sentence2  She was the youngest of the two daug...
1        two     1  sentence2  She was the youngest of the two daug...
1  daughters     2  sentence2  She was the youngest of the two daug...
2     mother     2  sentence3  Her mother had died too long ago for...
2       died     1  sentence3  Her mother had died too long ago for...
2       long     1  sentence3  Her mother had died too long ago for...

You can reset the index at last if you need to.如果需要,您可以最后重新设置索引。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 dataframe 的每一行转换为新列使用 concat in python - How to convert each row of a dataframe to new column use concat in python 如何将具有其值为列表的列的数据帧转换为数据帧,其中该列中每个列表的每个元素都成为一个新行 - How to transform a dataframe with a column whose values are lists to a dataframe where each element of each list in that column becomes a new row 如何将行的每个元素转换为 python 中的列? - How to convert each element of a row into column in python? 将一行中的每个元素相乘并在同一数据框中追加新列? - Multiply each element in one row and append new column in same dataFrame? 如何将 Pandas DataFrame 的每一行转换为新的 nxm 矩阵? - How to convert each row of a Pandas DataFrame into a new nxm matrix? 获取数据框中列的每一行的列表中各个项目的频率 - Get the frequency of individual items in a list of each row of a column in a dataframe 如何在数据框中拆分一列并将每个值存储为新行(以熊猫为单位)? - How to split a column in a dataframe and store each value as a new row (in pandas)? 创建一个新列,它是 pandas DataFrame 中行的频率 - Make a new column that's the frequency of a row in a pandas DataFrame 如何将每个嵌套字典的元素转换为新的 pandas 列? - How to convert each nested dictionary' element to a new pandas column? 如何将 dataframe 中的每一行乘以不同 dataframe 的不同列,并将所有行的总和作为 Python 中的新列? - How to multiply each row in dataframe by a different column of different dataframe and get sum of all rows as a new column in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM