在新列中计算 dataframe 中的逗号分隔字符串

Question

I have the following df:我有以下df：

df = pd.DataFrame({'Name': ['John', 'Sara', 'Paul', 'Guest'], 'Interaction': ['share,like,share,like,like,like', 'love,like,share,like,love,like', 'share,like,share,like,like,like,share,like,share,like,like,hug','share,like,care,like,like,like']})

Name    Interaction
0   John    share,like,share,like,like,like
1   Sara    love,like,share,like,love,like
2   Paul    share,like,share,like,like,like,share,like,sha...
3   Guest   share,like,care,like,like,like

I would like to create a third column calculating the number of single interactions as int我想创建第三列计算单个交互的数量为int

What I did:我做了什么：

df['likes'] = df[df['Interaction'] == 'like'].groupby('Name')['Interaction'].transform(lambda x: x[x.str.contains('like')].count())

I did the same line for share, care.. etc But it does not work!我做了同样的分享，关心..等但它不起作用！

Name    Interaction                                           likes     shares
0   John    share,like,share,like,like,like                     NaN     NaN
1   Sara    love,like,share,like,love,like                      NaN     NaN
2   Paul    share,like,share,like,like,like,share,like,sha...   NaN     NaN
3   Guest   share,like,care,like,like,like                      NaN     NaN

How can I count each interaction as int and then find the total per row in a final column?如何将每个交互计为int ，然后在最后一列中找到每行的总数？

Answer 1

You can split the string by , , explode it and value_counts :您可以将字符串拆分为, 、分解它和value_counts ：

df.join(df['Interaction'].str.split(',')
          .explode()
          .groupby(level=0).value_counts()
          .unstack(fill_value=0))

Output: Output：

    Name                                        Interaction  care  hug  like  love  share
0   John                    share,like,share,like,like,like     0    0     4     0      2
1   Sara                     love,like,share,like,love,like     0    0     3     2      1
2   Paul  share,like,share,like,like,like,share,like,sha...     0    1     7     0      4
3  Guest                     share,like,care,like,like,like     1    0     4     0      1

Answer 2

First you need to str.split the column on the comma, expand the result to create a dataframe, stack to get a series and use str.get_dummies that will create a column for each different word and add 1 for the corresponding value in the series.首先，您需要在逗号上对列进行str.split ，展开结果以创建 dataframe， stack以获取系列并使用str.get_dummies将为每个不同的单词创建一列，并为系列中的相应值加 1 . Finally sum on level=0 to go back to original shape.最后在 level=0 上sum到 go 回到原始形状。 join the result to the original dataframe将结果join原始 dataframe

df = df.join( df['Interaction'].str.split(',', expand=True)
                .stack()
                .str.get_dummies()
                .sum(level=0)
            )
print(df)
    Name                                        Interaction  care  hug  like  \
0   John                    share,like,share,like,like,like     0    0     4   
1   Sara                     love,like,share,like,love,like     0    0     3   
2   Paul  share,like,share,like,like,like,share,like,sha...     0    1     7   
3  Guest                     share,like,care,like,like,like     1    0     4   

   love  share  
0     0      2  
1     2      1  
2     0      4  
3     0      1

Answer 3

Let us do pd.crosstab让我们做pd.crosstab

s = df.Interaction.str.split(',').explode()
df = df.join(pd.crosstab(s.index,s))

Answer 4

Hej, good answer Quang.嘿，好回答广。

df.join(df['Interaction'].str.split(',')
          .explode()
          .groupby(level=0).value_counts()
          .unstack(fill_value=0))

How can we name the columns as we like?我们如何根据需要命名列？ For example, Int_care, Int_love, etc. so that relating to the actual parent column is possible.例如，Int_care、Int_love 等，因此与实际父列相关是可能的。

Not able to comment on your answer as I do not have enough points.无法评论您的答案，因为我没有足够的分数。

Thanks.谢谢。

在新列中计算 dataframe 中的逗号分隔字符串

问题描述

3 个解决方案

解决方案1
3 2021-05-20 03:11:51

解决方案2
3 已采纳 2021-05-20 03:14:13

解决方案3
1 2021-05-20 03:23:52

解决方案4
-2 2021-12-30 10:38:50

在新列中计算 dataframe 中的逗号分隔字符串

问题描述

3 个解决方案

解决方案1 3 2021-05-20 03:11:51

解决方案2 3 已采纳 2021-05-20 03:14:13

解决方案3 1 2021-05-20 03:23:52

解决方案4 -2 2021-12-30 10:38:50

解决方案1
3 2021-05-20 03:11:51

解决方案2
3 已采纳 2021-05-20 03:14:13

解决方案3
1 2021-05-20 03:23:52

解决方案4
-2 2021-12-30 10:38:50