[英]Counting comma separated string in dataframe in a new column
I have the following df:我有以下df:
df = pd.DataFrame({'Name': ['John', 'Sara', 'Paul', 'Guest'], 'Interaction': ['share,like,share,like,like,like', 'love,like,share,like,love,like', 'share,like,share,like,like,like,share,like,share,like,like,hug','share,like,care,like,like,like']})
Name Interaction
0 John share,like,share,like,like,like
1 Sara love,like,share,like,love,like
2 Paul share,like,share,like,like,like,share,like,sha...
3 Guest share,like,care,like,like,like
I would like to create a third column calculating the number of single interactions as int
我想创建第三列计算单个交互的数量为
int
What I did:我做了什么:
df['likes'] = df[df['Interaction'] == 'like'].groupby('Name')['Interaction'].transform(lambda x: x[x.str.contains('like')].count())
I did the same line for share, care.. etc But it does not work!我做了同样的分享,关心..等但它不起作用!
Name Interaction likes shares
0 John share,like,share,like,like,like NaN NaN
1 Sara love,like,share,like,love,like NaN NaN
2 Paul share,like,share,like,like,like,share,like,sha... NaN NaN
3 Guest share,like,care,like,like,like NaN NaN
How can I count each interaction as int
and then find the total per row in a final column?如何将每个交互计为
int
,然后在最后一列中找到每行的总数?
You can split the string by ,
, explode it and value_counts
:您可以将字符串拆分为
,
、分解它和value_counts
:
df.join(df['Interaction'].str.split(',')
.explode()
.groupby(level=0).value_counts()
.unstack(fill_value=0))
Output: Output:
Name Interaction care hug like love share
0 John share,like,share,like,like,like 0 0 4 0 2
1 Sara love,like,share,like,love,like 0 0 3 2 1
2 Paul share,like,share,like,like,like,share,like,sha... 0 1 7 0 4
3 Guest share,like,care,like,like,like 1 0 4 0 1
First you need to str.split
the column on the comma, expand the result to create a dataframe, stack
to get a series and use str.get_dummies
that will create a column for each different word and add 1 for the corresponding value in the series.首先,您需要在逗号上对列进行
str.split
,展开结果以创建 dataframe, stack
以获取系列并使用str.get_dummies
将为每个不同的单词创建一列,并为系列中的相应值加 1 . Finally sum
on level=0 to go back to original shape.最后在 level=0 上
sum
到 go 回到原始形状。 join
the result to the original dataframe将结果
join
原始 dataframe
df = df.join( df['Interaction'].str.split(',', expand=True)
.stack()
.str.get_dummies()
.sum(level=0)
)
print(df)
Name Interaction care hug like \
0 John share,like,share,like,like,like 0 0 4
1 Sara love,like,share,like,love,like 0 0 3
2 Paul share,like,share,like,like,like,share,like,sha... 0 1 7
3 Guest share,like,care,like,like,like 1 0 4
love share
0 0 2
1 2 1
2 0 4
3 0 1
Let us do pd.crosstab
让我们做
pd.crosstab
s = df.Interaction.str.split(',').explode()
df = df.join(pd.crosstab(s.index,s))
Hej, good answer Quang.嘿,好回答广。
df.join(df['Interaction'].str.split(',')
.explode()
.groupby(level=0).value_counts()
.unstack(fill_value=0))
How can we name the columns as we like?我们如何根据需要命名列? For example, Int_care, Int_love, etc. so that relating to the actual parent column is possible.
例如,Int_care、Int_love 等,因此与实际父列相关是可能的。
Not able to comment on your answer as I do not have enough points.无法评论您的答案,因为我没有足够的分数。
Thanks.谢谢。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.