[英]Append number of times a string occurs in Pandas dataframe to another column
I'd like to create an extra column on this dataframe: 我想在这个数据帧上创建一个额外的列:
Index Value
0 22,88,22,24
1 24,24
2 22,24
3 11,22,24,12,24,24,22,24
4 22
So that the number of times a value occurs is stored in a new column: 因此,值发生的次数存储在新列中:
Index Value 22 Count
0 22,88,22,24 2
1 24,24 1
2 22,24 1
3 11,22,24,12,24,24,22,24 2
4 22 1
I'd like to repeat this process for a number of different values within the value
column. 我想在value
列中为许多不同的值重复此过程。
My minimal Python knowledge is telling me something like: 我最小的Python知识告诉我类似的东西:
df['22 Count'] = df['Value'].count('22')
I've tried this and a few other versions but I must be missing something. 我试过这个和其他几个版本但我必须遗漏一些东西。
If want count only one value use str.count
: 如果只想计算一个值,请使用str.count
:
df['22 Count'] = df['Value'].str.count('22')
print (df)
Value 22 Count
Index
0 22,88,22,24 2
1 24,24 0
2 22,24 1
3 11,22,24,12,24,24,22,24 2
4 22 1
For all columns count need: 对于所有列数需要:
from collections import Counter
df1 = df['Value'].apply(lambda x: pd.Series(Counter(x.split(','))), 1).fillna(0).astype(int)
Or: 要么:
df1 = pd.DataFrame([Counter(x.split(',')) for x in df['Value']]).fillna(0).astype(int)
Or: 要么:
from sklearn.feature_extraction.text import CountVectorizer
countvec = CountVectorizer()
counts = countvec.fit_transform(df['Value'].str.replace(',', ' '))
df1 = pd.DataFrame(counts.toarray(), columns=countvec.get_feature_names())
print (df1)
11 12 22 24 88
0 0 0 2 1 1
1 0 0 0 2 0
2 0 0 1 1 0
3 1 1 2 4 0
4 0 0 1 0 0
Last if need add to original: 最后如果需要添加到原始:
df = df.join(df1.add_suffix(' Count'))
print (df)
Value 11 Count 12 Count 22 Count 24 Count \
Index
0 22,88,22,24 0 0 2 1
1 24,24 0 0 0 2
2 22,24 0 0 1 1
3 11,22,24,12,24,24,22,24 1 1 2 4
4 22 0 0 1 0
88 Count
Index
0 1
1 0
2 0
3 0
4 0
You are close. 你很亲密 But your syntax attempts to treat a series as if it were a list. 但是您的语法会尝试将系列视为列表。 Instead, you can use the count
method after conversion to list
: 相反,您可以在转换为list
后使用count
方法:
from operator import methodcaller
df['22_Count'] = df['Value'].str.split(',').apply(methodcaller('count', '22'))
print(df)
Index Value 22_Count
0 0 22,88,22,24 2
1 1 24,24 0
2 2 22,24 1
3 3 11,22,24,12,24,24,22,24 2
4 4 22 1
Use the methods shown by @jezrael . 使用@jezrael显示的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.