[英]Python: In a DataFrame, how do I loop through all strings of one column and check to see if they appear in another column and count them?
I've got a dataframe and want to loop through all cells within column c2
and count how many times each entire string appears in another column c1
, if it exists. 我有一个数据框,想要遍历
c2
列中的所有单元格,并计算每个完整字符串出现在另一列c1
(如果存在)的次数。 Then print the results. 然后打印结果。
Example df: df示例:
id c1 c2
0 luke skywalker han solo
1 leia organa r2d2
2 darth vader finn
3 han solo the emporer
4 han solo c3po
5 finn leia organa
6 r2d2 darth vader
Example printed result: 示例打印结果:
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
I'm using Jupyter notebook with python and pandas. 我正在将Jupyter Notebook与python和pandas一起使用。 Thanks!
谢谢!
You can use some Numpy magic. 您可以使用一些Numpy魔术。
Use count
and broadcasting to compare each combination. 使用
count
和广播比较每个组合。
from numpy.core.defchararray import count
c1 = df.c1.values.astype(str)
c2 = df.c2.values.astype(str)
pd.Series(
count(c1, c2[:, None]).sum(1),
c2
)
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
dtype: int64
You can pass them as category
and using value_counts
您可以将它们作为
category
并使用value_counts
传递
df.c1.astype('category',categories=df.c2.tolist()).value_counts(sort=False)
Out[572]:
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
Name: c1, dtype: int64
Or you can do 或者你可以做
pd.crosstab(df.c2,df.c1).sum().reindex(df.c2,fill_value=0)
Out[592]:
c2
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
df[c3] = pd.Series([df[c1].count(n) for n in df[c2]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.