I've got a dataframe and want to loop through all cells within column c2
and count how many times each entire string appears in another column c1
, if it exists. Then print the results.
Example df:
id c1 c2
0 luke skywalker han solo
1 leia organa r2d2
2 darth vader finn
3 han solo the emporer
4 han solo c3po
5 finn leia organa
6 r2d2 darth vader
Example printed result:
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
I'm using Jupyter notebook with python and pandas. Thanks!
You can use some Numpy magic.
Use count
and broadcasting to compare each combination.
from numpy.core.defchararray import count
c1 = df.c1.values.astype(str)
c2 = df.c2.values.astype(str)
pd.Series(
count(c1, c2[:, None]).sum(1),
c2
)
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
dtype: int64
You can pass them as category
and using value_counts
df.c1.astype('category',categories=df.c2.tolist()).value_counts(sort=False)
Out[572]:
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
Name: c1, dtype: int64
Or you can do
pd.crosstab(df.c2,df.c1).sum().reindex(df.c2,fill_value=0)
Out[592]:
c2
han solo 2
r2d2 1
finn 1
the emporer 0
c3po 0
leia organa 1
darth vader 1
df[c3] = pd.Series([df[c1].count(n) for n in df[c2]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.