[英]Python Pandas: Counting the frequency of a specific value in each row of dataframe?
[英]How to get the frequency of a specific value in each row of pandas dataframe
我有这个pandas DataFrame:
df = pd.DataFrame(
data=[
['yes', 'no', np.nan],
['no', 'yes', 'no'],
[np.nan, 'yes', 'yes'],
['no', 'no', 'no']
],
index=pd.Index(['xyz_1', 'xyz_2', 'xyz_3', 'xyz_4'], name='ID'),
columns=['class1', 'class2', 'class3']
)
print(df)
Out:
ID class1 class2 class3
xyz_1 yes no NaN
xyz_2 no yes no
xyz_3 NaN yes yes
xyz_4 no no no
我想在每行的类列中获得“是”和“否”的频率,并且有一个新的数据框,如下所示:
ID yes no nan
xyz_1 1 1 1
xyz_2 1 2 0
xyz_3 2 0 1
xyz_4 0 3 0
我看了这个问题,但我不想要总和而是计数。
有任何想法吗?
使用pd.get_dummies
,但将dummy_na
设置为True
:
pd.get_dummies(
df, prefix='', prefix_sep='', dummy_na=True
).groupby(level=0, axis=1).sum() # Sum the *counts* for each column.
nan no yes
ID
xyz_1 1 1 1
xyz_2 0 2 1
xyz_3 1 0 2
xyz_4 0 3 0
你可以检查一下melt
+ crosstab
newdf=df.melt('ID')
pd.crosstab(newdf.ID,newdf.value.fillna('NaN'))
Out[8]:
value NaN no yes
ID
xyz_1 1 1 1
xyz_2 0 2 1
xyz_3 1 0 2
xyz_4 0 3 0
df = df.set_index('ID') # Do this line only if 'ID' is not index
df2 = pd.get_dummies(df, dummy_na=True)
df['no'] = df2[df2.columns[df2.columns.str.endswith('no')]].sum(1)
df['yes'] = df2[df2.columns[df2.columns.str.endswith('yes')]].sum(1)
df['nan'] = df2[df2.columns[df2.columns.str.endswith('nan')]].sum(1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.