简体   繁体   English

如何获取pandas数据帧的每一行中特定值的频率

[英]How to get the frequency of a specific value in each row of pandas dataframe

I have this pandas DataFrame: 我有这个pandas DataFrame:

df = pd.DataFrame(
    data=[
        ['yes', 'no', np.nan],
        ['no', 'yes', 'no'],
        [np.nan, 'yes', 'yes'],
        ['no', 'no', 'no']
    ],
    index=pd.Index(['xyz_1', 'xyz_2', 'xyz_3', 'xyz_4'], name='ID'),
    columns=['class1', 'class2', 'class3']
)

print(df)
Out:

    ID         class1  class2   class3
xyz_1          yes     no       NaN    
xyz_2          no      yes      no
xyz_3          NaN     yes      yes
xyz_4          no      no       no

I want to get the frequency of the 'yes' and 'no' in the class columns per row and have a new data frame which looks like: 我想在每行的类列中获得“是”和“否”的频率,并且有一个新的数据框,如下所示:

    ID         yes     no       nan
xyz_1          1       1        1
xyz_2          1       2        0
xyz_3          2       0        1
xyz_4          0       3        0

I looked at this question, but I don't want the sum but the counts. 我看了这个问题,但我不想要总和而是计数。

Any ideas? 有任何想法吗?

Use pd.get_dummies , but set dummy_na to True : 使用pd.get_dummies ,但将dummy_na设置为True

pd.get_dummies(
    df, prefix='', prefix_sep='', dummy_na=True
 ).groupby(level=0, axis=1).sum()  # Sum the *counts* for each column.

       nan  no  yes
ID                 
xyz_1    1   1    1
xyz_2    0   2    1
xyz_3    1   0    2
xyz_4    0   3    0 

You may check melt + crosstab 你可以检查一下melt + crosstab

newdf=df.melt('ID')

pd.crosstab(newdf.ID,newdf.value.fillna('NaN'))
Out[8]: 
value  NaN  no  yes
ID                 
xyz_1    1   1    1
xyz_2    0   2    1
xyz_3    1   0    2
xyz_4    0   3    0

Using pd.get_dummies 使用pd.get_dummies

df = df.set_index('ID') # Do this line only if 'ID' is not index

df2 = pd.get_dummies(df, dummy_na=True)

df['no']  = df2[df2.columns[df2.columns.str.endswith('no')]].sum(1)
df['yes'] = df2[df2.columns[df2.columns.str.endswith('yes')]].sum(1)
df['nan'] = df2[df2.columns[df2.columns.str.endswith('nan')]].sum(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM