[英]Count occurance of unique values in a pandas dataframe across multiple columns
I have the following dataframe in pandas 我在熊猫中有以下数据帧
df = pd.DataFrame({'a' : ['hello', 'world', 'great', 'hello'], 'b' : ['world', None, 'hello', 'world'], 'c' : [None, 'hello', 'great', None]})
i would like to count the occurrence of the unique values in column 'a' across all the other columns and column 'a' too and save that into new columns for the dataframe with appropriate naming that take on the values in column 'a' such as 'hello_count', 'world_count' and so on. 我想计算所有其他列和列'a'中列'a'中唯一值的出现,并将其保存到数据帧的新列中,并使用适当的命名来获取列'a'中的值如'hello_count','world_count'等。 Hence the end result would be something like
因此,最终结果将是这样的
df = pd.DataFrame({'a' : ['hello', 'world', 'great', 'hello'], 'b' : ['world', None, 'hello', 'world'], 'c' : [None, 'hello', 'great', None], 'hello_count' : [1,1,1,1], 'world_count' : [1,1,0,1], 'great_count' : [0,0,2,0]})
i tried 我试过了
df['a', 'b', 'a'].groupby('a').agg(['count])
but that did not work. 但那没用。 Any help is really appreciated
任何帮助都非常感谢
Let's use pd.get_dummies
and groupby
: 让我们使用
pd.get_dummies
和groupby
:
(df1.assign(**pd.get_dummies(df1)
.pipe(lambda x: x.groupby(x.columns.str[2:], axis=1)
.sum())))
Output: 输出:
a b c great hello world
0 hello world None 0 1 1
1 world None hello 0 1 1
2 great hello great 2 1 0
3 hello world None 0 1 1
Here is the above solution in steps. 以下是步骤中的上述解决方案。
df_gd = pd.get_dummies(df1)
print(df_gd)
a_great a_hello a_world b_hello b_world c_great c_hello
0 0 1 0 0 1 0 0
1 0 0 1 0 0 0 1
2 1 0 0 1 0 1 0
3 0 1 0 0 1 0 0
df_gb = df_gd.groupby(df_gd.columns.str[2:], axis=1).sum()
print(df_gb)
great hello world
0 0 1 1
1 0 1 1
2 2 1 0
3 0 1 1
df_out = df1.join(df_gb)
print(df_out)
Ouput: 输出继电器:
a b c great hello world
0 hello world None 0 1 1
1 world None hello 0 1 1
2 great hello great 2 1 0
3 hello world None 0 1 1
Using df.apply
in a loop simplifies the job. 在循环中使用
df.apply
简化作业。 Each row is then tested how many of its elements are same as the required string: 然后测试每行中有多少元素与所需字符串相同:
for ss in df.a.unique():
df[ss+"_count"] = df.apply(lambda row: sum(map(lambda x: x==ss, row)), axis=1)
print(df)
Output: 输出:
a b c hello_count world_count great_count
0 hello world None 1 1 0
1 world None hello 1 1 0
2 great hello great 1 0 2
3 hello world None 1 1 0
You can create dictionary d_unique={} and assign all the unique values as key pair in to it, consider the dataframe named as data_rnr: 您可以创建字典d_unique = {}并将所有唯一值作为密钥对分配给它,考虑名为data_rnr的数据帧:
d_unique={}
for col in data_rnr.columns:
print(data_rnr[col].name)
print(len(data_rnr[col].unique()))
d_unique[data_rnr[col].name]=len(data_rnr[col].unique())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.