[英]pandas groupby on multiple columns
I have a data set which contains state code and its status. 我有一个包含状态代码及其状态的数据集。
code status
1 AZ a
2 CA b
3 KS c
4 MO c
5 NY d
6 AZ d
7 MO a
8 MO b
9 MN b
10 NV a
11 NV e
12 MO f
13 NY a
14 NY a
15 NY b
I want to filter out this data set which code contains only a
status and count how many they have. 我想过滤出该数据集,其中哪些代码仅包含
a
状态并计算它们的数量。 Example output will be, 示例输出将是,
code status
1 AZ a
2 MO a
3 NY a
AZ =1 MO = 1 NY =2
I used df.groupyby("code").loc[df.status == 'a']
but didn't have any luck. 我使用了
df.groupyby("code").loc[df.status == 'a']
但没有任何运气。 Any help appreciated! 任何帮助表示赞赏!
Let's filter the dataframe first for a, then groupby and count. 让我们首先为a过滤数据帧,然后对groupby进行计数。
df[df.status == 'a'].groupby('code').size()
Output: 输出:
code
AZ 1
MO 1
NV 1
NY 2
dtype: int64
I've recreated your dataset 我已经重新创建了您的数据集
data = [["AZ","CA", "KS","MO","NY","AZ","MO","MO","MN","NV","NV","MO","NY","NY" ,"NY"],
["a","b","c","c","d","d","a","b","b","a","e","f","a","a","b"]]
df = pd.DataFrame(data)
df = df.T
df.columns = ["code","status" ]
df[df["status"] == "a"].groupby(["code", "status"]).size()
gives 给
code status
AZ a 1
MO a 1
NV a 1
NY a 2
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.