按组获取所有行，选定行和总熊猫的百分比的总计

Question

let us say I have a pandas dataframe called mydf. 假设我有一个名为mydf的熊猫数据框。 Ie, 即

import pandas as pd

mydf = pd.DataFrame({
    'type':['A','A','A', 'B','B','B', 'C'], 
    'state':['NY','CA','NY', 'NY','CA','CA', 'WY'], 
    'date':['2018-01-02','2018-01-04','2018-02-06', 
            '2018-01-01','2018-01-24','2018-02-10','2018-01-24']
})

Out[28]: 
         date state type
0  2018-01-02    NY    A
1  2018-01-04    CA    A
2  2018-02-06    NY    A
3  2018-01-01    NY    B
4  2018-01-24    CA    B
5  2018-02-10    CA    B
6  2018-01-24    WY    C

I'd want a table that count total number of records per state and date (year-month only not per daily date) for all records of type A, for all records (type A,B,C) and then the percentage of A within each group to the total. 我想要一个表，该表计算类型A的所有记录，状态A的所有记录（类型A，B，C）的每个州和日期（仅年-月，而不是每天的日期）的记录总数，然后计算A的百分比每个组中的总数。

Ie, the final output would be another pandas dataframe with following columns and values: 即，最终输出将是具有以下列和值的另一个pandas数据框：

date_ym state   total_count total_type_A    percentage
20181   CA      2           1               50
20181   NY      2           1               50
20181   WY      1           0               0
20182   CA      1           0               0
20182   NY      1           1               50

I could create two tables, then merge them and then count but I was looking for a simpler one-liner code... 我可以创建两个表，然后合并它们，然后计数，但是我在寻找一个更简单的单行代码...

Answer 1

First transform dates to months: 第一次转换日期为几个月：

mydf["date"] = mydf["date"].dt.strftime("%Y%m")

Then use groupby.agg : 然后使用groupby.agg ：

def total_type_A(x):
    return sum(x == "A")

def percentage(x):
    return sum(x == "A") / len(x)

mydf.groupby(["date", "state"]).agg([len, total_type_A,  percentage])

Answer 2

Another alternative would be to create a function that returns a Series with your desired columns. 另一种选择是创建一个函数，该函数返回带有所需列的Series。

Full example: 完整示例：

import pandas as pd

df = pd.DataFrame({
    'type':['A','A','A', 'B','B','B', 'C'], 
    'state':['NY','CA','NY', 'NY','CA','CA', 'WY'], 
    'date':['2018-01-02','2018-01-04','2018-02-06', 
            '2018-01-01','2018-01-24','2018-02-10','2018-01-24']
})

df['date_ym'] = pd.to_datetime(df['date']).dt.strftime('%Y%#m') # switch # with - on linux

def func(x):
    cnt = len(x)
    cnt_A = sum(x == 'A')
    return pd.Series({
        'total_count': cnt,
        'total_type_A': cnt_A,
        'percentage': cnt_A/cnt*100
    })

df = df.groupby(['date_ym','state'])['type'].apply(func).unstack().reset_index()

print(df)

Returns: 返回值：

  date_ym state  total_count  total_type_A  percentage
0   20181    CA          2.0           1.0        50.0
1   20181    NY          2.0           1.0        50.0
2   20181    WY          1.0           0.0         0.0
3   20182    CA          1.0           0.0         0.0
4   20182    NY          1.0           1.0       100.0

按组获取所有行，选定行和总熊猫的百分比的总计

问题描述

2 个解决方案

解决方案1
2 2018-06-14 21:16:42

解决方案2
2 已采纳 2018-06-14 21:32:11

按组获取所有行，选定行和总熊猫的百分比的总计

问题描述

2 个解决方案

解决方案1 2 2018-06-14 21:16:42

解决方案2 2 已采纳 2018-06-14 21:32:11

解决方案1
2 2018-06-14 21:16:42

解决方案2
2 已采纳 2018-06-14 21:32:11