Pandas groupby /按日期在多列上进行数据透视

Question

I'm trying to get the following output from this df. 我正试图从这个df获得以下输出。 It was constructed from a django query which was converted to a df: 它是从一个django查询构造的，它被转换为df：

messages = Message.objects.all()
df = pd.DataFrame.from_records(messages.values())

+---+-----------------+------------+---------------------+
|   |    date_time    | error_desc |        text         |
+---+-----------------+------------+---------------------+
| 0 | 3/31/2019 12:35 | Error msg  | Hello there         |
| 1 | 3/31/2019 12:35 |            | Nothing really here |
| 2 | 4/1/2019 12:35  | Error msg  | What if I told you  |
| 3 | 4/1/2019 12:35  |            | Yes                 |
| 4 | 4/1/2019 12:35  | Error Msg  | Maybe               |
| 5 | 4/2/2019 12:35  |            | Sure I could        |
| 6 | 4/2/2019 12:35  |            | Hello again         |
+---+-----------------+------------+---------------------+

Output: 输出：

+-----------+-------------+--------+-----------------------------+--------------+
|   date    | Total count | Errors | Greeting (start with hello) | errors/total |
+-----------+-------------+--------+-----------------------------+--------------+
| 3/31/2019 |           2 |      1 |                           1 | 50%          |
| 4/1/2019  |           3 |      2 |                           0 | 66.67%       |
| 4/2/2019  |           2 |      0 |                           1 | 0%           |
+-----------+-------------+--------+-----------------------------+--------------+

I'm partially able to get there with the following code, but it seems a bit of a roundabout way of doing it. 我部分能够使用以下代码到达那里，但它似乎有点迂回的做法。 I am marking each with a 'Yes'/'No' based on if they meet conditions and then run a group by. 我根据他们是否符合条件然后分组来标记每个'是'/'否'。

df['date'] = df['date_time'].dt.date
df['greeting'] = np.where(df["text"].str.lower().str.startswith('hello'), "Yes", "No")
df['error'] = np.where(df["error_desc"].notnull(), "Yes", "No")

df.set_index("date")
    .groupby(level="date")
    .apply(lambda g: g.apply(pd.value_counts))
    .unstack(level=1)
    .fillna(0)

This produces the counts, but in multiple yes/no columns. 这会产生计数，但会产生多个是/否列。

I could do some manipulation after this point, but is there a more efficient way of coming up with the output I'm after? 在这之后我可以做一些操作，但有没有更有效的方法来提出我之后的输出？

Answer 1

You can use lambda on multiple columns: 您可以在多列上使用lambda ：

df.groupby('date').apply(lambda x: 
                         pd.Series({'total_count': len(x),
                                    'error_count': (x['error'] == 'Yes').sum(),
                                    'hello_count': (x['greeting'] == 'Yes').sum()}))

To calculate the ratio: 要计算比率：

df['errors/total'] = df['error_count'] / df['total_count']

Answer 2

Here is what I tried which gave me the answer you wanted: 这是我试过的，给了我你想要的答案：

df['date_time'] = pd.to_datetime(df['date_time']).dt.date
df1=pd.DataFrame()
df1['total count'] = df['date_time'].groupby(df['date_time']).count()
df1['errors'] = df['error_desc'].groupby(df['date_time']).count()
df1['Greeting'] = df['text'].groupby(df['date_time']).apply(lambda x: x[x.str.lower().str.startswith('hello')].count())
df1['errors/total'] = round(df1['errors']/df1['total count']*100,2)

Pandas groupby /按日期在多列上进行数据透视

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-04-01 21:59:55

解决方案2
0 2019-04-01 22:26:53

Pandas groupby /按日期在多列上进行数据透视

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-04-01 21:59:55

解决方案2 0 2019-04-01 22:26:53

解决方案1
0 已采纳 2019-04-01 21:59:55

解决方案2
0 2019-04-01 22:26:53