简体   繁体   English

Pandas groupby后其他列出现次数直方图

[英]Pandas histogram of number of occurences of other columns after groupby

I have a dataframe:我有一个 dataframe:

df = Batch_ID           DateTime            Code A1 A2
      ABC.      '2019-01-02 17:03:41.000'   230  2. 4 
      ABC.      '2019-01-02 17:03:41.000'   230  1. 5 
      ABC.      '2019-01-02 17:03:42.000'   231  1. 4 
      ABC.      '2019-01-02 17:03:48.000'   232  2. 7 
      ABC.      '2019-01-02 17:04:41.000'   230  2. 9 
      ABB.      '2019-01-02 17:04:41.000'   235  5. 4 
      ABB.      '2019-01-02 17:04:45.000'   236  2. 0 

I need to generate an plot of an histogram of "number of different codes per <Batch_ID, minute>. Notice that 'Code' may have multiple occurrences but should be taken after unique.我需要生成一个 plot 的“每 <Batch_ID,分钟> 的不同代码数量”的直方图。请注意,“代码”可能多次出现,但应在唯一之后采用。

So in this case some entries will be:所以在这种情况下,一些条目将是:

<ABC, 2019-01-02 17:03> : 3
<ABC, 2019-01-02 17:04> : 1
<ABB, 2019-01-02 17:04> : 2

How can it be done?如何做呢?

Try this using pd.Grouper on a datetime dtype column:在 datetime dtype 列上使用pd.Grouper试试这个:

df = pd.read_clipboard(sep='\s\s+')

df['DateTime'] = pd.to_datetime(df['DateTime'].str.strip("'"))

df.groupby(['Batch_ID', pd.Grouper(key='DateTime', freq='T')])['Code'].count().rename('Count').reset_index()

Output: Output:

  Batch_ID            DateTime  Count
0     ABB. 2019-01-02 17:04:00      2
1     ABC. 2019-01-02 17:03:00      3
2     ABC. 2019-01-02 17:04:00      1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM