简体   繁体   English

如何使用计数将数据帧重新采样到新列中并将列聚合到列表中

[英]How to resample dataframe with counts into new column and aggregate column into list

I have a DataFrame with measurements of the following form: 我有一个具有以下形式的度量的DataFrame:

                           label
2015-01-17 20:58:00.740000    cc
2015-01-19 04:36:00.740000    xy
2015-01-19 09:48:00.740000    ab
2015-01-19 09:52:00.740000    ab
2015-01-20 11:45:00.740000    ab

And want to resample it by days, create a new column with counts and aggregate the labels into a list. 并希望按天重新采样,创建一个包含计数的新列并将标签汇总到一个列表中。 Such that I have the following result: 这样我得到以下结果:

           counts label
2015-01-17    1   [cc]
2015-01-18    0   []
2015-01-19    3   [ab, xy]
2015-01-20    1   [ab]

I'm new to pandas and don't know how to do it. 我是熊猫新手,不知道该怎么做。 I have read that DataFrame supports lists as column types. 我已经读过DataFrame支持将列表作为列类型。 I can count the days by DataFrame.resample() and by sum I can put the labels into one string. 我可以通过DataFrame.resample()来计算天数,并且可以通过sum将标签放入一个字符串中。 But this is not sufficient to produce the results. 但这不足以产生结果。

I have generated the data with 我已经生成了数据

from datetime import datetime, timedelta

from pandas import DataFrame, TimeGrouper
from random import randint, choice

n = 5
rnd_time = lambda: datetime.now() + timedelta(days=randint(0, 3), hours=randint(0, 24))
rnd_label = lambda: choice(['ab', 'cc', 'xyz'])

gen_times = [rnd_time() for _ in range(n)]
gen_labels = [rnd_label() for _ in range(n)]

df = DataFrame({'label': gen_labels}, index=gen_times)

So how can one produce the desired outcome? 那么,如何才能产生理想的结果呢?

Thank you in advance. 先感谢您。

You can do: 你可以做:

>>> df['counts'] = df.groupby(level=0).transform('count')
>>> df.resample('D', how={'counts': lambda x: x[0] if len(x) else 0, 
                          'label' : lambda x: list(set(x))})
            count     label
2015-01-17      1      [cc]
2015-01-18      0        []
2015-01-19      3  [xy, ab]
2015-01-20      1      [ab]

EDIT: If the order of the elements is important then replace list(set(x)) with list(OrderedDict.fromkeys(x)) . 编辑:如果元素的顺序很重要,则将list(set(x))替换为list(OrderedDict.fromkeys(x))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM