根据python pandas数据帧中列的状态更改，将时间序列数据拆分为组

Question

I need to group some data in a pandas dataframe but the standard grouping method does not quite work how I need it to. 我需要在pandas数据帧中对一些数据进行分组，但标准的分组方法并不能完全满足我的需求。 It must group so that each change in "loc" and/or each change in "name" is treated as a separate group. 它必须分组，以便“loc”中的每个更改和/或“name”中的每个更改都被视为一个单独的组。

Example; 例;

x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]])
x.columns = ['name','loc','time']

name    loc  time
john    abc  1
john    abc  2
john    abc  3
john    xyz  4
john    xyz  5
john    abc  6
john    abc  7
matt    abc  8

I need to group these values so that the resulting data is 我需要对这些值进行分组，以便得到结果数据

name    loc  first last
john    abc  1     3
john    xyz  4     5
john    abc  6     7
matt    abc  8     8

The default grouping function (correctly) groups all the loc and name values so we are only left with 3 groups (john / abc is 1 group). 默认分组功能（正确）将所有loc和name值分组，因此我们只剩下3组（john / abc是1组）。 Does anybody know how the grouping can be forced to group how i require it to? 有人知道如何将分组强制分组我需要它吗？

I'm able to generate the required table using a for loop (iterrows), but if there is a nice pandas pythonic way to do the same thing I would love to know. 我能够使用for循环（iterrows）生成所需的表，但如果有一个很好的pandas pythonic方式来做同样的事情我很想知道。

Thank you in advance. 先感谢您。

Matt 马特

Answer 1

This is not really a job for groupby because the order of the rows matters. 这对于groupby来说实际上并不是一项工作，因为行的顺序很重要。 Instead, compare consecutive rows by using shift . 相反，使用shift比较连续的行。

In [37]: cols = ['name', 'loc']

In [38]: change = (x[cols] != x[cols].shift(-1)).any(1).shift(1).fillna(True)

In [39]: groups = x[change]

In [40]: groups.columns = ['name', 'loc', 'first']

In [41]: groups['last'] = (groups['first'].shift(-1) - 1).fillna(len(x))

In [42]: groups
Out[42]:
   name  loc  first  last
0  john  abc      1     3
3  john  xyz      4     5
5  john  abc      6     7
7  matt  abc      8     8

[4 rows x 4 columns]

Answer 2

You can use a function in the groupby : 您可以在groupby使用一个函数：

x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]])
x.columns = ['name','loc','time']

last_group = None
c =0
def f(y):
    global c,last_group
    g = x.irow(y)['name'],x.irow(y)['loc']
    if last_group != g:
        c += 1
        last_group = g
    return c

print x.groupby(f).head()

根据python pandas数据帧中列的状态更改，将时间序列数据拆分为组

问题描述

2 个解决方案

解决方案1
0 2014-01-16 15:47:00

解决方案2
0 2014-01-16 16:19:31

根据python pandas数据帧中列的状态更改，将时间序列数据拆分为组

问题描述

2 个解决方案

解决方案1 0 2014-01-16 15:47:00

解决方案2 0 2014-01-16 16:19:31

解决方案1
0 2014-01-16 15:47:00

解决方案2
0 2014-01-16 16:19:31