简体   繁体   English

在连续值组中切片pandas数据帧

[英]Slice pandas dataframe in groups of consecutive values

I have a dataframe containing sections of consecutive values that eventually "skip" (that is, are increased by more than 1). 我有一个数据框,其中包含最终“跳过”的连续值的部分(即,增加超过1)。 I would like to split the dataframe, similar to groupby function (alphabetic indexing just for show): 我想分割数据帧,类似于groupby函数(仅用于show的字母索引):

    A
a   1
b   2
c   3
d   6
e   7
f   8
g   11
h   12
i   13

# would return

a   1
b   2
c   3
-----
d   6
e   7
f   8
-----
g   11
h   12
i   13

Slightly improved for speed answer... 速度答案略有改进......

for k,g in df.groupby(df['A'] - np.arange(df.shape[0])):
    print g

My two cents just for the fun of it. 我的两分钱只是为了它的乐趣。

In [15]:

for grp, val in df.groupby((df.diff()-1).fillna(0).cumsum().A):
    print val
   A
a  1
b  2
c  3
   A
d  6
e  7
f  8
    A
g  11
h  12
i  13

We can use shift to compare if the difference between rows is larger than 1 and then construct a list of tuple pairs of the required indices: 如果行之间的差异大于1,我们可以使用shift来比较,然后构造所需索引的元组对列表:

In [128]:
# list comprehension of the indices where the value difference is larger than 1, have to add the first row index also
index_list = [df.iloc[0].name] + list(df[(df.value - df.value.shift()) > 1].index)
index_list
Out[128]:
['a', 'd', 'g']

we have to construct a list of tuple pairs of the ranges that we are interested in, note that in pandas the beg and end index values are included so we have to find the label for the previous row for the end range label: 我们必须构造一个我们感兴趣的范围的元组对列表,请注意,在pandas中包含了beg和end索引值,因此我们必须找到结束范围标签的前一行的标签:

In [170]:

final_range=[]
for i in range(len(index_list)):
    # handle last range value
    if i == len(index_list) -1:
        final_range.append((index_list[i], df.iloc[-1].name ))
    else:
        final_range.append( (index_list[i], df.iloc[ np.searchsorted(df.index, df.loc[index_list[i + 1]].name) -1].name))

final_range

Out[170]:
[('a', 'c'), ('d', 'f'), ('g', 'i')]

I use numpy's searchsorted to find the index value (integer based) where we can insert our value and then subtract 1 from this to get the previous row's index label value 我使用numpy的searchsorted来查找索引值(基于整数),我们可以在其中插入我们的值,然后从中减去1以获取前一行的索引标签值

In [171]:
# now print
for r in final_range:
    print(df[r[0]:r[1]])
       value
index       
a          1
b          2
c          3
       value
index       
d          6
e          7
f          8
       value
index       
g         11
h         12
i         13

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM