简体   繁体   中英

Slice pandas dataframe in groups of consecutive values

I have a dataframe containing sections of consecutive values that eventually "skip" (that is, are increased by more than 1). I would like to split the dataframe, similar to groupby function (alphabetic indexing just for show):

    A
a   1
b   2
c   3
d   6
e   7
f   8
g   11
h   12
i   13

# would return

a   1
b   2
c   3
-----
d   6
e   7
f   8
-----
g   11
h   12
i   13

Slightly improved for speed answer...

for k,g in df.groupby(df['A'] - np.arange(df.shape[0])):
    print g

My two cents just for the fun of it.

In [15]:

for grp, val in df.groupby((df.diff()-1).fillna(0).cumsum().A):
    print val
   A
a  1
b  2
c  3
   A
d  6
e  7
f  8
    A
g  11
h  12
i  13

We can use shift to compare if the difference between rows is larger than 1 and then construct a list of tuple pairs of the required indices:

In [128]:
# list comprehension of the indices where the value difference is larger than 1, have to add the first row index also
index_list = [df.iloc[0].name] + list(df[(df.value - df.value.shift()) > 1].index)
index_list
Out[128]:
['a', 'd', 'g']

we have to construct a list of tuple pairs of the ranges that we are interested in, note that in pandas the beg and end index values are included so we have to find the label for the previous row for the end range label:

In [170]:

final_range=[]
for i in range(len(index_list)):
    # handle last range value
    if i == len(index_list) -1:
        final_range.append((index_list[i], df.iloc[-1].name ))
    else:
        final_range.append( (index_list[i], df.iloc[ np.searchsorted(df.index, df.loc[index_list[i + 1]].name) -1].name))

final_range

Out[170]:
[('a', 'c'), ('d', 'f'), ('g', 'i')]

I use numpy's searchsorted to find the index value (integer based) where we can insert our value and then subtract 1 from this to get the previous row's index label value

In [171]:
# now print
for r in final_range:
    print(df[r[0]:r[1]])
       value
index       
a          1
b          2
c          3
       value
index       
d          6
e          7
f          8
       value
index       
g         11
h         12
i         13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM