在连续值组中切片pandas数据帧

Question

我有一个数据框，其中包含最终“跳过”的连续值的部分（即，增加超过1）。 我想分割数据帧，类似于groupby函数（仅用于show的字母索引）：

    A
a   1
b   2
c   3
d   6
e   7
f   8
g   11
h   12
i   13

# would return

a   1
b   2
c   3
-----
d   6
e   7
f   8
-----
g   11
h   12
i   13

Answer 1

速度答案略有改进......

for k,g in df.groupby(df['A'] - np.arange(df.shape[0])):
    print g

Answer 2

我的两分钱只是为了它的乐趣。

In [15]:

for grp, val in df.groupby((df.diff()-1).fillna(0).cumsum().A):
    print val
   A
a  1
b  2
c  3
   A
d  6
e  7
f  8
    A
g  11
h  12
i  13

Answer 3

如果行之间的差异大于1，我们可以使用shift来比较，然后构造所需索引的元组对列表：

In [128]:
# list comprehension of the indices where the value difference is larger than 1, have to add the first row index also
index_list = [df.iloc[0].name] + list(df[(df.value - df.value.shift()) > 1].index)
index_list
Out[128]:
['a', 'd', 'g']

我们必须构造一个我们感兴趣的范围的元组对列表，请注意，在pandas中包含了beg和end索引值，因此我们必须找到结束范围标签的前一行的标签：

In [170]:

final_range=[]
for i in range(len(index_list)):
    # handle last range value
    if i == len(index_list) -1:
        final_range.append((index_list[i], df.iloc[-1].name ))
    else:
        final_range.append( (index_list[i], df.iloc[ np.searchsorted(df.index, df.loc[index_list[i + 1]].name) -1].name))

final_range

Out[170]:
[('a', 'c'), ('d', 'f'), ('g', 'i')]

我使用numpy的searchsorted来查找索引值（基于整数），我们可以在其中插入我们的值，然后从中减去1以获取前一行的索引标签值

In [171]:
# now print
for r in final_range:
    print(df[r[0]:r[1]])
       value
index       
a          1
b          2
c          3
       value
index       
d          6
e          7
f          8
       value
index       
g         11
h         12
i         13

在连续值组中切片pandas数据帧

问题描述

3 个解决方案

解决方案1
11 已采纳 2014-09-30 14:58:03

解决方案2
1 2014-09-30 14:52:45

解决方案3
0 2014-09-30 13:21:09

在连续值组中切片pandas数据帧

问题描述

3 个解决方案

解决方案1 11 已采纳 2014-09-30 14:58:03

解决方案2 1 2014-09-30 14:52:45

解决方案3 0 2014-09-30 13:21:09

解决方案1
11 已采纳 2014-09-30 14:58:03

解决方案2
1 2014-09-30 14:52:45

解决方案3
0 2014-09-30 13:21:09