从顺序值创建范围，同时维护 pandas 中的其他列

Question

I'm trying to find a way to consolidate sequential (consecutive?) numbers into a range, grouped by another column.我试图找到一种方法将顺序（连续？）数字合并到一个范围内，由另一列分组。

I've tried pynumparser and itertools , but I'm not clever enough to implement them to get the results I'm looking for.我已经尝试过pynumparser和itertools ，但我不够聪明，无法实现它们以获得我正在寻找的结果。 Looking for some assistance and/or ideas.寻找一些帮助和/或想法。 Thank you!谢谢！

Input:输入：

| test_var   |   F1 |
|------------|------|
| ABC        |    1 |
| ABC        |    2 |
| DEF        |    3 |
| ABC        |    4 |
| ABC        |    5 |
| GHI        |    1 |
| GHI        |    2 |
| ABC        |    6 |

Goal output:目标输出：

F1_range is supposed to represent the min and max of sequential values per test_var. F1_range 应该代表每个 test_var 的顺序值的最小值和最大值。 Which there may be several sets.其中可能有几套。

A simple example is "GHI".一个简单的例子是“GHI”。 For F1 there is only 1 set of sequential values, 1-2.对于 F1，只有一组顺序值，1-2。

A more complicated example is "ABC", it has 2 sets of sequential values 1-2 and 4-6.一个更复杂的例子是“ABC”，它有 2 组顺序值 1-2 和 4-6。

| test_var   |   F1 | F1_range   |
|------------|------|------------|
| ABC        |    1 | 1-2        |
| ABC        |    2 | 1-2        |
| DEF        |    3 | 3          |
| ABC        |    4 | 4-6        |
| ABC        |    5 | 4-6        |
| GHI        |    1 | 1-2        |
| GHI        |    2 | 1-2        |
| ABC        |    6 | 4-6        |

Sample input data:样本输入数据：

df = pd.DataFrame(data={'test_var': {0: 'ABC',
  1: 'ABC',
  2: 'DEF',
  3: 'ABC',
  4: 'ABC',
  5: 'GHI',
  6: 'GHI',
  7: 'ABC'},
 'F1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 1, 6: 2, 7: 6}})

Answer 1

How to group equal neighbors along a column如何沿列对相等的邻居进行分组

Test data测试数据

df = pd.DataFrame({
    'test_var': ['ABC', 'ABC', 'DEF', 'ABC', 'ABC', 'ABC', 'GHI', 'GHI'],
    'F1': [1, 2, 3, 4, 6, 5, 1, 2],
    'F2': [10, 11, 1, 13, 16, 14, 2, 1]
})

We suppose that indexes are an ordinary RangeIndex starting from 0 with step 1.我们假设索引是一个普通的RangeIndex ，从 0 开始，第 1 步。

Main steps主要步骤

Find indexes where the value in test_var differs from previous neighbors.查找test_var中的值与以前的邻居不同的索引。
Split the data at those indexes vertically with numpy.vsplit .使用numpy.vsplit垂直拆分这些索引处的数据。
join min/max values across the columns of interest in each group of the previous split. join上一次拆分的每组中感兴趣的列的最小/最大值。

columns = ['F1','F2']
ranges = [f'{name}_range' for name in columns]
df[ranges] = ''

test_var = df['test_var'].values
changed = np.zeros(len(df), dtype=np.bool)
changed[1:] = test_var[1:] != test_var[:-1]
groups = np.vsplit(df, df.index[changed])
sep = '-'

def get_range(index, column):
    data = df.loc[index, column]
    low, high = min(data), max(data)
    return f'{low}-{high}' if low < high else str(low)

for gr in groups:
    for col, rng in zip(columns, ranges):
        df.loc[gr.index, rng] = get_range(gr.index, col)

从顺序值创建范围，同时维护 pandas 中的其他列

问题描述

Input:输入：

Goal output:目标输出：

Sample input data:样本输入数据：

1 个解决方案

解决方案1
0 已采纳 2022-05-22 23:02:25

How to group equal neighbors along a column如何沿列对相等的邻居进行分组

Test data测试数据

Main steps主要步骤

Output输出

从顺序值创建范围，同时维护 pandas 中的其他列

问题描述

Input:输入：

Goal output:目标输出：

Sample input data:样本输入数据：

1 个解决方案

解决方案1 0 已采纳 2022-05-22 23:02:25

How to group equal neighbors along a column如何沿列对相等的邻居进行分组

Test data测试数据

Main steps主要步骤

Output输出

解决方案1
0 已采纳 2022-05-22 23:02:25