简体   繁体   English

从顺序值创建范围,同时维护 pandas 中的其他列

[英]Creating ranges from sequential values, while maintaining other columns in pandas

I'm trying to find a way to consolidate sequential (consecutive?) numbers into a range, grouped by another column.我试图找到一种方法将顺序(连续?)数字合并到一个范围内,由另一列分组。

I've tried pynumparser and itertools , but I'm not clever enough to implement them to get the results I'm looking for.我已经尝试过pynumparseritertools ,但我不够聪明,无法实现它们以获得我正在寻找的结果。 Looking for some assistance and/or ideas.寻找一些帮助和/或想法。 Thank you!谢谢!

Input:输入:

| test_var   |   F1 |
|------------|------|
| ABC        |    1 |
| ABC        |    2 |
| DEF        |    3 |
| ABC        |    4 |
| ABC        |    5 |
| GHI        |    1 |
| GHI        |    2 |
| ABC        |    6 |

Goal output:目标输出:

F1_range is supposed to represent the min and max of sequential values per test_var. F1_range 应该代表每个 test_var 的顺序值的最小值和最大值。 Which there may be several sets.其中可能有几套。

A simple example is "GHI".一个简单的例子是“GHI”。 For F1 there is only 1 set of sequential values, 1-2.对于 F1,只有一组顺序值,1-2。

A more complicated example is "ABC", it has 2 sets of sequential values 1-2 and 4-6.一个更复杂的例子是“ABC”,它有 2 组顺序值 1-2 和 4-6。

| test_var   |   F1 | F1_range   |
|------------|------|------------|
| ABC        |    1 | 1-2        |
| ABC        |    2 | 1-2        |
| DEF        |    3 | 3          |
| ABC        |    4 | 4-6        |
| ABC        |    5 | 4-6        |
| GHI        |    1 | 1-2        |
| GHI        |    2 | 1-2        |
| ABC        |    6 | 4-6        |

Sample input data:样本输入数据:

df = pd.DataFrame(data={'test_var': {0: 'ABC',
  1: 'ABC',
  2: 'DEF',
  3: 'ABC',
  4: 'ABC',
  5: 'GHI',
  6: 'GHI',
  7: 'ABC'},
 'F1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 1, 6: 2, 7: 6}})

How to group equal neighbors along a column如何沿列对相等的邻居进行分组

Test data测试数据

df = pd.DataFrame({
    'test_var': ['ABC', 'ABC', 'DEF', 'ABC', 'ABC', 'ABC', 'GHI', 'GHI'],
    'F1': [1, 2, 3, 4, 6, 5, 1, 2],
    'F2': [10, 11, 1, 13, 16, 14, 2, 1]
})

We suppose that indexes are an ordinary RangeIndex starting from 0 with step 1.我们假设索引是一个普通的RangeIndex ,从 0 开始,第 1 步。

Main steps主要步骤

  1. Find indexes where the value in test_var differs from previous neighbors.查找test_var中的值与以前的邻居不同的索引。
  2. Split the data at those indexes vertically with numpy.vsplit .使用numpy.vsplit垂直拆分这些索引处的数据。
  3. join min/max values across the columns of interest in each group of the previous split. join上一次拆分的每组中感兴趣的列的最小/最大值。
columns = ['F1','F2']
ranges = [f'{name}_range' for name in columns]
df[ranges] = ''

test_var = df['test_var'].values
changed = np.zeros(len(df), dtype=np.bool)
changed[1:] = test_var[1:] != test_var[:-1]
groups = np.vsplit(df, df.index[changed])
sep = '-'

def get_range(index, column):
    data = df.loc[index, column]
    low, high = min(data), max(data)
    return f'{low}-{high}' if low < high else str(low)

for gr in groups:
    for col, rng in zip(columns, ranges):
        df.loc[gr.index, rng] = get_range(gr.index, col)

Output输出

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于python pandas中其他列的值创建新列 - Creating a new column based on values from other columns in python pandas 根据其他列的加权值创建新的pandas列 - Creating new pandas column from weighted values in other columns 使用大熊猫中其他两个列的值从分类变量中创建列 - Creating columns out of categorical variables with values from two other columns in pandas 熊猫。 匹配来自其他 DataFrame 的对应范围的值 - Pandas. Matching values with corresponding ranges from other DataFrame 从其他 pandas 列创建新列 - Creating New columns from other pandas column Pandas 使用来自其他 2 列的字符串值填充 - Pandas fillna with string values from 2 other columns 在Pandas中如何根据列的值对多索引的一个级别进行排序,同时保持另一级别的分组 - In Pandas How to sort one level of a multi-index based on the values of a column, while maintaining the grouping of the other level 嵌套np.where语句的替代方案,用于在基于其他两个现有列创建新的pandas布尔列时保留NaN值 - Alternative to nested np.where statements to retain NaN values while creating a new pandas boolean column based on two other existing columns 如何使用 Pandas 将值从多列传输到其他列? - How to transfer values from multiple columns to other columns using Pandas? 从具有重复值的 pandas 列中的值创建字典 - creating dictionaries from values in pandas columns with repeating values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM