从某个范围内的列中找到一个固定值，并在熊猫数据框中找到另一列的每个唯一值

Question

I have a data frame like this: 我有一个像这样的数据框：

df
col1      col2
 1        50000
 1        2000
 2        51000
 3        100
 3        5000
 3        50500
 4        200
 4        51500
 5        49000

I want to identify the values with plus minus 10 percent for each of col1 values which occurs for every col1 unique values. 我想为每个col1唯一值出现的每个col1值标识正负10％的值。

the final output should look like 最终输出应如下所示

col1        col2
  1         50000
  2         51000
  3         50500
  4         51500
  5         49000

if other values other than the values around 50000 presents and have within plus minus 10 percent range, add those with the values around 50000 如果存在除50000左右以外的其他值并且在正负10％范围内，则将那些具有50000左右的值相加

How to do it using pandas/python with most efficient way ? 如何以最有效的方式使用pandas / python？

Answer 1

Use list cpmprehension for loop by all unique values of col2 , filter by +-10% with Series.between and boolean indexing and compare if all values exist in all groups with set created by col1 . 使用列表cpmprehension for循环的所有唯一值col2通过，过滤器+-10%与Series.between和boolean indexing ，如果各组存在通过创建集中的所有值进行比较col1 。 Last filter by Series.isin : 按Series.isin最后一个过滤器：

s = set(df['col1'])
print (s)
{1, 2, 3, 4, 5}

a = [x for x in df['col2'].unique() 
     if set(df.loc[df['col2'].between(x - x *.1, x + x*.1), 'col1']) == s]
print (a)
[50000, 51000, 50500, 51500, 49000]

df = df[df['col2'].isin(a)]
print (df)
   col1   col2
0     1  50000
2     2  51000
5     3  50500
7     4  51500
8     5  49000

从某个范围内的列中找到一个固定值，并在熊猫数据框中找到另一列的每个唯一值

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-05-21 06:50:36

从某个范围内的列中找到一个固定值，并在熊猫数据框中找到另一列的每个唯一值

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-05-21 06:50:36

解决方案1
1 已采纳 2019-05-21 06:50:36