[英]find a fix value from a column around a range with each unique values of another column in pandas data frame
I have a data frame like this: 我有一个像这样的数据框:
df
col1 col2
1 50000
1 2000
2 51000
3 100
3 5000
3 50500
4 200
4 51500
5 49000
I want to identify the values with plus minus 10 percent for each of col1 values which occurs for every col1 unique values. 我想为每个col1唯一值出现的每个col1值标识正负10%的值。
the final output should look like 最终输出应如下所示
col1 col2
1 50000
2 51000
3 50500
4 51500
5 49000
if other values other than the values around 50000 presents and have within plus minus 10 percent range, add those with the values around 50000 如果存在除50000左右以外的其他值并且在正负10%范围内,则将那些具有50000左右的值相加
How to do it using pandas/python with most efficient way ? 如何以最有效的方式使用pandas / python?
Use list cpmprehension for loop by all unique values of col2
, filter by +-10%
with Series.between
and boolean indexing
and compare if all values exist in all groups with set created by col1
. 使用列表cpmprehension for循环的所有唯一值col2
通过,过滤器+-10%
与Series.between
和boolean indexing
,如果各组存在通过创建集中的所有值进行比较col1
。 Last filter by Series.isin
: 按Series.isin
最后一个过滤器:
s = set(df['col1'])
print (s)
{1, 2, 3, 4, 5}
a = [x for x in df['col2'].unique()
if set(df.loc[df['col2'].between(x - x *.1, x + x*.1), 'col1']) == s]
print (a)
[50000, 51000, 50500, 51500, 49000]
df = df[df['col2'].isin(a)]
print (df)
col1 col2
0 1 50000
2 2 51000
5 3 50500
7 4 51500
8 5 49000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.