简体   繁体   English

如何根据python中另一个df的值过滤一个df

[英]How to filter a df based on the values of another df in python

I have 2x df's with data in them, and the index are dates我有 2x df 的数据,索引是日期

dfcarry = {'EUR3m3m': [1.5, 0.6, 1.7, 1.5, -1.2],
           'EUR6m3m': [2.0, 1.2, 1.3, 0.6, -1.7],
         'EUR6m3m3m': [1.3, 1.0, -1.4, 0.5, np.nan]}
dfcarry = pd.DataFrame(dfcarry, index=['26-09-2016','25-09-2016','24-09-2016','23-09-2016'])

and

dfflags = {'EUR3m3m': [1, 0, 1, 1, -1],
           'EUR6m3m': [1, 1, 1, 0, -1],
         'EUR6m3m3m': [1, 1, -1, 0, 0]}
dfflags = pd.DataFrame(dfflags, index=['26-09-2016','25-09-2016','24-09-2016','23-09-2016'])

Now, what I want to do is to limit the abs value of the sum of the numbers to 1, so for any given date, I can not have more than 2 flags in the same direction, ie a 1 cancels out a -1:现在,我想要做的是将数字总和的abs值限制为 1,因此对于任何给定日期,我在同一方向上不能有超过 2 个标志,即 1 抵消了 -1:

if abs(sum(dfflags['26-09-2016'])) > 1:
    then convert one of the flags to zero

then, the integer I want to get rid of is the value where the corresponding abs(carry) number is the least.然后,我想去掉的整数是对应的 abs(carry) 数最少的值。 If we have too many 1's, we get rid of the 1 where the carry number is least.如果我们有太多的 1,我们会去掉进位数最少的 1。 If we have too many -1's in flags, we get rid of a -1 where the carry number is highest (least -ve)如果标志中有太多 -1,我们会去掉进位数最高的 -1(最少 -ve)

In my carry df, i have some nans (on purpose).在我的携带 df 中,我有一些 nans(故意)。

How do I do this?我该怎么做呢?

So to be clear, for the first date, '26-09-2016', the expected output is to keep a 1 at EUR6m3m, because I keep the 1 corresponsding to the highest absolute values of carry ( so I keep carry=2.0, and get rid of carry=1.5 and 1.3).所以要明确的是,对于第一个日期,“26-09-2016”,预期输出是保持 1 为 EUR6m3m,因为我保持 1 对应于最高进位绝对值(所以我保持进位=2.0,并去掉carry=1.5 和1.3)。

expected output overall is总体预期产出为

dfflags = {'EUR3m3m': [0, 0, 1, 1, 0],
         'EUR6m3m':   [1, 1, 1, 0, -1],
         'EUR6m3m3m': [0, 0, -1, 0, 0]}

Thanks谢谢

EDIT: takes into account the ABS sum of flags as coldsteel pointed out编辑:考虑到coldsteel指出的标志的ABS总和

n = 2 #how many flags can be maximum equal to 1
df = abs(dfflags) * dfcarry
for col in df.columns: #for each column, set the flags
    if sum(dfflags[col]) > n:
        n_new = n + 2 * len(dfflags[dfflags[col] == -1])
        threshold = min(df[col].nlargest(n=n_new))
        df[col] = np.where(df[col] >= threshold,1,0)
    else:
        df[col] = abs(dfflags[col])
dfflags = dfflags[df == 1].fillna(0) #apply the 'filter df' to the dfflags, only keeping the top n 1 in each column

Result结果

_________ EUR3m3m EUR6m3m EUR6m3m3m 26-09-2016 0.0 1.0 0.0 25-09-2016 0.0 0.0 1.0 24-09-2016 1.0 1.0 -1.0 23-09-2016 1.0 0.0 0.0 22-09-2016 0.0 0.0 0.0

Open question: Sort carry by values as given or by absolute value?开放性问题:按给定值还是按绝对值对进位进行排序?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM