[英]How to filter a df based on the values of another df in python
I have 2x df's with data in them, and the index are dates我有 2x df 的数据,索引是日期
dfcarry = {'EUR3m3m': [1.5, 0.6, 1.7, 1.5, -1.2],
'EUR6m3m': [2.0, 1.2, 1.3, 0.6, -1.7],
'EUR6m3m3m': [1.3, 1.0, -1.4, 0.5, np.nan]}
dfcarry = pd.DataFrame(dfcarry, index=['26-09-2016','25-09-2016','24-09-2016','23-09-2016'])
and和
dfflags = {'EUR3m3m': [1, 0, 1, 1, -1],
'EUR6m3m': [1, 1, 1, 0, -1],
'EUR6m3m3m': [1, 1, -1, 0, 0]}
dfflags = pd.DataFrame(dfflags, index=['26-09-2016','25-09-2016','24-09-2016','23-09-2016'])
Now, what I want to do is to limit the abs value of the sum of the numbers to 1, so for any given date, I can not have more than 2 flags in the same direction, ie a 1 cancels out a -1:现在,我想要做的是将数字总和的abs值限制为 1,因此对于任何给定日期,我在同一方向上不能有超过 2 个标志,即 1 抵消了 -1:
if abs(sum(dfflags['26-09-2016'])) > 1:
then convert one of the flags to zero
then, the integer I want to get rid of is the value where the corresponding abs(carry) number is the least.然后,我想去掉的整数是对应的 abs(carry) 数最少的值。 If we have too many 1's, we get rid of the 1 where the carry number is least.
如果我们有太多的 1,我们会去掉进位数最少的 1。 If we have too many -1's in flags, we get rid of a -1 where the carry number is highest (least -ve)
如果标志中有太多 -1,我们会去掉进位数最高的 -1(最少 -ve)
In my carry df, i have some nans (on purpose).在我的携带 df 中,我有一些 nans(故意)。
How do I do this?我该怎么做呢?
So to be clear, for the first date, '26-09-2016', the expected output is to keep a 1 at EUR6m3m, because I keep the 1 corresponsding to the highest absolute values of carry ( so I keep carry=2.0, and get rid of carry=1.5 and 1.3).所以要明确的是,对于第一个日期,“26-09-2016”,预期输出是保持 1 为 EUR6m3m,因为我保持 1 对应于最高进位绝对值(所以我保持进位=2.0,并去掉carry=1.5 和1.3)。
expected output overall is总体预期产出为
dfflags = {'EUR3m3m': [0, 0, 1, 1, 0],
'EUR6m3m': [1, 1, 1, 0, -1],
'EUR6m3m3m': [0, 0, -1, 0, 0]}
Thanks谢谢
EDIT: takes into account the ABS sum of flags as coldsteel pointed out编辑:考虑到coldsteel指出的标志的ABS总和
n = 2 #how many flags can be maximum equal to 1
df = abs(dfflags) * dfcarry
for col in df.columns: #for each column, set the flags
if sum(dfflags[col]) > n:
n_new = n + 2 * len(dfflags[dfflags[col] == -1])
threshold = min(df[col].nlargest(n=n_new))
df[col] = np.where(df[col] >= threshold,1,0)
else:
df[col] = abs(dfflags[col])
dfflags = dfflags[df == 1].fillna(0) #apply the 'filter df' to the dfflags, only keeping the top n 1 in each column
Result结果
_________ EUR3m3m EUR6m3m EUR6m3m3m 26-09-2016 0.0 1.0 0.0 25-09-2016 0.0 0.0 1.0 24-09-2016 1.0 1.0 -1.0 23-09-2016 1.0 0.0 0.0 22-09-2016 0.0 0.0 0.0
Open question: Sort carry by values as given or by absolute value?开放性问题:按给定值还是按绝对值对进位进行排序?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.