python pandas通过另一个系列过滤数据框，多列

Question

在获得最高交付日期的一系列天数之后，如何在那几天过滤掉原始数据框？ 鉴于这两个：

most_liquid_contracts.head(20)
Out[32]: 
2007-04-26    706
2007-04-27    706
2007-04-29    706
2007-04-30    706
2007-05-01    706
2007-05-02    706
2007-05-03    706
2007-05-04    706
2007-05-06    706
2007-05-07    706
2007-05-08    706
2007-05-09    706
2007-05-10    706
2007-05-11    706
2007-05-13    706
2007-05-14    706
2007-05-15    706
2007-05-16    706
2007-05-17    706
2007-05-18    706
dtype: int64

df.head(20).to_string
Out[40]: 
<bound method DataFrame.to_string of                            
                              delivery  volume
2007-04-27 11:55:00+01:00       705       1
2007-04-27 13:46:00+01:00       705       1
2007-04-27 14:15:00+01:00       705       1
2007-04-27 14:33:00+01:00       705       1
2007-04-27 14:35:00+01:00       705       1
2007-04-27 17:05:00+01:00       705      16
2007-04-27 17:07:00+01:00       705       1
2007-04-27 17:12:00+01:00       705       1
2007-04-27 17:46:00+01:00       705       1
2007-04-27 18:25:00+01:00       705       2
2007-04-26 23:00:00+01:00       706      10
2007-04-26 23:01:00+01:00       706      12
2007-04-26 23:02:00+01:00       706       1
2007-04-26 23:05:00+01:00       706      21
2007-04-26 23:06:00+01:00       706      10
2007-04-26 23:07:00+01:00       706      19
2007-04-26 23:08:00+01:00       706       1
2007-04-26 23:13:00+01:00       706      10
2007-04-26 23:14:00+01:00       706      62
2007-04-26 23:15:00+01:00       706       3>

我试过了：

liquid = df[df.index.date==most_liquid_contracts.index & df['delivery']==most_liquid_contracts]

还是我需要合并？ 似乎不太优雅，我也不确定。

# ATTEMPT 1
most_liquid_contracts.index = pd.to_datetime(most_liquid_contracts.index, unit='d')
df['days'] = pd.to_datetime(df.index.date, unit='d')
mlc = most_liquid_contracts.to_frame(name='delivery')
mlc['days'] = mlc.index.date
data = pd.merge(mlc, df, on=['delivery', 'days'], left_index=True)

# ATTEMPT 2
liquid = pd.merge(mlc, df, on='delivery', how='inner', left_index=True)
# this gets me closer (ie. retains granularity), but somehow seems to be an outer join? it includes the union but not the intersection. this should be a subset of df, but instead has about x50 the rows, at around 195B. df originally has 4B

但是我似乎无法保留原始“ df”中所需的分钟级别的粒度。 本质上，我只需要对流动性最高的合约使用“ df”（来自most_liquid_contracts系列；例如，4月27日将仅包含标记为“ 706”的合同，4月29日将仅包含“ 706”标记的合同）。 然后对完全相反的第二DF：对于所有其他合同（即，不是最液体）一个DF。

更新：有关更多信息- 在此处输入图片说明

Answer 1

棘手的部分是合并两个索引/日期时间分辨率不同的系列/数据框。 一旦将它们智能地组合在一起，就可以正常进行过滤。

# Make sure your series has a name
# Make sure the index is pure dates, not date 00:00:00
most_liquid_contracts.name = 'most'
most_liquid_conttracts.index = most_liquid_contracts.index.date

data = df
data['day'] = data.index.date
combined = data.join(most_liquid_contracts, on='day', how='left')

现在你可以做类似的事情

combined[combined.delivery == combined.most]

这将产生data （ df ）中的行，其中data.delivery等于当天的most_liquid_contracts的值。

Answer 2

我假设我已经正确理解了您，并且most_liquid_contracts系列是包含N个整数N的最大交货量的序列。您想过滤df，使其仅包括输送量足够高的天数来构成列表。 因此，您可以简单地删除df中不大于most_liquid_contracts最小值的所有内容。

threshold = min(most_liquid_contracts)
filtered = df[df['delivery'] >= threshold]

python pandas通过另一个系列过滤数据框，多列

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-01-13 17:56:52

解决方案2
0 2015-01-13 18:02:40

python pandas通过另一个系列过滤数据框，多列

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-01-13 17:56:52

解决方案2 0 2015-01-13 18:02:40

解决方案1
1 已采纳 2015-01-13 17:56:52

解决方案2
0 2015-01-13 18:02:40