如何从此熊猫多索引数据框中选择此类数据

Question

I'm working with futures market data, here is an example multi-index dataframe: 我正在使用期货市场数据，这是一个示例多索引数据框：

date_index = pd.date_range('2018-03-20', periods = 10)
contract = ['ZN1805', 'ZN1806', 'ZN1807']
price = ['open', 'close']
columns = pd.MultiIndex.from_product([contract, price], names=['contract', 'price'])
df1 = pd.DataFrame(data=np.random.randint(100, 150, (10, columns.shape[0])), index=date_index, columns=columns)

df2 = pd.DataFrame(columns=['contract', 'close'], index=df1.index)
# Set the data in contract column randomly here for illustration
df2.contract = np.random.choice(contract, 10)

Here is what df1 looks like, 这是df1样子，

df1
Out[357]: 
contract   ZN1805       ZN1806       ZN1807      
price        open close   open close   open close
2018-03-20    145   144    116   127    107   128
2018-03-21    116   143    114   103    114   148
2018-03-22    101   135    143   125    140   129
2018-03-23    106   139    100   127    116   100
2018-03-24    104   101    148   132    102   140
2018-03-25    125   141    106   136    128   134
2018-03-26    148   146    142   143    108   137
2018-03-27    110   123    128   128    124   127
2018-03-28    144   143    117   116    112   140
2018-03-29    143   114    115   105    124   118

and df2 would be: 和df2将是：

df2
Out[364]: 
           contract close
2018-03-20   ZN1805   NaN
2018-03-21   ZN1807   NaN
2018-03-22   ZN1806   NaN
2018-03-23   ZN1807   NaN
2018-03-24   ZN1807   NaN
2018-03-25   ZN1806   NaN
2018-03-26   ZN1807   NaN
2018-03-27   ZN1806   NaN
2018-03-28   ZN1805   NaN
2018-03-29   ZN1807   NaN

My problem is how do I 'pythonically' fill in the close column of df2 from df1 that has the same date index and contract value? 我的问题是我怎么“pythonically”在填写close的列df2从df1具有相同的date索引和contract价值？

I tried this: 我尝试了这个：

from pandas import IndexSlice as idx
df2['close'] = df1.loc[df2.index, idx[df2.contract.values.tolist(), 'close']]

However I got an error: 但是我得到一个错误：

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'

I understand I could do an iterated way to filter each row, but any pythonic way to do it? 我知道我可以做一个迭代的方法来过滤每一行，但是有什么pythonic的方法吗？

Answer 1

Use join by 2 columns created by xs for select close level and unstack for reshape: 使用join由创建2列xs的选择close水平和unstack的重塑：

s = df1.xs('close', axis=1, level=1).unstack().rename('close')

df2 = (df2.drop('close', 1)
          .reset_index()
          .join(s, on=['contract', 'index'])
          .set_index('index')
          .rename_axis(None))

print (df2)
           contract  close
2018-03-20   ZN1805    124
2018-03-21   ZN1805    112
2018-03-22   ZN1807    118
2018-03-23   ZN1807    136
2018-03-24   ZN1805    103
2018-03-25   ZN1805    135
2018-03-26   ZN1805    138
2018-03-27   ZN1805    109
2018-03-28   ZN1805    129
2018-03-29   ZN1805    104

Answer 2

@jezrael's answer is very good, but if for someone who is not familiar with xs (like me), I just come up with a more complex way to get s first: @jezrael的回答非常好，但是如果对于不熟悉xs （例如我），我只是想出了一种更复杂的方法来获得s ：

s= df1.loc[:, idx[:, 'close']]
s.columns = s.columns.droplevel(1)
s = s.unstack().rename('close')

Of course this 3-line statement doesn't look that attractive. 当然，这三行陈述看起来并不那么吸引人。 :D Then we can get df1 in the same way: ：D然后我们可以用相同的方式得到df1：

df2 = (df2.drop('close', 1)
          .reset_index()
          .join(s, on=['contract', 'index'])
          .set_index('index')
          .rename_axis(None))

如何从此熊猫多索引数据框中选择此类数据

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-04-19 11:09:42

解决方案2
0 2018-04-20 01:35:26

如何从此熊猫多索引数据框中选择此类数据

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-04-19 11:09:42

解决方案2 0 2018-04-20 01:35:26

解决方案1
2 已采纳 2018-04-19 11:09:42

解决方案2
0 2018-04-20 01:35:26