在pandas中使用multiindex设置值

Question

There are already a couple of questions on SO relating to this, most notably this one , however none of the answers seem to work for me and quite a few links to docs (especially on lexsorting) are broken, so I'll ask another one. 关于SO的问题已经有几个问题，尤其是这个问题，但是没有一个答案似乎对我有用，并且很多文档链接（特别是关于lexsorting）都被打破了，所以我会问另一个。

I'm trying do to something (seemingly) very simple. 我正在尝试做某事（看似）非常简单。 Consider the following MultiIndexed Dataframe: 请考虑以下MultiIndexed Dataframe：

import pandas as pd; import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)

Now I want to set all values in column 0 to some value (say np.NaN ) for the observations in category one . 现在我想将第0列中的所有值设置为某个值（例如np.NaN ）以用于第one类中的观察。 I've failed with: 我失败了：

df.loc(axis=0)[:, "one"][0] = 1 # setting with copy warning

and 和

df.loc(axis=0)[:, "one", 0] = 1

which either yields a warning about length of keys exceeding length of index, or one about a lack of lexsorting to sufficient depth. 这或者产生关于键的长度超过索引长度的警告，或者关于缺少lexsorting到足够深度的警告。

What is the correct way to do this? 这样做的正确方法是什么？

Answer 1

I think you can use loc with tuple for selecting MultiIndex and 0 for selecting column: 我认为您可以使用带有元组的loc来选择MultiIndex ，使用0来选择列：

import pandas as pd; 
import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

#add for testing
np.random.seed(0)
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)

print df
                     0         1
first second                    
bar   one     1.764052 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

df.loc[('bar', "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

If you need set all rows in level second with value one use slice(None) : 如果需要将second级中的所有行设置为值one使用slice(None) ：

df.loc[(slice(None), "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     1.000000  0.144044
      two     2.240893  1.454274
foo   one     1.000000  0.761038
      two    -0.977278  0.121675
qux   one     1.000000  0.443863
      two    -0.151357  0.333674

Docs . 文件。

在pandas中使用multiindex设置值

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-03-14 14:01:48

在pandas中使用multiindex设置值

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-03-14 14:01:48

解决方案1
4 已采纳 2016-03-14 14:01:48