[英]Assigning values to Pandas Multiindex DataFrame by index level
I have a Pandas multiindex dataframe and I need to assign values to one of the columns from a series.我有一个 Pandas 多索引 dataframe,我需要将值分配给系列中的其中一列。 The series shares its index with the first level of the index of the dataframe.
该系列与 dataframe 索引的第一级共享其索引。
import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s
out:出去:
A B
bar one NaN NaN
two NaN NaN
three NaN NaN
baz one NaN NaN
foo one NaN NaN
two NaN NaN
bar True
baz False
foo True
dtype: bool
These don't work:这些不起作用:
df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error
expected output:预计 output:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
Series (and dictionaries) can be used just like functions with map and apply (thanks to @normanius for improving the syntax): 系列(和字典)可以像map和apply一样使用函数(感谢@normanius改进语法):
df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values
Or similarly: 或类似地:
df['A'] = df.reset_index(level=0)['level_0'].map(s).values
Results: 结果:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
df.A = s
does not raise an error, but does nothingdf.A = s
不会引发错误,但什么也不做
Indeed this should have worked.确实这应该有效。 Your point is actually related to mine .
你的观点实际上与我的观点有关。
ᐊᐊ The workaround ᐊᐊ ᐊᐊ解决方法ᐊᐊ
>>> s.index = pd.Index((c,) for c in s.index) # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
Why does the above work?为什么上面的工作?
Because when you do directly df.A = s
without the workaround , you are actually trying to assign pandas.Index
-contained coordinates within a subclass instance, which somehow looks like a "counter-opposition" to the LS principle ie an instance of pandas.MultiIndex
.因为当您直接
df.A = s
而没有解决方法时,您实际上是在尝试在子类实例中分配pandas.Index
的坐标,这在某种程度上看起来像是对LS 原则的“反反对”,即pandas.MultiIndex
的实例pandas.MultiIndex
。 I mean, look for yourself:我的意思是,寻找你自己:
>>> type(s.index).__name__
'Index'
whereas然而
>>> type(df.index).__name__
'MultiIndex'
Hence this workaround that consists in turning s
's index into a 1-dimensional pandas.MultiIndex
instance.因此,此解决方法包括将
s
的索引转换为一维pandas.MultiIndex
实例。
>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'
and nothing has perceptibly changed一切都没有明显改变
>>> s
bar True
baz False
foo True
dtype: bool
A thought: From many views (mathematical, ontological) all this somehow shows that pandas.Index
should have been designed as a subclass of pandas.MultiIndex
, not the opposite, as it is currently.一个想法:从许多观点(数学,本体论)来看,所有这些都以某种方式表明
pandas.Index
应该被设计为pandas.MultiIndex
的子类,而不是像现在这样相反。
You can use the join
method on the df
DataFrame, but you need to name the indexes and the series accordingly:您可以在
df
DataFrame 上使用join
方法,但您需要相应地命名索引和系列:
>>> df.index.names = ('lvl0', 'lvl1')
>>> s.index.name = 'lvl0'
>>> s.name = 'new_col'
Then the join method creates a new column in the DataFrame:然后 join 方法在 DataFrame 中创建一个新列:
>>> df.join(s)
A B new_col
lvl0 lvl1
bar one NaN NaN True
two NaN NaN True
three NaN NaN True
baz one NaN NaN False
foo one NaN NaN True
two NaN NaN True
To assign it to an existing column:要将其分配给现有列:
>>> df['A'] = df.join(s)['new_col']
>>> df
A B
lvl0 lvl1
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.