[英]How to drop 1st level index and then merge the remaining index values with custom logic for a pd DataFrame?
Say I have a MultiIndex DataFrame like so:假设我有一个像这样的 MultiIndex DataFrame:
price volume
year product city
2010 A LA 10 7
B SF 7 9
C NY 7 6
LA 18 21
SF 4 8
2011 A LA 13 5
B SF 2 4
C NY 9 3
SF 2 0
I want to do a somewhat complex merge where the first level of the DataFrame index (year) is dropped and the duplicates in the now first level index (product) in the DataFrame get merged according to some custom logic.我想做一个有点复杂的合并,其中删除 DataFrame 索引(年份)的第一级,并根据一些自定义逻辑合并 DataFrame 中现在第一级索引(产品)中的重复项。 In this case I would like to be able to set the price column to use the value from the 2010 outer index and the volume column to use the values from the 2011 outer index, but I would like a general solution that can be applied to more columns should they exist.
在这种情况下,我希望能够将价格列设置为使用 2010 年外部索引中的值,将交易量列设置为使用 2011 年外部索引中的值,但我想要一个可以应用于更多列应该存在。
Final DataFrame would look like this, where the price values are those from the 2010 index and the volume values are those from the 2011 index, where missing values are filled with NaNs.最终的 DataFrame 看起来像这样,其中价格值来自 2010 年指数,交易量值来自 2011 年指数,其中缺失值用 NaN 填充。
price volume
product city
A LA 10 5
B SF 7 4
C NY 7 3
LA 18 NaN
SF 4 0
You can select by first level by DataFrame.xs
and then concat
:您可以通过
DataFrame.xs
按第一级选择,然后concat
:
df = pd.concat([df.xs(2010)['price'], df.xs(2011)['volume']], axis=1)
Also is possible use loc
:也可以使用
loc
:
df = pd.concat([df.loc[2010, 'price'], df.loc[2011, 'volume']], axis=1)
print (df)
price volume
product city
A LA 10 5.0
B SF 7 4.0
C LA 18 NaN
NY 7 3.0
SF 4 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.