简体   繁体   English

在控制MultiIndex值的同时串联熊猫数据框和序列

[英]Concatenate Pandas dataframes and series while controlling MultiIndex values

Pandas concat method allows you to concatenate mixtures of Series and Dataframes but the default way it infers column names for the series in the resulting dataframe is not quite what I want. Pandas concat方法允许您串联Series和Dataframe的混合,但是它在结果数据框中推断系列的列名的默认方式不是我想要的。

Example: 例:

Say I have a dictionary containing a collection of dataframes and series as values. 说我有一个字典,其中包含数据框和序列的集合作为值。

dict_of_series_and_dataframes = {
    'x': pd.Series([1, 2, 3]),
    'y': pd.Series([6, 5, 4]),
    'sizes': pd.DataFrame(100, columns=[1, 2, 3], index=range(3)),
    'z': pd.Series([0.1, 0.2, 0.3])
}

Combining them into one dataframe is very easy: 将它们组合成一个数据帧非常容易:

>>> pd.concat(dict_of_series_and_frames, axis=1)
  sizes            x  y    z
      1    2    3  0  1    2
0   100  100  100  1  6  0.1
1   100  100  100  2  5  0.2
2   100  100  100  3  4  0.3

The problem is the way Pandas has filled in the second level for the series. 问题在于熊猫如何填补该系列作品的第二层。 Seems to be a simple integer count (0, 1, 2, ...). 似乎是一个简单的整数(0、1、2,...)。 I would like to do something more logical such as labelling all the series ('Series name', None) or grouping them as ('Other', 'Series name') to make them easier to index later (all my series have unique names). 我想做一些更合乎逻辑的事情,例如标记所有系列(“系列名称”,“无”)或将它们分组为(“其他”,“系列名称”),以使以后更易于索引(我的所有系列都有唯一的名称) )。

I found out that pandas puts the series name in the second level if it has one: 我发现大熊猫将系列名称放在第二级(如果有的话):

dict_of_series_and_frames = {
    'x': pd.Series([1, 2, 3], name='x'),
    'y': pd.Series([6, 5, 4], name='y'),
    'sizes': pd.DataFrame(100, columns=[1, 2, 3], index=range(3)),
    'z': pd.Series([0.1, 0.2, 0.3])
}

>>> pd.concat(dict_of_series_and_frames, axis=1)
  sizes            x  y    z
      1    2    3  x  y    0
0   100  100  100  1  6  0.1
1   100  100  100  2  5  0.2
2   100  100  100  3  4  0.3

But I don't want to have to ensure that all the series are named correctly. 但我不想确保所有系列都正确命名。

Other than ignoring pandas attempt to build the index and doing it all by hand... 除了忽略大熊猫尝试建立索引并手动完成所有工作外...

>>> col_names = []
>>> for key, values in dict_of_series_and_frames.items():
...     try:
...         for value in values.columns:
...             col_names.append((key, value))
...     except AttributeError:
...         col_names.append((key, '-'))  # or ('Other', key) say
... 
>>> df = pd.concat(dict_of_series_and_frames, 
...                keys=dict_of_series_and_frames.keys(), 
...                axis=1, ignore_index=True)
>>> df.columns = pd.MultiIndex.from_tuples(col_names)
>>> df
   x  y sizes              z
   -  -     1    2    3    -
0  1  6   100  100  100  0.1
1  2  5   100  100  100  0.2
2  3  4   100  100  100  0.3

am I missing a simpler way to get the desired result above or something similar? 我是否错过了一种简单的方法来获得上述期望的结果或类似的结果?

Ideally in one line using concat. 理想情况下,使用concat在一行中。

You can modify/add series names already in the dictionary, and then apply concatenation: 您可以修改/添加字典中已经存在的系列名称,然后应用串联:

for k,v in dict_of_series_and_frames.items(): 
         if isinstance(v,pd.core.series.Series): 
             v.name="-" 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM