简体   繁体   中英

How to join a multi-index series to a single index dataframe with Pandas?

Consider the following single index DataFrame:

      energy    fat
1      2000      28
2      1900      17
3      2200      30
4      1750      15
5      1800      18
6      1600      12

I also have a multindex Series:

1  vitamin-c    0.0004
   vitamin-a    0.0150
2  vitamin-c    0.0030
3  vitamin-d    1.2000
   vitamin-e    1.0007
   vitamin-c    1.2020
4  vitamin-a    0.0780
5  vitamin-b    0.9650
6  vitamin-e    1.9801
   vitamin-c    1.0011

How can I join the two so the result looks like this:

      energy    fat          vitamins
1      2000      28     vitamin-c    0.0004
                        vitamin-a    0.0150
2      1900      17     vitamin-c    0.0030
3      2200      30     vitamin-d    1.2000
                        vitamin-e    1.0007
                        vitamin-c    1.2020
4      1750      15     vitamin-a    0.0780
5      1800      18     vitamin-b    0.9650
6      1600      12     vitamin-e    1.9801
                        vitamin-c    1.0011

I tried df.join(series, how = 'inner') but all I got is the following error message:

"ValueError: cannot join with no level specified and no overlapping names"

Can someone please explain me what im doing wrong here and how i can achieve the combination of the two ? Thank you !

Option 1
I don't suggest moving things into the index that shouldn't be there.
That said, you can use pd.DataFrame.join if your index levels are appropriately named, or rather they match so pandas knows what to join on.

df.rename_axis('ord').join(s.rename_axis(['ord', 'vit']).rename('val'))

               energy  fat     val
ord vit                           
1   vitamin-c    2000   28  0.0004
    vitamin-a    2000   28  0.0150
2   vitamin-c    1900   17  0.0030
3   vitamin-d    2200   30  1.2000
    vitamin-e    2200   30  1.0007
    vitamin-c    2200   30  1.2020
4   vitamin-a    1750   15  0.0780
5   vitamin-b    1800   18  0.9650
6   vitamin-e    1600   12  1.9801
    vitamin-c    1600   12  1.0011

In a couple more lines to add readability

s = s.rename_axis(['ord', 'vit']).rename('val')
df = df.rename_axis('ord')

df.join(s)

               energy  fat     val
ord vit                           
1   vitamin-c    2000   28  0.0004
    vitamin-a    2000   28  0.0150
2   vitamin-c    1900   17  0.0030
3   vitamin-d    2200   30  1.2000
    vitamin-e    2200   30  1.0007
    vitamin-c    2200   30  1.2020
4   vitamin-a    1750   15  0.0780
5   vitamin-b    1800   18  0.9650
6   vitamin-e    1600   12  1.9801
    vitamin-c    1600   12  1.0011

Option 2
We can also use pd.concat with loc and pd.Index.get_level_values

pd.concat(
    [df.loc[s.index.get_level_values(0)].set_index(s.index), s.rename('val')],
    axis=1
)

             energy  fat     val
1 vitamin-c    2000   28  0.0004
  vitamin-a    2000   28  0.0150
2 vitamin-c    1900   17  0.0030
3 vitamin-d    2200   30  1.2000
  vitamin-e    2200   30  1.0007
  vitamin-c    2200   30  1.2020
4 vitamin-a    1750   15  0.0780
5 vitamin-b    1800   18  0.9650
6 vitamin-e    1600   12  1.9801
  vitamin-c    1600   12  1.0011

Source sets:

In [96]: s
Out[96]:
id   vitamins
1.0  vitamin-c    0.0004
     vitamin-a    0.0150
2.0  vitamin-c    0.0030
3.0  vitamin-d    1.2000
     vitamin-e    1.0007
     vitamin-c    1.2020
4.0  vitamin-a    0.0780
5.0  vitamin-b    0.9650
6.0  vitamin-e    1.9801
     vitamin-c    1.0011
Name: val, dtype: float64

In [97]: df
Out[97]:
   energy  fat
1    2000   28
2    1900   17
3    2200   30
4    1750   15
5    1800   18
6    1600   12

Solution:

In [99]: s.reset_index() \
          .merge(df, left_on='id', right_index=True) \
          .set_index(['id','energy','fat','vitamins'])
Out[99]:
                             val
id  energy fat vitamins
1.0 2000   28  vitamin-c  0.0004
               vitamin-a  0.0150
2.0 1900   17  vitamin-c  0.0030
3.0 2200   30  vitamin-d  1.2000
               vitamin-e  1.0007
               vitamin-c  1.2020
4.0 1750   15  vitamin-a  0.0780
5.0 1800   18  vitamin-b  0.9650
6.0 1600   12  vitamin-e  1.9801
               vitamin-c  1.0011

If you add names to the index/multiindex you can use a join:

In [11]: df
Out[11]:
   energy  fat
n
1    2000   28
2    1900   17
3    2200   30
4    1750   15
5    1800   18
6    1600   12

In [12]: df2
Out[12]:
                val
n vitamin
1 vitamin-c  0.0004
  vitamin-a  0.0150
2 vitamin-c  0.0030
3 vitamin-d  1.2000
  vitamin-e  1.0007
  vitamin-c  1.2020
4 vitamin-a  0.0780
5 vitamin-b  0.9650
6 vitamin-e  1.9801
  vitamin-c  1.0011

In [13]: df.join(df2)
Out[13]:
             energy  fat     val
n vitamin
1 vitamin-c    2000   28  0.0004
  vitamin-a    2000   28  0.0150
2 vitamin-c    1900   17  0.0030
3 vitamin-d    2200   30  1.2000
  vitamin-e    2200   30  1.0007
  vitamin-c    2200   30  1.2020
4 vitamin-a    1750   15  0.0780
5 vitamin-b    1800   18  0.9650
6 vitamin-e    1600   12  1.9801
  vitamin-c    1600   12  1.0011

Note: Do this by setting the .index.names:

In [21]: df.index.names = ["n"]  # or .name = "n"

In [22]: df2.index.names = ["n", "vitamin"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM