简体   繁体   中英

Pandas flatten Hierarchical Multi-index

I have a pandas data frame as follows after attempting to flatten it:

df = pd.DataFrame(web.DataReader(stocks, 'yahoo', day, day).iloc[0]).unstack(level=0).droplevel(level=0, axis=1)

Attributes     adjClose        close  ...       volume       date
Symbols                               ...                        
FB           261.399994   261.399994  ...   13587000.0 2020-10-19
AAPL         115.980003   115.980003  ...  120639300.0 2020-10-19
AMZN        3207.209961  3207.209961  ...    5223600.0 2020-10-19
GOOG        1534.609985  1534.609985  ...    1607100.0 2020-10-19
NFLX                NaN          NaN  ...          NaN 2020-10-19

I'm trying to persist this to the database; however, I don't see Symbols in the df.columns. In order to save the df to a following format:

Symbols       adjClose        close  ...       volume       date 
FB           261.399994   261.399994  ...   13587000.0 2020-10-19
AAPL         115.980003   115.980003  ...  120639300.0 2020-10-19
AMZN        3207.209961  3207.209961  ...    5223600.0 2020-10-19
GOOG        1534.609985  1534.609985  ...    1607100.0 2020-10-19
NFLX                NaN          NaN  ...          NaN 2020-10-19

Any suggestions on how to achieve this? My database has a composite key on Symbols, date columns. Thank You.

Symbols is your dataframes index, you'll need to use reset_index to put it into the frame itself. Try this:

df = (pd.DataFrame(web.DataReader(stocks, 'yahoo', day, day)
      .iloc[0])
      .unstack(level=0)
      .droplevel(level=0, axis=1)
      .rename_axis(columns=None) # Gets rid of the "Attributes"
      .reset_index()             # Puts "Symbols" as an actual column, not as the index
)

My 2 addition:

  • rename_axis This should get rid of your "Attributes" title. This is mainly for visual purposes when printing, but can throw off people who aren't used to working with multiindex data. Essentially your column labels are stored in an Index object. This Index object can have a name, so "Attributes" is the name of your columns (pretty strange concept, that isn't super useful for normal Indexes- but has a lot of usefulness when working with a MultiIndex ).
  • reset_index() It seems that your "Symbols" column isn't actually a column (which is why it doesn't appear in df.columns but rather the index for the dataframe. Adding this method will insert the "Symbols" index as a column into the dataframe, and create a new index that is a simple RangeIndex that spans from 0 to the length of your dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM