简体   繁体   中英

Is there a way to un-nesting a pandas dataframe in a python3 jupyter notebook?

I am importing a json file into a python3 jupyter notebook. The json file has the format

  1. object
    • rooms [26 elements]
      • 0
        • turns
          • fromBathroom
          • fromParking
        • distances
          • dfromBathroom
          • dfromParking
        • depth
        • area
      • 1
        • .... etc.
    • name

I am importing the json file in this way:

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize

with open("rooms.json") as file:
  data = json.load(file)
df = json_normalize(data['rooms'])

I am now trying to plot each of the 6 dimensions against each other in a matrix-like format, with 36 total graphs.

I am trying to this the following way:

col_features = ['fromBathroom', 'fromParking', 'dfromBathroom', 'dfromParking', 'depth', 'area']
pd.plotting.scatter_matrix(df[col_features], alpha = .2, figsize = (14,8))

This does not work, as I am getting an error that reads: KeyError: "['fromBathroom' 'fromParking' 'dfromBathroom' 'dfromParking'] not in index"

This is because those features are nested in 'turns' and 'distances' in the json file. Is there a way to un-nest these features so that I can index into the dataframe the same way I can for depth and area to get the values?

Thank you for any insights.

Maybe you could extract df1 = df['turns'] , df2 = df['distances'] and df3 = df['areas', 'depth] and then do a df4 = pd.concat([df1, df2, df3], join='inner', axis=1) see pandas doc

or directly : pd.concat([df['turns'], df['distances'], df['areas', 'depth]], join='inner', axis=1)

EDIT :

I tried something, I hope it is what you are looking for :

link to the image with the code and the results I get with Jupyter

df1 = df['turns']
df2 = df['distances']
df3 = pd.DataFrame(df['depth'])
df4 = pd.DataFrame(df['area'])
df_recomposed = pd.concat([df1, df2, df3, df4], join='inner', axis=1)

or Pandas - How to flatten a hierarchical index in columns

where df.columns = [' '.join(col).strip() for col in df.columns.values] should be what you are looking for

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM