简体   繁体   English

有没有办法在python3 jupyter笔记本中取消嵌套熊猫数据框?

[英]Is there a way to un-nesting a pandas dataframe in a python3 jupyter notebook?

I am importing a json file into a python3 jupyter notebook. 我正在将json文件导入python3 jupyter笔记本中。 The json file has the format json文件具有以下格式

  1. object 宾语
    • rooms [26 elements] 房间[26个要素]
      • 0 0
        • turns 转弯
          • fromBathroom 从浴室
          • fromParking 从停车
        • distances 距离
          • dfromBathroom 浴室
          • dfromParking 停车
        • depth 深度
        • area 区域
      • 1 1个
        • .... etc. ....等
    • name 名称

I am importing the json file in this way: 我以这种方式导入json文件:

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize

with open("rooms.json") as file:
  data = json.load(file)
df = json_normalize(data['rooms'])

I am now trying to plot each of the 6 dimensions against each other in a matrix-like format, with 36 total graphs. 我现在正尝试以矩阵状格式绘制6个维度中的每个维度,总共绘制36张图。

I am trying to this the following way: 我正在尝试通过以下方式:

col_features = ['fromBathroom', 'fromParking', 'dfromBathroom', 'dfromParking', 'depth', 'area']
pd.plotting.scatter_matrix(df[col_features], alpha = .2, figsize = (14,8))

This does not work, as I am getting an error that reads: KeyError: "['fromBathroom' 'fromParking' 'dfromBathroom' 'dfromParking'] not in index" 这不起作用,因为我收到一条错误,内容为:KeyError:“ ['from'Bathroom''fromParking''dfromBathroom''dfromParking']不在索引中”

This is because those features are nested in 'turns' and 'distances' in the json file. 这是因为这些功能嵌套在json文件的“转弯”和“距离”中。 Is there a way to un-nest these features so that I can index into the dataframe the same way I can for depth and area to get the values? 有没有一种方法可以取消嵌套这些功能,以便我可以像深度和面积那样获取值的方式索引到数据框中?

Thank you for any insights. 感谢您的任何见解。

Maybe you could extract df1 = df['turns'] , df2 = df['distances'] and df3 = df['areas', 'depth] and then do a df4 = pd.concat([df1, df2, df3], join='inner', axis=1) see pandas doc 也许您可以提取df1 = df['turns']df2 = df['distances']df3 = df['areas', 'depth]然后执行df4 = pd.concat([df1, df2, df3], join='inner', axis=1) 参见pandas doc

or directly : pd.concat([df['turns'], df['distances'], df['areas', 'depth]], join='inner', axis=1) 或直接: pd.concat([df['turns'], df['distances'], df['areas', 'depth]], join='inner', axis=1)

EDIT : 编辑:

I tried something, I hope it is what you are looking for : 我尝试了一些东西,希望它是您要寻找的东西:

link to the image with the code and the results I get with Jupyter 链接到带有代码和通过Jupyter获得的结果的图像

df1 = df['turns']
df2 = df['distances']
df3 = pd.DataFrame(df['depth'])
df4 = pd.DataFrame(df['area'])
df_recomposed = pd.concat([df1, df2, df3, df4], join='inner', axis=1)

or Pandas - How to flatten a hierarchical index in columns Pandas-如何展平列中的层次结构索引

where df.columns = [' '.join(col).strip() for col in df.columns.values] should be what you are looking for df.columns = [' '.join(col).strip() for col in df.columns.values]应该是您要查找的内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM