简体   繁体   English

如何将包含 json 的 pandas 列拆分为给定 dataframe 中的新列?

[英]How can I split a pandas column containing a json into new columns in a given dataframe?

I am looking to split a dataframe column that contains a string of a dictionary into separate columns.我希望将包含字典字符串的 dataframe 列拆分为单独的列。 I've seen a few methods, but I want to avoid splitting the string since there are some inconsistencies.我见过几种方法,但我想避免拆分字符串,因为存在一些不一致之处。 For instance, "Melting Point" sometimes takes the place of "Boiling Point", but I do not want melting point and boiling point to be in the same column.例如,“熔点”有时会代替“沸点”,但我不希望熔点和沸点在同一列中。

Here is the column I am trying to split.这是我要拆分的专栏。

Json within Pandas Column Json Pandas 栏内

#example below
data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']
  
df = pd.DataFrame(data, columns=['experimental_properties'])

I want it to look like this first row:我希望它看起来像第一行:

Expected Output row 1预计 Output 第 1 行

I tried a method from here to no avail: How to convert JSON data inside a pandas column into new columns我从这里尝试了一种方法无济于事: How to convert JSON data inside a pandas column into new columns

pd.io.json.json_normalize(df.experimental_properties.apply(json.loads))

Help is much appreciated!非常感谢您的帮助!

I hope I've understood your question well.我希望我已经很好地理解了你的问题。 Try:尝试:

from ast import literal_eval

df["experimental_properties"] = df["experimental_properties"].apply(
    lambda x: {d["name"]: d["property"] for d in literal_eval(x)}
)
df = pd.concat([df, df.pop("experimental_properties").apply(pd.Series)], axis=1)

print(df)

Prints:印刷:

            Boiling Point                                Density
0                115.3 °C                                    NaN
1  91 °C @ Press: 20 Torr                                    NaN
2  58 °C @ Press: 12 Torr  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C

Is the expected output really what you are looking for?预期的 output 真的是您要找的吗? Another way to visualise the data would be to have "name", "property", and "sourceNumber" as column names.另一种可视化数据的方法是将“name”、“property”和“sourceNumber”作为列名。

import json
import pandas as pd

data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']

#Initialise a naiveList
naiveList = []

#String to List
for i in data:
    tempStringOfData = i
    tempStringOfData = tempStringOfData.replace("\'", "\"")
    tempJsonData = json.loads(tempStringOfData)
    naiveList.append(tempJsonData)

#Initialise a List for Dictionaries
newListOfDictionaries = []
for i in naiveList:
    for j in i:
        newListOfDictionaries.append(j)

df = pd.DataFrame(newListOfDictionaries)
print(df)

Which gives you这给你

            name                               property  sourceNumber
0  Boiling Point                               115.3 °C             1
1  Boiling Point                 91 °C @ Press: 20 Torr             1
2  Boiling Point                 58 °C @ Press: 12 Torr             1
3        Density  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C             1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Pandas 中将特定列拆分为新列? - How can I split a specific column to new columns in Pandas? 如何将数据从 Pandas 数据帧的一列拆分为新数据帧的多列 - How do I split data out from one column of a pandas dataframe into multiple columns of a new dataframe 如何将 pandas dataframe 中的多个列转换为包含这些列的字典的列? - How can I convert multiple columns in a pandas dataframe into a column containing dictionaries of those columns? 如何将单个 Pandas Dataframe 列的内容拆分为多个新列 - How to Split the Contents of a Single Pandas Dataframe Column into Multiple New Columns 将 pandas dataframe 列拆分为新的 4 列 - Split pandas dataframe column to new 4 columns Python:将包含列表和值的 pandas dataframe 列拆分为两列 - Python: Split pandas dataframe column containing a list and a value into two columns 如何将 pandas dataframe 列拆分为 3 个唯一列? - How do I split a pandas dataframe column into 3 unique columns? 如何在数据框熊猫中将一列分为两列? - How to split a column into two columns in dataframe pandas? Pandas dataframe,如何按多列分组并为特定列应用总和并添加新的计数列? - Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column? 如何使用新列拆分和替换数据框中的列 - How to split and replace column in dataframe with new columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM