简体   繁体   中英

How can I split a pandas column containing a json into new columns in a given dataframe?

I am looking to split a dataframe column that contains a string of a dictionary into separate columns. I've seen a few methods, but I want to avoid splitting the string since there are some inconsistencies. For instance, "Melting Point" sometimes takes the place of "Boiling Point", but I do not want melting point and boiling point to be in the same column.

Here is the column I am trying to split.

Json within Pandas Column

#example below
data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']
  
df = pd.DataFrame(data, columns=['experimental_properties'])

I want it to look like this first row:

Expected Output row 1

I tried a method from here to no avail: How to convert JSON data inside a pandas column into new columns

pd.io.json.json_normalize(df.experimental_properties.apply(json.loads))

Help is much appreciated!

I hope I've understood your question well. Try:

from ast import literal_eval

df["experimental_properties"] = df["experimental_properties"].apply(
    lambda x: {d["name"]: d["property"] for d in literal_eval(x)}
)
df = pd.concat([df, df.pop("experimental_properties").apply(pd.Series)], axis=1)

print(df)

Prints:

            Boiling Point                                Density
0                115.3 °C                                    NaN
1  91 °C @ Press: 20 Torr                                    NaN
2  58 °C @ Press: 12 Torr  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C

Is the expected output really what you are looking for? Another way to visualise the data would be to have "name", "property", and "sourceNumber" as column names.

import json
import pandas as pd

data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']

#Initialise a naiveList
naiveList = []

#String to List
for i in data:
    tempStringOfData = i
    tempStringOfData = tempStringOfData.replace("\'", "\"")
    tempJsonData = json.loads(tempStringOfData)
    naiveList.append(tempJsonData)

#Initialise a List for Dictionaries
newListOfDictionaries = []
for i in naiveList:
    for j in i:
        newListOfDictionaries.append(j)

df = pd.DataFrame(newListOfDictionaries)
print(df)

Which gives you

            name                               property  sourceNumber
0  Boiling Point                               115.3 °C             1
1  Boiling Point                 91 °C @ Press: 20 Torr             1
2  Boiling Point                 58 °C @ Press: 12 Torr             1
3        Density  0.8753 g/cm<sup>3</sup> @ Temp: 20 °C             1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM