[英]How can I split a pandas column containing a json into new columns in a given dataframe?
I am looking to split a dataframe column that contains a string of a dictionary into separate columns.我希望将包含字典字符串的 dataframe 列拆分为单独的列。 I've seen a few methods, but I want to avoid splitting the string since there are some inconsistencies.
我见过几种方法,但我想避免拆分字符串,因为存在一些不一致之处。 For instance, "Melting Point" sometimes takes the place of "Boiling Point", but I do not want melting point and boiling point to be in the same column.
例如,“熔点”有时会代替“沸点”,但我不希望熔点和沸点在同一列中。
Here is the column I am trying to split.这是我要拆分的专栏。
Json within Pandas Column Json Pandas 栏内
#example below
data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']
df = pd.DataFrame(data, columns=['experimental_properties'])
I want it to look like this first row:我希望它看起来像第一行:
Expected Output row 1预计 Output 第 1 行
I tried a method from here to no avail: How to convert JSON data inside a pandas column into new columns我从这里尝试了一种方法无济于事: How to convert JSON data inside a pandas column into new columns
pd.io.json.json_normalize(df.experimental_properties.apply(json.loads))
Help is much appreciated!非常感谢您的帮助!
I hope I've understood your question well.我希望我已经很好地理解了你的问题。 Try:
尝试:
from ast import literal_eval
df["experimental_properties"] = df["experimental_properties"].apply(
lambda x: {d["name"]: d["property"] for d in literal_eval(x)}
)
df = pd.concat([df, df.pop("experimental_properties").apply(pd.Series)], axis=1)
print(df)
Prints:印刷:
Boiling Point Density
0 115.3 °C NaN
1 91 °C @ Press: 20 Torr NaN
2 58 °C @ Press: 12 Torr 0.8753 g/cm<sup>3</sup> @ Temp: 20 °C
Is the expected output really what you are looking for?预期的 output 真的是您要找的吗? Another way to visualise the data would be to have "name", "property", and "sourceNumber" as column names.
另一种可视化数据的方法是将“name”、“property”和“sourceNumber”作为列名。
import json
import pandas as pd
data = [
'''[{'name': 'Boiling Point', 'property': '115.3 °C', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '91 °C @ Press: 20 Torr', 'sourceNumber': 1}]''',
'''[{'name': 'Boiling Point', 'property': '58 °C @ Press: 12 Torr', 'sourceNumber': 1}, {'name': 'Density', 'property': '0.8753 g/cm<sup>3</sup> @ Temp: 20 °C', 'sourceNumber': 1}]''']
#Initialise a naiveList
naiveList = []
#String to List
for i in data:
tempStringOfData = i
tempStringOfData = tempStringOfData.replace("\'", "\"")
tempJsonData = json.loads(tempStringOfData)
naiveList.append(tempJsonData)
#Initialise a List for Dictionaries
newListOfDictionaries = []
for i in naiveList:
for j in i:
newListOfDictionaries.append(j)
df = pd.DataFrame(newListOfDictionaries)
print(df)
Which gives you这给你
name property sourceNumber
0 Boiling Point 115.3 °C 1
1 Boiling Point 91 °C @ Press: 20 Torr 1
2 Boiling Point 58 °C @ Press: 12 Torr 1
3 Density 0.8753 g/cm<sup>3</sup> @ Temp: 20 °C 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.