[英]Creating a dataframe from values extracted from a json column in Pandas
I loaded a .csv file into a df, and one of the row of a columns contains a list of dictionary like below. 我将.csv文件加载到df中,并且其中一列包含一个字典列表,如下所示。
data = [{"character": "Jake Sully", "gender": 2,}, {"character": "Neytiri", "gender": 1},
{"character": "Dr. Grace Augustine","gender": 1},
{"character": "Col. Quaritch", "gender": 2]
But of course after loading it, it's read as a string. 但是,当然,在加载后,它会作为字符串读取。 So, I converted each row in the column to a json, which makes it easy to extract values based on the key name.
因此,我将列中的每一行都转换为一个json,这使得根据键名轻松提取值变得容易。 I then need to create a seperate df like so.
然后,我需要像这样创建一个单独的df。
df = {'character': ['Jake Sully','Neytiri', 'Dr. Grace Augustine', 'Col.Quaritch'],
'gender': [2, 1, 1, 2]}
This is my code but I can't quite get the desired df ouput right. 这是我的代码,但我无法完全正确地获得所需的df输出。
df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
lst=[]
for val in data: #to iterate over data series
for object in json.loads(val):
for key in keys:
lst.append(object[key])
df = pd.concat([df,pd.DataFrame(lst,columns=[key])], axis=1)
Can someone tell me what i am doing wrong? 有人可以告诉我我在做什么错吗?
pd.DataFrame
accepts a list of dictionaries directly: pd.DataFrame
直接接受字典列表:
data = [{"character": "Jake Sully", "gender": 2,},
{"character": "Neytiri", "gender": 1},
{"character": "Dr. Grace Augustine","gender": 1},
{"character": "Col. Quaritch", "gender": 2}]
df = pd.DataFrame(data) # or pd.DataFrame.from_dict(data)
print(df)
character gender
0 Jake Sully 2
1 Neytiri 1
2 Dr. Grace Augustine 1
3 Col. Quaritch 2
Therefore, you only need to extract a list of dictionaries from your json file. 因此,您只需要从json文件中提取字典列表即可。 One way you can do this is via
json.loads
. 实现此目的的一种方法是通过
json.loads
。
A better idea is to read your data directly into a dataframe via pd.read_json
. 更好的主意是通过
pd.read_json
将数据直接读取到数据帧中。
I may be don't understand your question completely, but I am able to get df just fine. 我可能无法完全理解您的问题,但是我能够很好地获得df。
data = [{"character": "Jake Sully", "gender": 2,},
{"character": "Neytiri", "gender": 1},
{"character": "Dr. Grace Augustine","gender": 1},
{"character": "Col. Quaritch", "gender": 2}]
pd.DataFrame(data)
Out: 出:
character gender
0 Jake Sully 2
1 Neytiri 1
2 Dr. Grace Augustine 1`
figured it out. 弄清楚了。
df = pd.DataFrame() #create new df
keys = ['character','gender'] #keys to extract values from json
for i,key in enumerate(keys):
lst_i = []
for row in data: #iterating over the rows in the cols of interest
for object in json.loads(row):
lst_i.append(object[key])
df = pd.concat([df,pd.DataFrame(lst_i,columns=[key])], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.