[英]Formatting JSON for Pandas Dataframe
I'm trying to wrangle some data to make a recommender system for an app. 我正在尝试处理一些数据,以便为应用创建推荐系统。 Of course, to do this I need a record of which users like which posts.
当然,要做到这一点,我需要记录哪些用户喜欢哪些帖子。 I currently have that data in a JSON file that is formatted like this (numbers being post id, and letters being user ids):
我目前在JSON文件中保存数据,其格式如下(数字为发布ID,字母为用户ID):
{
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
I'm trying to figure out how to get this into a pandas dataframe that would look like this: 我试图弄清楚如何将其放入看起来像这样的熊猫数据框中:
I've tried using a few online JSON to CSV converters out of laziness which unsurprisingly didn't bring it into a useable format for me. 我尝试使用一些在线JSON到CSV转换器是出于懒惰,这毫不奇怪没有为我带来可用的格式。 I've tried using "print(json_normalize(data))", as well which also did not work, and put each instance of a like into separate columns.
我试过使用“ print(json_normalize(data))”,它也没有用,并将每个like实例放入单独的列中。
Any advice? 有什么建议吗?
From my experience for such simple formats, writing a quick and dirty loop is usually the fastest method rather than finding some ready solution and customizing it. 根据我对这种简单格式的经验,编写快速而肮脏的循环通常是最快的方法,而不是寻找一些现成的解决方案并对其进行自定义。 An example for the data you gave here:
您在此处提供的数据的示例:
import json
my_json=""" {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
}"""
parsed_json = json.loads(my_json)
print(parsed_json)
# result:
# {'-1234': {'abc': 'abc', 'def': 'def', 'ghi': 'ghi'},
# '-5678': {'jkl': 'jkl', 'mno': 'mno'}}
for key in parsed_json.keys():
line = ''
line += key
line += ' | '
for value in parsed_json[key].values():
line += value + ', '
line = line[:-2] # stripping the ', ' from the end of the line
print(line)
# result:
# -1234 | abc, def, ghi
# -5678 | jkl, mno
This is a solution optimized for the peculiarities in your dataset. 这是针对数据集中的特性优化的解决方案。
import pandas as pd
data = {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}}
formatted = [{'PostID': d, 'User Like': list(data[d].keys())} for d in data]
df = pd.DataFrame.from_dict(formatted)
Output: 输出:
Thanks Zaroth 谢谢扎罗斯
import json
my_json=""" {
"-1234": {
"abc": "abc",
"def": "def",
"ghi": "ghi"
},
"-5678": {
"jkl": "jkl",
"mno": "mno"
}
}"""
parsed_json = json.loads(my_json)
pd.DataFrame(
[(k, [*v]) for k, v in parsed_json.items()],
columns=['PostID', 'User Like']
)
PostID User Like
0 -1234 [abc, def, ghi]
1 -5678 [jkl, mno]
pd.DataFrame({
'PostID': [*parsed_json],
'User Like': [[*v] for v in parsed_json.values()]
})
data = {"-1234": {"abc": "abc","def": "def","ghi": "ghi"},"-5678": {"jkl": "jkl","mno": "mno"}}
key = []
val = []
for k,v in data.items():
key.append(k)
val.append(list(v.values()))
pd.DataFrame(zip(key,val),columns=['PostID','User Like'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.