简体   繁体   English

格式化Pandas Dataframe的JSON

[英]Formatting JSON for Pandas Dataframe

I'm trying to wrangle some data to make a recommender system for an app. 我正在尝试处理一些数据,以便为应用创建推荐系统。 Of course, to do this I need a record of which users like which posts. 当然,要做到这一点,我需要记录哪些用户喜欢哪些帖子。 I currently have that data in a JSON file that is formatted like this (numbers being post id, and letters being user ids): 我目前在JSON文件中保存数据,其格式如下(数字为发布ID,字母为用户ID):

    {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }

I'm trying to figure out how to get this into a pandas dataframe that would look like this: 我试图弄清楚如何将其放入看起来像这样的熊猫数据框中:

example format 示例格式

I've tried using a few online JSON to CSV converters out of laziness which unsurprisingly didn't bring it into a useable format for me. 我尝试使用一些在线JSON到CSV转换器是出于懒惰,这毫不奇怪没有为我带来可用的格式。 I've tried using "print(json_normalize(data))", as well which also did not work, and put each instance of a like into separate columns. 我试过使用“ print(json_normalize(data))”,它也没有用,并将每个like实例放入单独的列中。

Any advice? 有什么建议吗?

From my experience for such simple formats, writing a quick and dirty loop is usually the fastest method rather than finding some ready solution and customizing it. 根据我对这种简单格式的经验,编写快速而肮脏的循环通常是最快的方法,而不是寻找一些现成的解决方案并对其进行自定义。 An example for the data you gave here: 您在此处提供的数据的示例:

import json
my_json="""    {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }
    }"""
parsed_json = json.loads(my_json)
print(parsed_json)
# result:
# {'-1234': {'abc': 'abc', 'def': 'def', 'ghi': 'ghi'},
# '-5678': {'jkl': 'jkl', 'mno': 'mno'}}

for key in parsed_json.keys():
    line = ''
    line += key
    line += ' | '
    for value in parsed_json[key].values():
        line += value + ', '
    line = line[:-2] # stripping the ', ' from the end of the line
    print(line)
# result:
# -1234 | abc, def, ghi
# -5678 | jkl, mno

This is a solution optimized for the peculiarities in your dataset. 这是针对数据集中的特性优化的解决方案。

import pandas as pd
data = {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }}
formatted = [{'PostID': d, 'User Like': list(data[d].keys())} for d in data]
df = pd.DataFrame.from_dict(formatted)

Output: 输出:

在此处输入图片说明

Setup 设定

Thanks Zaroth 谢谢扎罗斯

import json
my_json="""    {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }
    }"""
parsed_json = json.loads(my_json)

Comprehension 理解

pd.DataFrame(
    [(k, [*v]) for k, v in parsed_json.items()],
    columns=['PostID', 'User Like']
)

  PostID        User Like
0  -1234  [abc, def, ghi]
1  -5678       [jkl, mno]

OR 要么

pd.DataFrame({
    'PostID': [*parsed_json],
    'User Like': [[*v] for v in parsed_json.values()]
})
data = {"-1234": {"abc": "abc","def": "def","ghi": "ghi"},"-5678": {"jkl": "jkl","mno": "mno"}}

key = []
val = []

for k,v in data.items():
    key.append(k)
    val.append(list(v.values()))

pd.DataFrame(zip(key,val),columns=['PostID','User Like'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM