格式化Pandas Dataframe的JSON

Question

I'm trying to wrangle some data to make a recommender system for an app. 我正在尝试处理一些数据，以便为应用创建推荐系统。 Of course, to do this I need a record of which users like which posts. 当然，要做到这一点，我需要记录哪些用户喜欢哪些帖子。 I currently have that data in a JSON file that is formatted like this (numbers being post id, and letters being user ids): 我目前在JSON文件中保存数据，其格式如下（数字为发布ID，字母为用户ID）：

    {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }

I'm trying to figure out how to get this into a pandas dataframe that would look like this: 我试图弄清楚如何将其放入看起来像这样的熊猫数据框中：

example format 示例格式

I've tried using a few online JSON to CSV converters out of laziness which unsurprisingly didn't bring it into a useable format for me. 我尝试使用一些在线JSON到CSV转换器是出于懒惰，这毫不奇怪没有为我带来可用的格式。 I've tried using "print(json_normalize(data))", as well which also did not work, and put each instance of a like into separate columns. 我试过使用“ print（json_normalize（data））”，它也没有用，并将每个like实例放入单独的列中。

Any advice? 有什么建议吗？

Answer 1

From my experience for such simple formats, writing a quick and dirty loop is usually the fastest method rather than finding some ready solution and customizing it. 根据我对这种简单格式的经验，编写快速而肮脏的循环通常是最快的方法，而不是寻找一些现成的解决方案并对其进行自定义。 An example for the data you gave here: 您在此处提供的数据的示例：

import json
my_json="""    {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }
    }"""
parsed_json = json.loads(my_json)
print(parsed_json)
# result:
# {'-1234': {'abc': 'abc', 'def': 'def', 'ghi': 'ghi'},
# '-5678': {'jkl': 'jkl', 'mno': 'mno'}}

for key in parsed_json.keys():
    line = ''
    line += key
    line += ' | '
    for value in parsed_json[key].values():
        line += value + ', '
    line = line[:-2] # stripping the ', ' from the end of the line
    print(line)
# result:
# -1234 | abc, def, ghi
# -5678 | jkl, mno

Answer 2

This is a solution optimized for the peculiarities in your dataset. 这是针对数据集中的特性优化的解决方案。

import pandas as pd
data = {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }}
formatted = [{'PostID': d, 'User Like': list(data[d].keys())} for d in data]
df = pd.DataFrame.from_dict(formatted)

Output: 输出：

Answer 3

Setup 设定

Thanks Zaroth 谢谢扎罗斯

import json
my_json="""    {
       "-1234": {
         "abc": "abc",
         "def": "def",
         "ghi": "ghi"
    },
       "-5678": {
         "jkl": "jkl",
         "mno": "mno"
    }
    }"""
parsed_json = json.loads(my_json)

Comprehension 理解

pd.DataFrame(
    [(k, [*v]) for k, v in parsed_json.items()],
    columns=['PostID', 'User Like']
)

  PostID        User Like
0  -1234  [abc, def, ghi]
1  -5678       [jkl, mno]

OR 要么

pd.DataFrame({
    'PostID': [*parsed_json],
    'User Like': [[*v] for v in parsed_json.values()]
})

Answer 4

data = {"-1234": {"abc": "abc","def": "def","ghi": "ghi"},"-5678": {"jkl": "jkl","mno": "mno"}}

key = []
val = []

for k,v in data.items():
    key.append(k)
    val.append(list(v.values()))

pd.DataFrame(zip(key,val),columns=['PostID','User Like'])

格式化Pandas Dataframe的JSON

问题描述

4 个解决方案

解决方案1
1 2019-07-27 18:08:03

解决方案2
1 已采纳 2019-07-27 18:25:31

解决方案3
1 2019-07-27 19:08:48

Setup 设定

Comprehension 理解

OR 要么

解决方案4
0 2019-07-27 18:39:41

格式化Pandas Dataframe的JSON

问题描述

4 个解决方案

解决方案1 1 2019-07-27 18:08:03

解决方案2 1 已采纳 2019-07-27 18:25:31

解决方案3 1 2019-07-27 19:08:48

Setup 设定

Comprehension 理解

OR 要么

解决方案4 0 2019-07-27 18:39:41

解决方案1
1 2019-07-27 18:08:03

解决方案2
1 已采纳 2019-07-27 18:25:31

解决方案3
1 2019-07-27 19:08:48

解决方案4
0 2019-07-27 18:39:41