如何将多个 json 文件读入 pandas 数据框？

Question

I'm having a hard time loading multiple line delimited JSON files into a single pandas dataframe.我很难将多行分隔的 JSON 文件加载到单个熊猫数据框中。 This is the code I'm using:这是我正在使用的代码：

import os, json
import pandas as pd
import numpy as np
import glob
pd.set_option('display.max_columns', None)

temp = pd.DataFrame()

path_to_json = '/Users/XXX/Desktop/Facebook Data/*' 

json_pattern = os.path.join(path_to_json,'*.json')
file_list = glob.glob(json_pattern)

for file in file_list:
    data = pd.read_json(file, lines=True)
    temp.append(data, ignore_index = True)

It looks like all the files are loading when I look through file_list , but cannot figure out how to get each file into a dataframe.当我查看file_list时，看起来所有文件都在加载，但无法弄清楚如何将每个文件放入数据框中。 There are about 50 files with a couple lines in each file.大约有 50 个文件，每个文件中有几行。

Answer 1

Change the last line to:将最后一行更改为：

temp = temp.append(data, ignore_index = True)

The reason we have to do this is because the append doesn't happen in place.我们必须这样做的原因是因为追加没有就地发生。 The append method does not modify the data frame. append 方法不会修改数据框。 It just returns a new data frame with the result of the append operation.它只是返回一个带有追加操作结果的新数据帧。

Edit:编辑：

Since writing this answer I have learned that you should never use DataFrame.append inside a loop because it leads to quadratic copying (see this answer ).自从写了这个答案后，我了解到你永远不应该在循环中使用DataFrame.append因为它会导致二次复制（参见这个答案）。

What you should do instead is first create a list of data frames and then use pd.concat to concatenate them all in a single operation.您应该做的是首先创建一个数据框列表，然后使用pd.concat在单个操作pd.concat它们全部连接起来。 Like this:像这样：

dfs = [] # an empty list to store the data frames
for file in file_list:
    data = pd.read_json(file, lines=True) # read data frame from json file
    dfs.append(data) # append the data frame to the list

temp = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.

This alternative should be considerably faster.这种替代方法应该快得多。

Answer 2

If you need to flatten the JSON, Juan Estevez's approach won't work as is.如果您需要展平 JSON，Juan Estevez 的方法将无法按原样工作。 Here is an alternative :这是一个替代方案：

import pandas as pd

dfs = []
for file in file_list:
    with open(file) as f:
        json_data = pd.json_normalize(json.loads(f.read()))
    dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs

Or if your JSON are line-delimited (not tested) :或者，如果您的 JSON 是行分隔的（未测试）：

import pandas as pd

dfs = []
for file in file_list:
    with open(file) as f:
        for line in f.readlines():
            json_data = pd.json_normalize(json.loads(line))
            dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs

Answer 3

from pathlib import Path
import pandas as pd

paths = Path("/home/data").glob("*.json")
df = pd.DataFrame([pd.read_json(p, typ="series") for p in paths])```

Answer 4

Maybe you should state, if the json files are created themselves with pandas pd.to_json() or in another way.也许您应该说明，如果 json 文件是使用 pandas pd.to_json() 或其他方式创建的。 I used data which was not created with pd.to_json() and I think it is not pssible to use pd.read_json() in my case.我使用了不是用 pd.to_json() 创建的数据，我认为在我的情况下使用 pd.read_json() 是不可行的。 Instead, I programmed a customized for-each loop approach to write everything to the DataFrames相反，我编写了一个自定义的 for-each 循环方法来将所有内容写入 DataFrame

Answer 5

I combined Juan Estevez's answer with glob.我将 Juan Estevez 的回答与 glob 结合起来。 Thanks a lot.非常感谢。

import pandas as pd
import glob

def readFiles(path):
    files = glob.glob(path)
    dfs = [] # an empty list to store the data frames
    for file in files:
        data = pd.read_json(file, lines=True) # read data frame from json file
        dfs.append(data) # append the data frame to the list

    df = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.
    return df

如何将多个 json 文件读入 pandas 数据框？

问题描述

5 个解决方案

解决方案1
21 已采纳 2019-07-17 02:20:53

Edit:编辑：

解决方案2
2 2021-01-27 10:04:10

解决方案3
1 2022-02-16 17:42:36

解决方案4
0 2020-04-22 10:50:13

解决方案5
0 2021-07-14 08:28:39

如何将多个 json 文件读入 pandas 数据框？

问题描述

5 个解决方案

解决方案1 21 已采纳 2019-07-17 02:20:53

Edit:编辑：

解决方案2 2 2021-01-27 10:04:10

解决方案3 1 2022-02-16 17:42:36

解决方案4 0 2020-04-22 10:50:13

解决方案5 0 2021-07-14 08:28:39

解决方案1
21 已采纳 2019-07-17 02:20:53

解决方案2
2 2021-01-27 10:04:10

解决方案3
1 2022-02-16 17:42:36

解决方案4
0 2020-04-22 10:50:13

解决方案5
0 2021-07-14 08:28:39