简体   繁体   中英

Reading a large json file into a pandas dataframe

I have a large JSONL file (~100 GB). I want to convert this to a pandas dataframe and apply some functions on a column by iterating over all the rows .

Whats the best way to read this JSONL file ? I am doing the following currently but that gets stuck (running this on GCP)

import pandas as pd
import json
data = []
with open("my_jsonl_file", 'r') as file:
      for line in file:
          data.append(json.loads(line))

For smaller data you can simply use:

import pandas as pd
path = "test.jsonl"
data = pd.read_json(path, lines=True) 

For large data, you can use something like this:

df = pd.DataFrame(columns=['c1'])
import jsonlines
data = jsonlines.open(path)

for line in data.iter():
  # get data in line
  df.append({'c1': data})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM