[英]Read json file as pandas dataframe?
我正在使用 python 3.6 並嘗試使用下面的代碼下載 json 文件(350 MB)作為 Pandas 數據幀。 但是,我收到以下錯誤:
data_json_str = "[" + ",".join(data) + "] "TypeError: sequence item 0: expected str instance, bytes found
我該如何修復錯誤?
import pandas as pd
# read the entire file into a python array
with open('C:/Users/Alberto/nutrients.json', 'rb') as f:
data = f.readlines()
# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)
# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ",".join(data) + "]"
# now, load it into pandas
data_df = pd.read_json(data_json_str)
從您的代碼來看,您似乎正在加載一個 JSON 文件,該文件的每一行都包含 JSON 數據。 read_json
支持如下數據的lines
參數:
data_df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
注意
如果每行只有一個 JSON 對象而不是單獨的 JSON 對象,請刪除lines=True
。
使用 json 模塊,您可以將 json 解析為 python 對象,然后從中創建一個數據幀:
import json
import pandas as pd
with open('C:/Users/Alberto/nutrients.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
如果您以二進制文件 ( 'rb'
) 打開文件,您將獲得字節。 怎么樣:
with open('C:/Users/Alberto/nutrients.json', 'rU') as f:
同樣如this answer中所述,您還可以直接使用pandas,例如:
df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
如果你想把它轉換成一組JSON 對象,我認為這個會做你想做的
import json
data = []
with open('nutrients.json', errors='ignore') as f:
for line in f:
data.append(json.loads(line))
print(data[0])
使用 pandas 讀取 json 文件的最簡單方法是:
pd.read_json("sample.json",lines=True,orient='columns')
像這樣處理嵌套的json
[[{Value1:1},{value2:2}],[{value3:3},{value4:4}],....]
使用 Python 基礎知識
value1 = df['column_name'][0][0].get(Value1)
請在下面的代碼
#call the pandas library
import pandas as pd
#set the file location as URL or filepath of the json file
url = 'https://www.something.com/data.json'
#load the json data from the file to a pandas dataframe
df = pd.read_json(url, orient='columns')
#display the top 10 rows from the dataframe (this is to test only)
df.head(10)
請查看代碼並根據您的需要進行修改。 我添加了注釋來解釋每一行代碼。 希望這有幫助!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.