簡體   English   中英

如果列是 integer,Pandas read_json(orient="table") 返回 NaN

[英]Pandas read_json(orient="table") returns NaN if the column is an integer

在使用 orient="table" 時將 DataFrame 轉換為 json 並返回時遇到問題。

如果將列表加載到 DF 中,然后使用to_json(orient="table")將列表導出為 json,則架構 output 將列名稱作為int包含,這似乎是導致問題的原因。

例子

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_table_schema = dataframe.to_json(orient='table')
print(dataframe_table_schema)

# Load the DataFrame from the json object
dataframe = pd.read_json(dataframe_table_schema, orient='table')
print(dataframe)

Output

     0
0  123

{"schema": {"fields":[{"name":"index","type":"integer"},{"name":0,"type":"string"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":0,"0":"123"}]}

     0
0  NaN

要解決此問題,我們可以遍歷字段dataframe_table_schema.schema.fields並檢查字段名稱是否為 integer,如果將其轉換為字符串,然后將 object 轉儲到 Z466DEEC76ECDF635FCA6D357 字符串。

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_table_schema = dataframe.to_json(orient='table')

# Load the schema into a dict
dataframe_table_schema_modified = json.loads(dataframe_table_schema)
# Loop over the fields
for field in dataframe_table_schema_modified.get("schema").get("fields"):
    # Get the column name
    column_name = field.get("name", "")
    if isinstance(column_name, int):
        # Cast the field name to a string
        field["name"] = str(column_name)
#  Dump the object to a string
dataframe_table_schema_modified = json.dumps(dataframe_table_schema_modified)
print(dataframe_table_schema_modified)

dataframe = pd.read_json(dataframe_table_schema_modified, orient='table')
print(dataframe)

請有人確認這是一個錯誤還是有辦法正確處理這個問題。

pd.show_versions()安裝版本

commit: None python: 3.8.0.final.0 python-bits: 64 OS: Linux OS-release: 5.8.0-1041-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.25.2 numpy: 1.17.3 pytz: 2019.3 dateutil: 2.8.0 pip: 19.3.1 setuptools: 41.6.0 Cython: 0.29.13 pytest: 5.2.2 hypothesis: None sphinx: 2.2.1 blosc: None feather : 無 xlsxwriter: 1.2.2 lxml.etree: 4.4.1 html5lib: 1.0.1 pymysql: 無 psycopg2: 2.8.4 (dt dec pq3 ext lo64) jinja2: 2.10.3 IPython: 7.8.0 pandas_datareader: 無 b。 .1 瓶頸:1.2.1 fastparquet:無 gcsfs:無 lxml.etree:4.4.1 matplotlib:3.1.1 numexpr:2.7.0 odfpy:無 openpyxl:3.0.0 pandas_gbq:無 pyarrow:無 pytables:無 s3fs:無scipy:1.3.1 sqlalchemy:1.3.10 表:無 xarray:無 xlrd:1.2.0 xlwt:1.3.0 xlsxwriter:1.2.2

您的字段和數據不匹配。

注意在“字段”中,列名是 0,即 integer:

"fields":[{"name":"index","type":"integer"},{"name":0,"type":"string"}]
                                                   #^integer

但在“數據”中,列名是“0”,即一個字符串:

"data": [{"index":0,"0":"123"}]
                    #^string

您可以通過在構造 DataFrame 時指定列名來糾正此問題:

df = pd.DataFrame(["123"], columns=["A"])
js = df.to_json(orient="table")
df = pd.read_json(js, orient="table")

>>> df
     A
0  123

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM