如果列是 integer，Pandas read_json(orient="table") 返回 NaN

Question

I have an issue when converting a DataFrame to json and back whilst using orient="table".在使用 orient="table" 时将 DataFrame 转换为 json 并返回时遇到问题。

If a list is loaded into a DF and then exported as json using to_json(orient="table") the schema output includes the column name as an int which appears to be the cause of the issue.如果将列表加载到 DF 中，然后使用to_json(orient="table")将列表导出为 json，则架构 output 将列名称作为int包含，这似乎是导致问题的原因。

Example例子

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_table_schema = dataframe.to_json(orient='table')
print(dataframe_table_schema)

# Load the DataFrame from the json object
dataframe = pd.read_json(dataframe_table_schema, orient='table')
print(dataframe)

Output Output

     0
0  123

{"schema": {"fields":[{"name":"index","type":"integer"},{"name":0,"type":"string"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":0,"0":"123"}]}

     0
0  NaN

To work around the issue we can loop over the fields dataframe_table_schema.schema.fields and check if the field name is an integer, if it is cast it to a string and then dump the object to a json string.要解决此问题，我们可以遍历字段dataframe_table_schema.schema.fields并检查字段名称是否为 integer，如果将其转换为字符串，然后将 object 转储到 Z466DEEC76ECDF635FCA6D357 字符串。

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_table_schema = dataframe.to_json(orient='table')

# Load the schema into a dict
dataframe_table_schema_modified = json.loads(dataframe_table_schema)
# Loop over the fields
for field in dataframe_table_schema_modified.get("schema").get("fields"):
    # Get the column name
    column_name = field.get("name", "")
    if isinstance(column_name, int):
        # Cast the field name to a string
        field["name"] = str(column_name)
#  Dump the object to a string
dataframe_table_schema_modified = json.dumps(dataframe_table_schema_modified)
print(dataframe_table_schema_modified)

dataframe = pd.read_json(dataframe_table_schema_modified, orient='table')
print(dataframe)

Please could someone confirm if this is a bug or if there is a way to handle this correctly.请有人确认这是一个错误还是有办法正确处理这个问题。

pd.show_versions() INSTALLED VERSIONS pd.show_versions()安装版本

commit: None python: 3.8.0.final.0 python-bits: 64 OS: Linux OS-release: 5.8.0-1041-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 commit: None python: 3.8.0.final.0 python-bits: 64 OS: Linux OS-release: 5.8.0-1041-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.25.2 numpy: 1.17.3 pytz: 2019.3 dateutil: 2.8.0 pip: 19.3.1 setuptools: 41.6.0 Cython: 0.29.13 pytest: 5.2.2 hypothesis: None sphinx: 2.2.1 blosc: None feather: None xlsxwriter: 1.2.2 lxml.etree: 4.4.1 html5lib: 1.0.1 pymysql: None psycopg2: 2.8.4 (dt dec pq3 ext lo64) jinja2: 2.10.3 IPython: 7.8.0 pandas_datareader: None bs4: 4.8.1 bottleneck: 1.2.1 fastparquet: None gcsfs: None lxml.etree: 4.4.1 matplotlib: 3.1.1 numexpr: 2.7.0 odfpy: None openpyxl: 3.0.0 pandas_gbq: None pyarrow: None pytables: None s3fs: None scipy: 1.3.1 sqlalchemy: 1.3.10 tables: None xarray: None xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.2.2 pandas: 0.25.2 numpy: 1.17.3 pytz: 2019.3 dateutil: 2.8.0 pip: 19.3.1 setuptools: 41.6.0 Cython: 0.29.13 pytest: 5.2.2 hypothesis: None sphinx: 2.2.1 blosc: None feather : 无 xlsxwriter: 1.2.2 lxml.etree: 4.4.1 html5lib: 1.0.1 pymysql: 无 psycopg2: 2.8.4 (dt dec pq3 ext lo64) jinja2: 2.10.3 IPython: 7.8.0 pandas_datareader: 无 b。 .1 瓶颈：1.2.1 fastparquet：无 gcsfs：无 lxml.etree：4.4.1 matplotlib：3.1.1 numexpr：2.7.0 odfpy：无 openpyxl：3.0.0 pandas_gbq：无 pyarrow：无 pytables：无 s3fs：无scipy：1.3.1 sqlalchemy：1.3.10 表：无 xarray：无 xlrd：1.2.0 xlwt：1.3.0 xlsxwriter：1.2.2

Answer 1

There is a mismatch between your field and data.您的字段和数据不匹配。

Notice in "fields", the column name is 0 ie an integer:注意在“字段”中，列名是 0，即 integer：

"fields":[{"name":"index","type":"integer"},{"name":0,"type":"string"}]
                                                   #^integer

But in "data", the column name is "0" ie a string:但在“数据”中，列名是“0”，即一个字符串：

"data": [{"index":0,"0":"123"}]
                    #^string

You can correct this by specifying the column names while constructing your DataFrame:您可以通过在构造 DataFrame 时指定列名来纠正此问题：

df = pd.DataFrame(["123"], columns=["A"])
js = df.to_json(orient="table")
df = pd.read_json(js, orient="table")

>>> df
     A
0  123

如果列是 integer，Pandas read_json(orient="table") 返回 NaN

问题描述

pd.show_versions() INSTALLED VERSIONS pd.show_versions()安装版本

1 个解决方案

解决方案1
0 2021-11-29 17:51:50

如果列是 integer，Pandas read_json(orient="table") 返回 NaN

问题描述

pd.show_versions() INSTALLED VERSIONS pd.show_versions()安装版本

1 个解决方案

解决方案1 0 2021-11-29 17:51:50

解决方案1
0 2021-11-29 17:51:50