If I have the following CSV
"1"
"2"
"23"
and I read it
names = ["nullable"]
dtype = [("nullable", 'int32')]
df = pd.read_csv(r"E:\work\nullable.csv",
names=names,
dtype=dtype,
encoding = "utf-8")
Looking at df.info()
:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
nullable 3 non-null int32
dtypes: int32(1)
memory usage: 140.0 bytes
None
If I add a ""
(a NaN
) to the CSV and change the dtype
to pd.Int32Dtype
the df.info()
shows object type.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
nullable 3 non-null object
dtypes: object(1)
memory usage: 160.0+ bytes
None
However if I do
s = pd.Series([1, 2.0, np.nan, 4.0])
s2 = s.astype('Int32')
The dtype
is correctly filled in as Int32
s2.info()
AttributeError("'Series' object has no attribute 'info'")
s2
0 1
1 2
2 NaN
3 4
dtype: Int32
This looks like a bug to me.
Are there any suggestions on how to work around this? Since I want to save the CSV as parquet, but if I use pd.Int32Dtype
the column is saved as a string.
It's not feasible to remove or replace NaN
s.
Pandas read_csv interprets 'NaN' as Null but not 'NAN'. You can pass 'NAN' to the na_values argument.
df = pd.read_csv(r"E:\work\nullable.csv",
names=names,
dtype=dtype,
encoding = "utf-8",
na_values = 'NAN'
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.