简体   繁体   中英

Pandas not assuming dtypes when using read_sql?

I have a table in sql I'm looking to read into a pandas dataframe. I can read the table in but all column dtypes are being read in as objects. When I write the table to a csv then re-read it back in using read_csv, the correct data types are assumed. Obviously this intermediate step is inefficient and I just want to be able to read the data directly from sql with the correct data types assumed.

I have 650 columns in the df so obviously manually specifying the data types is not possible.

So it turns out all the data types in the database are defined as varchar.

It seems read_sql reads the schema and assumes data types based off this. What's strange is then I couldn't convert those data types using infer_objects().

The only way to do it was to write to a csv then read than csv using pd.read_csv().

No, doesn't really check on metadata.

Per pandas doc data types are inferred by default, while others enabled on demand (extra guidance is needed for date formats, for example).

Thus bare pd.read_sql is not fully robust, but may work on your specific data.

On my Postgres this looks like

column_name     postgres    pandas
patient_id  character varying   object
spell_id    character varying   object
spell_start_date    date    object
spell_start_time    time without time zone  object
spell_end_date  date    object
spell_end_time  time without time zone  object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM