Pandas not assuming dtypes when using read_sql?

Question

I have a table in sql I'm looking to read into a pandas dataframe. I can read the table in but all column dtypes are being read in as objects. When I write the table to a csv then re-read it back in using read_csv, the correct data types are assumed. Obviously this intermediate step is inefficient and I just want to be able to read the data directly from sql with the correct data types assumed.

I have 650 columns in the df so obviously manually specifying the data types is not possible.

Answer 1

So it turns out all the data types in the database are defined as varchar.

It seems read_sql reads the schema and assumes data types based off this. What's strange is then I couldn't convert those data types using infer_objects().

The only way to do it was to write to a csv then read than csv using pd.read_csv().

Answer 2

No, doesn't really check on metadata.

Per pandas doc data types are inferred by default, while others enabled on demand (extra guidance is needed for date formats, for example).

Thus bare pd.read_sql is not fully robust, but may work on your specific data.

On my Postgres this looks like

column_name     postgres    pandas
patient_id  character varying   object
spell_id    character varying   object
spell_start_date    date    object
spell_start_time    time without time zone  object
spell_end_date  date    object
spell_end_time  time without time zone  object

Pandas not assuming dtypes when using read_sql?

Question

2 answers

solution1
0 2019-11-14 16:53:48

solution2
0 2022-02-09 20:59:53

Pandas not assuming dtypes when using read_sql?

Question

2 answers

solution1 0 2019-11-14 16:53:48

solution2 0 2022-02-09 20:59:53

solution1
0 2019-11-14 16:53:48

solution2
0 2022-02-09 20:59:53