简体   繁体   中英

ProgrammingError: (psycopg2.errors.UndefinedColumn), while working with sqlalchemy

I have trouble querying a table, created with sqlalchemy on postgres db (local).

While I am able to execute, and receive query result with:

SELECT * FROM olympic_games 

I am getting an error message when I'm trying to access single column, or perform any other operation on table:

SELECT games FROM olympic_games

The error message is (couple sentences translated from Polish):

ProgrammingError: (psycopg2.errors.UndefinedColumn) BŁĄD: column "games" does not exist

LINE 1: SELECT COUNT(Sport)
^
HINT: maybe you meant "olympic_games.Games".

SQL: SELECT games FROM olympic_games LIMIT 5;]
(Background on this error at: http://sqlalche.me/e/f405 )

It pretty much sums to that program doesn't see, or can access specific column, and display that it doesn't exist.

I tried accessing with table.column format, it didn't work as well. I am also able to see column names, via information_schema.columns

Data (.csv) was loaded with pd.read_csv , and then DataFrame.to_sql . Code below, thanks for help!

engine = create_engine('postgresql://:@:/olympic_games')

with open('olympic_athletes_2016_14.csv', 'r') as file:
    df = pd.read_csv(file, index_col='ID')
df.to_sql(name = 'olympic_games', con = engine, if_exists = 'replace', index_label = 'ID')

Both execute commands returned with same error:

with engine.connect() as con:
    rs = con.execute("SELECT games FROM olympic_games LIMIT 5;")
    df_fetch = pd.DataFrame(rs.fetchall())
df_fetch2 = engine.execute("""SELECT games FROM olympic_games LIMIT 5;""").fetchall()

Essentially, this is the double quoting issue of column identifiers as mentioned in the PostgreSQL manual :

Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other.

When any of your Pandas data frame columns have mixed cases, the DataFrame.to_sql preserves the case sensitivity by creating columns with double quotes at CREATE TABLE stage. Specifically, the below Python Pandas code when using replace

df.to_sql(name='olympic_games', con=engine, if_exists='replace', index_label='ID')

Translates as below in Postgres if Sport was a titled case column in data frame:

DROP TABLE IF EXISTS public."olympic_games";

CREATE TABLE public."olympic_games"
(
    ...
    "Sport" varchar(255)
    "Games" varchar(255)
    ...
);

Once an identifier is quoted with mixed cases, it must always be referred to in that manner. Therefore sport is not the same as "Sport" . Remember in SQL, double quotes actually is different than single quotes which can be interchangeable in Python.

To fix, consider rendering all your Pandas columns to lower case since "games" is the same as games , Games or GAMES (but not "Games" or "GAMES" ).

df.columns = df.columns.str.lower()
df.to_sql(name='olympic_games', con=engine, if_exists='replace', index_label='ID')

Alternatively, leave as is and quote appropriately:

SELECT "Games" FROM olympic_games

Try SELECT "games" FROM olympic_games . In some cases PostgreSQL create the quotes around a columns names. For example if the column name contained mixed register. I have to remind you: PostgreSQL is case sensitive

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM