简体   繁体   中英

Window functions not working in pd.read_sql; Its shows error

I am currently using the European soccer SQLite database for my data analysis in Google collab (Jupyter notebook).

Aim of analysis; For a specific team ex: Chelsea, get the wins and loss label for every match (Done using CASE statement) and after this partition the match count by its season and win_loss result. This is all done within pd.read_sql() statement in google collab(Jupyter notebook).

The statement runs all fine until the window function is introduced. But the query runs all fine in the SQLite DB browser (image attached). The main error i get is OperationalError: near "(": syntax error

Here is the code

    Home_Perf = pd.read_sql(""" --- CTE to get the wins and loss as a home team
                            WITH Homes AS (
                            SELECT season, team_long_name AS HomeTeam,
                                   home_team_goal, away_team_goal,
                                   CASE 
                                       WHEN home_team_goal > away_team_goal THEN 'win'
                                       WHEN home_team_goal < away_team_goal THEN 'loss'
                                       ELSE 'Tie' END AS Win_Loss                                                
                            FROM match
                            ---Inner JOIN for getting the team name
                            INNER JOIN team 
                            ON team_api_id = home_team_api_id
                            WHERE home_team_api_id = 8455)

                            SELECT season, HomeTeam,
                                   COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
                            FROM homes""", conn)
Home_Perf

Here is the error

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 38))

---------------------------------------------------------------------------

OperationalError                          Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
   1585         try:
-> 1586             cur.execute(*args, **kwargs)
   1587             return cur

OperationalError: near "(": syntax error


The above exception was the direct cause of the following exception:

DatabaseError                             Traceback (most recent call last)

3 frames

<ipython-input-17-9b1c924dbbdd> in <module>()
     15                             SELECT season, HomeTeam,
     16                                    COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
---> 17                             FROM homes""", conn)
     18 Home_Perf

/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
    410             coerce_float=coerce_float,
    411             parse_dates=parse_dates,
--> 412             chunksize=chunksize,
    413         )
    414 

/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
   1631 
   1632         args = _convert_params(sql, params)
-> 1633         cursor = self.execute(*args)
   1634         columns = [col_desc[0] for col_desc in cursor.description]
   1635 

/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
   1596 
   1597             ex = DatabaseError(f"Execution failed on sql '{args[0]}': {exc}")
-> 1598             raise ex from exc
   1599 
   1600     @staticmethod

DatabaseError: Execution failed on sql ' --- CTE to get the wins and loss as a home team
                            WITH Homes AS (
                            SELECT season, team_long_name AS HomeTeam,
                                   home_team_goal, away_team_goal,
                                   CASE 
                                       WHEN home_team_goal > away_team_goal THEN 'win'
                                       WHEN home_team_goal < away_team_goal THEN 'loss'
                                       ELSE 'Tie' END AS Win_Loss                                                
                            FROM match
                            ---Inner JOIN for getting the team name
                            INNER JOIN team 
                            ON team_api_id = home_team_api_id
                            WHERE home_team_api_id = 8455)

                            SELECT season, HomeTeam,
                                   COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
                            FROM homes': near "(": syntax error

tl;dr Google Colab's using SQLite version 3.22 but SQLite only supports Window Functions after version 3.25.


I think the issue is that Google Colab uses an outdated version of SQLlite that does not support window functions. Google has to udpate this, I write this as of March 6. 2022.

To your point, more specifically:

On September 15, 2018 SQLite released version 3.25, and as you can see in the release log here , the first thing they did in this release was:

  1. Add support for window functions

This an issue I've encountered before, and the trick is to update your SQLite3 library, BUT, you are not on a local device, you are on a Google Colab.

So the next thing you want to do is check which version of SQLlite you have. You can do this by running the following line of SQL code: SELECT sqlite_version(); and I got that goodie from here .

And since you are querying within Pandas, you'd run:

pd.read_sql_query("""
SELECT sqlite_version();
""", conn)

So I just ran exactly that today (3/6/22), and I got the following in my Google Colab:

在此处输入图像描述

And this is too old of a version. I don't know how we can update the sqlite library within a Google Colab. So this is as far as I've gotten, and I think we just don't have that functionality at this point. This post may help with the updating of sqlite within the Notebook.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM