简体   繁体   中英

Handle empty result with read_sql chunked

I am still learning Python, I need to handle a case when a sql query doesn't provide any rows, with a pandas read_sql function with chunksize param.

Here is the current line :

df = pd.concat([x for x in pd.read_sql(SQL_request,self.connection, chunksize=50000)], ignore_index=True)

When the query returns zero rows I get this error :

  File "[....]\lib\site-packages\pandas\core\reshape\concat.py", line 239, in __init__
    raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

What is the best way to handle this ? I need to return an empty dataframe even if there is no rows (the columns must be there). I need to keep the chunking, it really helps not using too much memory.

I thought about running a first query without chunking and check if there is any rows, and then run a second chunked query. But I feel it is a very bad and inefficient idea.

Since you are trying to concatenate all chunks into a whole dataframe, it seems like you are using chunk only to avoid using too much memory. May be you can try our tool ConnectorX ( pip install -U connectorx ), which aims to improve the performance of pandas.read_sql in terms of both time and memory usage, and provides similar API. To switch to it, you only need to:

import connectorx as cx

# currently ConnectorX support postgres, mysql, oracle, mssql and sqlite
# conn_url example on mysql: mysql://username:password@server:port/database

df = cx.read_sql(conn_url, SQL_request)

The reason pandas.read_sql uses a lot of memory during running is because of its large intermediate python objects, in ConnectorX we use Rust and streaming process internally to tackle this problem.

Here is some benchmark result on memory usage:

PostgreSQL: 邮局

MySQL: mysql

Try this:

df = pd.concat([x for x in pd.read_sql(SQL_request,self.connection, chunksize=50000) if not x.empty] , ignore_index=True)

EDIT:

Ah got it. Can you try the following code then? I'll update the answer if it works.

try: 
    df = pd.concat([x for x in pd.read_sql(SQL_request,self.connection, chunksize=50000)] , ignore_index=True)
except:
    df = pd.read_sql(SQL_request,self.connection)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM