conn1 = pyodbc.connect('DSN=LUDP-Training Presto',uid='*****', pwd='****', autocommit=True)
sql_query = "SELECT zsourc_sy, zmsgeo, salesorg, crm_begdat, zmcn, zrtm, crm_obj_id, zcrmprod, prod_hier, hier_type, zsoldto, zendcst, crmtgqtycv, currency, zwukrs, netvalord, zgtnper,zsub_4_t \
FROM `prd_updated`.`bw_ms_zocsfs05l_udl` \
WHERE zdcgflag = 'DCG' AND crm_begdat >= '20200101' AND zmsgeo IN ('AP', 'LA', 'EMEA', 'NA')"
I have to load the following query into a pandas dataframe but the pd.read_sql statement has been loading for more than a couple hours since the table is > 10 million rows of data. Is there a way to speed this process up?
contract_table = pd.read_sql(sql_query,conn1)
You can pass a chunksize
param to the read_sql
function ( docs ), which turns it into a generator that returns an iterator of dataframes with the specified number of rows.
df_iter = pd.read_sql(sql_query,conn1, chunksize=100)
for df in df_iter:
for row in df: # 100 rows in each dataframe in this example
# do work here
Generators are an efficient way of processing data that's too large to all fit in memory at once.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.