[英]What is an optimum way to read huge data from oracle table and fetch into a data frame
I am going to read data from a table in my oracle database and fetch it in a data frame in python.我将从我的 oracle 数据库中的表中读取数据,并在 python 中的数据框中获取它。 The table has 22 million records and using fetchall() takes a long time without any result.
该表有 2200 万条记录,使用 fetchall() 花费了很长时间没有任何结果。 (the query runs in oracle in 1 second)
(查询在 1 秒内在 oracle 中运行)
I have tried using slicing the data with below code, but still it is not efficient.我尝试使用以下代码对数据进行切片,但仍然效率不高。
import cx_Oracle
import pandas as pd
from pandas import DataFrame
connect_serv = cx_Oracle.connect(user='', password='', dsn='')
cur = connect_serv.cursor()
table_row_count=22242387;
batch_size=100000;
sql="""select t.* from (select a.*,ROW_NUMBER() OVER (ORDER BY column1 ) as row_num from table1 a) T where t.row_num between :LOWER_BOUND and :UPPER_BOUND"""
data=[]
for lower_bound in range (0,table_row_count,batch_size):
cur.execute(sql,{'LOWER_BOUND':lower_bound,
'UPPER_BOUND':lower_bound + batch_size - 1})
for row in cur.fetchall():
data.append(row)
I would like to know what is the proper solution to fetch this amount of data in python in a reasonable time.我想知道在合理的时间内在 python 中获取这么多数据的正确解决方案是什么。
It's not the query that is slow, it's the stacking of the data with data.append(row)
.慢的不是查询,而是数据与
data.append(row)
的堆叠。
Try using尝试使用
data.extend(cur.fetchall())
for starters.对于初学者。 It will avoid the repeated single-row appending, but rather append the entire set of rows coming from
fetchall
at once.它将避免重复的单行追加,而是一次追加来自
fetchall
的整组行。
You will have to tune arraysize and prefetchrow parameters.您将必须调整 arraysize 和 prefetchrow 参数。 I was having the same issue.
我有同样的问题。 Increasing arraysize resolved the issue.
增加 arraysize 解决了这个问题。 Choose the values based on the memory you have.
根据您的记忆选择值。
Link: https://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows链接: https ://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.