从 oracle 表中读取大量数据并提取到数据框中的最佳方法是什么

Question

I am going to read data from a table in my oracle database and fetch it in a data frame in python.我将从我的 oracle 数据库中的表中读取数据，并在 python 中的数据框中获取它。 The table has 22 million records and using fetchall() takes a long time without any result.该表有 2200 万条记录，使用 fetchall() 花费了很长时间没有任何结果。 (the query runs in oracle in 1 second) （查询在 1 秒内在 oracle 中运行）

I have tried using slicing the data with below code, but still it is not efficient.我尝试使用以下代码对数据进行切片，但仍然效率不高。

import cx_Oracle
import pandas as pd
from pandas import DataFrame
connect_serv = cx_Oracle.connect(user='', password='', dsn='')
cur = connect_serv.cursor()  

table_row_count=22242387;
batch_size=100000;

sql="""select t.* from (select a.*,ROW_NUMBER() OVER (ORDER BY column1 ) as row_num  from  table1 a) T where t.row_num between :LOWER_BOUND and :UPPER_BOUND"""

data=[]
for lower_bound in range (0,table_row_count,batch_size):
    cur.execute(sql,{'LOWER_BOUND':lower_bound,
                     'UPPER_BOUND':lower_bound + batch_size - 1})
    for row in cur.fetchall():
        data.append(row)

I would like to know what is the proper solution to fetch this amount of data in python in a reasonable time.我想知道在合理的时间内在 python 中获取这么多数据的正确解决方案是什么。

Answer 1

It's not the query that is slow, it's the stacking of the data with data.append(row) .慢的不是查询，而是数据与data.append(row)的堆叠。

Try using尝试使用

data.extend(cur.fetchall())

for starters.对于初学者。 It will avoid the repeated single-row appending, but rather append the entire set of rows coming from fetchall at once.它将避免重复的单行追加，而是一次追加来自fetchall的整组行。

Answer 2

You will have to tune arraysize and prefetchrow parameters.您将必须调整 arraysize 和 prefetchrow 参数。 I was having the same issue.我有同样的问题。 Increasing arraysize resolved the issue.增加 arraysize 解决了这个问题。 Choose the values based on the memory you have.根据您的记忆选择值。

Link: https://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows链接： https ://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows

从 oracle 表中读取大量数据并提取到数据框中的最佳方法是什么

问题描述

2 个解决方案

解决方案1
0 2019-04-17 07:40:35

解决方案2
0 2021-02-19 03:06:27

从 oracle 表中读取大量数据并提取到数据框中的最佳方法是什么

问题描述

2 个解决方案

解决方案1 0 2019-04-17 07:40:35

解决方案2 0 2021-02-19 03:06:27

解决方案1
0 2019-04-17 07:40:35

解决方案2
0 2021-02-19 03:06:27