简体   繁体   English

从 oracle 表中读取大量数据并提取到数据框中的最佳方法是什么

[英]What is an optimum way to read huge data from oracle table and fetch into a data frame

I am going to read data from a table in my oracle database and fetch it in a data frame in python.我将从我的 oracle 数据库中的表中读取数据,并在 python 中的数据框中获取它。 The table has 22 million records and using fetchall() takes a long time without any result.该表有 2200 万条记录,使用 fetchall() 花费了很长时间没有任何结果。 (the query runs in oracle in 1 second) (查询在 1 秒内在 oracle 中运行)

I have tried using slicing the data with below code, but still it is not efficient.我尝试使用以下代码对数据进行切片,但仍然效率不高。

import cx_Oracle
import pandas as pd
from pandas import DataFrame
connect_serv = cx_Oracle.connect(user='', password='', dsn='')
cur = connect_serv.cursor()  

table_row_count=22242387;
batch_size=100000;

sql="""select t.* from (select a.*,ROW_NUMBER() OVER (ORDER BY column1 ) as row_num  from  table1 a) T where t.row_num between :LOWER_BOUND and :UPPER_BOUND"""

data=[]
for lower_bound in range (0,table_row_count,batch_size):
    cur.execute(sql,{'LOWER_BOUND':lower_bound,
                     'UPPER_BOUND':lower_bound + batch_size - 1})
    for row in cur.fetchall():
        data.append(row)

I would like to know what is the proper solution to fetch this amount of data in python in a reasonable time.我想知道在合理的时间内在 python 中获取这么多数据的正确解决方案是什么。

It's not the query that is slow, it's the stacking of the data with data.append(row) .慢的不是查询,而是数据与data.append(row)的堆叠。

Try using尝试使用

data.extend(cur.fetchall())

for starters.对于初学者。 It will avoid the repeated single-row appending, but rather append the entire set of rows coming from fetchall at once.它将避免重复的单行追加,而是一次追加来自fetchall的整组行。

You will have to tune arraysize and prefetchrow parameters.您将必须调整 arraysize 和 prefetchrow 参数。 I was having the same issue.我有同样的问题。 Increasing arraysize resolved the issue.增加 arraysize 解决了这个问题。 Choose the values based on the memory you have.根据您的记忆选择值。

Link: https://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows链接: https ://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html?highlight=arraysize#choosing-values-for-arraysize-and-prefetchrows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 sqlalchemy 从 mysql 获取大量数据的最佳方法是什么? - What is the best way to fetch huge data from mysql with sqlalchemy? 使用Python从Oracle以数据帧的形式读取海量数据的最快方法 - Fastest way to read huge volume of data as dataframe from Oracle using Python 从一个巨大的表中检索数据 - Retrieving data from a huge table 什么是更快,更Python化的读取CSV并从中创建数据帧的方法? - What is a faster and more Pythonic way to read the CSV and make a data frame from it? 在 Python 中从 Oracle 获取大量数据 - Fetching huge data from Oracle in Python 读取并绘制从大文件中读取的数据的图形 - Reading and graphing data read from huge files 有效地从巨大的CSV文件中读取数据 - read data from a huge CSV file efficiently 根据另一个数据帧对一个数据帧中的数据进行排序的最佳方法是什么? - What is the best way to sort data from one data frame based on another data frame? 寻找一种从数据框中的列生成统计表的方法 - Looking for a way to produce a table of statistics from columns in a data frame 使用python从oracle数据库中的多个表中获取数据并将这些数据插入到另一个表中 - Fetch data from multiple tables from oracle database using python and insert those data into another table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM