简体   繁体   English

使用Python从Oracle以数据帧的形式读取海量数据的最快方法

[英]Fastest way to read huge volume of data as dataframe from Oracle using Python

I need to read huge amount of data from Oracle (around 1 million and 450 columns) and do a bulk load in Greenplum. 我需要从Oracle中读取大量数据(大约100万列和450列),并在Greenplum中进行批量加载。 I am using following approach: 我正在使用以下方法:

import pandas as pd
from psycopg2 import *
from sqlalchemy import create_engine
import cx_Oracle
import sqlalchemy
import psycopg2 as pg
import io

engineor = create_engine('oracle+cx_oracle://xxxx:xxxx@xxxxx:xxxx/?service_name=xxxxx')
sql = "select * from xxxxxx"
enginegp = create_engine('xxxxx@xxxxx:xxxx/xxxx')
connection = enginegp.raw_connection()
output = io.StringIO()
for df in pd.read_sql(sql, engineor, chunksize=10000):
df.to_csv(output, header=False, index=False,mode='a')
output.seek(0)
cur = connection.cursor()
cur.copy_expert("COPY test FROM STDIN WITH CSV NULL '' ", output)
connection.commit()
cur.close()

I have been reading the data in chunks: 我一直在大块读取数据:

for df in pd.read_sql(sql, engineor, chunksize=10000):
    df.to_csv(output, header=False, index=False,mode='a')

Is there a quicker and seamless way to read big tables from Oracle as a dataframe? 有没有更快,更无缝的方式从Oracle作为数据帧读取大表? This method just works, and doesn't seem seamless as connection to Oracle times out or killed by DBA at times, and it runs successfully at times. 该方法可以正常工作,并且似乎无法无缝连接,因为与Oracle的连接有时会被DBA终止或终止,并且有时会成功运行。 Seems less reliable given the table size. 给定表的大小,似乎不太可靠。 I need this as a dataframe as I need to load it in to Greenplum later using copy method. 我需要将此作为数据框,因为稍后需要使用复制方法将其加载到Greenplum中。

Outsourcer was specifically created to do what you are trying to do but it was written in Java. 专门创建Outsourcer是为了执行您想做的事情,但是它是用Java编写的。

http://www.pivotalguru.com/?page_id=20 http://www.pivotalguru.com/?page_id=20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中读取巨大的MySQL表的最快方法 - Fastest way to read huge MySQL table in python 与使用python的大文件B相比,从大文件A中查找唯一行的最快方法是什么? - What's the fastest way to find unique lines from huge file A as compared to huge file B using python? 从hDf5文件中读取巨大的numpy数组(带有图像数据)的最快方法 - Fastest way to read huge numpy arrays (with image data) from hDf5 files 在Python中从磁盘读取复杂数据结构的最快方法 - Fastest way to read complex data structures from disk in Python 将巨大的Python列表数据导出到文本文件的最快方法 - Fastest way to export data of huge Python lists to a text file python匹配具有巨大数据量的字符串的最快方法 - python fastest way to match strings with huge data size 将大字典转换为数据框的最快方法 - Fastest way to convert huge dictionary to a dataframe 从Pandas DataFrame存储数据的最快方法 - Fastest way to store data from Pandas DataFrame 从 oracle 表中读取大量数据并提取到数据框中的最佳方法是什么 - What is an optimum way to read huge data from oracle table and fetch into a data frame 使用python将大量XLS数据加载到Oracle中 - loading huge XLS data into Oracle using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM