簡體 English 中英

從Teradata提取數百萬條記錄到Python（pandas）

[英]Extract a few million records from Teradata to Python (pandas)

原文 2018-06-26 05:28:56 1 1 python/ sql/ odbc/ teradata

我的Teradata表中有6個月的電子郵件數據（電子郵件屬性，如發送日期，主題行以及收件人詳細信息，如年齡，性別等，總共約20列）。 它總共約有2000萬，我想深入了解Python以用於進一步的預測建模。

我試圖使用“ pyodbc”連接器運行選擇查詢，但它只運行了幾個小時。 然后我停止了它並修改了查詢以僅獲取1個月的數據（可能是3-4百萬），但是仍然需要很長時間。

有沒有比“ pyodbc”更好（更快）的選擇或完全不同的方法？

任何輸入表示贊賞。 謝謝

1 個解決方案

在Python和Teradata之間進行通信時，我建議使用Teradata -package（pip teradata； https：//developer.teradata.com/tools/reference/teradata-python-module ）。 它利用ODBC（或REST）進行連接。

除此之外，您可以通過JayDeBeApi使用JDBC。 JDBC有時可能比ODBC快一些。

這兩個選項都支持Python數據庫API規范，因此您的周圍其他代碼都無需改動。 例如，pandas.read_sql在上面的連接中工作正常。

您的效果問題看起來像其他一些問題：

網絡連接
Python（Pandas）內存處理

廣告1）吞吐量只能由更高的吞吐量代替

廣告2），您可以嘗試在數據庫中進行盡可能多的操作（功能工程），並且您的本地計算機應具有RAM（“熊貓的經驗法則：RAM是數據集大小的5至10倍”）-也許Apache Arrow可以緩解您的一些本地RAM問題

校驗：

如何使用 Python Pandas 處理 800 萬條記錄

[英]How to process 8 Million records using Python Pandas

Python Dataframe 從幾百萬行的大日期時間索引中提取唯一日期列表

[英]Python Dataframe extract list of unique dates from a big datetimeindex of few million rows

從 SQL 服務器獲取百萬條記錄並保存到 pandas dataframe

[英]Fetching Million records from SQL server and saving to pandas dataframe

Python：百萬條記錄的緩慢處理

[英]Python: slow processing of million records

python從輸出中提取幾行

[英]python extract few lines from the output

將 1100 萬行從 Postgresql 導入到 Pandas/Python

[英]Importing 11 million rows from Postgresql to Pandas/Python

在一百萬條記錄上使用pandas group by的有效方法

[英]Efficient way to use pandas group by on a million records

如何使用 Python 處理 Oracle DB 中的 1000 萬條記錄。 (cx_Oracle / Pandas)

[英]How to process 10 million records in Oracle DB using Python. (cx_Oracle / Pandas)

從 Pandas/Python 中提取字符串

[英]Extract string from Pandas/Python

Python字典百萬記錄通過線程處理

[英]Python Dictionary million records process via threads

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何使用 Python Pandas 處理 800 萬條記錄 Python Dataframe 從幾百萬行的大日期時間索引中提取唯一日期列表從 SQL 服務器獲取百萬條記錄並保存到 pandas dataframe Python：百萬條記錄的緩慢處理 python從輸出中提取幾行將 1100 萬行從 Postgresql 導入到 Pandas/Python 在一百萬條記錄上使用pandas group by的有效方法如何使用 Python 處理 Oracle DB 中的 1000 萬條記錄。 (cx_Oracle / Pandas) 從 Pandas/Python 中提取字符串 Python字典百萬記錄通過線程處理

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM