如何將 pandas dataframe 划分為多個較小的數據幀或元組列表？

Question

我正在使用 pandas.read_csv(path, low_memory=False) 將大型 csv 文件讀取到 memory 我想將某些行組逐行提取並插入數據庫中。 我知道第 11 行到第 62 行 go 到一個表中，第 65 行到第 10000 行 go 到另一個表中，有辦法從 Z6A8064B5DF479455557DZCC 中獲取行子集以循環遍歷 4794555500553。 如果行的元素 2 不是 nan，我還需要只處理子集中的數據。 謝謝你的幫助

Answer 1

您的問題有幾個解決方案。 來自pandas read_csv 文檔

跳過

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

跳過頁腳

Number of lines at bottom of file to skip (Unsupported with engine='c').

行

Number of rows of file to read. Useful for reading pieces of large files.

對您來說最直觀的解決方案是

df1 = pd.read_csv(path, low_memory=False, skiprows=65, nrows=10000-65)

但當然你也可以 go for

df1 = pd.read_csv(path, low_memory=False, skiprows=65, skipfooter=total_rows-10000)

Answer 2

您可以簡單地使用：

dataframe_name['column_name'] (conditions) (value)

例子：

dataframe['row_num'] > 200

如何將 pandas dataframe 划分為多個較小的數據幀或元組列表？

問題描述

2 個解決方案

解決方案1
0 2020-06-12 19:54:43

解決方案2
0 2020-06-12 19:56:17

如何將 pandas dataframe 划分為多個較小的數據幀或元組列表？

問題描述

2 個解決方案

解決方案1 0 2020-06-12 19:54:43

解決方案2 0 2020-06-12 19:56:17

解決方案1
0 2020-06-12 19:54:43

解決方案2
0 2020-06-12 19:56:17