[英]Pandas to_csv progress bar with tqdm
正如標題所暗示的,我試圖在執行pandas.to_csv
顯示進度條。
我有以下腳本:
def filter_pileup(pileup, output, lists):
tqdm.pandas(desc='Reading, filtering, exporting', bar_format=BAR_DEFAULT_VIEW)
# Reading files
pileup_df = pd.read_csv(pileup, '\t', header=None).progress_apply(lambda x: x)
lists_df = pd.read_csv(lists, '\t', header=None).progress_apply(lambda x: x)
# Filtering pileup
intersection = pd.merge(pileup_df, lists_df, on=[0, 1]).progress_apply(lambda x: x)
intersection.columns = [i for i in range(len(intersection.columns))]
intersection = intersection.loc[:, 0:5]
# Exporting filtered pileup
intersection.to_csv(output, header=None, index=None, sep='\t')
在前幾行中,我找到了一種集成進度條的方法,但此方法對最后一行不起作用,我該如何實現?
您可以將數據幀分成n
行的塊,並使用 mode='w' 第一行和 mode="a" 將數據幀逐塊保存到 csv 塊中:
例子:
import numpy as np
import pandas as pd
from tqdm import tqdm
df = pd.DataFrame(data=[i for i in range(0, 10000000)], columns = ["integer"])
print(df.head(10))
chunks = np.array_split(df.index, 100) # chunks of 100 rows
for chunck, subset in enumerate(tqdm(chunks)):
if chunck == 0: # first row
df.loc[subset].to_csv('data.csv', mode='w', index=True)
else:
df.loc[subset].to_csv('data.csv', header=None, mode='a', index=True)
輸出:
integer
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
100%|██████████| 100/100 [00:12<00:00, 8.12it/s]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.