簡體   English   中英

帶有 tqdm 的 Pandas to_csv 進度條

[英]Pandas to_csv progress bar with tqdm

正如標題所暗示的,我試圖在執行pandas.to_csv顯示進度條。
我有以下腳本:

def filter_pileup(pileup, output, lists):
    tqdm.pandas(desc='Reading, filtering, exporting', bar_format=BAR_DEFAULT_VIEW)
    # Reading files
    pileup_df = pd.read_csv(pileup, '\t', header=None).progress_apply(lambda x: x)
    lists_df = pd.read_csv(lists, '\t', header=None).progress_apply(lambda x: x)
    # Filtering pileup
    intersection = pd.merge(pileup_df, lists_df, on=[0, 1]).progress_apply(lambda x: x)
    intersection.columns = [i for i in range(len(intersection.columns))]
    intersection = intersection.loc[:, 0:5]
    # Exporting filtered pileup
    intersection.to_csv(output, header=None, index=None, sep='\t')

在前幾行中,我找到了一種集成進度條的方法,但此方法對最后一行不起作用,我該如何實現?

您可以將數據幀分成n行的塊,並使用 mode='w' 第一行和 mode="a" 將數據幀逐塊保存到 csv 塊中:

例子:

import numpy as np
import pandas as pd
from tqdm import tqdm

df = pd.DataFrame(data=[i for i in range(0, 10000000)], columns = ["integer"])

print(df.head(10))

chunks = np.array_split(df.index, 100) # chunks of 100 rows

for chunck, subset in enumerate(tqdm(chunks)):
    if chunck == 0: # first row
        df.loc[subset].to_csv('data.csv', mode='w', index=True)
    else:
        df.loc[subset].to_csv('data.csv', header=None, mode='a', index=True)

輸出:

   integer
0        0
1        1
2        2
3        3
4        4
5        5
6        6
7        7
8        8
9        9

100%|██████████| 100/100 [00:12<00:00,  8.12it/s]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM