简体   繁体   English

解析文件时的进度条

[英]Progress bar while parsing files

The code below goes to a directory that has xml files, it takes them and parses them into a dataframe.下面的代码进入一个包含 xml 文件的目录,它接受它们并将它们解析成一个数据帧。

from xml.etree import ElementTree as ET
from collections import defaultdict
from pathlib import Path
import csv
from pathlib import Path


directory = 'C:/Users/xml_files'

with open('try.csv', 'w', newline='') as f:
    writer = csv.writer(f, delimiter=';')
   #◙ writer = csv.writer(f)

    headers = ['identify','id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt','Counter', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']

    writer.writerow(headers)

    xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
    for xml_file in xml_files_list:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        p_get = tree.find('.//Phones/Get').text
        p_set = tree.find('.//Phones/Set').text


        start_nodes = root.findall('.//START')
        for sn in start_nodes:
            row = defaultdict(str)

            # <<<<< Indentation was wrong here
            for k,v in sn.attrib.items():
                row[k] = v
            for rn in sn.findall('.//Rational'):
                row['Rational'] = rn.text

            for qu in sn.findall('.//Qualify'):
                row['Qualify'] = qu.text

            for ds in sn.findall('.//Description'):
                row['Description_txt'] = ds.text
                row['Description_text_id'] = ds.attrib['text_id']



            for counter, st in enumerate( sn.findall('.//SetData') ):
                for k,v in st.attrib.items():
                    if v.startswith("-"):
                        v = v.replace("-","",1)
                    v=v.replace(',', '.')
                    row['SetData_'+ str(k)] = v
                row["Counter"] = counter 
                row_data = [row[i] for i in headers]
                row_data[0]=p_get + '_' + p_set
                writer.writerow(row_data)
                row = defaultdict(str)

Upon using more data, it is really hard to just wait there and not know how far the parsing into dataframe has been done.在使用更多数据时,真的很难在那里等待并且不知道解析到数据帧的程度。

So I went and tried to find a way I can show the progress bar.所以我去尝试找到一种可以显示进度条的方法。 I ended up finding the following我最终找到了以下内容

import tqdm
import time

for i in tqdm.tqdm(range(1000)):
    time.sleep(0.01)
    # or other long operations

I am having problem implementing the code into my code and finding the range which preferably would be to get the numbers of Xml files in that directory我在将代码实施到我的代码中并找到最好是获取该目录中 Xml 文件数量的范围时遇到问题

This library tqdm seemed like the easiest one to implement.这个库tqdm似乎是最容易实现的。

You could use你可以用

for xml_file in tqdm.tqdm(xml_files_list):

it should automatically use len(xml_files_list) and it will return xml_file .它应该自动使用len(xml_files_list)并且它会返回xml_file

And you don't need sleep() .而且你不需要sleep() It was used in documentation only to slow down loop for example.例如,它在文档中仅用于减慢循环速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM