简体   繁体   中英

How to extract information from xml file with python

I have a xml file and I need to extract some information from them. The information that I need are in the tag tckrsymb that is the stock id and the other informations are the in this tags ntlfinvol, frstpric, MaxPric, MinPric that is the values.

My issue:

1- A have to combine the information of the stock id that is in the tag tckrsymb with the other information that are in the others tags, if is possible in one line separated by "|", like the example bellow.

Example: NEOE3F | 130667.98 | 25.79 | 25.95 | 25.52

2- Some tag don't have the information that is in on the ntlfinvol, frstpric, MaxPric, MinPric tags. In this case I have to put 0 like the example bellow

Example: IDIJ20P282800 | 0 | 0 | 0 | 0

The code that I did until now. In that code I can extract the Id stock information and put as a key in dictionary, but the other values information is always the same because I'm using the find method of the bs4 library , and this case only return the first occurrence. I need the extract the stock id and the all the values information (if exits ) for each stock id.

from bs4 import BeautifulSoup as bs
import lxml

# Read the XML file
with open('BVBG.086.01_BV000328202001200328000001926374007.xml',"r") as file:
    bs_content = bs(file, "lxml")

    paper = {}

    for t in bs_content.find_all("tckrsymb"):
        valor = []

        volume_financeiro = float(bs_content.find('ntlfinvol').get_text())
        valor.append(volume_financeiro)

        preco_abertura = float(bs_content.find("frstpric").get_text())
        valor.append(preco_abertura)

        paper[t.get_text()] = valor

    print(paper)

The result of my program is : {'IDIJ20P282800': [130667.98, 25.79], 'NEOE3F': [130667.98, 25.79], 'DIRRR157': [130667.98, 25.79], 'IDIJ20P282900': [130667.98, 25.79], 'BRFSA4': [130667.98, 25.79], 'BRFSC4': [130667.98, 25.79]}

The exeample of xml file click in this link

I also have software that performs the reading of these files and others from B3, (BVBG.017.02, BVBG.013.02 ...), but the language I program is Kotlin, so I don't know if it will be of help. In Kotlin if I configured a tag as optional, in the object, if it doesn't exist in the document, Kotlin returns me null. If it is null, I set the value to zero. Most likely python must have something similar.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM