How to properly print the whole result of a for loop (Python)?

Question

I have a function that, from an input XML, gets the child tags of a certain tag and puts their index and tags as key, values in a dictionary. After having filtered this dictionary, I take the values and get their text, then I delete some elements in the text. The problem is that the text is not returned as one line, put "piece by piece". I use the "end" workaround to have it in one line, but it is still a problem because it is still not the whole text. Also, the print must be inside the loop if I want to print something.

This is the code:

def get_xml_by_tag_names(xml_path, tag_name_1, tag_name_2):

    data = {}
    xml_tree = minidom.parse(xml_path)
    item_group_nodes = xml_tree.getElementsByTagName(tag_name_1)
    for idx, item_group_node in enumerate(item_group_nodes):
        cl_compile_nodes = item_group_node.getElementsByTagName(tag_name_2)
        for _ in cl_compile_nodes:
            data[idx]=[item_group_node.toxml()]
    return data

def main():
    lista_prima = []
    uncinata1 = " < "
    uncinata2 = " >"
    punto = "."
    virgola = ","
    puntoevirgola = ";"
    dash = "-"
    puntoesclamativo = "!"
    duepunti = ":"
    apostrofo = "’"
    puntointerrogativo = "?"
    angolate = "<>"
    data = get_xml_by_tag_names('output2.xml', 'new_line', 'text')
    deletekeys = []
    for k in data:
        for v in data[k]:
            if "10.238" in v:
                # return k
                deletekeys.append(k)
    for item in deletekeys:
        del data[item]



    for value in data.values():
        myxml = ' '.join(value)
        # print(myxml)

        tree = ET.fromstring(myxml)
        lista = ([text.text for text in tree.findall('text')])
        testo = (' '.join(lista))
        testo = testo.replace(uncinata1, "")
        testo = testo.replace(uncinata2, "")
        testo = testo.replace(punto, "")
        testo = testo.replace(virgola, "")
        testo = testo.replace(puntoevirgola, "")
        testo = testo.replace(dash, "")
        testo = testo.replace(puntoesclamativo, "")
        testo = testo.replace(duepunti, "")
        testo = testo.replace(apostrofo, "")
        testo = testo.replace(puntointerrogativo, "")
        testo = testo.replace(angolate, "")


    print(testo)


if __name__ == "__main__":
    main()

My XML is:

<pages>
      <page id="1" bbox="0.000,0.000,462.047,680.315" rotate="0">
        <textbox id="0" bbox="191.745,592.218,249.042,603.578">
    <textline>
         <new_line>
                  <text font="NUMPTY+ImprintMTnum" bbox="297.284,540.828,300.188,553.310" colourspace="DeviceGray" ncolour="0" size="12.482">della quale non conosce che una parte;] </text>
                  <text font="PYNIYO+ImprintMTnum-Italic" bbox="322.455,540.839,328.251,553.566" colourspace="DeviceGray" ncolour="0" size="12.727">prima</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="331.206,545.345,334.683,552.834" colourspace="DeviceGray" ncolour="0" size="7.489">1</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="177.602,528.028,180.850,540.510" colourspace="DeviceGray" ncolour="0" size="12.482">che nonconosce ancora appieno;</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="189.430,532.545,192.908,540.034" colourspace="DeviceGray" ncolour="0" size="7.489">2</text>
                  <text font="NUMPTY+ImprintMTnum" bbox="203.879,528.028,208.975,540.510" colourspace="DeviceGray" ncolour="0" size="12.482">che</text>
                </new_line>
    </textline>
<textline bbox="68.032,408.428,372.762,421.166">
<new_line>
          <text font="NUMPTY+ImprintMTnum" bbox="307.143,408.428,310.392,420.910" colourspace="DeviceGray" ncolour="0" size="12.482">viso] vi</text>
          <text font="NUMPTY+ImprintMTnum" bbox="310.280,408.808,313.243,419.046" colourspace="DeviceGray" ncolour="0" size="10.238">-</text>
          <text font="PYNIYO+ImprintMTnum-Italic" bbox="320.072,408.439,325.868,421.166" colourspace="DeviceGray" ncolour="0" size="12.727">su</text>
          <text font="NUMPTY+ImprintMTnum" bbox="328.829,408.428,338.452,420.910" colourspace="DeviceGray" ncolour="0" size="12.482">m</text>
        </new_line>
</textline>
    </textbox>
    </page>
    </pages>

Basically it returns the text of the XML one at a time like this:

piece of text
piece of text
piece of text

But I need the whole text together because I can't process it further otherwise. If I print outside the loop it prints just one line.

I tried print(testo, end = " ") but even though it prints it in one line, it still can't be processed.

Answer 1

When you exit the for value loop, testo is just the value of testo from the last iteration - you never preserved the previous values of testo in any way. Possible fix:

testo = []
for value in data.values():
    myxml = ' '.join(value)
    tree = ET.fromstring(myxml)
    tmpstring = ' '.join(text.text for text in tree.findall('text')))
    for to_remove in (" < ", " >", ".", ",", ";", "-", "!", ":", "’", "?", "<>"):
        tmpstring = tmpstring.replace(to_remove, "")
    testo.append(tmpstring)

testo = ''.join(testo)
print(testo)

How to properly print the whole result of a for loop (Python)?

Question

1 answers

solution1
0 ACCPTED 2020-04-22 12:43:53

How to properly print the whole result of a for loop (Python)?

Question

1 answers

solution1 0 ACCPTED 2020-04-22 12:43:53

solution1
0 ACCPTED 2020-04-22 12:43:53