Reading txt file line by line and field by field

Question

I'm struggling to read the attached TXT file to present as csv each field read from the file I made a code that comes close to what I want but I don't advance.

TXT file format:

    COMPANY TEST OF BRAZIL-        Junho/2022 Horista
      37-6  WALTER WHITE DA SILVA                 
         1006136-9   MOTORISTA            A33 1     00011523            


001 Hrs Normais Diurnas           183,333    2.555,66 +
031 Hrs Dsr Vencimento             36,667      511,14 +
037 Dsr Adicionais                             306,36 +
053 Reembolso de Vale Transpo                   47,61 +
102 Hrs Extras  ( 60%)             68,680    1.531,84 +
824 Vale Transporte                            500,00 +
290 Alimentacao Funcionario                                   10,50 -
404 Adiantamento Normal Desco                              1.011,95 -
476 Desconto Seconci Dependen                                 65,46 -
511 Inss Normal                                              522,87 -
561 Irf Normal                                                90,07 -
567 Irf Recol Adto                                           214,77 -
820 Desc de Vale Transporte                                  184,00 - 
 
                                             5.452,61+      2.099,62-
                                                  
                                                            3.352,99

       13,94      4.905,00      4.905,00       392,40      2.965,82


 COMPANY TEST OF BRAZIL-        Junho/2022 Horista
     102-0  WILTON PEATER TEMPLATE               
           31022-0   L EQUIPE B           000 1     00011524            


001 Hrs Normais Diurnas           183,333    2.220,16 +
031 Hrs Dsr Vencimento             36,667      444,04 +
037 Dsr Adicionais                             225,77 +
053 Reembolso de Vale Transpo                   26,40 +
102 Hrs Extras  ( 60%)             58,260    1.128,85 +
290 Alimentacao Funcionario                                   10,50 -
404 Adiantamento Normal Desco                                854,04 -
476 Desconto Seconci Dependen                                 98,19 -
511 Inss Normal                                              398,81 -
561 Irf Normal                                                48,77 -
567 Irf Recol Adto                                           211,64 -
820 Desc de Vale Transporte                                  159,85 -
 
 
 
 
 
                                             4.045,22+      1.781,80-
                                                  
                                                            2.263,42

       12,11      4.018,82      4.018,82       321,50      2.554,33

My code reads the first line in the positions I want but the lines below I can't, much less repeat the reading on the next payslip contained in the file.

    # Read TXT
    
    with open ("I:input\\test.txt", "r") as ft:
        head_text = ft.readline()
    # Capturar campos
        ## Head
    competence = head_text[46:59]
    
    company = head_text[:45]
    
    print('competence',';','company')
    print(competence,';',company,)

The output at the moment is this:

# competence;company
junho/2022; COMPANY TEST OF BRAZIL

how the exit should be

# competence;company;id_employee;employe;etc...
junho/2022;COMPANY TEST OF BRAZIL;37-6;WALTER WHITE DA SILVA...
junho/2022;COMPANY TEST OF BRAZIL;102-0;WILTON PEATER TEMPLATE...

Reading and capturing the data line by line I have to finish a payslip that will form a line in the output and the second payslip will form the second line in the output and so it will be until the end of the txt file At the moment I can't move forward and I'm lost.

Answer 1

I think that you may use the following code to have your desired output. You should make sure if your first data has similar template. You can also edit the template if your desired output needs to be edited. Please see the code:

!pip install ttp

from ttp import ttp
import json

with open ("test.txt", "r") as ft:
    data_to_parse = ft.read()

ttp_template = """
 {{Part_2|ORPHRASE}}-        {{Part_1}} {{ignore}}
     {{Part_3}}  {{Part_4|ORPHRASE}}
"""

def stack_test(data_to_parse):
    parser = ttp(data=data_to_parse, template=ttp_template)
    parser.parse()

    # print result in JSON format
    results = parser.result(format='json')[0]
    #print(results)

    #converting str to json. 
    result = json.loads(results)
    return(result)

# print(stack_test(data_to_parse))

for i in stack_test(data_to_parse)[0]:
    print(f"{i['Part_1']};{i['Part_2']};{i['Part_3']};{i['Part_4']}")

See the print(i) output first:

See also your desired output:

Reading txt file line by line and field by field

Question

1 answers

solution1
0 ACCPTED 2022-07-12 13:53:29

Reading txt file line by line and field by field

Question

1 answers

solution1 0 ACCPTED 2022-07-12 13:53:29

solution1
0 ACCPTED 2022-07-12 13:53:29