简体   繁体   中英

Appending variable string to each line in a file with Python

I have barely any experience with file r/w in Python, and wanted to ask what the best solution for my particular case is.

I have a tab separated file with the following structure, where each sentence is separated by a blank line:

Roundup NN
:   :
Muslim  NNP
Brotherhood NNP
vows    VBZ
daily   JJ
protests    NNS
in  IN
Egypt   NNP

Families    NNS
with    IN
no  DT
information NN
on  IN
the DT
whereabouts NN
of  IN
loved   VBN
ones    NNS
are VBP
grief   JJ
-   :
stricken    JJ
.   .

The DT
provincial  JJ
departments NNS
of  IN
supervision NN
and CC
environmental   JJ
protection  NN
jointly RB
announced   VBN
on  IN
May NNP
9   CD
that    IN
the DT
supervisory JJ
department  NN
will    MD
question    VB
and CC
criticize   VB
mayors  NNS
who WP
fail    VBP
to  TO
curb    VB
pollution   NN
.   .

(...)

I want to append to the non-empty lines of this file, first a tab and then a given string.

For each line, the string to append will depend on the value stored in lab_pred_tags in the code below. For each iteration of the for loop, lab_pred_tags has the same length as the number of lines as its corresponding sentence in the text file. ie, in the example, the lengths of lab_pred_tags for the 3 for loop iterations are 9, 15, and 12.

For the first for loop iteration, lab_pred_tags contains the list : ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']

# (...) code to calculate lab_pred
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths):
    lab = lab[:length]
    lab_pred = lab_pred[:length]
    # Convert lab_pred from a sequence of numbers to a sequence of strings
    lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags)
    # Now what is the best solution to append each element of `lab_pred_tags` to each line in the file?
    # Keep in mind that I will need to skip a line everytime a new for loop iteration is started

For the example, the desired output file is:

Roundup NN  O
:   :   O
Muslim  NNP B-ORG
Brotherhood NNP I-ORG
vows    VBZ O
daily   JJ  O
protests    NNS O
in  IN  O
Egypt   NNP B-GPE

Families    NNS O
with    IN  O
no  DT  O
information NN  O
on  IN  O
the DT  O
whereabouts NN  O
of  IN  O
loved   VBN O
ones    NNS O
are VBP O
grief   JJ  O
-   :   O
stricken    JJ  O
.   .   O

The DT  O
provincial  JJ  O
departments NNS O
of  IN  O
supervision NN  O
and CC  O
environmental   JJ  O
protection  NN  O
jointly RB  O
announced   VBN O
on  IN  O
May NNP O
9   CD  O
that    IN  O
the DT  O
supervisory JJ  O
department  NN  O
will    MD  O
question    VB  O
and CC  O
criticize   VB  O
mayors  NNS O
who WP  O
fail    VBP O
to  TO  O
curb    VB  O
pollution   NN  O
.   .   O

What is the best solution for this?

For the testing purpose, I modified the lab_pred_tags list. Here is my solution:

    lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O',
                     'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O',
                     'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O',
                     'O', 'O', 'B-GPE', 'O']
    index = 0

    with open("PATH_TO_YOUR_FILE", "r") as lab_file, \
            open("PATH_TO_NEW_FILE", "w") as lab_file_2:
        lab_file_list = lab_file.readlines()

        for lab_file_list_element in lab_file_list:
            if lab_file_list_element != "\n":
                new_line_element = lab_file_list_element.replace(
                    "\n", ' ' + lab_pred_tags[index] + "\n"
                )
                index += 1
                lab_file_2.write(new_line_element)
            if lab_file_list_element == "\n":
                index = 0
                lab_file_2.write("\n")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM