繁体   English   中英

使用Python将变量字符串追加到文件中的每一行

[英]Appending variable string to each line in a file with Python

我几乎没有在Python中使用文件r / w的经验,并且想问一下针对我的特殊情况的最佳解决方案是什么。

我有一个具有以下结构的制表符分隔文件,其中每个句子都由空白行分隔:

Roundup NN
:   :
Muslim  NNP
Brotherhood NNP
vows    VBZ
daily   JJ
protests    NNS
in  IN
Egypt   NNP

Families    NNS
with    IN
no  DT
information NN
on  IN
the DT
whereabouts NN
of  IN
loved   VBN
ones    NNS
are VBP
grief   JJ
-   :
stricken    JJ
.   .

The DT
provincial  JJ
departments NNS
of  IN
supervision NN
and CC
environmental   JJ
protection  NN
jointly RB
announced   VBN
on  IN
May NNP
9   CD
that    IN
the DT
supervisory JJ
department  NN
will    MD
question    VB
and CC
criticize   VB
mayors  NNS
who WP
fail    VBP
to  TO
curb    VB
pollution   NN
.   .

(...)

我想附加到此文件的非空行,首先附加一个制表符,然后附加给定的字符串。

对于每一行,要附加的字符串将取决于以下代码中存储在lab_pred_tags中的值。 对于for循环的每次迭代, lab_pred_tags的长度与文本文件中相应句子的行数相同。 即,在该示例中,长度lab_pred_tags为3 for循环迭代是9,15和12。

对于第一个for循环迭代, lab_pred_tags包含以下list['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']

# (...) code to calculate lab_pred
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths):
    lab = lab[:length]
    lab_pred = lab_pred[:length]
    # Convert lab_pred from a sequence of numbers to a sequence of strings
    lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags)
    # Now what is the best solution to append each element of `lab_pred_tags` to each line in the file?
    # Keep in mind that I will need to skip a line everytime a new for loop iteration is started

对于此示例,所需的输出文件为:

Roundup NN  O
:   :   O
Muslim  NNP B-ORG
Brotherhood NNP I-ORG
vows    VBZ O
daily   JJ  O
protests    NNS O
in  IN  O
Egypt   NNP B-GPE

Families    NNS O
with    IN  O
no  DT  O
information NN  O
on  IN  O
the DT  O
whereabouts NN  O
of  IN  O
loved   VBN O
ones    NNS O
are VBP O
grief   JJ  O
-   :   O
stricken    JJ  O
.   .   O

The DT  O
provincial  JJ  O
departments NNS O
of  IN  O
supervision NN  O
and CC  O
environmental   JJ  O
protection  NN  O
jointly RB  O
announced   VBN O
on  IN  O
May NNP O
9   CD  O
that    IN  O
the DT  O
supervisory JJ  O
department  NN  O
will    MD  O
question    VB  O
and CC  O
criticize   VB  O
mayors  NNS O
who WP  O
fail    VBP O
to  TO  O
curb    VB  O
pollution   NN  O
.   .   O

最好的解决方案是什么?

为了进行测试,我修改了lab_pred_tags列表。 这是我的解决方案:

    lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O',
                     'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O',
                     'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O',
                     'O', 'O', 'B-GPE', 'O']
    index = 0

    with open("PATH_TO_YOUR_FILE", "r") as lab_file, \
            open("PATH_TO_NEW_FILE", "w") as lab_file_2:
        lab_file_list = lab_file.readlines()

        for lab_file_list_element in lab_file_list:
            if lab_file_list_element != "\n":
                new_line_element = lab_file_list_element.replace(
                    "\n", ' ' + lab_pred_tags[index] + "\n"
                )
                index += 1
                lab_file_2.write(new_line_element)
            if lab_file_list_element == "\n":
                index = 0
                lab_file_2.write("\n")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM