簡體   English   中英

使用 python 如何在文本文件的選擇行中插入字符串,其中插入的字符串取決於行的內容和已知映射?

[英]Using python how do I insert a string in select lines of a text file where the inserted string depends on the content of the line and a known mapping?

背景

我有一個文本文件(它是一個 DAT 文件),我想將其導入到按原樣格式化的程序中,盡管插入了一些小的附加字符串來選擇行。 該文件太大而無法手動進行細微更改。

任意選擇行具有以下定義屬性:

  • 它以select_string_開頭,后跟一個可以使用正則表達式檢測到的唯一字符串$_
  • 它以以下字符串集的成員結尾:{'string_A', 'string_B', 'string_C'}

對於每個選擇行,我想要插入的確切字符串取決於這些字符串成員中的哪一個出現在該行的末尾以及一個已知的映射。

(非選擇行包含任意字符串;它們不會按照一些簡單的順序出現。順便說一句,對於所有選擇行,上述唯一字符串$_后跟_blah_ ,這是正則表達式可檢測的)

所以我們有,從第 1 行開始,如下所示:

select_string_$__blah_string_A
non_select_arbitrary_string
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$__blah_string_A
non_select_arbitrary_string
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$__blah_string_B
select_string_$__blah_string_B
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$__blah_string_C
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$__blah_string_C

對於給定的選擇行,我要插入的文本屬於$_ ,並且我希望插入的特定字符串反映以下簡單(廣泛定義的)雙射函數f

f = {(string_A, f (string_A)), (string_B, f (string_B)), (string_C, f (string_C)))

以下字典捕獲了此映射:

{'string_A' : '*f*(string_A)', 'string_B' : '*f*(string_B)', 'string_C' : '*f*(string_C)'}

因此,以string_A為例:所有以string_A結尾的選擇string_A應該在$_之后插入f(string) 因此,我希望包含string_A所有選擇行如下所示:

select_string_$_f(string_A)_blah_string_A

從這個任意示例中概括我的問題如下:

使用 python 3,如何生成以下文本?

select_string_$_f(string_A)_blah_string_A
non_select_arbitrary_string
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_A)_blah_string_A
non_select_arbitrary_string
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_B)_blah_string_B
select_string_$_f(string_B)_blah_string_B
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_C)_blah_string_C
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_C)_blah_string_C

更一般地說:使用 python 如何在文本文件的選擇行中插入字符串,其中插入的字符串取決於行的內容和已知映射?

考慮到$_是您希望更改的所有行中的一個明顯指示符,我們可以檢查$_的存在,並進一步檢查string_a, b or c的存在。

string_a = 'string_A'
string_b = 'string_B'
string_c = 'string_C'

testcases = ['select_string_$__blah_string_A', 'select_string_$__blah_string_B', 'select_string_$__blah_string_C', 'non_select_arbitrary_string']

result = []

for test in testcases:
    if '$_' not in test:
        result.append(test)
        continue

    check = test.split('$_')

    if string_a in check[1]:
        result.append(f'$_({string_a})'.join(check))
    elif string_b in check[1]:
        result.append(f'$_({string_b})'.join(check))
    elif string_c in check[1]:
        result.append(f'$_({string_c})'.join(check))

print(result)

#['select_string_$_(string_A)_blah_string_A', 'select_string_$_(string_B)_blah_string_B', 'select_string_$_(string_C)_blah_string_C', 'non_select_arbitrary_string']

從這里您可以將result寫回文件。

import re

fin = open("input.txt", "r")
fout = open("output.txt", "w")

for line in fin:
    line = re.sub(r'^(select_string_\$_)(.*?(string_A|string_B|string_C))$', r'\1f(\3)\2', line)
    fout.write(line)

鑒於您的示例,這會產生:

select_string_$_f(string_A)_blah_string_A
non_select_arbitrary_string
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_A)_blah_string_A
non_select_arbitrary_string
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_B)_blah_string_B
select_string_$_f(string_B)_blah_string_B
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_C)_blah_string_C
non_select_arbitrary_string
non_select_arbitrary_string
select_string_$_f(string_C)_blah_string_C

正則表達式解釋:

^                                   # beginning of line
  (select_string_\$_)               # group 1, literally "select_string_$_"
  (                                 # group 2
    .*?                             # 0 or more any character
    (string_A|string_B|string_C)    # group 3 one of string_A or string_B or string_C
  )                                 # end group 3
$                                   # end of line

替代品:

\1              # content of group 1
f(\3)           # f(, content of group 3, )  
\2              # content of group 2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM