python 正則表達式：從.txt 文件中逐行捕獲不同的字符串

Question

我需要逐行從a.txt 文件中提取名稱/字符串。 我正在嘗試使用正則表達式來執行此操作。

例如。 在下面這一行中，我想提取三個不同列表中的名稱“Victor Lau”、“Siti Zuan”和字符串“TELEGRAPHIC TRANSFER”，然后將它們 output 提取到一個 excel 文件中。 您可能還會看到 txt 文件

電匯 0008563668 040122 BRH BDVI0093 VICTOR LAU 10,126.75-.00 10,126.75- SITI ZUZAN 16:15:09

我試過這段代碼

for file in os.listdir(directory):
     filename = os.fsdecode(file)
     if (filename.endswith(".txt") or filename.endswith(".TXT")) and (filename.find('AllBanks')!=-1):
        with open(file) as AllBanks:
            for line in AllBanks:
                try:
                    match4 = re.search(r'( [a-zA-Z]+ [a-zA-Z]+ [a-zA-Z]+ )|( [a-zA-Z]+ [a-zA-Z]+)', line)                    
                    List4.append(match4.group(0).strip())                     
                except:
                    List4.append('NA')
df = pd.DataFrame(np.column_stack([List4,List5,List6]),columns=['a', 'b', 'c'])
df.to_excel('AllBanks.xlsx', index=False)

Answer 1

您的文本文件看起來是固定寬度的列 - 沒有分隔符。 您可以使用重新捕獲組，例如 '^(.{20})(.{15})(.{30})'

或者您可以指定列開始 position 和寬度，並使用它來拼接每行的數據。

此方法將從文件的每一行解析 2 列，並返回一個行數組，每個行都有一個列數組。

def parse(filename):
    fixed_columns = [[0, 28], [71, 50]] # start pos and width pairs of columns you want
    rows = []
    with open(filename) as file:
        for line in file:
            cols = []
            for start,wid in fixed_columns:
                cols.append(line[start: start+wid].strip())
            rows.append(cols)
    return rows

for row in parse(filename):
    print(", ".join(row))

Output：

TELEGRAPHIC TRANSFER, LIEW WAI KEEN
TELEGRAPHIC TRANSFER, KWAN SANG@KWAN CHEE SANG
TELEGRAPHIC TRANSFER, VICTOR LAU
TELEGRAPHIC TRANSFER, VICTOR LAU

您可以從這里以任何您喜歡的方式保存數據。

python 正則表達式：從.txt 文件中逐行捕獲不同的字符串

問題描述

1 個解決方案

解決方案1
1 已采納 2022-03-11 05:08:23

python 正則表達式：從.txt 文件中逐行捕獲不同的字符串

問題描述

1 個解決方案

解決方案1 1 已采納 2022-03-11 05:08:23

解決方案1
1 已采納 2022-03-11 05:08:23