简体   繁体   English

python 正则表达式:从.txt 文件中逐行捕获不同的字符串

[英]python regex: capture different strings line by line from .txt file

I need to extract names/strings from a.txt file line by line.我需要逐行从a.txt 文件中提取名称/字符串。 I am trying to use regex to do this.我正在尝试使用正则表达式来执行此操作。

Eg.例如。 In this below line I want to extract the name "Victor Lau", "Siti Zuan" and the string "TELEGRAPHIC TRANSFER" in three different lists then output them into an excel file.在下面这一行中,我想提取三个不同列表中的名称“Victor Lau”、“Siti Zuan”和字符串“TELEGRAPHIC TRANSFER”,然后将它们 output 提取到一个 excel 文件中。 You may see the txt file also您可能还会看到 txt 文件

TELEGRAPHIC TRANSFER 0008563668 040122 BRH BDVI0093 VICTOR LAU 10,126.75-.00 10,126.75- SITI ZUZAN 16:15:09电汇 0008563668 040122 BRH BDVI0093 VICTOR LAU 10,126.75-.00 10,126.75- SITI ZUZAN 16:15:09

I have tried this code我试过这段代码

for file in os.listdir(directory):
     filename = os.fsdecode(file)
     if (filename.endswith(".txt") or filename.endswith(".TXT")) and (filename.find('AllBanks')!=-1):
        with open(file) as AllBanks:
            for line in AllBanks:
                try:
                    match4 = re.search(r'( [a-zA-Z]+ [a-zA-Z]+ [a-zA-Z]+ )|( [a-zA-Z]+ [a-zA-Z]+)', line)                    
                    List4.append(match4.group(0).strip())                     
                except:
                    List4.append('NA')
df = pd.DataFrame(np.column_stack([List4,List5,List6]),columns=['a', 'b', 'c'])
df.to_excel('AllBanks.xlsx', index=False)

Your text file looks to be fixed width columns - no delimiters.您的文本文件看起来是固定宽度的列 - 没有分隔符。 You can use re capture groups like '^(.{20})(.{15})(.{30})'您可以使用重新捕获组,例如 '^(.{20})(.{15})(.{30})'

or you can specify the columns start position and width and use that to splice out the data from each row.或者您可以指定列开始 position 和宽度,并使用它来拼接每行的数据。

This method will parse 2 columns from each line of your file and return an array of rows, each with an array of columns.此方法将从文件的每一行解析 2 列,并返回一个行数组,每个行都有一个列数组。

def parse(filename):
    fixed_columns = [[0, 28], [71, 50]] # start pos and width pairs of columns you want
    rows = []
    with open(filename) as file:
        for line in file:
            cols = []
            for start,wid in fixed_columns:
                cols.append(line[start: start+wid].strip())
            rows.append(cols)
    return rows

for row in parse(filename):
    print(", ".join(row))

Output: Output:

TELEGRAPHIC TRANSFER, LIEW WAI KEEN
TELEGRAPHIC TRANSFER, KWAN SANG@KWAN CHEE SANG
TELEGRAPHIC TRANSFER, VICTOR LAU
TELEGRAPHIC TRANSFER, VICTOR LAU

From here you can save the data any way you like.您可以从这里以任何您喜欢的方式保存数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM