简体   繁体   English

Python,通过提取字符和数字子串来解析字符串

[英]Python, parse string by extracting characters and digits substring

I have a string, resulting from some machine learning algorithm, which is generally formed by multiple lines.我有一个字符串,由一些机器学习算法产生,通常由多行组成。 At the beginning and at the end there can be some lines not containing any characters (except for whitespaces), and in between there should be 2 lines, each containing a word followed by some numbers and (sometimes) other characters.在开头和结尾,可以有一些不包含任何字符的行(空格除外),中间应该有 2 行,每行包含一个单词,后跟一些数字和(有时)其他字符。

Something like this像这样的东西


first_word  3 5 7 @  4
second_word 4 5 67| 5 [


I need to extract the 2 words and the numeric characters.我需要提取 2 个单词和数字字符。

I can eliminate the empty lines by doing something like:我可以通过执行以下操作来消除空行:

lines_list = initial_string.split("\n")
for line in lines_list:
    if len(line) > 0 and not line.isspace():
        print(line)

but now I was wondering:但现在我想知道:

  1. if there is a more robust, general way如果有更强大的通用方法
  2. how to parse each of the remaining 2 central lines, by extracting the words and digits (and discard the other characters mixed in between the digits, if there are any)如何通过提取单词和数字来解析剩余的 2 条中心线中的每一条(并丢弃混合在数字之间的其他字符,如果有的话)

I imagine reg expressions could be useful, but I never really used them, so I'm struggling a little bit at the moment我想 reg 表达式可能很有用,但我从来没有真正使用过它们,所以我现在有点挣扎

I would use re.findall here:我会在这里使用 re.findall:

inp = '''first_word  3 5 7 @  4
second_word 4 5 67| 5 ['''
matches = re.findall(r'\w+', inp)
print(matches)  # ['first_word', '3', '5', '7', '4', 'second_word', '4', '5', '67', '5']

If you want to process each line separately, then simply split in the input on CR?LF and use the same approach:如果你想单独处理每一行,那么只需在 CR?LF 上拆分输入并使用相同的方法:

inp = '''first_word  3 5 7 @  4
second_word 4 5 67| 5 ['''
lines = inp.split('\n')
for line in lines:
    matches = re.findall(r'\w+', line)
    print(matches)

This prints:这打印:

['first_word', '3', '5', '7', '4']
['second_word', '4', '5', '67', '5']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM