简体   繁体   English

如何构建正则表达式来查找以\\ n和字母开头以及以数字或单词结尾的单词?

[英]How to build regex for finding words that start with `\n` and letter and end with digit OR word?

Here's an example of string, spacing after digit could be different. 这是一个字符串示例,数字后的空格可能不同。

product_list = 'Buy:\n Milk \nYoughurt 4 \nBread  \nSausages 4     \nBanana '

I want to build a regexp with the following output: 我想用以下输出构建一个正则表达式:

import re

re.findall(r'some pattern', product_list)
['Milk', 'Youghurt 4', 'Bread', 'Sausages 4', 'Banana']

This is what I thought it should look like. 这就是我认为的样子。 However, it returns empty list: 但是,它返回空列表:

re.findall(r'\n(\w+\w$))', product_list)

The approach of the below script is to first strip off the leading term:\\n in this case Buy:\\n . 以下脚本的方法是首先删除开头的term:\\n在这种情况下为Buy:\\n Then, we use re.findall with the following pattern to find all matches: 然后,将re.findall与以下模式结合使用以查找所有匹配项:

(.+?)\s*(?:\n|$)

This says to capture anything up until the first optional whitespace character, which is then followed by a newline, or the end of the string. 这表示要捕获直到第一个可选的空白字符为止的所有内容,然后再跟换行符或字符串的末尾。

product_list = 'Buy:\n Milk \nYoughurt 4 \nBread  \nSausages 4     \nBanana '
product_list = re.sub(r'^[^\s]*\s+', '', product_list)

matches = re.findall(r'(.+?)\s*(?:\n|$)', product_list)
print(matches)

['Milk', 'Youghurt 4', 'Bread', 'Sausages 4', 'Banana']

I would suggest to use a non-regex (a regex seems expensive), if you can guarantee similar pattern of input: 如果可以保证类似的输入模式,我建议使用非正则表达式(正则表达式似乎很昂贵):

list(map(lambda x: x.strip(), product_list.split('\n')))[1:]

Code : 代码

product_list = 'Buy:\n Milk \nYoughurt 4 \nBread  \nSausages 4     \nBanana '

print(list(map(lambda x: x.strip(), product_list.split('\n')))[1:])
# ['Milk', 'Youghurt 4', 'Bread', 'Sausages 4', 'Banana']

This example can be done without a regex, split on : and then \\n 此示例可以在不使用正则表达式的情况下完成:先在:分割,然后\\n

actual_list = 'Buy:\n Milk \nYoughurt 4 \nBread  \nSausages 4     \nBanana '
product_list = actual_list.split(':')[1]
processed_list = [product.strip() for product in product_list.split('\n') if product.strip() != '']
print(processed_list)
#['Milk', 'Youghurt 4', 'Bread', 'Sausages 4', 'Banana']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM