使用带有单词和数字的 RegEx 模式拆分字符串

Question

My question came along when trying to help in this post试图在这篇文章中提供帮助时出现了我的问题

I'm searching for a Regex pattern which splits this string at 1., 2. and 3. or in general: split after a digit (or more if the list would be longer) followed by a dot.我正在寻找一种 Regex 模式，该模式将此字符串拆分为 1.、2. 和 3. 或一般情况下：在一个数字（如果列表更长，则更多）之后拆分，然后是一个点。 Problem is that there are more numbers in the string which are needed.问题是字符串中需要更多的数字。

test_string = '1. Fruit 12 oranges 2. vegetables 7 carrot 3. NFL 246 SHIRTS'

With this pattern I managed to do so, but I got an empty string at the start and didn't know how to change that.有了这个模式，我设法做到了，但一开始我得到一个空字符串，不知道如何改变它。

l1 = re.split(r"\s?\d{1,2}\.", test_string)

Output l1:
['', ' Fruit 12 oranges', ' vegetables 7 carrot', ' NFL 246 SHIRTS']

So I changed from split it to search something that finds the pattern.所以我从拆分它改为搜索可以找到模式的东西。

l2 = re.findall(r"(?:^|(?<=\d\.))([\sa-zA-Z0-9]+)(?:\d\.|$)", pattern)

Output l2:
[' Fruit 12 oranges ', ' vegetables 7 carrot ', ' NFL 246 SHIRTS']

It is really close to be fine with it, just the trailing whitespace at the beginning of every element in the list.它真的很接近于它，只是列表中每个元素开头的尾随空格。

What would be a good and efficient approach for my task?对于我的任务来说，什么是好的和有效的方法？ Stick with the splitting with re.split() or building a pattern and use re.findall() ?坚持使用re.split()进行拆分还是构建模式并使用re.findall() ？ Is my pattern good like I have done it or is it way too complicated?我的模式是否像我所做的那样好，还是太复杂了？

Answer 1

By just adding twice (?:\s) to your expression:只需在表达式中添加两次 (?:\s) ：

re.findall(r"(?:^|(?<=\d\.))(?:\s)([\sa-zA-Z0-9]+)(?:\s\d\.|$)", test_string)

the output is: ['Fruit 12 oranges', 'vegetables 7 carrot', 'NFL 246 SHIRTS'] output 是： ['Fruit 12 oranges', 'vegetables 7 carrot', 'NFL 246 SHIRTS']

使用带有单词和数字的 RegEx 模式拆分字符串

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-03-04 18:19:59

使用带有单词和数字的 RegEx 模式拆分字符串

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-03-04 18:19:59

解决方案1
0 已采纳 2022-03-04 18:19:59