Python正则表达式没有给出所需的输出

Question

I'm scraping a site which contains the following string我正在抓取一个包含以下字符串的站点

"1 Year+ in Category"

or in some cases或在某些情况下

"1 Year+ by user in Category

I want to separate the Year, Category and the User.我想将年份、类别和用户分开。 I tried using regular split but it doesn't work in this case because there are two delimiters 'in' and 'by'.我尝试使用常规拆分，但在这种情况下不起作用，因为有两个分隔符“in”和“by”。 So, I used regex.所以，我使用了正则表达式。 It kinda works but not properly.它有点工作但不正确。 Here is the snippet这是片段

dateandcat=re.split(r'.\s[in , by]',rightside[0])

rightside[0] contains date,category and user. rightside[0] 包含日期、类别和用户。 It results in the following output:结果如下：

['1 Year', 'n Movies']
['1 Year', 'y user', 'n TV shows']
['1 Year', 'y user', 'n TV shows']
['1 Year', 'n Movies']

I could just trim off first two characters in [1] and [2] but I want to fix the regex.我可以剪掉 [1] 和 [2] 中的前两个字符，但我想修复正则表达式。 Why is second character of 'in' and 'by' still showing?为什么“in”和“by”的第二个字符仍然显示？ How do I fix this?我该如何解决？

Answer 1

Try using:尝试使用：

import re

value = "1 Year+ in Category by User"

match = re.match(r"(\d+ \w+\+?) in (\w+)(?: by (\w+)*)?", value)
if match:
    print(match.groups())

Output:输出：

('1 Year+', 'Category', 'User')

You can use regex101 to learn more about that regex and others.您可以使用regex101了解有关该正则表达式和其他内容的更多信息。

Python正则表达式没有给出所需的输出

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-09-13 08:28:05

Python正则表达式没有给出所需的输出

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-09-13 08:28:05

解决方案1
0 已采纳 2020-09-13 08:28:05