在正则表达式 python 中排除大写单词的正确方法是什么

Question

Let's say I've scrapped this from a website.假设我已经从网站上删除了它。

PARIS - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (2015).巴黎 - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (2015)。 Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat 22/05/2015. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat 22/05/2015。 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur。 Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I can just use.replace ('PARIS - ','') and then get the texts with regex, but what if the place is changing in different article?我可以只使用 .replace ('PARIS - ','') 然后用正则表达式获取文本，但是如果不同文章中的位置发生变化怎么办？
How do I exclude the first "Paris" and " - " and get the other texts如何排除第一个“Paris”和“-”并获取其他文本
Should I seperate between the location and the content with regex?我应该用正则表达式分隔位置和内容吗？
What should I think or do first when facing problem like this?遇到这样的问题，我首先应该想什么或做什么？

Here's my code to get the first string for my third question, assume that text is variable that contains these texts这是我的代码，用于获取第三个问题的第一个字符串，假设文本是包含这些文本的变量

location = re.findall('^\w+', text)

Answer 1

Use a regular expression that matches a sequence of uppercase letters and spaces followed by a hyphen at the beginning, and replaces it with an empty string.使用匹配一系列大写字母和空格后跟开头连字符的正则表达式，并将其替换为空字符串。

text = re.sub(r'^[A-Z\s]+\s-\s*', '', text)

在正则表达式 python 中排除大写单词的正确方法是什么

问题描述

1 个解决方案

解决方案1
1 2022-09-30 05:02:27

在正则表达式 python 中排除大写单词的正确方法是什么

问题描述

1 个解决方案

解决方案1 1 2022-09-30 05:02:27

解决方案1
1 2022-09-30 05:02:27