简体   繁体   English

在正则表达式 python 中排除大写单词的正确方法是什么

[英]What's the proper way to exclude uppercase word/s in regex python

Let's say I've scrapped this from a website.假设我已经从网站上删除了它。

PARIS - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (2015).巴黎 - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua (2015)。 Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat 22/05/2015. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat 22/05/2015。 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur。 Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

  • I can just use.replace ('PARIS - ','') and then get the texts with regex, but what if the place is changing in different article?我可以只使用 .replace ('PARIS - ','') 然后用正则表达式获取文本,但是如果不同文章中的位置发生变化怎么办?
  • How do I exclude the first "Paris" and " - " and get the other texts如何排除第一个“Paris”和“-”并获取其他文本
  • Should I seperate between the location and the content with regex?我应该用正则表达式分隔位置和内容吗?
  • What should I think or do first when facing problem like this?遇到这样的问题,我首先应该想什么或做什么?

Here's my code to get the first string for my third question, assume that text is variable that contains these texts这是我的代码,用于获取第三个问题的第一个字符串,假设文本是包含这些文本的变量

location = re.findall('^\w+', text)

Use a regular expression that matches a sequence of uppercase letters and spaces followed by a hyphen at the beginning, and replaces it with an empty string.使用匹配一系列大写字母和空格后跟开头连字符的正则表达式,并将其替换为空字符串。

text = re.sub(r'^[A-Z\s]+\s-\s*', '', text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM