[英]Python RegEx , how to find words that start with uppercase followed by lower case?
I have the following string我有以下字符串
Date: 20/8/2020 Duration: 0.33 IP: 110.1.x.x Server:01
I'm applying findall
as a way to split my string when I apply findall
it split I & P how can I change expression to get this output当我应用findall
时,我将findall
用作拆分字符串的一种方式,它拆分了 I & P 如何更改表达式以获得此 output
['Date: 20/8/2020 ', 'Duration: 0.33 ', 'IP: 110.1.x.x ', 'Server:01']
text = "Date: 20/8/2020 Duration: 0.33 IP: 110.1.x.x Server:01"
my_list = re.findall('[a-zA-Z][^A-Z]*', text)
my_list
['Date: 20/8/2020 ', 'Duration: 0.33 ', 'I', 'P: 110.1.x.x ', 'Server:01']
Look for any string that begins with either two uppercase letters, or an uppercase followed by a lowercase, and then match until you find either the same pattern or end of line.查找以两个大写字母或一个大写字母后跟一个小写字母开头的任何字符串,然后进行匹配,直到找到相同的模式或行尾。
>>> re.findall(r'([A-Z][a-zA-Z].*?)\s*(?=[A-Z][a-zA-Z]|$)', text)
['Date: 20/8/2020', 'Duration: 0.33', 'IP: 110.1.x.x', 'Server:01']
You may also wish to use this to create a dictionary.您可能还希望使用它来创建字典。
>>> dict(re.split(r'\s*:\s*', m, 1) for m in re.findall(r'([A-Z][a-zA
-Z].*?)\s*(?=[A-Z][a-zA-Z]|$)', text))
{'Date': '20/8/2020', 'Duration': '0.33', 'IP': '110.1.x.x', 'Server': '01'}
With Regex you should always be as precise as possible.使用正则表达式,您应该始终尽可能精确。 So if you know that your input data always looks like that, I would suggest writing the full words in Regex.所以如果你知道你的输入数据看起来总是那样,我建议用 Regex 写完整的单词。
If that's not what you want you have to make a sacrifice of certainty:如果那不是你想要的,你必须牺牲确定性:
You can use:您可以使用:
(?<!\S)[A-Z][a-zA-Z]*:\s*\S+
Explanation解释
(?<!\S)
[AZ][a-zA-Z]*:
Match an uppercase char AZ, optional chars a-zA-Z followed by :
[AZ][a-zA-Z]*:
匹配大写字符 AZ,可选字符 a-zA-Z 后跟:
\s*\S
Match optional whitespace chars and 1+ non whitespace chars \s*\S
匹配可选的空白字符和 1+ 个非空白字符import re
pattern = r"(?<!\S)[A-Z][a-zA-Z]*:\s*\S+"
s = "Date: 20/8/2020 Duration: 0.33 IP: 110.1.x.x Server:01"
print(re.findall(pattern, s))
Output Output
['Date: 20/8/2020', 'Duration: 0.33', 'IP: 110.1.x.x', 'Server:01']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.