简体   繁体   English

关于 Python 中正则表达式的问题

[英]Questions About Regular Expressions in Python

I am learning about regular expressions, But I'm having trouble making sense of certain things.我正在学习正则表达式,但我在理解某些事情时遇到了麻烦。 I am working on an assignment to use regex's to find various characters, and words in a string:我正在处理一项使用正则表达式查找字符串中的各种字符和单词的作业:

Using the findall function, get all of the instances of non alphanumeric characters in the string assigned to 'lorem_ipsum'使用 findall 函数,获取分配给 'lorem_ipsum' 的字符串中所有非字母数字字符的实例

Output to the console, the number of non-alphanumeric characters.输出到控制台的非字母数字字符数。 Hint: use the len function.提示:使用 len 函数。 Use the ^ and [] regular expression operator along with the findall() regular expression function.将 ^ 和 [] 正则表达式运算符与 findall() 正则表达式函数一起使用。

pattern = re.compile(r'sit-:amet')
occurrance_sit_amet = pattern.findall(lorem_ipsum)
for match in occurrance_sit_amet:
  print (match)

Why would I use the len function?为什么要使用 len 函数? and even more puzzling, why would I use ^ and [], when they can only be used to find characters at the start of the string and characters in brackets?更令人费解的是,为什么我要使用^和[],因为它们只能用于查找字符串开头的字符和括号中的字符? Also my code gave me this error:我的代码也给了我这个错误:

Cannot read property 'toISOString' of undefined - 9b2bb9d0-119a-11e8-95f3-4351563e5e1b无法读取未定义的属性“toISOString” - 9b2bb9d0-119a-11e8-95f3-4351563e5e1b

can someone explain what that means?有人可以解释一下这是什么意思吗?

I think you misunderstand what the hints are (though I admit they are quite misleading).我认为您误解了提示是什么(尽管我承认它们具有误导性)。 When it says "use ^ and [] ", it's not telling you to use a character class ( [] ) and a start of string anchor ( ^ ).当它说“使用^[] ”时,它并不是告诉您使用字符类 ( [] ) 和字符串锚点的开头 ( ^ )。 It's telling you to combine the characters [] and ^ together to form a reverse character class [^] .它告诉您将字符[]^组合在一起以形成反向字符类[^] Everything inside a reverse character class will not be matched.反向字符类中的所有内容都不会匹配。

The regex you need is this:你需要的正则表达式是这样的:

[^a-zA-Z0-9]

It means "everything but az, AZ or 0-9"它的意思是“除了az、AZ 或 0-9 之外的所有东西”

Regarding the the len function, your task is to find how many non alphanumeric characters there are, isn't it?关于len函数,您的任务是找出有多少非字母数字字符,不是吗? findall returns a list of the matches. findall返回匹配列表。 That's why you need to find the length of the list to find out how many such characters there are.这就是为什么你需要找到列表的长度来找出有多少这样的字符。

Here is some code:这是一些代码:

pattern = re.compile(r'[^a-zA-Z0-9]')
allMatches = pattern.findall(lorem_ipsum)
print(len(allMatches))

To answer to your questions:回答您的问题:

  1. Using findall returns matches , which is a list of all substrings that match your pattern.使用findall返回matches ,它是与您的模式匹配的所有子字符串的列表。 So len(matches) should give you the "number of non-alphanumeric characters."所以len(matches)应该给你“非字母数字字符的数量”。
  2. ^ can be used in conjunction with [] to do a search for patterns NOT containing certain characters . ^可以与[]结合使用来搜索不包含某些字符的模式

A few issues with your regex that might be causing that error:可能导致该错误的正则表达式的一些问题:

  • The hyphen.连字符。 Hyphens denote a range when placed between two characters, so right now the regex is trying to find characters ranging from t to : , which doesn't mean anything.连字符在两个字符之间表示一个范围,所以现在正则表达式试图找到从t:字符,这没有任何意义。
  • The alphabetical characters in the regex.正则表达式中的字母字符。 Aside from the colon, right now r'sit-:amet' matches a string composed of specific lowercase alphabetical letters one after another.除了冒号,现在r'sit-:amet'匹配一个由特定小写字母组成的字符串一个接一个。 Even if the regex worked, this would not match your desired "non-alphanumeric" pattern.即使正则表达式有效,这也不符合您想要的“非字母数字”模式。

What you need is a negative search for anything alphanumeric:您需要的是对任何字母数字进行否定搜索:

[^A-Za-z0-9]

PS I highly recommend using regexr whenever you have regex assingments. PS我强烈建议使用regexr只要你有正则表达式assingments。 It's a great way to check how pattern matching syntax works and test your regexs :)这是检查模式匹配语法如何工作并测试您的正则表达式的好方法:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM