简体   繁体   English

仅当带空格,句点或什么都没有正则表达式时,才使用Python匹配字符串中的字母吗?

[英]Use Python to match a letter in a string only when followed by a space, period, or nothing, without regex?

I am trying to write this code for readability but the last 'for x in measurements' clearly doesn't work. 我正在尝试编写此代码以提高可读性,但是最后一个“ for x in measurement”显然不起作用。

The following prints ' t' but I don't want it to match on ' test' 以下打印“ t”,但我不希望它与“ test”匹配
I do want it to match on ' t' of 'this is at' if it were a test case. 如果确实是测试用例,我确实希望它与“ this at at”的“ t”相匹配。

Possible without resorting to regex? 是否可以不使用正则表达式?

measurements = ['t', 'tsp', 'T', 'tbl', 'tbs', 'tbsp', 'c']
measurements = ([' ' + x + ' ' for x in measurements] + #space on either side
                [' ' + x + '.' for x in measurements] + #space in front, period in back
                [' ' + x + '' for x in measurements])   #space in front, nothing in back???

string_to_check = 'this is a test'

for measurement in measurements:
    if measurement in string_to_check:
        print(measurement)

Here you could use re.search 在这里您可以使用re.search

>>> measurements = ['t', 'tsp', 'T', 'tbl', 'tbs', 'tbsp', 'c']
>>> measurements = ([' ' + x + ' ' for x in measurements] + [' ' + x + '\.' for x in measurements] + [' ' + x + r'\b' for x in measurements])
>>> measurements
[' t ', ' tsp ', ' T ', ' tbl ', ' tbs ', ' tbsp ', ' c ', ' t\\.', ' tsp\\.', ' T\\.', ' tbl\\.', ' tbs\\.', ' tbsp\\.', ' c\\.', ' t\\b', ' tsp\\b', ' T\\b', ' tbl\\b', ' tbs\\b', ' tbsp\\b', ' c\\b']
>>> string_to_check = 'this is a test'
>>> for measurement in measurements:
    if re.search(measurement, string_to_check):
         print(measurement)


>>>

I had done two things here. 我在这里做了两件事。

  • [' ' + x + '\\.' for x in measurements] [' ' + x + '\\.' for x in measurements] , escape the dot in-order to match a literal dot, since dot is a special meta character in regex which matches any character. [' ' + x + '\\.' for x in measurements] ,请按顺序转义点以匹配文字点,因为点是regex中的特殊元字符,可以匹配任何字符。

  • [' ' + x + r'\\b' for x in measurements] add word boundary \\b , since \\b matches between a word character and a non-word character, it won't pick spacet from <space>test [' ' + x + r'\\b' for x in measurements]添加字边界\\b ,由于\\b单词字符和非字符字之间的匹配,因此不会接spacet<space>test

The problem is that you're coded for a different meaning of 'nothing behind it' than you're thinking of. 问题在于,您所编码的含义“与它无关”。

You've included the string ' t' in your array which is a substring of the string 'this is a test' [namely, it's sitting there at the front of the word test]. 您已经在数组中包括了字符串't',它是字符串'this is a test'的子字符串[即,它位于单词test的前面]。

If you want 'nothing behind it' to mean 'at the end of the string' then you'll have to check what's at the end of the string instead of using substring search. 如果您想让“后面没有内容”的意思是“字符串末尾”,那么您必须检查字符串末尾是什么,而不是使用子字符串搜索。

measurements 测量
[' t ', ' tsp ', ' T ', ' tbl ', ' tbs ', ' tbsp ', ' c ', ' t.', ' tsp.', ' T.', ' tbl.', ' tbs.', ' tbsp.', ' c.', ' t', ' tsp', ' T', ' tbl', ' tbs', ' tbsp', ' c']

You can find ' t' in measurements.So ' t' in your check string "this is a[ t]est". 您可以在测量结果中找到“ t”。因此在您的检查字符串“ this is aest”中找到“ t”。
so, it's right to return ' t'. 因此,返回“ t”是正确的。

if you want to exactly match ' t' not ' txxx', you need to 如果您想完全匹配't'而不是'txxx',则需要
[' ' + x + r'\\b' for x in measurements]

A possible non-regex approach is to split string_to_check into a list of words. 一种可能的非正则表达式方法是将string_to_check分成单词列表。 Then in will look for a word that matches exactly. 然后in将寻找完全匹配的单词。

measurements = ['t', 'tsp', 'T', 'tbl', 'tbs', 'tbsp', 'c']

string_to_check = 'this is a test'
words_to_check = string_to_check.replace('.', ' ').split()
for measurement in measurements:
    if measurement in words_to_check:
        print(measurement)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM