简体   繁体   English

如何匹配字符串中的第一个单词?

[英]How to match the first word in a string?

I want to match the word 'St' or 'St.'我想匹配单词'St''St.' or 'st' or 'st.''st''st.' BUT only as the first word of a string.但仅作为字符串的第一个单词。 For example 'St. Mary Church Church St.'例如'St. Mary Church Church St.' 'St. Mary Church Church St.' - should find ONLY first 'St.' - 应该只找到第一个'St.' . .

  • 'st. Mary Church Church St.' - should find ONLY 'st.' - 应该只找到'st.'
  • 'st Mary Church Church St.' - should find ONLY 'st' - 应该只找到'st'

I want to eventually replace the first occurrence with 'Saint'.我想最终用“Saint”替换第一次出现的地方。

Regex sub allows you to define the number of occurence to replace in a string. 正则表达式sub允许您定义要在字符串中替换的出现次数。

ie : 即:

>>> import re
>>> s = "St. Mary Church Church St."
>>> new_s = re.sub(r'^(St.|st.|St|st)\s', r'Saint ', s, 1) # the last argument defines the number of occurrences to be replaced. In this case, it will replace the first occurrence only.
>>> new_s
'Saint Mary Church Church St.'
>>> 

Hope it hepls. 希望它帮助。

You don't need to use a regex for this, just use the split() method on your string to split it by whitespace. 您不需要为此使用正则表达式,只需在字符串上使用split()方法即可将其按空格分隔。 This will return a list of every word in your string: 这将返回字符串中每个单词的列表:

matches = ["St", "St.", "st", "st."]
name = "St. Mary Church Church St."
words = name.split()   #split the string into words into a list
if words [0] in matches:
    words[0] = "Saint"   #replace the first word in the list (St.) with Saint
new_name = "".join([word + " " for word in words]).strip()   #create the new name from the words, separated by spaces and remove the last whitespace
print(new_name)   #Output: "Saint Mary Church Church St."

Thanks for the question! 谢谢你的提问! This is exactly what I was looking for to solve my issue. 这正是我要解决的问题。 I wanted to share another regex trick I found while hunting around for this answer. 我想分享我在寻找这个答案时发现的另一个正则表达式技巧。 You can simply pass the flag paramater into the sub function. 您可以简单地将flag参数传递给sub This will allow you to reduce the amount of information you need to pass to the pattern paramater in the tool. 这将使您减少传递给工具中的pattern参数所需的信息量。 This makes the code a little cleaner and reduces the chances of you missing a pattern. 这样可以使代码更加简洁,并减少您错过模式的机会。 Cheers! 干杯!

import re
s = "St. Mary Church Church St."
new_s = re.sub(r'^(st.|st)\s', r'Saint ', s, 1, flags=re.IGNORECASE) # You can shorten the code from above slightly by ignoring the case
new_s
'Saint Mary Church Church St.'
import re

string = "Some text"

replace = {'St': 'Saint', 'St.': 'Saint', 'st': 'Saint', 'st.': 'Saint'}
replace = dict((re.escape(k), v) for k, v in replace.iteritems())
pattern = re.compile("|".join(replace.keys()))
for text in string.split():
    text = pattern.sub(lambda m: replace[re.escape(m.group(0))], text)

This should work I guess, please check. 我猜这应该可行,请检查。 Source 资源

Try using the regex '^\\S+' to match the first non-space character in your string. 尝试使用正则表达式'^\\S+'来匹配字符串中的第一个非空格字符。

import re 

s = 'st Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st'

s = 'st. Mary Church Church St.'
m = re.match(r'^\S+', s)
m.group()    # 'st.'

Python 3.10 introduced a new Structural Pattern Matching feature (otherwise known as match/case ) which can fit this use-case: Python 3.10 引入了一个新的结构模式匹配功能(也称为match/case ),可以适合这个用例:

s = "St. Mary Church Church St."

words = s.split()
match words:
    case ["St" | "St." | "st" | "st.", *rest]:
        print("Found st at the start")
        words[0] = "Saint"
    case _:
        print("didn't find st at the start")

print(' '.join(words))

Will give:会给:

Found st at the start
Saint Mary Church Church St.

While using s = "Mary Church Church St."使用s = "Mary Church Church St." will give:会给:

didn't find st at the start
Mary Church Church St.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM