简体   繁体   English

在特定短语之前查找字符串

[英]Finding string before certain phrase

Lets say the string representing the phrase is "Holy it is changing again and again" 假设表示该短语的字符串是"Holy it is changing again and again"

I want to print out the word "changing" before "again and again" , but this word may be different every time. 我想在"again and again"之前打印出"changing" "again and again" ,但是每次该词可能都不同。 So I need to extract the word before the phrase "again and again" . 所以我需要在短语"again and again"之前提取单词。 The phrase "holy it is" should not be extracted. 不应提取短语"holy it is"

How can I do that with Python? 如何使用Python做到这一点?

I thought about using Regex like here Python regex to match word before < but I'm not too sure how to code it right. 我曾考虑过使用Regex之类的Python正则表达式来匹配<之前的单词,但我不太确定如何正确编码。

To match any word followed by "again and again" , use this regex: 要匹配后面跟着"again and again" 任何单词,请使用此正则表达式:

  • ([\\w]*) again and again

If you want to include more characters, for example the apostrophe, replace [\\w] with [\\w'] , and similarly for other characters inside the square brackets (some requires escaping). 如果要包含更多字符,例如撇号,请将[\\w]替换为[\\w'] ,并类似地将方括号内的其他字符替换(某些字符需要转义)。

  • Holy it is changing again and again ! 天哪,它一次又一次地改变
  • We are going to play again, and play again and again ! 我们将再次玩,一次又一次地玩
  • OMG again and again ! OMG一次又一次
  • Let's go again and again . 让我们一次又一次 Again and again we go! 我们一次又一次地走!
  • I got roomba'd again and again (requires adding ') 一次又一次地得到Roomba'd (需要添加')
  • Foo became ABC again and again , Bar and Baz. Foo 一次又一次成为ABC ,Bar和Baz。 (requires adding the escaped hyphen) (需要添加转义的连字符)
  • More sample regexes! 更多示例正则表达式!

To find all occurrences of that pattern, use 要查找该模式的所有出现,请使用

The regex match = re.findall("([\\w']*) again and again", phrase) , where ([\\w']*) is any word (sequence of word characters, including the apostrophe. It returns a list of all the words followed by "again and again". 正则表达式match = re.findall("([\\w']*) again and again", phrase) ,其中([\\w']*)是任何单词(单词字符的序列,包括撇号。它返回a所有单词的列表,后跟“再次”。

phrase = "Holy it is changing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['changing']

phrase = "Going again, going again and again, and finishing again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['going', 'finishing']

phrase = "Defeated again and again! I got ninja'd again and again!"
match = re.findall("([\w']*) again and again", phrase)
# match is ['Defeated', "ninja'd"]
import re

text = '''

Holy it is changing again and again
Holy it is not changing again and again
Holy it has changed again and again
Holy it has changed once
Holy it used to change again and again
'''

prog = re.compile(r'(\w+) again and again');
for line in text.splitlines():
  x = prog.search(line)
  if(x): print(x.group(1))

This outputs: 输出:

changing
changing
changed
change

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM