简体   繁体   English

正则表达式从 python 中的字符串中提取 substring

[英]regex to extract a substring from a string in python

How do we get the following substring from a string using re in python.我们如何在 python 中使用 re 从字符串中获取以下 substring。

string1 = "fgdshdfgsLooking: 3j #123"
substring = "Looking: 3j #123"

string2 = "Looking: avb456j #13fgfddg"
substring = "Looking: avb456j #13"

tried:试过:

re.search(r'Looking: (.*#\d+)$', string1)

You need to remove the $ from the regex:您需要从正则表达式中删除$

 re.search(r'Looking: (.*#\d+)', string1)

If you also want re to return Looking , you'll have to wrap it in parens:如果您还想re返回Looking ,则必须将其包装在括号中:

 re.search(r'(Looking: (.*#\d+))', string1)

Try,尝试,

re.search(r'Looking: (.)*#(\d)+', string1)

  1. It will match "Looking: "它将匹配“正在寻找:”
  2. After that it will look for 0 or more any character之后它将寻找 0 个或更多任何字符
  3. After that a "#"之后一个“#”
  4. and 1 or more digits和 1 个或多个数字

在此处输入图像描述

Your regex is mostly correct, you just need to remove EOL(End of Line) $ as in some case like string2 the pattern does not end with a EOL, and have some extra string after the pattern ends.您的正则表达式大部分是正确的,您只需要删除 EOL(End of Line) $因为在某些情况下,例如string2模式不以 EOL 结尾,并且在模式结束后有一些额外的字符串。

import re

string1 = 'fgdshdfgsLooking: 3j #123'
string2 = 'Looking: avb456j #13fgfddg'

pattern = r'Looking: (.*?#\d+)'

match1 = re.search(pattern, string1)
match2 = re.search(pattern, string2)

print('String1:', string1, '|| Substring1:', match1.group(0))
print('String2:', string2, '|| Substring2:', match2.group(0))

Output: Output:

String1: fgdshdfgsLooking: 3j #123 || Substring1: Looking: 3j #123
String2: Looking: avb456j #13fgfddg || Substring2: Looking: avb456j #13

should work, also I've matched everything before # lazily by using ?应该可以工作,而且我已经在# lazily 之前通过使用匹配了所有内容? to match as few times as possible, expanding as needed, that is to avoid matching everything upto second # , in case there is a second # followed by few digits in the string somewhere further down.尽可能少地匹配,根据需要扩展,即避免将所有内容匹配到第二个# ,以防第二个#后跟字符串中的几位数字。

Live Demo现场演示

try this:尝试这个:

re.search("[A-Z]\w+:\s?\w+\s#\d+",string1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM