[英]Regex to find text after last occurence of character till another one
I am looking to find a regular expression to extract information starting with " including: " and ending with the text after the last occurence of the character "\n*" or "\n•" until the character "\n".我正在寻找一个正则表达式来提取以“包括:”开头并以字符“\n*”或“\n•”最后一次出现之后的文本结尾的信息,直到字符“\n”。 In other words, i am trying to find an ending as the first occurence of "\n" right after the last occurence of "\n*" or "\n•".换句话说,我试图在最后一次出现“\n*”或“\n•”之后找到一个结尾作为“\n”的第一次出现。 I have tried this demo but doesn't work as i want it to.我已经尝试过这个演示,但没有按我的意愿工作。 I would like to include the next sentence untill "guidance.\n".我想包括下一句,直到“指导。\n”。 I am using python and i am trying to extract that to a new column in my pandas DataFrame called "Skills".我正在使用 python 并且我正在尝试将其提取到我的 pandas DataFrame 称为“技能”的新列中。 The "Job Description" column has the information “职位描述”列包含信息
df["Skills"]=df["Job description"].str.extract("including:((?:.)*\\n[*|•])")
You might use你可能会使用
(?s)\bincluding:(.*\\n[*•]).*?\\n(?![*•])
(?s)
Inline modifier to make the dot match a newline (?s)
内联修饰符使点匹配换行符\bincluding:
Match including:
preceded by a word boundary \bincluding:
匹配including:
前面有一个单词边界(
Capture group 1 (
捕获组 1
.*\\n[*•]
Match till the last occurrence of \n
followed by either * or • .*\\n[*•]
匹配直到最后出现的\n
后跟 * 或 •(
Close group 1 (
关闭组 1.*?\\n
Match till the first occurrence of \n
.*?\\n
匹配直到第一次出现\n
Or when \\n
is a real newline或者当\\n
是一个真正的换行符时
(?s)\bincluding:(.*\n[*•]).*?\n(?![*•])
For example例如
df["Skills"] = df["Job description"].str.extract(r"(?s)\bincluding:(.*\n[*•]).*?\n(?![*•])")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.