正则表达式在最后一次出现字符后查找文本直到另一个字符

Question

I am looking to find a regular expression to extract information starting with " including: " and ending with the text after the last occurence of the character "\n*" or "\n•" until the character "\n".我正在寻找一个正则表达式来提取以“包括：”开头并以字符“\n*”或“\n•”最后一次出现之后的文本结尾的信息，直到字符“\n”。 In other words, i am trying to find an ending as the first occurence of "\n" right after the last occurence of "\n*" or "\n•".换句话说，我试图在最后一次出现“\n*”或“\n•”之后找到一个结尾作为“\n”的第一次出现。 I have tried this demo but doesn't work as i want it to.我已经尝试过这个演示，但没有按我的意愿工作。 I would like to include the next sentence untill "guidance.\n".我想包括下一句，直到“指导。\n”。 I am using python and i am trying to extract that to a new column in my pandas DataFrame called "Skills".我正在使用 python 并且我正在尝试将其提取到我的 pandas DataFrame 称为“技能”的新列中。 The "Job Description" column has the information “职位描述”列包含信息

df["Skills"]=df["Job description"].str.extract("including:((?:.)*\\n[*|•])")

Answer 1

You might use你可能会使用

(?s)\bincluding:(.*\\n[*•]).*?\\n(?![*•])

(?s) Inline modifier to make the dot match a newline (?s)内联修饰符使点匹配换行符
\bincluding: Match including: preceded by a word boundary \bincluding:匹配including:前面有一个单词边界
( Capture group 1 (捕获组 1
- .*\\n[*•] Match till the last occurrence of \n followed by either * or • .*\\n[*•]匹配直到最后出现的\n后跟 * 或 •
( Close group 1 (关闭组 1
.*?\\n Match till the first occurrence of \n .*?\\n匹配直到第一次出现\n

Regex demo正则表达式演示

Or when \\n is a real newline或者当\\n是一个真正的换行符时

(?s)\bincluding:(.*\n[*•]).*?\n(?![*•])

Regex demo正则表达式演示

For example例如

df["Skills"] = df["Job description"].str.extract(r"(?s)\bincluding:(.*\n[*•]).*?\n(?![*•])")

正则表达式在最后一次出现字符后查找文本直到另一个字符

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-10 15:54:19

正则表达式在最后一次出现字符后查找文本直到另一个字符

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-10 15:54:19

解决方案1
1 已采纳 2020-12-10 15:54:19