[英]What query should I write for regex to capture the indicated paragraph formats and skip the rest?
I am trying to write a regex query to capture either forms of following paragraphs from 'DIAGNOSIS' until before 'Board of pathologists' and ignoring the rest.我正在尝试编写一个正则表达式查询来捕获从“诊断”到“病理学家委员会”之前的以下段落中的任何一种形式,并忽略其余部分。 What is a good regex query for this?什么是好的正则表达式查询?
("" indicate the beginning and the end of paragraphs and not included in the wanted string) (“”表示段落的开头和结尾,不包含在想要的字符串中)
("THIS IS DIAGNOSIS..." and "diagnosis result" are sample texts for the sake of the question and are replaced by different things in the data) (“THIS IS DIAGNOSIS...”和“diagnosis result”是问题的示例文本,并由数据中的不同内容替换)
Paragraph format 1:段落格式1:
" ”
DIAGNOSIS:诊断:
A- THIS IS THE DIAGNOSIS, NO.1: A- 这是诊断,NO.1:
B- THIS IS THE DIAGNOSIS, NO.2: B- 这是诊断,NO.2:
Board of pathologists: .病理学家委员会:。 . . . .
" ”
Paragraph format 2:段落格式2:
" ”
DIAGNOSIS:诊断:
THIS IS THE DIAGNOSIS:这是诊断:
Board of pathologists:病理学家委员会:
. . . . . .
" ”
I used "DIAGNOSIS:(\\s*)((\\w*.\\s*)*)".我使用了“诊断:(\\s*)((\\w*.\\s*)*)”。 I know that this captures almost anything after diagnosis and my output shows that :) I couldn't find any better solution to capture the paragraphs.我知道这会在诊断后捕获几乎所有内容,并且我的输出显示:) 我找不到任何更好的解决方案来捕获这些段落。
You could match ^DIAGNOSIS:
form the start of the string.您可以匹配^DIAGNOSIS:
形成字符串的开头。
Then you could repeatedly match the following lines that do not start with Board of pathologists: using a negative lookahead (?:(?!Board of pathologists:).*\\r?\\n)*
然后,您可以重复匹配以下不以Board of pathologists开头的行: using a negative lookahead (?:(?!Board of pathologists:).*\\r?\\n)*
^DIAGNOSIS:\s*(?:\r?\n)(?:(?!Board of pathologists:).*\r?\n)*
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.