简体   繁体   English

我应该为正则表达式编写什么查询来捕获指示的段落格式并跳过其余部分?

[英]What query should I write for regex to capture the indicated paragraph formats and skip the rest?

I am trying to write a regex query to capture either forms of following paragraphs from 'DIAGNOSIS' until before 'Board of pathologists' and ignoring the rest.我正在尝试编写一个正则表达式查询来捕获从“诊断”到“病理学家委员会”之前的以下段落中的任何一种形式,并忽略其余部分。 What is a good regex query for this?什么是好的正则表达式查询?

("" indicate the beginning and the end of paragraphs and not included in the wanted string) (“”表示段落的开头和结尾,不包含在想要的字符串中)

("THIS IS DIAGNOSIS..." and "diagnosis result" are sample texts for the sake of the question and are replaced by different things in the data) (“THIS IS DIAGNOSIS...”和“diagnosis result”是问题的示例文本,并由数据中的不同内容替换)

Paragraph format 1:段落格式1:

"

DIAGNOSIS:诊断:

A- THIS IS THE DIAGNOSIS, NO.1: A- 这是诊断,NO.1:

  • diagnosis results诊断结果

B- THIS IS THE DIAGNOSIS, NO.2: B- 这是诊断,NO.2:

  • diagnosis result诊断结果
  • another result另一个结果

Board of pathologists: .病理学家委员会:。 . . . .

"

Paragraph format 2:段落格式2:

"

DIAGNOSIS:诊断:

THIS IS THE DIAGNOSIS:这是诊断:

  • diagnosis results诊断结果

Board of pathologists:病理学家委员会:
. . . . . .

"

I used "DIAGNOSIS:(\\s*)((\\w*.\\s*)*)".我使用了“诊断:(\\s*)((\\w*.\\s*)*)”。 I know that this captures almost anything after diagnosis and my output shows that :) I couldn't find any better solution to capture the paragraphs.我知道这会在诊断后捕获几乎所有内容,并且我的输出显示:) 我找不到任何更好的解决方案来捕获这些段落。

You could match ^DIAGNOSIS: form the start of the string.您可以匹配^DIAGNOSIS:形成字符串的开头。

Then you could repeatedly match the following lines that do not start with Board of pathologists: using a negative lookahead (?:(?!Board of pathologists:).*\\r?\\n)*然后,您可以重复匹配以下不以Board of pathologists开头的行 using a negative lookahead (?:(?!Board of pathologists:).*\\r?\\n)*

^DIAGNOSIS:\s*(?:\r?\n)(?:(?!Board of pathologists:).*\r?\n)*

Regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM