[英]Regex for extracting words before punctuation's
I'm trying to extract phrase which occur before a punctuation but is of the form capitalized words in a phrase. 我试图提取出现在标点符号之前的短语,但该短语的形式为大写单词。
Abstract Algebra .
抽象代数 。 the area of modern mathematics that considers algebraic structures to be sets with operations defined on them, and extends algebraic concepts usually associated with the real number system to other more general systems, such as groups, rings, fields, modules and vector spaces.
现代数学领域,它考虑将代数结构设置为具有定义的运算的集合,并将通常与实数系统关联的代数概念扩展到其他更通用的系统,例如组,环,场,模块和向量空间。
Algebra.
代数。 a branch of mathematics that uses symbols or letters to represent variables, values or numbers, which can then be used to express operations and relationships and to solve equations.
数学的一个分支,使用符号或字母表示变量,值或数字,然后可以使用它们表示运算和关系以及求解方程式。
Algebraic Expression .
代数表达式 。 a combination of numbers and letters equivalent to a phrase in language, eg x2 + 3x - 4.
数字和字母的组合,相当于语言中的短语,例如x2 + 3x-4。
Analytic (Cartesian) Geometry: the study of geometry using a coordinate system and the principles of algebra and analysis, thus defining geometrical shapes in a numerical way and extracting numerical information from that representation.
解析(笛卡尔)几何:使用坐标系以及代数和分析原理研究几何,从而以数字方式定义几何形状并从该表示中提取数字信息。
Inductive reasoning or logic: a type of reasoning that involves moving from a set of specific facts to a general conclusion, indicating some degree of support for the conclusion without actually ensuring its truth.
归纳推理或逻辑:一种推理,涉及从一组特定事实转变为一般结论,表示对结论的某种程度的支持,而没有实际确保其真实性。
Currently I'm using the following regex: 目前,我正在使用以下正则表达式:
(([? ])([A-Z][a-z\s]+)?([A-Z][a-z\s]+?[.:]))
I have two issues with this. 我有两个问题。
One reason not matching more than 1 word for the current data is that the pattern starts with [? ]
当前数据不匹配超过1个单词的原因之一是模式以
[? ]
[? ]
which will match either a space or question mark. [? ]
,它将匹配空格或问号。
You might also omit some of the capturing groups and use a single one. 您也可以省略某些捕获组,而只使用一个。 Note that you don't have to make this match
[az\\s]+?[.:]
non greedy using a ?
请注意,您不必使用
?
使此匹配[az\\s]+?[.:]
非贪心?
because the character class does not contain a .
因为字符类不包含
.
or :
或
:
To get the capitalized words followed by either .
要得到大写的单词,紧接着是任一个
.
or :
you could use: 或
:
您可以使用:
\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)[.:]
Explanation 说明
\\b
Word boundary \\b
字边界 (
Capture group 1 (
捕获组1
[AZ][az]+
(?:\\s+[AZ][az]+)*
Repeat 0+ times matching AZ and 1+ times az (?:\\s+[AZ][az]+)*
重复0+次匹配AZ和1+次Az )
Close group )
封闭小组 [.:]
Match either .
[.:]
匹配任何一个.
or :
:
If you also want to match words surrounded by (
and )
you might use an alternation. 如果您还想匹配用
(
和)
包围的单词,则可以使用交替形式。
\b((?:\([A-Z][a-z]+\)|[A-Z][a-z]+)(?:\s+(?:\([A-Z][a-z]+\)|[A-Z][a-z]+))*)[.:]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.