简体   繁体   English

如何使用正则表达式查找某个单词下的数字?

[英]How to find a number under a certain word using regex?

Trying to get the number 811.00 when its placed under the word Size .尝试将数字811.00放在Size下。

I know how to get the number when its NEAR some word, like "Jerusalem" in this case.我知道如何在它靠近某个词时获取数字,例如本例中的“Jerusalem”。
But here I'm trying to get the number when it's under the word Size .但在这里,我试图在Size一词获取数字。

Property Size
Jerusalem 811.00
A new property agreement

Thanks, Couldn't Find any solution for this.谢谢,找不到任何解决方案。

This can be accomplished by a technique introduced in vertical regex matching and requires a regex flavor with support for possessive quantifiers and forward references like PCRE or Java .这可以通过垂直正则表达式匹配中引入的技术来实现,并且需要正则表达式风格支持所有格量词前向引用,如 PCRE 或 Java

I don't know if it's worth the effort but it's certainly an interesting task by use of regex.我不知道这是否值得付出努力,但使用正则表达式肯定是一项有趣的任务。 I found the biggest challenge to keep the start of the number below above words boundaries to the left and right.我发现最大的挑战是将数字的开头保持在左右单词边界以下。 In the following pattern I tried to only catch full numbers and prevent any partial matching.在以下模式中,我尝试只捕获完整数字并防止任何部分匹配。

^(?:.(?=.*\n(\1?+.)))*?(?=Size)(?:\w\B(?=.*\n\1?+(\2?+\D)))*+.*\n\1?+\2?+(?<![\d.])([\d.]+)
regex-part正则表达式部分 explained解释
^(?:.(?=.*\n(\1?+.)))*?(?=Size) captures substring from below line up to above word to $1捕获substring 从下面一行上面的单词到$1
the first group is growing at each repetition by one character第一组在每次重复时增长一个字符
(?:\w\B(?=.*\n\1?+(\2?+\D)))*+ captures any non-digits matching above words length to $2捕获任何匹配超过$2的单词长度的非数字
\B (non word boundary) prevents skipping over the margin \B (非字边界)防止跳过边距
.*\n\1?+\2?+(?<.[\d.])([\d.]+) consumes what is captured and capturing the number to $3消耗捕获的内容并将数字捕获到$3
the negative lookbehind prevents matching numbers partially否定的视阻止了部分匹配数字

See this demo at regex101 or a PHP demo at tio.run - The number will be found in the third group .请参阅 regex101 上的此演示或 tio.run 上的PHP 演示- 该号码将在第三组中找到。

Also works with .NET by getting around the possessive quantifiers using atomic groups ( C# demo ).还可以通过使用原子组绕过所有格量词来与 .NET 一起使用( C# 演示)。
In Notepad++ ([\d.]+) can be replaced with \K[\d.]+ to reset before and finding the numbers.在 Notepad++ ([\d.]+)可以替换为\K[\d.]+在之前重置并查找数字。


More about how it works can further be found in this answer about matching a letter below another .更多关于它是如何工作的可以在这个关于匹配一个字母下面的答案中找到。

One solution would be to count the index of 'Size' within the first header row of the output and then use that information to extract the value under 'Size':一种解决方案是计算 output 的第一个 header 行中“Size”的索引,然后使用该信息提取“Size”下的值:

(?<=(\w\s){1}?)(\d+.\d+)

In the example you provided, 'Size' is the second attribute in the row, so there is one word and a space preceding the value you desire (\w\s){1}, we also know that the value is a decimal (\d+.\d+).在您提供的示例中,'Size' 是行中的第二个属性,因此在您想要的值 (\w\s){1} 之前有一个单词和一个空格,我们也知道该值是一个小数 ( \d+.\d+)。 If there were 3 attributes, you would replace the 1 with a 2...如果有 3 个属性,您可以将 1 替换为 2...

Note: this solution assumes that every value under each attribute is a single word.注意:此解决方案假定每个属性下的每个值都是一个单词。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM