简体   繁体   English

正则表达式在两个字符串之间查找字符串,不包括外部字符串

[英]Regex to find string between two strings, excluding outer strings

I know this has been asked a thousand times before, but I could not get any of the previous solutions working for my case. 我知道这个问题已经被问过一千遍了,但是我无法获得任何适用于我的案例的解决方案。 I'm trying to use Regex in Javascript to parse a text file. 我正在尝试在Javascript中使用Regex解析文本文件。 The bit I'm trying to extract is the monetary figure, with a format like 55,555.00. 我尝试提取的是货币数字,格式为55,555.00。 The numbers of digits here can vary throughout the text file. 此处的位数在整个文本文件中可能有所不同。 Additionally, the boundary characters and spaces can vary. 此外,边界字符和空格可以变化。

I wrote the following to extract what I need from the sample code below: 我编写了以下内容以从下面的示例代码中提取所需的内容:

/((\w\s{10,20})([0-9]{8,}(?=.*[,.]))/g

sample code: 样例代码:

                  23205        - Grants Current-County Operatin                        4,425,327.00"

"    4   0000047387         Central Equatoria State          1003-1478 Sta Hosp Oper Oct                   85,784.00"
"    4   0000047442         EASTERN EQUATORIA ST             1003-1479 Sta Hosp Oper Oct                   93,137.00"
"    4   0000047485         JONGLEI STATE                    1003-1519 Sta Hosp Oper Oct                  144,608.00"
"    4   0000047501         Lakes State                      1003-1482 Sta Hosp Oper Oct                   93,137.00"
"    4   0000047528         Unity State                      1003-1484 Sta Hosp Oper Oct                   75,980.00"
"    4   0000047532         Northern Bahr-el State           1003-1483 Sta Hosp Oper Oct                   58,824.00"
"    4   0000047615         Western E State                  1003-1488 Sta Hosp Oper Oct                   93,137.00"
"    4   0000047638         Warap State                      1003-1486 Sta Hosp Oper Oct                   51,471.00"
"    4   0000047680         Upper Nile State                 1003-1485 Sta Hosp Oper Oct                  102,941.00"
"    4   0000047703         Western BG State                 1003-1487 Sta Hosp Oper Oct                   34,314.00"
                                                                                             ----------------------
"        Total For Period          4                                                                      833,333.00"
 ----------------------------------------------------------------------------------------------------------------------------
 Fiscal Year        2015/16                               Republic Of South Sudan                         Date     2015/11/20
 Period                   5                                                                               Time       12:58:40
                                                  FreeBalance Financial Management System                 Page              7
 ----------------------------------------------------------------------------------------------------------------------------
                                                            Vendor Analysis Report

                                                              1091 Health (MOH)
  Prd   Voucher #          Vendor Name                      Description                          Amount
  ---   ----------------   ------------------------------   -----------------------------    ----------------------
                                                                                             ----------------------
"  

Here's an example: https://regex101.com/r/nO8nM1/4 这是一个示例: https : //regex101.com/r/nO8nM1/4

The issue is the leading boundary. 问题是主要的边界。 I am able to exclude the closing boundary (double quotes), but I can't get rid of the leading boundary. 我可以排除右边界(双引号),但不能摆脱前导边界。 I've gotten a couple things sort of working, but they included the two strings of digits outside the main tables (in this case 4,425,327.00 and 833,333.00). 我已经做了一些工作,但是它们在主表之外包括了两个数字串(在本例中为4,425,327.00和833,333.00)。

Any help would be much appreciated. 任何帮助将非常感激。

To match float values with obligatory decimal fractions and , as a digit grouping symbol, you can use 要将浮点值与必需的小数点和匹配,作为数字分组符号,可以使用

\d+(?:,\d{3})*\.\d+

See demo 观看演示

Explanation : 说明

  • \\d+ - 1 or more digits \\d+ -1个或更多数字
  • (?:,\\d{3})* - 0 or more sequences of (?:,\\d{3})* -0个或多个序列
    • , - a comma , -逗号
    • \\d{3} - exactly 3 digits \\d{3} -正好3位数字
  • \\. - a literal period/dot -文字周期/点
  • \\d+ - 1 or more digits. \\d+ -1个或多个数字。

To only get the values that appear after Oct , you may use a regex that is a mix of the pattern above and yours: 要仅获取Oct之后出现的值,可以使用将上述模式和您的模式混合使用的正则表达式:

\w\s{10,20}(\d+(?:,\d{3})*\.\d+)

See another demo 观看另一个演示

The \\w\\s{10,20} matches an alphanumeric \\w and then 10 to 20 whitespace characters, and only after that the pattern matches and captures into Group 1 the float value. \\w\\s{10,20}匹配一个字母数字\\w ,然后匹配10到20个空格字符,只有在该模式匹配并将 float值捕获到组1中之后,该字符才匹配。

See JS snippet below ( m[1] is where the float value resides): 请参见下面的JS代码段( m[1]是float值所在的位置):

 var re = /\\w\\s{10,20}(\\d+(?:,\\d{3})*\\.\\d+)/gm; var str = ' 23205 - Grants Current-County Operatin 4,425,327.00"\\n\\n" 4 0000047387 Central Equatoria State 1003-1478 Sta Hosp Oper Oct 85,784.00"\\n" 4 0000047442 EASTERN EQUATORIA ST 1003-1479 Sta Hosp Oper Oct 93,137.00"\\n" 4 0000047485 JONGLEI STATE 1003-1519 Sta Hosp Oper Oct 144,608.00"\\n" 4 0000047501 Lakes State 1003-1482 Sta Hosp Oper Oct 93,137.00"\\n" 4 0000047528 Unity State 1003-1484 Sta Hosp Oper Oct 75,980.00"\\n" 4 0000047532 Northern Bahr-el State 1003-1483 Sta Hosp Oper Oct 58,824.00"\\n" 4 0000047615 Western E State 1003-1488 Sta Hosp Oper Oct 93,137.00"\\n" 4 0000047638 Warap State 1003-1486 Sta Hosp Oper Oct 51,471.00"\\n" 4 0000047680 Upper Nile State 1003-1485 Sta Hosp Oper Oct 102,941.00"\\n" 4 0000047703 Western BG State 1003-1487 Sta Hosp Oper Oct 34,314.00"\\n ----------------------\\n" Total For Period 4 833,333.00"\\n ----------------------------------------------------------------------------------------------------------------------------\\n Fiscal Year 2015/16 Republic Of South Sudan Date 2015/11/20\\n Period 5 Time 12:58:40\\n FreeBalance Financial Management System Page 7\\n ----------------------------------------------------------------------------------------------------------------------------\\n Vendor Analysis Report\\n\\n 1091 Health (MOH)\\n Prd Voucher # Vendor Name Description Amount\\n --- ---------------- ------------------------------ ----------------------------- ----------------------\\n ----------------------\\n" '; var m; while ((m = re.exec(str)) !== null) { document.getElementById("r").innerHTML += m[1] + "<br/>"; } 
 <div id="r"/> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式在字符串中搜索两个字符串之间的内容 - regex search a string for contents between two strings 在 Javascript 或 jQuery 中的两个字符串之间查找字符串 - Find string between two strings in Javascript or jQuery 正则表达式以获取两个字符串之间的所有字符串 - RegEx to get ALL Strings between two Strings 正则表达式查找包含在两个不同字符串之间的文本及其在输入字符串中的位置 - Regex to find text contained between two different strings and their position in the input string 匹配正则表达式字符串不在2个字符串之间 - Match regex string not between 2 strings 如何在两个字符串之间使用正则表达式匹配和查找原始文件并获取整个字符串? - How to match and find in a raw file with regex between two strings and get the whole strings between? 用javascript中的正则表达式匹配其他两个字符串之间的字符串 - Match a string between two other strings with regex in javascript 正则表达式JS:两个字符串之间的匹配字符串,包括换行符 - Regex JS: Matching string between two strings including newlines 正则表达式捕获两个字符串,多行之间的字符串 - Regex catch string between two strings, multiple lines 正则表达式来匹配其他两个字符串之间的字符串 - Regex expression to match string between two other strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM