c＃正则表达式（RegEX）匹配的组无法返回匹配的字符

Question

The objective of my c# app is to extract 2 decimal values (latitude, longtitude) from a text document. 我的C＃应用程序的目标是从文本文档中提取2个十进制值（纬度，经度）。 I tried to apply a template to pick up those numerals. 我试图应用模板来提取这些数字。 It is an older app with Framework-3.5 platform. 这是带有Framework-3.5平台的较旧应用。

using System.Text.RegularExpressions;

String BB = "<span style=\"font-family:&quot;Times&quot;,&quot;serif&quot;\">\r\n<i>Lat</i>: 29.48434, <i>Long</i>: -81.562445 <o:p></o:p></span></p>\r\n</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<p class=\"MsoNormal\"><span style=\"font-family:&quot;Times&quot;,&quot;serif&quot;\"><o:p>&nbsp;</o:p></span></p>\r\n<table class=\"MsoNormalTable\" border=\"0\" cellpadding=\"0\">\r\n<tbody>\r\n<tr>\r\n<td style=\"padding:.75pt .75pt .75pt .75pt\">\r\n<p class=\"MsoNormal\"><b><span style=\"font-family:&quot;Times&quot;,&quot;serif&quot;\">Coordinates:</span></b><span style=\"font-family:&quot;Times&quot;,&quot;serif&quot;\">\r\n<i>Lat</i>: 29.48434, <i>Long</i>: -81.562445 <o:p></o:p></span></p>\r\n</td>";

string p2 = @".*Lat\D+(-*[0-9]+\.[0-9]+)\D+Lon\D+(-*[0-9]+\.[0-9]+)";

Console.WriteLine(p2);
foreach (Match collection in Regex.Matches(BB, p2)) {
    foreach ( Group gp in collection.Groups) {
        Console.WriteLine("Match group {0}", gp.Value);
    }
}

I expected the output of Group[2] should have the '-' sign before 81.562445 but it looks like it has dropped it even it matches the template "(-*[0-9]+.[0-9]+)" !!! 我希望Group [2]的输出在81.562445之前应该有'-'符号，但是即使它匹配模板“（-* [0-9] +。[0-9] +）”，它也似乎已经删除了它。 !!! Is there anything I can do to make the group show with the '-' sign? 我可以做些什么来使组显示为“-”号吗？

Answer 1

Your pattern is looking for non-digit characters ( \\D+ ) before the latitude and longitude values and the - is not a digit so it is captured. 您的模式正在寻找经度和纬度值之前的非数字字符（ \\D+ ），并且-不是数字，因此会被捕获。 To make the non-digit match non-greedy, use the a ? 要使非数字匹配非贪婪，请使用a ? after the sequence ( \\D+? ) making your final pattern 在序列（ \\D+? ）之后形成最终模式

string p2 = @".*Lat\D+?(-?[0-9]+\.[0-9]+)\D+Lon\D+?(-?[0-9]+\.[0-9]+)";

As for the comment about parsing the html node instead of matching with a regex, this is generally better but in this case it doesn't really gain you much as the inner text of the relevant elements turn out to be 至于关于解析html节点而不是与正则表达式匹配的注释，通常情况下更好，但是在这种情况下，由于相关元素的内部文本显示为

"\r\nLat: 29.48434, Long: -81.562445 "

and 和

"\r\n\r\n\r\n\r\nCoordinates:\r\nLat: 29.48434, Long: -81.562445 \r\n"

both of which require similar amounts of massaging to tease out the required data, likely with a regex anyway, unless an exact match can be expected with the remaining content. 两者都需要进行类似的按摩，以梳理所需的数据，无论如何都可能使用正则表达式，除非可以预期与其余内容完全匹配。

c＃正则表达式（RegEX）匹配的组无法返回匹配的字符

问题描述

1 个解决方案

解决方案1
2 2017-09-16 02:24:23

c＃正则表达式（RegEX）匹配的组无法返回匹配的字符

问题描述

1 个解决方案

解决方案1 2 2017-09-16 02:24:23

解决方案1
2 2017-09-16 02:24:23