简体   繁体   English

过滤完整字符串匹配但不过滤子字符串

[英]Filtering on full string match but not on substrings

So I've got a long string of numbers and characters and I'd like to filter out a substring. 所以我有一长串的数字和字符,我想过滤掉一个子字符串。 The thing I'm struggling with is that I need a full match on a certain value (starting with S) but this may not be matched in another value. 我正在努力的事情是我需要一个特定值的完全匹配(从S开始),但这可能与另一个值不匹配。

Input: 输入:

S10     1+0000000297472+00EURS100    1+0000000297472+00EURS1023P  1+0000000816072+00EUR

The input is exactly like this. 输入完全是这样的。

Breakdown of input: 输入细分:

S10     1+0000000297472+00EUR
  • Every part starts with a tag S and ends with EUR 每个部分都以标签S开头,以EUR结尾
  • There are spaces in between because every part has a fixed length 中间有空格,因为每个部分都有固定的长度

=> =>

  • index 0 : tag 'S' with length 1 index 0:标记'S',长度为1
  • index 1 : code with length 7 index 1:长度为7的代码
  • index 8 : numbertype with length 1 索引8:长度为1的numbertype
  • index 9 : sign with length 1 索引9:长度为1的符号
  • index 10 : value with length 13 index 10:长度为13的值
  • index 23 : sign with length 1 索引23:长度为1的符号
  • index 24 : exponent with length 2 index 24:长度为2的指数
  • index 26 : unit with length 3 index 26:长度为3的单位

I need to match on for example S10 and I only want this substring till EUR. 我需要匹配例如S10,我只想要这个子串直到EUR。 I don't want it to match on S100 or S1023P or any other combination. 我不希望它在S100或S1023P或任何其他组合上匹配。 Only on exactly S10 仅在S10上

Output: 输出:

S10     1+0000000297472+00EUR

I'm trying to use Regex to find my match on 'S + code'. 我正在尝试使用Regex在'S +代码'上找到我的匹配。 I'm doing a full match on my search query and then as soon as anything follows I don't want it anymore. 我正在对我的搜索查询进行完全匹配,然后只要有任何后续内容我就不再需要了。 But doing it like this also discards the actual match as after the S10 the value will follow which will match with [^\\d|^\\D])+\\w 但这样做也会丢弃实际的匹配,因为S10之后的值会跟随[^ \\ d | ^ \\ D])+ \\ w

 foreach (var field in fieldList)
 {
     var query = "S" + field.BallanceCode;                                
     var index = Regex.Match(values, Regex.Escape(query) + @"([^\d|^\D])+\w").Index;
 }

For example when looking for S10 例如,在寻找S10时

needs to match: 需要匹配:

S10 1+0000000297472+00EUR

may not match: 可能不匹配:

S10/15  1+0000001748447+00EUR 
S1023P  1+0000000816072+00EUR
S10000001+0000000546546+00EUR

Update: 更新:

Using this code 使用此代码

var index = Regex.Match(values, Regex.Escape(query) + @"\p{Zs}.*?EUR").Index; 

wil yield S10, S10/15, etc when looked for. 当寻找时,将产生S10,S10 / 15等。 However looking for S1000000 in the string doesn't work because there is no whitespace between the code and 1+ 但是在字符串中查找S1000000不起作用,因为代码和1+之间没有空格

S1000000 1+0000000546546+00EUR S1000000 1 + 0000000546546 + 00EUR

For example when looking for S1000000 例如,在寻找S1000000时

needs to match: 需要匹配:

S10000001+0000000297472+00EUR

may not match: 可能不匹配:

S10     1+0000001748447+00EUR 
S1023P  1+0000000816072+00EUR
S10/15  1+0000000546546+00EUR

You can use a regex that requires a space (or whitespace) to appear right after the field.BallanceCode : 您可以使用需要空格(或空格)的正则表达式出现在field.BallanceCode

var index = Regex.Match(values, Regex.Escape(query) + (field.BallanceCode.Length < 7 ? @"\p{Zs}" : "") + ".*?EUR").Index;

The regex will match the S10 , then any horizontal whitespace ( \\p{Zs} ), then any 0 or more characters other than a newline (as few as possible due to *? ) up to the first EUR . 正则表达式将匹配S10 ,然后是任何水平空格( \\p{Zs} ),然后是换行符以外的任何0个或更多字符(由于*?而尽可能少)直到第一个EUR

The (field.BallanceCode.Length < 7 ? @"\\p{Zs}" : "") check is necessary to support a 7-digit BallanceCode . (field.BallanceCode.Length < 7 ? @"\\p{Zs}" : "")检查是支持7位BallanceCode所必需的。 If it contains 7 digits or more, we do not check if there is a whitespace after it. 如果它包含7位数或更多,我们不会检查它后面是否有空格。 If the length is less than 7, we check for a space. 如果长度小于7,我们检查空间。

So you just want the start (S...) and end (...EUR) of each line and skip everything in between? 所以你只想要每行的开始(S ...)和结束(... EUR)并跳过它们之间的所有内容?

^([sS]\d+).*?([\d\+]+EUR)$

http://regexr.com/3c1ob http://regexr.com/3c1ob

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM