简体   繁体   English

如何在正则表达式中考虑特殊的非ASCII字符

[英]How to account for special non ASCII characters in regex

I dont know if this is the issue but I can't seem to get this to match. 我不知道这是否是问题,但我似乎无法得到这个匹配。

String [] seTab3_HighRes=null;

public Map<String, String> tab3HighResRegex(String x, Map<String,String> map) {

Pattern Tab3_HighRes_pattern = Pattern.compile("High Resolution Parameters:(.*?Intrabolus pressure)",Pattern.DOTALL);
Matcher matcherTab3_HighRes_pattern = Tab3_HighRes_pattern.matcher(x);


while (matcherTab3_HighRes_pattern.find()) {
    System.out.println("Anything here? Nope");
    seTab3_HighRes=matcherTab3_HighRes_pattern.group(1).split("\\n|\\r");
    }
}

The text is: 案文是:

 High Resolution Parameters:
    Intrabolus pressure (@LESR)(mmHg):-3.7 <8.4
    Some other stff: 123
    Intrabolus pressure (avg max)(mmHg):8.3 <17.0

I looked a bit more into the text and noticed there's a ^G character at the end of High Resolution Parameters: when I paste the text into textpad. 我在文本中看了一下,注意到High Resolution Parameters:末尾有一个^G字符High Resolution Parameters:当我将文本粘贴到textpad中时。 What is it and is that the reason I'm not getting a match (and how to get rid of it? 它是什么,是因为我没有得到匹配(以及如何摆脱它?

Description 描述

You could simply just match the ^G control G with \\cG , 你可以简单地将^G控制G与\\cG

This regex does the following: 这个正则表达式执行以下操作:

  • Matches the High Resolution Parameters: 匹配High Resolution Parameters:
  • finds the first Intrabolus pressure 找到第一个Intrabolus pressure
  • pulls the substring after the Intrabolus pressure ... : Intrabolus pressure ... :后拉出子串Intrabolus pressure ... :

The regex 正则表达式

High\sResolution\sParameters:(?:\cG|[\n\r\s])*(?:Intrabolus\spressure)[^:]*:([^\n]*)

正则表达式可视化

Example

https://regex101.com/r/pE5aI0/1 https://regex101.com/r/pE5aI0/1

Explanation 说明

  • Capture Group 0 gets the entire string Capture Group 0获取整个字符串
  • Capture Group 1 gets the Intrabolus pressure value Capture Group 1获得Intrabolus pressure

Expanded 扩展

NODE                     EXPLANATION
----------------------------------------------------------------------
  High                     'High'
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  Resolution               'Resolution'
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  Parameters:              'Parameters:'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \cG                      ^G
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [\n\r\s]                 any character of: '\n' (newline), '\r'
                             (carriage return), whitespace (\n, \r,
                             \t, \f, and " ")
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    Intrabolus               'Intrabolus'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    pressure                 'pressure'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  [^:]*                    any character except: ':' (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^\n]*                   any character except: '\n' (newline) (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM