简体   繁体   English

正则表达式解析至少20个字符的文本,字符串中不带空格

[英]Regex to parse at least 20 char text without spaces from string

i need to parse a code from hand inputed string field in SQL query for Oracle DB which can look something lik this: 我需要在SQL查询中为Oracle DB解析手工输入的字符串字段中的代码,它看起来像这样:

"i would ?! like / to make * bb a8 001/XYZ/0002/65432178 thank you very much friends" “我愿意吗?!想/做* bb a8 001 / XYZ / 0002/65432178非常感谢您的朋友”

Number of words totally differs each row only thing that is the same are the spaces between the words. 每一行的单词数完全不同,只有相同的是单词之间的空格。 Somewhere in the middle (but can be also very beginning or very end) of the string is CODE with variable length (always at least 20 chars) that i need to parse out - code is always without spaces and divided from the rest of the random text by spaces. 字符串中间的某个位置(但也可能是开头或结尾)是我需要解析的长度可变的CODE(总是至少20个字符)-代码始终没有空格,并且与其余部分分开文字按空格。 I need to parse out just the code cutting all the words. 我只需要解析掉所有单词的代码。 So only way how to identify the code in my opinion is that it sould be at least 20 char sequence without space. 因此,在我看来,如何识别代码的唯一方法是将代码至少包含20个字符,且没有空格。 Can you recommed regex to do this kind of thing? 您可以推荐regex做这种事情吗? Thank you very much 非常感谢你

so i expect to get string like this "001/XYZ/0002/65432178" 所以我希望得到像这样的字符串“ 001 / XYZ / 0002/65432178”

You can just look for a sequence of 20+ instances of anything except a space: 您只需要查找20多个实例的序列即可,除了空格:

select regexp_substr(
  'i would ?! like / to make * bb a8 001/XYZ/0002/65432178 thank you very much friends',
  '[^ ]{20,}') as result
from dual;

RESULT               
---------------------
001/XYZ/0002/65432178

[^ ] is a pattern that excludes spaces; [^ ]是排除空格的模式; {20,} means that has to be repeated a minimum of 20 times, with no maximum. {20,}表示必须重复最少20次,最多没有重复。

If you want to exclude any whitespace - in case, for instance, there's a tab instead of a space immediately before or after the part you want - you can use a character class instead: 如果要排除任何空格(例如,在需要的部分之前或之后有一个制表符而不是空格),则可以使用字符类代替:

regexp_substr(<your string>, '[^[:space:]]{20,}')

As @MTO points out, these will match the first 20-character string within the value, and it's feasible your user-inputted text could contain long non-code values that you don't really want to see. 正如@MTO所指出的那样,它们将匹配值中的前20个字符的字符串,并且用户输入的文本可能包含您不希望看到的长非代码值,这是可行的。 It would be better if you could match on an expected pattern for the code. 如果可以匹配代码的预期模式,那就更好了。

If you will never have any words longer than 20 characters then you can naively use: 如果您的单词永远不会超过20个字符,那么您可以天真地使用:

SELECT REGEXP_SUBSTR( value, '\S{20,}' ) AS code,
       value
FROM   data d;

However, if you have words longer than 20 characters such as: 但是,如果单词长度超过20个字符,例如:

CREATE TABLE data ( value ) AS
SELECT 'long words like floxinoxinihilipilification and antidisestablishmentarianism with your code 001/XYZ/0002/65432178' FROM DUAL UNION ALL
SELECT 'i would ?! like / to make * bb a8 001/XYZ/0002/65432178 thank you very much friends' FROM DUAL;

Then the above code outputs: 然后上面的代码输出:

CODE                        | VALUE                                                                                                            
:-------------------------- | :----------------------------------------------------------------------------------------------------------------
floxinoxinihilipilification | long words like floxinoxinihilipilification and antidisestablishmentarianism with your code 001/XYZ/0002/65432178
001/XYZ/0002/65432178       | i would ?! like / to make * bb a8 001/XYZ/0002/65432178 thank you very much friends

Instead you could try to do something like returning the word with more than 20 characters which also has the greatest number of / characters: 相反,您可以尝试执行类似的操作,例如返回包含20个以上字符的单词,其中该字符也包含最多/字符:

SELECT ( SELECT MAX( REGEXP_SUBSTR( d.value, '\S{20,}', 1, LEVEL ) ) KEEP ( DENSE_RANK LAST ORDER BY REGEXP_COUNT( REGEXP_SUBSTR( d.value, '\S{20,}', 1, LEVEL ), '/' ) )
         FROM   DUAL
         CONNECT BY LEVEL <= REGEXP_COUNT( d.value, '\S{20}' )
       ) AS code,
       value
FROM   data d;

Which outputs: 哪个输出:

CODE                  | VALUE                                                                                                            
:-------------------- | :----------------------------------------------------------------------------------------------------------------
001/XYZ/0002/65432178 | long words like floxinoxinihilipilification and antidisestablishmentarianism with your code 001/XYZ/0002/65432178
001/XYZ/0002/65432178 | i would ?! like / to make * bb a8 001/XYZ/0002/65432178 thank you very much friends

db<>fiddle here db <> 在这里拨弄

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM