简体   繁体   中英

Regex for extracting part of a file path

I am using regex function in Impala to find the folder name in file path but it doesn't seem to give me correct result

I want to parse out "one" from this file path:

/this/one/path/to/hdfs

This is the regex which I used:

regexp_extract(filepath,'[/]+',0)

If here we wish to capture the / , then we might just want to try ([\\/]+) . There should be other expressions to extract one also, such as:

(?:\/[a-z]+\/)(.+?)(?:\/.+)

and our code might look like:

regexp_extract(filepath, '(?:\/[a-z]+\/)(.+?)(?:\/.+)', 2)

or

regexp_extract(filepath, '(?:\/.+?\/)(.+?)(?:\/.+)', 2)

Compartments

In this case, we are not capturing what is behind one using a non-capturing group:

(?:\/[a-z]+\/)

then we capture one using:

(.+?)

and finally we add a right boundary after one in another non-capturing group:

(?:\/.+)

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图片说明

DEMO

Depending on which slash, one might be located, we can modify our expression. For example, in this case, this expression also might be working:

(?:\/.+?\/)(.+?)(?:\/.+)

DEMO

The latest Impala versions use RE2 regex library , and you may easily access capturing group values using the third argument in the regex_extract function .

Use the following regex:

^/[^/]+/([^/]+)

See the regex demo (note that Go regex flavor is also RE2, that is why this option is selected at regex101). It matches

  • ^ - start of string
  • / - a / char (no regex delimiters in Impala regex string, hence no need to escape / chars in the pattern)
  • [^/]+ - any 1 or more chars other than /
  • / - a / char
  • ([^/]+) - Capturing group 1 (to get it, the index argument must be set to 1 ): any 1 or more chars other than /

Code:

regexp_extract(filepath, '^/[^/]+/([^/]+)', 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM