Regex for extracting part of a file path

Question

I am using regex function in Impala to find the folder name in file path but it doesn't seem to give me correct result

I want to parse out "one" from this file path:

/this/one/path/to/hdfs

This is the regex which I used:

regexp_extract(filepath,'[/]+',0)

Answer 1

If here we wish to capture the / , then we might just want to try ([\\/]+) . There should be other expressions to extract one also, such as:

(?:\/[a-z]+\/)(.+?)(?:\/.+)

and our code might look like:

regexp_extract(filepath, '(?:\/[a-z]+\/)(.+?)(?:\/.+)', 2)

or

regexp_extract(filepath, '(?:\/.+?\/)(.+?)(?:\/.+)', 2)

Compartments

In this case, we are not capturing what is behind one using a non-capturing group:

(?:\/[a-z]+\/)

then we capture one using:

(.+?)

and finally we add a right boundary after one in another non-capturing group:

(?:\/.+)

RegEx Circuit

jex.im visualizes regular expressions:

DEMO

Depending on which slash, one might be located, we can modify our expression. For example, in this case, this expression also might be working:

(?:\/.+?\/)(.+?)(?:\/.+)

DEMO

Answer 2

The latest Impala versions use RE2 regex library , and you may easily access capturing group values using the third argument in the regex_extract function .

Use the following regex:

^/[^/]+/([^/]+)

See the regex demo (note that Go regex flavor is also RE2, that is why this option is selected at regex101). It matches

^ - start of string
/ - a / char (no regex delimiters in Impala regex string, hence no need to escape / chars in the pattern)
[^/]+ - any 1 or more chars other than /
/ - a / char
([^/]+) - Capturing group 1 (to get it, the index argument must be set to 1 ): any 1 or more chars other than /

Code:

regexp_extract(filepath, '^/[^/]+/([^/]+)', 1)

Regex for extracting part of a file path

Question

2 answers

solution1
2 ACCPTED 2019-05-26 00:54:42

Compartments

RegEx Circuit

DEMO

DEMO

solution2
1 2019-06-03 09:27:22

Regex for extracting part of a file path

Question

2 answers

solution1 2 ACCPTED 2019-05-26 00:54:42

Compartments

RegEx Circuit

DEMO

DEMO

solution2 1 2019-06-03 09:27:22

solution1
2 ACCPTED 2019-05-26 00:54:42

solution2
1 2019-06-03 09:27:22