简体   繁体   English

正则表达式,用于提取文件路径的一部分

[英]Regex for extracting part of a file path

I am using regex function in Impala to find the folder name in file path but it doesn't seem to give me correct result 我在Impala中使用正则表达式函数在文件路径中找到文件夹名称,但似乎无法给我正确的结果

I want to parse out "one" from this file path: 我想从此文件路径中解析出“一个”:

/this/one/path/to/hdfs

This is the regex which I used: 这是我使用的正则表达式:

regexp_extract(filepath,'[/]+',0)

If here we wish to capture the / , then we might just want to try ([\\/]+) . 如果在这里我们希望捕获/ ,那么我们可能只想尝试([\\/]+) There should be other expressions to extract one also, such as: 还应该有其他表达式来提取one表达式,例如:

(?:\/[a-z]+\/)(.+?)(?:\/.+)

and our code might look like: 我们的代码可能看起来像:

regexp_extract(filepath, '(?:\/[a-z]+\/)(.+?)(?:\/.+)', 2)

or 要么

regexp_extract(filepath, '(?:\/.+?\/)(.+?)(?:\/.+)', 2)

Compartments 车厢

In this case, we are not capturing what is behind one using a non-capturing group: 在这种情况下,我们没有捕捉背后是什么one使用非捕获组:

(?:\/[a-z]+\/)

then we capture one using: 然后我们使用以下命令捕获one

(.+?)

and finally we add a right boundary after one in another non-capturing group: 最后我们后面添加一个右边界one在另一个非捕获组:

(?:\/.+)

RegEx Circuit RegEx电路

jex.im visualizes regular expressions: jex.im可视化正则表达式:

在此处输入图片说明

DEMO DEMO

Depending on which slash, one might be located, we can modify our expression. 根据该斜线, one可能的位置,我们可以修改我们的表达。 For example, in this case, this expression also might be working: 例如,在这种情况下,此表达式也可能有效:

(?:\/.+?\/)(.+?)(?:\/.+)

DEMO DEMO

The latest Impala versions use RE2 regex library , and you may easily access capturing group values using the third argument in the regex_extract function . 最新的Impala版本使用RE2正则表达式库 ,您可以使用regex_extract函数中的第三个参数轻松访问捕获组值。

Use the following regex: 使用以下正则表达式:

^/[^/]+/([^/]+)

See the regex demo (note that Go regex flavor is also RE2, that is why this option is selected at regex101). 请参阅regex演示 (请注意,Go regex风味也是RE2,这就是为什么在regex101中选择此选项的原因)。 It matches 它匹配

  • ^ - start of string ^ -字符串的开头
  • / - a / char (no regex delimiters in Impala regex string, hence no need to escape / chars in the pattern) / -a / char(Impala正则表达式字符串中没有正则表达式分隔符,因此无需在模式中转义/ chars)
  • [^/]+ - any 1 or more chars other than / [^/]+ - /以外的1个或多个字符
  • / - a / char / -一/
  • ([^/]+) - Capturing group 1 (to get it, the index argument must be set to 1 ): any 1 or more chars other than / ([^/]+) -捕获组1(要获取它,必须将index参数设置为1 ): /以外的1个或多个字符

Code: 码:

regexp_extract(filepath, '^/[^/]+/([^/]+)', 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM