[英]Regex for extracting part of a file path
I am using regex function in Impala to find the folder name in file path but it doesn't seem to give me correct result 我在Impala中使用正则表达式函数在文件路径中找到文件夹名称,但似乎无法给我正确的结果
I want to parse out "one" from this file path: 我想从此文件路径中解析出“一个”:
/this/one/path/to/hdfs
This is the regex which I used: 这是我使用的正则表达式:
regexp_extract(filepath,'[/]+',0)
If here we wish to capture the /
, then we might just want to try ([\\/]+)
. 如果在这里我们希望捕获
/
,那么我们可能只想尝试([\\/]+)
。 There should be other expressions to extract one
also, such as: 还应该有其他表达式来提取
one
表达式,例如:
(?:\/[a-z]+\/)(.+?)(?:\/.+)
and our code might look like: 我们的代码可能看起来像:
regexp_extract(filepath, '(?:\/[a-z]+\/)(.+?)(?:\/.+)', 2)
or 要么
regexp_extract(filepath, '(?:\/.+?\/)(.+?)(?:\/.+)', 2)
In this case, we are not capturing what is behind one
using a non-capturing group: 在这种情况下,我们没有捕捉背后是什么
one
使用非捕获组:
(?:\/[a-z]+\/)
then we capture one
using: 然后我们使用以下命令捕获
one
:
(.+?)
and finally we add a right boundary after one
in another non-capturing group: 最后我们后面添加一个右边界
one
在另一个非捕获组:
(?:\/.+)
jex.im visualizes regular expressions: jex.im可视化正则表达式:
Depending on which slash, one
might be located, we can modify our expression. 根据该斜线,
one
可能的位置,我们可以修改我们的表达。 For example, in this case, this expression also might be working: 例如,在这种情况下,此表达式也可能有效:
(?:\/.+?\/)(.+?)(?:\/.+)
The latest Impala versions use RE2 regex library , and you may easily access capturing group values using the third argument in the regex_extract
function . 最新的Impala版本使用RE2正则表达式库 ,您可以使用
regex_extract
函数中的第三个参数轻松访问捕获组值。
Use the following regex: 使用以下正则表达式:
^/[^/]+/([^/]+)
See the regex demo (note that Go regex flavor is also RE2, that is why this option is selected at regex101). 请参阅regex演示 (请注意,Go regex风味也是RE2,这就是为什么在regex101中选择此选项的原因)。 It matches
它匹配
^
- start of string ^
-字符串的开头 /
- a /
char (no regex delimiters in Impala regex string, hence no need to escape /
chars in the pattern) /
-a /
char(Impala正则表达式字符串中没有正则表达式分隔符,因此无需在模式中转义/
chars) [^/]+
- any 1 or more chars other than /
[^/]+
- /
以外的1个或多个字符 /
- a /
char /
-一/
炭 ([^/]+)
- Capturing group 1 (to get it, the index
argument must be set to 1
): any 1 or more chars other than /
([^/]+)
-捕获组1(要获取它,必须将index
参数设置为1
): /
以外的1个或多个字符 Code: 码:
regexp_extract(filepath, '^/[^/]+/([^/]+)', 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.