简体   繁体   English

regexp_extract 在 Hive 中查找值

[英]regexp_extract to find the value in Hive

I'm new to regexp_extract and need to split the column on / and then pick the 3rd value.我是regexp_extract ,需要在/上拆分列,然后选择第三个值。 For example, from例如,从

application/motorola products/routers 

would want to get routers .想拿routers If there is no 3rd value, then we need to fall back on the 2nd value, which would be motorola products .如果没有第三个值,那么我们需要回退到第二个值,这将是motorola products I tried the following regex pattern but it doesn't work:我尝试了以下正则表达式模式,但它不起作用:

(.*?\/)(.*?\/)(.*?)(\/.*\/)

You are saying a single character is optional.您是说单个字符是可选的。 Give the .. a quantifier * or + .量词*+ I think this regex would actually be better:我认为这个正则表达式实际上会更好:

(?:([^\/]+?\/)([^\/]+?)\/([^\/]*)|([^\/]+?\/)([^\/]+))

Demo: https://regex101.com/r/dX6uQ9/2演示: https : //regex101.com/r/dX6uQ9/2

I haven't worked with/don't have hive so can't confirm this will work but I think it should put you in a closer direction.我没有和/没有hive一起工作,所以不能确认这会起作用,但我认为它应该让你走得更近。

It sounds like you just want the last value, meaning whatever is after the last / .听起来您只想要最后一个值,即最后一个/之后的值。 The regex for that would be [^/]+$ :正则表达式为[^/]+$

select regexp_extract(name, '[^/]+$', 0) from dummy;

If there are two slashes, you get the third value.如果有两个斜杠,您将获得第三个值。 If there are five slashes, you get the sixth value.如果有五个斜线,您将获得第六个值。

If you want to stop at the third value even if there are more than two slashes, you can use this:如果你想在第三个值处停止,即使有两个以上的斜杠,你可以使用这个:

select regexp_extract(name, '^(?:[^/]+/){0,2}([^/]+)', 1) from dummy;

The index argument, 1 , makes it extract what was matched in the first capturing group, ([^/]+) .索引参数1使其提取第一个捕获组([^/]+)匹配的内容。

Note: I'm assuming the complete value won't start or end with a slash, like /motorola products/routers or application/motorola products/ .注意:我假设完整的值不会以斜线开头结尾,例如/motorola products/routersapplication/motorola products/

select split('application/motorola products/routers','/')[size(split('application/motorola products/routers','/'))-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM