I want to extract a word between '/bla-bla-bla/' and 'a12345' in the URL, which is "this-is-the-word"
using regexp_extract in Hive.
INPUT: www.website.com/bla-bla-bla/this-is-the-word.a12345.anotherword.blabla
DESIRED OUTPUT: this-is-the-word
I've tried below, but none of them worked. What RegEx will achieve my desired output from this input?
regexp_extract(URL,'^.*[/]bla[-]bla[-]bla[/]([a-z]+)\\.(a([0-9]+))*$',1)
regexp_extract(URL,'^.*[/]bla-bla-bla[/]([a-z]*)[.]a([0-9]+)*$',1)
You may use
regexp_extract(URL,'^.*/bla-bla-bla/([^/.]+)\.a[0-9].*$', 1)
See this regex demo
It matches
^
- start of string .*
- any 0+ chars other than line break chars, as many as possible /bla-bla-bla/
- a literal /bla-bla-bla/
substring ([^/.]+)
- Group 1 (what you will get since the next argument is 1
): 1 or more chars other than /
and .
\\.a
- a .a
substring [0-9]
- a digit .*$
- the rest of the string to its end.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.