简体   繁体   中英

REGEXP_EXTRACT in hive to get the substring of a string

Hi i am new to hive i am using regexp_extract for getting substring from a string

my string is '/abc/def/ghi/'

how will get abc or def or ghi using regexp_extract function

Remove leading and trailing '/' and use split() to get an Array. split() is also using regexp:

hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[0];

abc

hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[1];

def

hive> select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/')[2];

ghi

Or in a subquery:

hive> select array[0], array[1], array[2] 
      from (select split(regexp_replace('/abc/def/ghi/','^/|/$',''),'/') as array) s;
OK
_c0     _c1     _c2
abc     def     ghi
Time taken: 0.192 seconds, Fetched: 1 row(s)

We can use regexp_extract by providing a pattern with capture groups targeting what we want to match. Then, we can specify which group should serve as the replacement.

As an example, to find the content between the second and third path separators, we can try:

regexp_extract('/abc/def/ghi/', '/[^/]+/([^/]+).*', 1)

Note: The above is untested and may give error should it be necessary to escape the forward slashes. In that case, use the following:

regexp_extract('/abc/def/ghi/', '\/[^\/]+\/([^\/]+).*', 1)

Below would be the query.

   hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',1);
OK
abc123.
Time taken: 0.103 seconds, Fetched: 1 row(s)
hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',2);
OK
def456.
Time taken: 0.1 seconds, Fetched: 1 row(s)
hive> select regexp_extract('/abc123./def456./ghi789/', '\/([\\w\\d.]*)\/([\\w\\d.]*)\/([\\w\\d.]*)',3);
OK
ghi789
Time taken: 0.124 seconds, Fetched: 1 row(s)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM