简体   繁体   中英

How to get Substring in Hadoop Hive?

My question is how to get a substring in Hive by an indication in the string. My Column values format is like this:

/Country/State/City/Suburb/Street

here I only need to get Country.

I have fond that SPLIT which returns an array of string delimited by '/'. And also SUBSTR(string a, int begin) which returns a substring from indicated begin.

In split I need to again access an array in which the first element is the desired one, but just want to know if there is any other easier way to get the Countries.

thanks

I tried with regular expression to extract Country . Using regular expression hive query is:

select regexp_extract(column,'\/(.*)/.*/.*/.*/',1) from substring_tbl;

My create table statement:

create external table substring_tbl(
column string)
LOCATION '/user/root/hive_substring/';

Your Input Data:

/Country/State/City/Suburb/Street

Query and regular expression to extract desired data:

select regexp_extract(column,'\/(.*)/.*/.*/.*/',1) from substring_tbl;

Output:

Country

Info: regexp_extract() returns the string extracted using the pattern. More detail about regexp_extract() is available on hive LanguageManual+UDF

But in case,if you change your input data into different form then you must change your regular expression too.

UPDATE1

Query using split() function to extract desired data.

select split(column, '\\/')[1] from substring_tbl;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM