简体   繁体   中英

Extract patterned text from a string with Hive

I have data in a column that looks like this:

  • Countryside_Video_-_A18-49_Pub_- Q3 -_Flight_7_18_49_BOTH

  • Countryside Video - M18-25 Validated -Q4 - Flight 1

  • PremiumBrand_2019_Upfront_Video_-_W18-49_Validated_-_Q4_Flight_1_18_49_FEMALE

  • Travel Around the World - W25-54 Validated - Q3 25-54_FEMALE

I need to extract the age and gender value from each string:

  • A18-49
  • M18-25
  • W18-49
  • W25-54

It's tricky, because there could be any number of combinations between the letters A,M,F and a number range. The letters signify Age, Male, or Female. The number range is the age range.

From some googling, it looks like I might be able to use a regexp_extract function, but I'm a novice to Hive. Any help on this would be greatly appreciated!

我手头没有 Hive 可以测试,但这可能有效:

select regexp_extract(col, '([AMW][0-9]{2}[-][0-9]{2})', 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM