简体   繁体   English

Hive 提取文本之前<br>

[英]Hive extract text before <br>

In Apache hive how can i extract the substring from a given string.在 Apache hive 中,如何从给定的字符串中提取子字符串。 I have a column containing the below value.我有一列包含以下值。

I need to extract ABC3170 from the below string ie till <br>我需要从下面的字符串中提取 ABC3170,即直到<br>

my data looks some what like below.我的数据看起来有点像下面。 I want to get all the records and eliminate text after <BR>我想获取所有记录并消除<BR>之后的文本
Col1第 1 列
---------
G3333 G3333
XYZD20 XYZD20
5289 5289
ABC2620 ABC2620
CDF-B700S CDF-B700S
CUSTOM MANAGER客户经理
ABC3170 <BR></DIV><DIV DIR="AUTO" STYLE="DIRECTION: LTR; MARGIN: 0; PADDING: 0; FONT-FAMILY: SANS-SE ABC3170 <BR></DIV><DIV DIR="AUTO" STYLE="DIRECTION: LTR; MARGIN: 0; PADDING: 0; FONT-FAMILY: SANS-SE

Use Regexp_Extract function and matching java regex to extract the value before <BR>.使用Regexp_Extract函数和匹配的 java regex 提取<BR>.之前的值<BR>.

Regex Expression:正则表达式:

(.*?)\\s+<BR> //capture without space until <BR>

Hive Query:蜂巢查询:

hive> select regexp_extract(<column.name>,"(.*?)\\s+<BR>",1) from <db.name>.<tab.name>;

Example:例子:

hive> select regexp_extract(txt,"(.*?)\\s+<BR>",1),txt from i;
+----------+---------------------------------------------------------------------------------------------------------------------------------------------+--+
|   _c0    |                                                                     txt                                                                     |
+----------+---------------------------------------------------------------------------------------------------------------------------------------------+--+
| ABC3170  | ABC3170 <BR></DIV><DIV DIR="AUTO" STYLE="DIRECTION: LTR; MARGIN: 0; PADDING: 0; FONT-FAMILY: SANS-SERIF; FONT-SIZE: 11PT; COLOR: BLACK; ">  |
+----------+---------------------------------------------------------------------------------------------------------------------------------------------+--+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM