在 Hive SQL 中提取具有特定模式的 substring

Question

I have a column with this sample data.我有一列包含此示例数据。 I need to extract all substring that starts with "M6".我需要提取所有以“M6”开头的 substring。 Is there a way to do it with regexp_extract?有没有办法用 regexp_extract 做到这一点？

Data Column数据栏
HEY01230328_M6K21SG_UNO_NYC_241 HEY01230328_M6K21SG_UNO_NYC_241
M6EW2BJ_UNO_NYC_251 M6EW2BJ_UNO_NYC_251
M6HW2WL_UNO_NYC_251 M6HW2WL_UNO_NYC_251
HEY08460329_NA_M6LAB3D_UNO_NYC_241 HEY08460329_NA_M6LAB3D_UNO_NYC_241

Desired Output所需 Output
M6K21SG M6K21SG
M6EW2BJ M6EW2BJ
M6HW2WL M6HW2WL
M6LAB3D M6LAB3D

Answer 1

Try using:尝试使用：

SELECT colname FROM tableName WHERE REGEXP_EXTRACT(colname, ".*(M6[^_]*).*",1)

Regex used:使用的正则表达式：

.*(M6[^_]*).*

Regex Demo正则表达式演示

Explanation:解释：

.* - matches 0+ occurrences of any character that is not a newline character .* - 匹配 0+ 次出现的非换行符的任何字符
(M6[^_]*) - matches M6 followed by 0+ occurrences of any character that is not a _ . (M6[^_]*) - 匹配M6后跟 0+ 次出现的任何非_字符。 So, after M6, it keeps on matching everything until it finds the next _ .因此，在 M6 之后，它会继续匹配所有内容，直到找到下一个_ 。 The parenthesis is used to store this sub-match in Group 1括号用于将这个子匹配存储在第 1 组中
.* - matches 0+ occurrences of any character that is not a newline character .* - 匹配 0+ 次出现的非换行符的任何字符

在 Hive SQL 中提取具有特定模式的 substring

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-04 04:27:10

在 Hive SQL 中提取具有特定模式的 substring

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-04 04:27:10

解决方案1
2 已采纳 2021-04-04 04:27:10