spark regexp_extract function 的 groupIdx 参数

Question

I don't understand how last parameter groupIdx works in below function, I can't find any details in documentation.我不明白最后一个参数groupIdx在 function 下面是如何工作的，我在文档中找不到任何详细信息。 I am using this function with groupIdx = 0, when I changed this value to > 0, I've received an error java.lang.IndexOutOfBoundsException: No group 1 .我正在使用此 function 和 groupIdx = 0，当我将此值更改为 > 0 时，我收到错误java.lang.IndexOutOfBoundsException: No group 1 。 Can someone explain how it works and when groupIdx > 0 could be applied?有人可以解释它是如何工作的以及何时可以应用 groupIdx > 0 吗？

regexp_extract(e: Column, exp: String, groupIdx: Int): Column

Answer 1

The argument extracts the part of a match that was captured with the specified capturing group .该参数提取使用指定捕获组捕获的匹配部分。

See the docs :请参阅文档：

regexp_extract(str, regexp[, idx]) - Extracts a group that matches regexp.
Examples:例子：
> SELECT regexp_extract('100-200', '(\d+)-(\d+)', 1);
100

The 100 substring is captured with the first (\d+) in the regex pattern, and the 1 argument makes the function return just this part of the whole match (which is 100-200 ). 100 substring 被正则表达式模式中的第一个(\d+)捕获，并且1参数使 function 仅返回整个匹配的这一部分（即100-200 ）。

spark regexp_extract function 的 groupIdx 参数

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-10-07 11:45:46

spark regexp_extract function 的 groupIdx 参数

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-10-07 11:45:46

解决方案1
1 已采纳 2019-10-07 11:45:46