简体   繁体   English

在 Bigquery 中使用 REGEXP_EXTRACT 在破折号之间查找字符串

[英]Finding strings between dashes using REGEXP_EXTRACT in Bigquery

In Bigquery, I am trying to find a way to extract particular segments of a string based on how many dashes come before it.在 Bigquery 中,我试图找到一种方法来根据字符串前面有多少破折号来提取字符串的特定段。 The number of total dashes in the string will always be the same.字符串中的总破折号数将始终相同。 For example, I could be looking for the string after the second dash and before the third dash in the following string:例如,我可以在以下字符串中查找第二个破折号之后和第三个破折号之前的字符串:

abc-defgh-hij-kl-mnop

Currently, I am using the following regex to extract, which counts the dashes from the back:目前,我正在使用以下正则表达式进行提取,从后面计算破折号:

([^-]+)(?:-[^-]+){2}$

The problem is that if there is nothing in between the dashes, the regex doesn't work.问题是,如果破折号之间没有任何内容,则正则表达式不起作用。 For example, something like this returns null :例如,像这样返回null

abc-defgh-hij--mnop

Is there a way to use regex to extract a string after a certain number of dashes and cut it off before the subsequent dash?有没有办法使用正则表达式在一定数量的破折号之后提取字符串并在随后的破折号之前将其切断?

Thank you!谢谢!

Below is for BigQuery Standrd SQL以下是 BigQuery 标准 SQL

The simplest way in your case is to use SPLIT and OFFSET as in below example在您的情况下,最简单的方法是使用 SPLIT 和 OFFSET,如下例所示

SELECT SPLIT(str, '-')[OFFSET(3)]   

above will return empty string for abc-defgh-hij--mnop以上将为abc-defgh-hij--mnop返回空字符串

to prevent error in case of calling non-existing element - better to use SAFE_OFFSET防止在调用不存在的元素时出错 - 最好使用 SAFE_OFFSET

SELECT SPLIT(str, '-')[SAFE_OFFSET(3)]   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM