简体   繁体   English

正则表达式使用数字数据提取大查询

[英]regex extract big query with numeric data

how would I be able to grab the number 2627995 from this string我怎样才能从这个字符串中获取数字 2627995

"hellotest/2627995?hl=en"

I want to grab the number 2627995, here is my current regex but it does not work when I use regex extract from big query我想获取数字 2627995,这是我当前的正则表达式,但是当我使用大查询中的正则表达式提取时它不起作用

(\/)\d{7,7}

SELECT
  REGEXP_EXTRACT(DESC, r"(\/)\d{7,7}")
  AS number
FROM
  `string table`

here is the output这是输出

错误

Thank you!!谢谢!!

I think you just want to match all digits coming after the last path separator, before either the start of the query parameter, or the end of the URL.我认为您只想匹配最后一个路径分隔符之后、查询参数开始或 URL 结束之前的所有数字。

SELECT REGEXP_EXTRACT(DESC, r"/(\d+)(?:\?|$)") AS number
FROM `string table`

Demo演示

试试这个: r"\\/(\\d+)"

Your code returns the slash because you captured it (see the parentheses in (\\/)\\d{7,7} ).您的代码返回斜杠,因为您捕获了它(请参阅(\\/)\\d{7,7}的括号)。 REGEXP_EXTRACT only returns the captured substring. REGEXP_EXTRACT仅返回捕获的子字符串。

Thus, you could just wrap the other part of your regex with the parentheses:因此,您可以将正则表达式的另一部分用括号括起来:

SELECT
  REGEXP_EXTRACT(DESC, r"/(\d{7})")
  AS number
FROM
  `string table`

NOTE :注意

  • In BigQuery, regex is specified with string literals, not regex literals (that are usually delimited with forward slashes), that is why you do not need to escape the / char (it is not a special regex metacharacter)在 BigQuery 中,正则表达式是用字符串文字指定的,而不是正则表达式文字(通常用正斜杠分隔),这就是为什么你不需要转义/字符(它不是一个特殊的正则表达式元字符)
  • {7,7} is equal to {7} limiting quantifier, meaning seven occurrences . {7,7}等于{7}限制量词,意思是出现七次

Also, if you are sure the number is at the end of string or is followed with a query string, you can enhance it as此外,如果您确定该数字位于字符串的末尾或后跟查询字符串,则可以将其增强为

REGEXP_EXTRACT(DESC, r"/(\d+)(?:[?#]|$)")

where the regex means正则表达式的意思

  • / - a / char / - 一个/字符
  • (\\d+) - Group 1 (the actual output): one or more digits (\\d+) - 第 1 组(实际输出):一位或多位数字
  • (?:[?#]|$) - either ? (?:[?#]|$) - 要么? or # char, or end of string.#字符或字符串的结尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM