简体   繁体   English

雪花中的 SUBSTRING_INDEX()

[英]SUBSTRING_INDEX() in Snowflake

What is the exact duplicate function of MySQL SUBSTRING_INDEX() in Snowflake??雪花中 MySQL SUBSTRING_INDEX()的确切重复函数是什么?

I found SPLIT_PART() in Snowflake but this is not the exact same of SUBSTRING_INDEX() .我在 Snowflake 中找到了SPLIT_PART()但这与SUBSTRING_INDEX()不完全相同。

Eg SUBSTRING_INDEX("www.abc.com", ".", 2);例如SUBSTRING_INDEX("www.abc.com", ".", 2); returns www.abc返回www.abc

all the left side substring after 2nd delimiter '.'第二个分隔符'.'之后的所有左侧子字符串

but

SPLIT_PART("www.abc.com", ".", 2); return abc返回abc

it splits 1st then only returns the split part of a string.它首先拆分,然后只返回字符串的拆分部分。

How can I use SUBSTRING_INDEX() in the same way as MySQL in Snowflake如何在 Snowflake 中以与 MySQL 相同的方式使用SUBSTRING_INDEX()

Similar effect could be achieved using ARRAY operations:使用 ARRAY 操作可以达到类似的效果:

SELECT s.c, ARRAY_TO_STRING(ARRAY_SLICE(STRTOK_TO_ARRAY(s.c, '.'), 0, 2), '.')
FROM (VALUES ('www.abc.com')) AS s(c);

在此处输入图片说明


How does it works?它是如何工作的?

  • STRTOK_TO_ARRAY - make an array from string STRTOK_TO_ARRAY - 从字符串创建一个数组
  • ARRAY_SLICE - take the parts from 0 to n ARRAY_SLICE - 取从 0 到 n 的部分
  • ARRAY_TO_STRING - convert array back to string using '.' ARRAY_TO_STRING - 使用 '.' 将数组转换回字符串as delimeter作为分隔符

In steps:分步骤:

SELECT 
  s.c,
  STRTOK_TO_ARRAY(s.c, '.')   AS arr,
  ARRAY_SLICE(arr, 0, 2)      AS slice,
  ARRAY_TO_STRING(slice, '.') AS result
FROM (VALUES ('www.abc.com')) AS s(c);

在此处输入图片说明

You may use REGEXP_SUBSTR here:您可以在此处使用REGEXP_SUBSTR

SELECT REGEXP_SUBSTR('www.abc.com', '^[^.]+\.[^.]+');

Here is a demo showing that the regex pattern works as expected.这是一个演示,显示正则表达式模式按预期工作。

The substring_index function in MySQL returns the entire string if the substring isn't found or if the supplied occurrence is greater than the maximum occurrence.如果未找到子字符串或提供的出现次数大于最大出现次数,MySQL 中的substring_index函数将返回整个字符串。 Assuming you want to preserve that behavior and that you'd also find it helpful to be able to extract non-contiguous parts of string, consider this approach.假设您想保留该行为,并且您还发现能够提取字符串的非连续部分很有帮助,请考虑使用这种方法。

with cte as (select 'www.abc.com' as txt)


select a.txt, listagg(b.value,'.') within group (order by b.index)
from cte a, lateral split_to_table(a.txt, '.') b
where b.index <=2 --you can also do for e.g. b.index in (1,3) to get 'www.com'
group by a.txt;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM