从文本字段中提取值以支持 Redshift 表中 SQL 中的 GROUP BY

Question

I have a Redshift table that essentially contains survey results that I'm trying to do some statistical analysis on (example data at end of question).我有一个 Redshift 表，它基本上包含我正在尝试对其进行一些统计分析的调查结果（问题末尾的示例数据）。 For the purposes of this question we can assume the table has three columns: response , date_submitted and meta_data出于这个问题的目的，我们可以假设该表具有三列： response 、 date_submitted和meta_data

The response column contains a simple 1 or 0 . response列包含一个简单的1或0 。 The timestamp is the datetime when the survey was submitted. timestamp是提交调查的日期时间。 And the meta_data field is essentially a CSV text field with a list of key, value pairs.而meta_data字段本质上是一个 CSV 文本字段，其中包含键值对列表。

Example row below下面的示例行

f1=v1, f2=v2, templateId=<someTemplateId>, ...

Most of this data I don't care about but the templateId is the value I want to pull out of meta_data .我不关心这些数据中的大部分，但templateId是我想从meta_data中提取的值。 I would like to use templateId to GROUP BY all my rows so I can get summary statistics for each templateId .我想使用templateId来GROUP BY我的所有行，这样我就可以获得每个templateId的摘要统计信息。 Long term this value should be extracted into its own column but given time constraints I am unable to update our data generating process for this and need to perform the analysis before that work can be done长期来看，这个值应该被提取到它自己的列中，但由于时间限制，我无法为此更新我们的数据生成过程，需要在完成这项工作之前执行分析

Ultimatley the query I'd like to be able to do is最后我想做的查询是

SELECT sum(response) FROM my_table GROUP BY <template_id>

Example Data示例数据

   reponse                    meta_data                  timestamp
0        0  f1=v1, f2=v2, templateId=77 2021-04-07 09:51:55.655793
1        0  f1=v1, f2=v2, templateId=55 2021-04-07 08:51:55.655793
2        1  f1=v1, f2=v2, templateId=77 2021-04-07 07:51:55.655793
3        1  f1=v1, f2=v2, templateId=77 2021-04-07 06:51:55.655793
4        1  f1=v1, f2=v2, templateId=66 2021-04-07 05:51:55.655793

To clarify the above... f1=v1, f2=v2, templateId=77 is the entry for meta_data in the first row为了澄清上述... f1=v1, f2=v2, templateId=77是第一行中meta_data的条目

Answer 1

You should be able to use regexp_substr() :您应该能够使用regexp_substr() ：

select t.*,
       regexp_substr(meta_data, 'template_id=([^ ]+) ', 1, 1, 'e') as template_id
from t;

从文本字段中提取值以支持 Redshift 表中 SQL 中的 GROUP BY

问题描述

1 个解决方案

解决方案1
0 2021-04-07 14:54:38

从文本字段中提取值以支持 Redshift 表中 SQL 中的 GROUP BY

问题描述

1 个解决方案

解决方案1 0 2021-04-07 14:54:38

解决方案1
0 2021-04-07 14:54:38