简体   繁体   English

从文本字段中提取值以支持 Redshift 表中 SQL 中的 GROUP BY

[英]Extract value from text field to support GROUP BY in SQL from Redshift table

I have a Redshift table that essentially contains survey results that I'm trying to do some statistical analysis on (example data at end of question).我有一个 Redshift 表,它基本上包含我正在尝试对其进行一些统计分析的调查结果(问题末尾的示例数据)。 For the purposes of this question we can assume the table has three columns: response , date_submitted and meta_data出于这个问题的目的,我们可以假设该表具有三列: responsedate_submittedmeta_data

The response column contains a simple 1 or 0 . response列包含一个简单的10 The timestamp is the datetime when the survey was submitted. timestamp是提交调查的日期时间。 And the meta_data field is essentially a CSV text field with a list of key, value pairs.meta_data字段本质上是一个 CSV 文本字段,其中包含键值对列表。

Example row below下面的示例行

f1=v1, f2=v2, templateId=<someTemplateId>, ...

Most of this data I don't care about but the templateId is the value I want to pull out of meta_data .我不关心这些数据中的大部分,但templateId是我想从meta_data中提取的值。 I would like to use templateId to GROUP BY all my rows so I can get summary statistics for each templateId .我想使用templateIdGROUP BY我的所有行,这样我就可以获得每个templateId的摘要统计信息。 Long term this value should be extracted into its own column but given time constraints I am unable to update our data generating process for this and need to perform the analysis before that work can be done长期来看,这个值应该被提取到它自己的列中,但由于时间限制,我无法为此更新我们的数据生成过程,需要在完成这项工作之前执行分析

Ultimatley the query I'd like to be able to do is最后我想做的查询是

SELECT sum(response) FROM my_table GROUP BY <template_id>

Example Data示例数据

   reponse                    meta_data                  timestamp
0        0  f1=v1, f2=v2, templateId=77 2021-04-07 09:51:55.655793
1        0  f1=v1, f2=v2, templateId=55 2021-04-07 08:51:55.655793
2        1  f1=v1, f2=v2, templateId=77 2021-04-07 07:51:55.655793
3        1  f1=v1, f2=v2, templateId=77 2021-04-07 06:51:55.655793
4        1  f1=v1, f2=v2, templateId=66 2021-04-07 05:51:55.655793

To clarify the above... f1=v1, f2=v2, templateId=77 is the entry for meta_data in the first row为了澄清上述... f1=v1, f2=v2, templateId=77是第一行中meta_data的条目

You should be able to use regexp_substr() :您应该能够使用regexp_substr()

select t.*,
       regexp_substr(meta_data, 'template_id=([^ ]+) ', 1, 1, 'e') as template_id
from t;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM