[英]Extract value from text field to support GROUP BY in SQL from Redshift table
I have a Redshift table that essentially contains survey results that I'm trying to do some statistical analysis on (example data at end of question).我有一个 Redshift 表,它基本上包含我正在尝试对其进行一些统计分析的调查结果(问题末尾的示例数据)。 For the purposes of this question we can assume the table has three columns:
response
, date_submitted
and meta_data
出于这个问题的目的,我们可以假设该表具有三列:
response
、 date_submitted
和meta_data
The response
column contains a simple 1
or 0
. response
列包含一个简单的1
或0
。 The timestamp
is the datetime when the survey was submitted. timestamp
是提交调查的日期时间。 And the meta_data
field is essentially a CSV text field with a list of key, value pairs.而
meta_data
字段本质上是一个 CSV 文本字段,其中包含键值对列表。
Example row below下面的示例行
f1=v1, f2=v2, templateId=<someTemplateId>, ...
Most of this data I don't care about but the templateId
is the value I want to pull out of meta_data
.我不关心这些数据中的大部分,但
templateId
是我想从meta_data
中提取的值。 I would like to use templateId
to GROUP BY
all my rows so I can get summary statistics for each templateId
.我想使用
templateId
来GROUP BY
我的所有行,这样我就可以获得每个templateId
的摘要统计信息。 Long term this value should be extracted into its own column but given time constraints I am unable to update our data generating process for this and need to perform the analysis before that work can be done长期来看,这个值应该被提取到它自己的列中,但由于时间限制,我无法为此更新我们的数据生成过程,需要在完成这项工作之前执行分析
Ultimatley the query I'd like to be able to do is最后我想做的查询是
SELECT sum(response) FROM my_table GROUP BY <template_id>
Example Data示例数据
reponse meta_data timestamp
0 0 f1=v1, f2=v2, templateId=77 2021-04-07 09:51:55.655793
1 0 f1=v1, f2=v2, templateId=55 2021-04-07 08:51:55.655793
2 1 f1=v1, f2=v2, templateId=77 2021-04-07 07:51:55.655793
3 1 f1=v1, f2=v2, templateId=77 2021-04-07 06:51:55.655793
4 1 f1=v1, f2=v2, templateId=66 2021-04-07 05:51:55.655793
To clarify the above... f1=v1, f2=v2, templateId=77
is the entry for meta_data
in the first row为了澄清上述...
f1=v1, f2=v2, templateId=77
是第一行中meta_data
的条目
You should be able to use regexp_substr()
:您应该能够使用
regexp_substr()
:
select t.*,
regexp_substr(meta_data, 'template_id=([^ ]+) ', 1, 1, 'e') as template_id
from t;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.