简体   繁体   English

在 SQL/Python 中使用动态日期值透视雪花查询

[英]Pivoting Snowflake query with dynamic date values in either SQL/Python

I have searched many threads on this site about this but could not implement any of the solutions.我在这个网站上搜索了很多关于这个的主题,但无法实现任何解决方案。 I am using snowflake to pull data and then using the pivot function to transpose the table.我正在使用雪花来提取数据,然后使用数据透视函数来转置表格。 Problem is I have to specify static fields in the pivot function.问题是我必须在枢轴函数中指定静态字段。 In my query I am pulling a date range of 90 days, so it would not be very efficient to constantly be changing the dates.在我的查询中,我将日期范围设为 90 天,因此不断更改日期效率不会很高。 I am pulling the data in Jupyter using the snowflake connection, so python is an option.我正在使用雪花连接在 Jupyter 中提取数据,因此 python 是一个选项。

Sample query (this works):示例查询(这有效):

select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '2019-01-01' and date <= '2019-01-05' 
   group by 1, 2) d
pivot (
   max(prod_count) for date in ('2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05')) piv 

I have tried passing a select distinct date query inside the "for date in" piece, but that does not work.我试过在“for date in”部分传递一个选择不同的日期查询,但这不起作用。 I have also tried creating separate dataframes and python lists containing all of the dates and passing those in instead but that does not work either.我还尝试创建单独的数据框和 python 列表,其中包含所有日期并将它们传递进来,但这也不起作用。 I have also tried various other solutions on this forum but they seem to be focused on TSQL or SQL Server syntax which does not work in my case (at least when I tried..) Any help is appreciated.我还在这个论坛上尝试了各种其他解决方案,但他们似乎专注于 TSQL 或 SQL Server 语法,这在我的情况下不起作用(至少当我尝试时..)任何帮助表示赞赏。

Edit:编辑:

To show a sample of input vs expected output:要显示输入与预期输出的示例:

Input:输入:

Date        ID  Products
2019-01-01  1   A
2019-01-01  1   B
2019-01-01  2   A
2019-01-02  1   A
2019-01-02  1   B
2019-01-02  1   C
2019-01-02  2   A
2019-01-02  2   B

Current (and expected, but dynamic for the dates) output:当前(和预期的,但动态的日期)输出:

ID  2019-01-01   2019-01-02
1   2            3
2   1            2

if the range is 90 days you can tweak the function, but what we can do is return a dynamic query with your dynamic paramters as inputs:如果范围是 90 天,您可以调整该函数,但我们可以做的是返回一个动态查询,并将您的动态参数作为输入:

import pandas as pd


def generate_sql_dates(start_date="2019-01-01", end_date="2019-01-05"):
"""Date Generator, takes in a start and end date"""
   date_arrays = pd.date_range(start_date, end_date,freq='D')

   pivot_dates = tuple([x.strftime("%Y-%m-%d") for x in date_arrays])

   return f"""select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '{start_date}' and date <= '{end_date}'
   group by 1, 2) d
   pivot (
   max(prod_count) for date in {pivot_dates}) piv"""

running this returns :运行此返回:

qry = generate_sql_dates('2019-03-05','2019-04-05')
print(qry)

output:输出:

select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '2019-03-05' and date <= '2019-04-05'
   group by 1, 2) d
   pivot (
   max(prod_count) for date in ('2019-03-05', '2019-03-06', '2019-03-07', '2019-03-08', '2019-03-09', '2019-03-10', '2019-03-11', '2019-03-12', '2019-03-13', '2019-03-14', '2019-03-15', '2019-03-16', '2019-03-17', '2019-03-18', '2019-03-19', '2019-03-20', '2019-03-21', '2019-03-22', '2019-03-23', '2019-03-24', '2019-03-25', '2019-03-26', '2019-03-27', '2019-03-28', '2019-03-29', '2019-03-30', '2019-03-31', '2019-04-01', '2019-04-02', '2019-04-03', '2019-04-04', '2019-04-05')) piv

now if your date needs to be dynamic, ie your running this daily and want it to start with a trigger, you can use a datetime function, much like GETDATE() in SQL:现在,如果您的日期需要是动态的,即您每天运行并希望它以触发器开始,您可以使用日期时间函数,就像 SQL 中的GETDATE()

start = (pd.to_datetime('today')).strftime('%Y-%m-%d')
end = (pd.to_datetime('today') + pd.DateOffset(days=90)).strftime('%Y-%m-%d')

you could then pass these into the function - or leave them as the default values.然后您可以将它们传递给函数 - 或者将它们保留为默认值。

This is too long for a comment, but I don't know enough Python to give you a fully functional answer.这对于评论来说太长了,但我对 Python 的了解不够多,无法为您提供功能齐全的答案。 I can explain the approach for building a dynamic pivot, though.不过,我可以解释构建动态枢轴的方法。

Once you have your result set in place, use a tool to get a list of the distinct values from the column that you will be pivoting on and turning into column names.设置好结果后,使用工具从您将要旋转并转换为列名的列中获取不同值的列表。 In this case, it seems like that will be your date column.在这种情况下,这似乎是您的date列。 As for tools, a SQL SELECT DISTINCT will work, but Python can do the same thing.至于工具,SQL SELECT DISTINCT可以工作,但 Python 可以做同样的事情。 One way or the other, take the list of values, separate them with a comma and wrap them in delimiters if need be (and for dates it will be needed), then save that comma-separated list to a string variable.一种或另一种方式,获取值列表,用逗号分隔它们并在需要时将它们包装在分隔符中(对于需要它的日期),然后将该逗号分隔的列表保存到字符串变量中。 That might be easier to accomplish in Python, but I think it can be done in Snowflake , too.这在 Python 中可能更容易完成,但我认为它也可以在 Snowflake 中完成 Whatever you're more comfortable with.无论你更舒服。

Next, you'll use that list of column names to build another variable that will have the rest of your query in it.接下来,您将使用该列名称列表来构建另一个变量,该变量将包含您的查询的其余部分。 In the IN clause, you'll append the variable from above with your column list.IN子句中,您将把上面的变量附加到您的列列表中。

SET @queryText = 'select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '2019-01-01' and date <= '2019-01-05' 
   group by 1, 2) d
pivot (
   max(prod_count) for date in (' + @listOfColumnValues + ')) piv '

Finally, execute the query contained in @queryText .最后,执行包含在@queryText的查询。

I'll keep the most updated version of this answer on the similar question How to pivot on dynamic values in Snowflake .我将在类似问题How to pivot on dynamic values in Snowflake 上保留此答案的最新版本。

I wrote a Snowflake stored procedure to get dynamics pivots inside Snowflake, 3 steps:我编写了一个 Snowflake 存储过程来获取 Snowflake 内部的动态支点,3 个步骤:

  1. Query询问
  2. Call stored procedure call pivot_prev_results()调用存储过程call pivot_prev_results()
  3. Find the results select * from table(result_scan(last_query_id(-2)))查找结果select * from table(result_scan(last_query_id(-2)))

The procedure:步骤:

create or replace procedure pivot_prev_results()
returns string
language javascript
execute as caller as
$$
  var cols_query = `
      select '\\'' 
        || listagg(distinct pivot_column, '\\',\\'') within group (order by pivot_column)
        || '\\'' 
      from table(result_scan(last_query_id(-1)))
  `;
  var stmt1 = snowflake.createStatement({sqlText: cols_query});
  var results1 = stmt1.execute();
  results1.next();
  var col_list = results1.getColumnValue(1);
  
  pivot_query = `
         select * 
         from (select * from table(result_scan(last_query_id(-2)))) 
         pivot(max(pivot_value) for pivot_column in (${col_list}))
     `
  var stmt2 = snowflake.createStatement({sqlText: pivot_query});
  stmt2.execute();
  return `select * from table(result_scan('${stmt2.getQueryId()}'));\n  select * from table(result_scan(last_query_id(-2)));`;
$$;

Check https://hoffa.medium.com/dynamic-pivots-in-sql-with-snowflake-c763933987c for more.查看https://hoffa.medium.com/dynamic-pivots-in-sql-with-snowflake-c763933987c了解更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM