简体   繁体   中英

Pivoting Snowflake query with dynamic date values in either SQL/Python

I have searched many threads on this site about this but could not implement any of the solutions. I am using snowflake to pull data and then using the pivot function to transpose the table. Problem is I have to specify static fields in the pivot function. In my query I am pulling a date range of 90 days, so it would not be very efficient to constantly be changing the dates. I am pulling the data in Jupyter using the snowflake connection, so python is an option.

Sample query (this works):

select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '2019-01-01' and date <= '2019-01-05' 
   group by 1, 2) d
pivot (
   max(prod_count) for date in ('2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05')) piv 

I have tried passing a select distinct date query inside the "for date in" piece, but that does not work. I have also tried creating separate dataframes and python lists containing all of the dates and passing those in instead but that does not work either. I have also tried various other solutions on this forum but they seem to be focused on TSQL or SQL Server syntax which does not work in my case (at least when I tried..) Any help is appreciated.

Edit:

To show a sample of input vs expected output:

Input:

Date        ID  Products
2019-01-01  1   A
2019-01-01  1   B
2019-01-01  2   A
2019-01-02  1   A
2019-01-02  1   B
2019-01-02  1   C
2019-01-02  2   A
2019-01-02  2   B

Current (and expected, but dynamic for the dates) output:

ID  2019-01-01   2019-01-02
1   2            3
2   1            2

if the range is 90 days you can tweak the function, but what we can do is return a dynamic query with your dynamic paramters as inputs:

import pandas as pd


def generate_sql_dates(start_date="2019-01-01", end_date="2019-01-05"):
"""Date Generator, takes in a start and end date"""
   date_arrays = pd.date_range(start_date, end_date,freq='D')

   pivot_dates = tuple([x.strftime("%Y-%m-%d") for x in date_arrays])

   return f"""select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '{start_date}' and date <= '{end_date}'
   group by 1, 2) d
   pivot (
   max(prod_count) for date in {pivot_dates}) piv"""

running this returns :

qry = generate_sql_dates('2019-03-05','2019-04-05')
print(qry)

output:

select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '2019-03-05' and date <= '2019-04-05'
   group by 1, 2) d
   pivot (
   max(prod_count) for date in ('2019-03-05', '2019-03-06', '2019-03-07', '2019-03-08', '2019-03-09', '2019-03-10', '2019-03-11', '2019-03-12', '2019-03-13', '2019-03-14', '2019-03-15', '2019-03-16', '2019-03-17', '2019-03-18', '2019-03-19', '2019-03-20', '2019-03-21', '2019-03-22', '2019-03-23', '2019-03-24', '2019-03-25', '2019-03-26', '2019-03-27', '2019-03-28', '2019-03-29', '2019-03-30', '2019-03-31', '2019-04-01', '2019-04-02', '2019-04-03', '2019-04-04', '2019-04-05')) piv

now if your date needs to be dynamic, ie your running this daily and want it to start with a trigger, you can use a datetime function, much like GETDATE() in SQL:

start = (pd.to_datetime('today')).strftime('%Y-%m-%d')
end = (pd.to_datetime('today') + pd.DateOffset(days=90)).strftime('%Y-%m-%d')

you could then pass these into the function - or leave them as the default values.

This is too long for a comment, but I don't know enough Python to give you a fully functional answer. I can explain the approach for building a dynamic pivot, though.

Once you have your result set in place, use a tool to get a list of the distinct values from the column that you will be pivoting on and turning into column names. In this case, it seems like that will be your date column. As for tools, a SQL SELECT DISTINCT will work, but Python can do the same thing. One way or the other, take the list of values, separate them with a comma and wrap them in delimiters if need be (and for dates it will be needed), then save that comma-separated list to a string variable. That might be easier to accomplish in Python, but I think it can be done in Snowflake , too. Whatever you're more comfortable with.

Next, you'll use that list of column names to build another variable that will have the rest of your query in it. In the IN clause, you'll append the variable from above with your column list.

SET @queryText = 'select * from (
   select date, id, count(products) as prod_count 
   from table1 where date >= '2019-01-01' and date <= '2019-01-05' 
   group by 1, 2) d
pivot (
   max(prod_count) for date in (' + @listOfColumnValues + ')) piv '

Finally, execute the query contained in @queryText .

I'll keep the most updated version of this answer on the similar question How to pivot on dynamic values in Snowflake .

I wrote a Snowflake stored procedure to get dynamics pivots inside Snowflake, 3 steps:

  1. Query
  2. Call stored procedure call pivot_prev_results()
  3. Find the results select * from table(result_scan(last_query_id(-2)))

The procedure:

create or replace procedure pivot_prev_results()
returns string
language javascript
execute as caller as
$$
  var cols_query = `
      select '\\'' 
        || listagg(distinct pivot_column, '\\',\\'') within group (order by pivot_column)
        || '\\'' 
      from table(result_scan(last_query_id(-1)))
  `;
  var stmt1 = snowflake.createStatement({sqlText: cols_query});
  var results1 = stmt1.execute();
  results1.next();
  var col_list = results1.getColumnValue(1);
  
  pivot_query = `
         select * 
         from (select * from table(result_scan(last_query_id(-2)))) 
         pivot(max(pivot_value) for pivot_column in (${col_list}))
     `
  var stmt2 = snowflake.createStatement({sqlText: pivot_query});
  stmt2.execute();
  return `select * from table(result_scan('${stmt2.getQueryId()}'));\n  select * from table(result_scan(last_query_id(-2)));`;
$$;

Check https://hoffa.medium.com/dynamic-pivots-in-sql-with-snowflake-c763933987c for more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM