简体   繁体   中英

How can I unpivot multiple columns using jupyter notebook or SQL?

I have a table structure in parent-child relationship and I need to unpivot multiple columns with different values.

Here is an example:

在此处输入图片说明

Small example I have put in this table.

Note: I have only put few columns here but I have 215 columns in this way and with different Ids and names. So my goal is to flatten out all columns tied with p_id and/or c_id in expected result table.

I am doing this exercise using snowflake. but I am familiar with jupyter notebook as well. Feel free to provide a solution in SQL or using a Python jupyter notebook. You can also suggest any other ways to handle these kind of data.

This image is a part of comment section. Please Check the highlighted part. 在此处输入图片说明

Let's start by creating a table shaped like the one in the description:

CREATE OR REPLACE TABLE weird_table
AS 
SELECT 1 AS a, 'b' b, 1 pip1, 2 pip2, 3 pip3, 4 pip4, 11 rip1, 12 rip2, 13 rip3, 14 rip4
UNION ALL SELECT 2 , 'c', 124, 3123, 123, 123, 1231 ,9, 99,999
;

[![enter image description here][1]][1]

Now we can create a stored procedure inside Snowflake with JavaScript. Here the script gets the name of a table. Then it takes all the columns that end with a number and uses those to generate different SELECT statements, and merges them with a UNION ALL :

CREATE OR REPLACE PROCEDURE custom_unpivot(TABLE_NAME VARCHAR)
RETURNS STRING
LANGUAGE JAVASCRIPT
AS
$$
var stmt = snowflake.createStatement({
    sqlText: "SELECT * FROM " + TABLE_NAME + " LIMIT 1;",    
});
stmt.execute();

var cols=[];
for (i = 1; i <= stmt.getColumnCount(); i++) {
  cols.push(stmt.getColumnName(i));
}
var idCols = cols.filter(x => !x.match(/[0-9]+$/));
var unpivotCols = cols.filter(x => x.match(/[0-9]+$/));
var maxUnpivot = Math.max(...unpivotCols.map(x => parseInt(x.match(/[0-9]+$/))));
var colsSansSuffix = [...new Set(unpivotCols.map(x => x.replace(/[0-9]+$/, '')))];

selectsToUnion = [];
for (i = 1; i <= maxUnpivot; i++) {
  selectsToUnion.push(
    "SELECT "+idCols+","+colsSansSuffix.map(x=>" "+x+i+" AS "+x)+" FROM "+TABLE_NAME
  );
}
return selectsToUnion.join('\nUNION ALL\n');
$$
;

When you call that procedure, it return a combined SELECT statement that gives you the desired "unpivot":

CALL custom_unpivot('weird_table');

SELECT A,B, PIP1 AS PIP, RIP1 AS RIP FROM weird_table
UNION ALL
SELECT A,B, PIP2 AS PIP, RIP2 AS RIP FROM weird_table
UNION ALL
SELECT A,B, PIP3 AS PIP, RIP3 AS RIP FROM weird_table
UNION ALL
SELECT A,B, PIP4 AS PIP, RIP4 AS RIP FROM weird_table

If you run that generated SQL, it produces the desired results:

在此处输入图片说明

Once you get how this pattern works within a stored procedure, then the possibilities are endless.


For the follow up in the comment, try embedding the resulting query into a filter like this:

SELECT *
FROM (
  SELECT A,B, PIP1 AS PIP, RIP1 AS RIP FROM A_weird_table
  UNION ALL
  SELECT A,B, PIP2 AS PIP, RIP2 AS RIP FROM A_weird_table
  UNION ALL
  SELECT A,B, PIP3 AS PIP, RIP3 AS RIP FROM A_weird_table
  UNION ALL
  SELECT A,B, PIP4 AS PIP, RIP4 AS RIP FROM A_weird_table
  UNION ALL
  SELECT A,B, PIP5 AS PIP, RIP5 AS RIP FROM A_weird_table
)
WHERE pip>0 AND rip>0;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM