Working on the MIMIC-IV dataset, In a task to predict mortality using hospital admission and lab test data, I'm trying to create a one-hot vector of m most common lab kinds.
subject_id - representing patient
admission_id - representing a single admission
itemid - the kind of lab taken
current query:
SELECT
a.hadm_id,
a.subject_id,
l.itemid,
gender,
count(*) as number_of_labs,
admission_type as type,
admission_location as loc,
ethnicity,
marital_status as ms,
anchor_age as age,
l.itemid IN (SELECT itemid
FROM `{labevents}` as l
GROUP BY itemid
ORDER BY COUNT(itemid) DESC
LIMIT 256) AS onehot,
MAX(hospital_expire_flag) as died
FROM
`{admissions_table}` as a
INNER JOIN `{patients_table}` as p ON a.subject_id = p.subject_id
INNER JOIN `{labevents}` as l ON l.subject_id = p.subject_id
group by subject_id, a.hadm_id, gender, admission_type, admission_location, ethnicity, marital_status, anchor_age, l.itemid
LIMIT 20
Ideally, I want to add to the 'onehot' column I created an array representing a one-hot vector of the m most common labs (in this case m=256).
data is credentialed access only, therefore I can't share it.
One possible approach would be to make a bin template
and cross joining
with the lab_index
.
Below is a simple example of the approach. I believe out can add a filter of most frequent index at the temp table.
DECLARE bin_max INT64;
DECLARE bin_min INT64;
-- 1 to 7 subject_id with lab_index of 1:4, 2:2, 3:0, 4:1
CREATE TEMP TABLE dataset AS
SELECT 1 as subject_id, 1 as lab_index
UNION ALL SELECT 2 as subject_id, 1 as lab_index UNION ALL SELECT 3 as subject_id, 1 as lab_index -- one hot index 0
UNION ALL SELECT 4 as subject_id, 1 as lab_index UNION ALL SELECT 5 as subject_id, 2 as lab_index -- one hot index 1
UNION ALL SELECT 6 as subject_id, 2 as lab_index UNION ALL SELECT 7 as subject_id, 4 as lab_index -- one hot index 3
;
SET bin_max = (SELECT MAX(lab_index) FROM dataset);
SET bin_min = (SELECT MIN(lab_index) FROM dataset);
WITH
empty_bin AS (
SELECT *
FROM UNNEST(GENERATE_ARRAY(0, bin_max - bin_min, 1)) AS bin
),
one_hot_index AS (
SELECT
subject_id, lab_index, bin, lab_index - bin_min,
IF (lab_index - bin_min = bin, 1, 0) AS one_hot,
FROM dataset
CROSS JOIN empty_bin
)
SELECT
subject_id, lab_index,
STRING_AGG(CAST(one_hot AS STRING), "" ORDER BY bin) as one_hot, -- <- vector format could be changed
FROM one_hot_index
GROUP BY subject_id, lab_index
ORDER BY subject_id
;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.