简体   繁体   English

如何根据某些条件投射(长到宽)数据框

[英]How to cast (long to wide) dataframe based on some conditions

How can I convert this:我该如何转换:

patient_id test    test_value      date_taken
11964    HBA1C         8.60        2017-06-14
11964    Glucose     231.00        2017-05-01
11964    Glucose     202.00        2017-07-01
11964    Glucose     194.00        2017-09-02
11964    Creatinine    1.10        2017-05-01
11964    Creatinine    1.28        2017-08-14

to this?到这个?

patient_id  hba1c_earliest hba1c_latest hba1c_change glucose_earliest glucose_latest/
    11964      8.60           8.60          0.0000        231.0           194.0   
glucose_change creatinine_earliest creatinine_latest creatinine_change
     -0.1602         1.10               1.28             0.1636

For the extended dataframe:对于扩展数据框:

.*_earliest columns should include that lab result with the earliest date. .*_latest columns should include that lab result with the latest date. .*_change columns should hold the relative change (variation), (Latest - Earliest) / Earliest.

Use:用:

print (df.dtypes)
patient_id             int64 <- not necessary
test                  object <- not necessary
test_value           float64 <- necessary
date_taken    datetime64[ns] <- necessary
dtype: object

df = (df.sort_values(['patient_id','test','date_taken'])
       .groupby(['patient_id','test'])['test_value']
       .agg([('earliest','first'),('latest','last')])
       .assign(change = lambda x: (x['latest'] - x['earliest'])/ x['earliest'])
       .unstack()
       .swaplevel(0,1, axis=1)
       .reindex(columns=df['test'].unique(), level=0)
       )
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
   patient_id  HBA1C_earliest  HBA1C_latest  HBA1C_change  Glucose_earliest  \
0       11964             8.6           8.6           0.0             231.0   

   Glucose_latest  Glucose_change  Creatinine_earliest  Creatinine_latest  \
0           194.0       -0.160173                  1.1               1.28   

   Creatinine_change  
0           0.163636  

Explanation :说明

  1. First sort_values by multiple columns首先按多列sort_values
  2. Aggregate by agg with first and last for earliest and latest columns.agg聚合, firstlast用于earliestlatest列。
  3. Create new column by assign通过assign创建新列
  4. Rehape by unstack Rehape被unstack
  5. Swap levels in MulriIndex in columns by swaplevel在MulriIndex交换水平按列swaplevel
  6. Then reindex for same order like in original column然后reindex与原始列中相同的顺序
  7. Flatten MultiIndex by map with join in columns通过在列中joinmap MultiIndex 展平
  8. Last reset_index for column from index .来自index列的最后reset_index

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM