简体   繁体   English

如何将 pandas dataframe 中的数据标准化为时间序列的起始值?

[英]How can I normalize data in a pandas dataframe to the starting value of a time series?

I would like to analyze a dataset from a clinical study using pandas.我想使用 pandas 分析来自临床研究的数据集。 Patients come at different visits to the clinic and some parameters are measured.患者在不同的诊所就诊,并测量了一些参数。 I would like to normalize the bloodparameters to the values of the first visit (baseline values), ie: Normalized = Parameter[Visit X] / Parameter[Visit 1].我想将血液参数标准化为第一次访问的值(基线值),即:Normalized = Parameter[Visit X] / Parameter[Visit 1]。 The dataset looks roughly like the following example:数据集大致类似于以下示例:

    import pandas as pd 
    import numpy as np
    rng = np.random.RandomState(0)
    df = pd.DataFrame({'Patient': ['A','A','A','B','B','B','C','C','C'],
                       'Visit': [1,2,3,1,2,3,1,2,3],
                       'Parameter': rng.randint(0, 100, 9)},
                       columns = ['Patient', 'Visit', 'Parameter'])
    df

       Patient  Visit   Parameter
    0    A        1     44
    1    A        2     47
    2    A        3     64
    3    B        1     67
    4    B        2     67
    5    B        3     9
    6    C        1     83
    7    C        2     21
    8    C        3     36 

Now I would like to add a column that includes each parameter normalized to the baseline value, ie the value at Visit 1. The simplest thing would be to add a column, which contains only the Visit 1 value for each patient and then simply divide the parameter column by this added column.现在我想添加一列,其中包含标准化为基线值的每个参数,即就诊 1 的值。最简单的方法是添加一列,该列仅包含每位患者的就诊 1 值,然后简单地将此添加列的参数列。 However I fail to create such a column, which would add the baseline value for each respective patient.但是,我无法创建这样一个列,它将为每个患者添加基线值。 But maybe there are also one-line solutions without adding another column.但也许也有不添加另一列的单行解决方案。

The result should look like this:结果应如下所示:

       Patient  Visit   Parameter Normalized
    0    A        1     44          1.0
    1    A        2     47          1.07
    2    A        3     64          1.45
    3    B        1     67          1.0
    4    B        2     67          1.0
    5    B        3     9           0.13
    6    C        1     83          1.0
    7    C        2     21          0.25
    8    C        3     36          0.43  

IIUC, GroupBy.transform IIUC, GroupBy.transform

df['Normalized'] = df['Parameter'].div(df.groupby('Patient')['Parameter']
                                         .transform('first'))
print(df)
  Patient  Visit  Parameter  Normalized
0       A      1         44    1.000000
1       A      2         47    1.068182
2       A      3         64    1.454545
3       B      1         67    1.000000
4       B      2         67    1.000000
5       B      3          9    0.134328
6       C      1         83    1.000000
7       C      2         21    0.253012
8       C      3         36    0.433735

df['Normalized'] = df['Parameter'].div(df.groupby('Patient')['Parameter']
                                         .transform('first')).round(2)
print(df)
  Patient  Visit  Parameter  Normalized
0       A      1         44        1.00
1       A      2         47        1.07
2       A      3         64        1.45
3       B      1         67        1.00
4       B      2         67        1.00
5       B      3          9        0.13
6       C      1         83        1.00
7       C      2         21        0.25
8       C      3         36        0.43

If you need create a new DataFrame:如果您需要创建一个新的 DataFrame:

df2 = df.assign(Normalized = df['Parameter'].div(df.groupby('Patient')['Parameter'].transform('first')))

We could also use lambda as I suggested.我们也可以按照我的建议使用lambda

Or:或者:

df2 = df.copy()
df2['Normalized'] = df['Parameter'].div(df.groupby('Patient')['Parameter']
                                         .transform('first'))

What @ansev said: GroupBy.transform @ansev 说什么: GroupBy.transform

If you wish to preserve the Parameter column, just run the last line he wrote but with Normalized instead of Parameter as the new column name:如果您希望保留Parameter列,只需运行他编写的最后一行,但使用Normalized而不是Parameter作为新列名:

df = df.assign(Normalized = lambda x: x['Parameter'].div(x.groupby('Patient')['Parameter'].transform('first')))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM