简体   繁体   English

R:计算线性回归并获得“数据子集”的斜率

[英]R: Calculate linear regression and get slope for “a subset of data”

My goal is to find out half life (from terminal phase if anyone is familiar with Pharmacokinetics)我的目标是找出半衰期(如果有人熟悉药代动力学,则从末期开始)

I have some data containing the following;我有一些包含以下内容的数据;

1500 rows, with ID being main "key". 1500 行, ID是主“键”。 There is 15 rows per ID .每个ID有 15 行。 Then I have other columns TIME and CONCENTRATION .然后我还有其他列TIMECONCENTRATION Now What I want to do is, for each ID , remove the first TIME (which equals "000" (numeric)), then run lm() function on the remaining 14 rows per ID , and then use abs() to extract the absolute value of the slope, then then save this to a new column named THALF .现在我想要做的是,对于每个ID ,删除第一个TIME (等于“000”(数字)),然后在每个ID的剩余 14 行上运行lm() function ,然后使用 abs() 提取斜率的绝对值,然后将其保存到名为THALF的新列中。 (If anyone is familiar with Pharmacokinetics maybe there is better way to do this?) (如果有人熟悉药代动力学,也许有更好的方法来做到这一点?)

But I have not be able to do this using my limited knowledge of R.但是使用我对 R 的有限知识,我无法做到这一点。

Here is what I've come up with so far:到目前为止,这是我想出的:

data_new <- data %>% dplyr::group_by(data $ID) %>% dplyr::filter(data $TIME != 10) %>% dplyr::mutate(THAFL = abs(lm$coefficients[2](data $CONC ~ data $TIME)))

From what I've understood from other Stackoverflow answers, lm$coefficients[2] will extract the slope.根据我从其他 Stackoverflow 答案中了解到的情况,lm$coefficients[2] 将提取斜率。

But however, I have not been able to make this work.但是,我无法完成这项工作。 I get this error from trying to run the code:我尝试运行代码时收到此错误:

Error: Problem with `mutate()` input `..1`.
x Input `..1` can't be recycled to size 15.
i Input `..1` is `data$ID`.
i Input `..1` must be size 15 or 1, not 1500.
i The error occurred in group 1: data$ID = "pat1".

Any suggestions on how to solve this?关于如何解决这个问题的任何建议? IF you need more info, let me know please.如果您需要更多信息,请告诉我。

(Also, if anyone is familiar with Pharmacokinetics, when they ask for half life from terminal phase, do I do lm() from the concentration max? I Have a column with value for the highest observed concentration at what time. ) (另外,如果有人熟悉药代动力学,当他们要求从终末期获得半衰期时,我是否从浓度 max 执行 lm()?我有一个列,其中包含在什么时间观察到的最高浓度的值。)

If after the model fitting you still need the observations with TIME == 10 , you can try summarising after you group by ID and then using a right join如果在 model 拟合之后您仍然需要TIME == 10的观察结果,您可以尝试在按ID分组后进行汇总,然后使用右连接

data %>% 
  filter(TIME != 10) %>% 
  group_by(ID) %>%
  summarise(THAFL = abs(lm(CONC ~ TIME)$coefficients[2])) %>% 
  right_join(data, by = "ID")


# A tibble: 30 x 16
   ID      THAFL Sex   Weight..kg. Height..cm. Age..yrs. T134A A443G G769C G955C A990C  TIME  CONC LBM   `data_combine$ID`  CMAX
   <chr>   <dbl> <chr>       <int>       <int>     <int> <int> <int> <int> <int> <int> <dbl> <dbl> <chr> <chr>             <dbl>
 1 pat1  0.00975 F              50         135        47     0     2     1     2     0    10  0    Under pat1                 60
 2 pat1  0.00975 F              50         135        47     0     2     1     2     0    20  6.93 Under pat1                 60
 3 pat1  0.00975 F              50         135        47     0     2     1     2     0    30 12.2  Under pat1                 60
 4 pat1  0.00975 F              50         135        47     0     2     1     2     0    45 14.8  Under pat1                 60
 5 pat1  0.00975 F              50         135        47     0     2     1     2     0    60 15.0  Under pat1                 60
 6 pat1  0.00975 F              50         135        47     0     2     1     2     0    90 12.4  Under pat1                 60
 7 pat1  0.00975 F              50         135        47     0     2     1     2     0   120  9.00 Under pat1                 60
 8 pat1  0.00975 F              50         135        47     0     2     1     2     0   150  6.22 Under pat1                 60
 9 pat1  0.00975 F              50         135        47     0     2     1     2     0   180  4.18 Under pat1                 60
10 pat1  0.00975 F              50         135        47     0     2     1     2     0   240  1.82 Under pat1                 60
# ... with 20 more rows

If after the model fitting you don't want the rows with TIME == 10 to appear on your dataset, you can use mutate如果在 model 拟合之后,您不希望TIME == 10的行出现在数据集上,则可以使用mutate

data %>% 
  filter(TIME != 10) %>% 
  group_by(ID) %>%
  mutate(THAFL = abs(lm(CONC ~ TIME)$coefficients[2]))

# A tibble: 28 x 16
# Groups:   ID [2]
   ID    Sex   Weight..kg. Height..cm. Age..yrs. T134A A443G G769C G955C A990C  TIME  CONC LBM   `data_combine$ID`  CMAX   THAFL
   <chr> <chr>       <int>       <int>     <int> <int> <int> <int> <int> <int> <dbl> <dbl> <chr> <chr>             <dbl>   <dbl>
 1 pat1  F              50         135        47     0     2     1     2     0    20  6.93 Under pat1                 60 0.00975
 2 pat2  M              75         175        29     0     2     0     0     0    20  6.78 Under pat2                 60 0.00835
 3 pat1  F              50         135        47     0     2     1     2     0    30 12.2  Under pat1                 60 0.00975
 4 pat2  M              75         175        29     0     2     0     0     0    30 11.6  Above pat2                 60 0.00835
 5 pat1  F              50         135        47     0     2     1     2     0    45 14.8  Under pat1                 60 0.00975
 6 pat2  M              75         175        29     0     2     0     0     0    45 13.5  Under pat2                 60 0.00835
 7 pat1  F              50         135        47     0     2     1     2     0    60 15.0  Under pat1                 60 0.00975
 8 pat2  M              75         175        29     0     2     0     0     0    60 13.1  Above pat2                 60 0.00835
 9 pat1  F              50         135        47     0     2     1     2     0    90 12.4  Under pat1                 60 0.00975
10 pat2  M              75         175        29     0     2     0     0     0    90  9.77 Under pat2                 60 0.00835
# ... with 18 more rows

You can use broom :您可以使用broom

library(broom)
library(dplyr)
#Code
data %>% group_by(ID) %>%
  filter(TIME!=10) %>%
  do(fit = tidy(lm(CONC ~ TIME, data = .))) %>% 
  unnest(fit) %>%
  filter(term=='TIME') %>%
  mutate(estimate=abs(estimate))

Output: Output:

# A tibble: 2 x 6
  ID    term  estimate std.error statistic p.value
  <chr> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 pat1  TIME   0.00975   0.00334     -2.92  0.0128
2 pat2  TIME   0.00835   0.00313     -2.67  0.0204

If joining with original data is needed, try:如果需要加入原始数据,请尝试:

#Code 2
data <- data %>% left_join(data %>% group_by(ID) %>%
  filter(TIME!=10) %>%
  do(fit = tidy(lm(CONC ~ TIME, data = .))) %>% 
  unnest(fit) %>%
  filter(term=='TIME') %>%
  mutate(estimate=abs(estimate)) %>%
  select(c(ID,estimate)))

Similar to @RicS .类似于@RicS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM