[英]R: Calculate linear regression and get slope for “a subset of data”
My goal is to find out half life (from terminal phase if anyone is familiar with Pharmacokinetics)我的目标是找出半衰期(如果有人熟悉药代动力学,则从末期开始)
I have some data containing the following;我有一些包含以下内容的数据;
1500 rows, with ID
being main "key". 1500 行,
ID
是主“键”。 There is 15 rows per ID
.每个
ID
有 15 行。 Then I have other columns TIME
and CONCENTRATION
.然后我还有其他列
TIME
和CONCENTRATION
。 Now What I want to do is, for each ID
, remove the first TIME
(which equals "000" (numeric)), then run lm()
function on the remaining 14 rows per ID
, and then use abs() to extract the absolute value of the slope, then then save this to a new column named THALF
.现在我想要做的是,对于每个
ID
,删除第一个TIME
(等于“000”(数字)),然后在每个ID
的剩余 14 行上运行lm()
function ,然后使用 abs() 提取斜率的绝对值,然后将其保存到名为THALF
的新列中。 (If anyone is familiar with Pharmacokinetics maybe there is better way to do this?) (如果有人熟悉药代动力学,也许有更好的方法来做到这一点?)
But I have not be able to do this using my limited knowledge of R.但是使用我对 R 的有限知识,我无法做到这一点。
Here is what I've come up with so far:到目前为止,这是我想出的:
data_new <- data %>% dplyr::group_by(data $ID) %>% dplyr::filter(data $TIME != 10) %>% dplyr::mutate(THAFL = abs(lm$coefficients[2](data $CONC ~ data $TIME)))
From what I've understood from other Stackoverflow answers, lm$coefficients[2] will extract the slope.根据我从其他 Stackoverflow 答案中了解到的情况,lm$coefficients[2] 将提取斜率。
But however, I have not been able to make this work.但是,我无法完成这项工作。 I get this error from trying to run the code:
我尝试运行代码时收到此错误:
Error: Problem with `mutate()` input `..1`.
x Input `..1` can't be recycled to size 15.
i Input `..1` is `data$ID`.
i Input `..1` must be size 15 or 1, not 1500.
i The error occurred in group 1: data$ID = "pat1".
Any suggestions on how to solve this?关于如何解决这个问题的任何建议? IF you need more info, let me know please.
如果您需要更多信息,请告诉我。
(Also, if anyone is familiar with Pharmacokinetics, when they ask for half life from terminal phase, do I do lm() from the concentration max? I Have a column with value for the highest observed concentration at what time. ) (另外,如果有人熟悉药代动力学,当他们要求从终末期获得半衰期时,我是否从浓度 max 执行 lm()?我有一个列,其中包含在什么时间观察到的最高浓度的值。)
If after the model fitting you still need the observations with TIME == 10
, you can try summarising after you group by ID
and then using a right join如果在 model 拟合之后您仍然需要
TIME == 10
的观察结果,您可以尝试在按ID
分组后进行汇总,然后使用右连接
data %>%
filter(TIME != 10) %>%
group_by(ID) %>%
summarise(THAFL = abs(lm(CONC ~ TIME)$coefficients[2])) %>%
right_join(data, by = "ID")
# A tibble: 30 x 16
ID THAFL Sex Weight..kg. Height..cm. Age..yrs. T134A A443G G769C G955C A990C TIME CONC LBM `data_combine$ID` CMAX
<chr> <dbl> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <chr> <chr> <dbl>
1 pat1 0.00975 F 50 135 47 0 2 1 2 0 10 0 Under pat1 60
2 pat1 0.00975 F 50 135 47 0 2 1 2 0 20 6.93 Under pat1 60
3 pat1 0.00975 F 50 135 47 0 2 1 2 0 30 12.2 Under pat1 60
4 pat1 0.00975 F 50 135 47 0 2 1 2 0 45 14.8 Under pat1 60
5 pat1 0.00975 F 50 135 47 0 2 1 2 0 60 15.0 Under pat1 60
6 pat1 0.00975 F 50 135 47 0 2 1 2 0 90 12.4 Under pat1 60
7 pat1 0.00975 F 50 135 47 0 2 1 2 0 120 9.00 Under pat1 60
8 pat1 0.00975 F 50 135 47 0 2 1 2 0 150 6.22 Under pat1 60
9 pat1 0.00975 F 50 135 47 0 2 1 2 0 180 4.18 Under pat1 60
10 pat1 0.00975 F 50 135 47 0 2 1 2 0 240 1.82 Under pat1 60
# ... with 20 more rows
If after the model fitting you don't want the rows with TIME == 10
to appear on your dataset, you can use mutate
如果在 model 拟合之后,您不希望
TIME == 10
的行出现在数据集上,则可以使用mutate
data %>%
filter(TIME != 10) %>%
group_by(ID) %>%
mutate(THAFL = abs(lm(CONC ~ TIME)$coefficients[2]))
# A tibble: 28 x 16
# Groups: ID [2]
ID Sex Weight..kg. Height..cm. Age..yrs. T134A A443G G769C G955C A990C TIME CONC LBM `data_combine$ID` CMAX THAFL
<chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 pat1 F 50 135 47 0 2 1 2 0 20 6.93 Under pat1 60 0.00975
2 pat2 M 75 175 29 0 2 0 0 0 20 6.78 Under pat2 60 0.00835
3 pat1 F 50 135 47 0 2 1 2 0 30 12.2 Under pat1 60 0.00975
4 pat2 M 75 175 29 0 2 0 0 0 30 11.6 Above pat2 60 0.00835
5 pat1 F 50 135 47 0 2 1 2 0 45 14.8 Under pat1 60 0.00975
6 pat2 M 75 175 29 0 2 0 0 0 45 13.5 Under pat2 60 0.00835
7 pat1 F 50 135 47 0 2 1 2 0 60 15.0 Under pat1 60 0.00975
8 pat2 M 75 175 29 0 2 0 0 0 60 13.1 Above pat2 60 0.00835
9 pat1 F 50 135 47 0 2 1 2 0 90 12.4 Under pat1 60 0.00975
10 pat2 M 75 175 29 0 2 0 0 0 90 9.77 Under pat2 60 0.00835
# ... with 18 more rows
You can use broom
:您可以使用
broom
:
library(broom)
library(dplyr)
#Code
data %>% group_by(ID) %>%
filter(TIME!=10) %>%
do(fit = tidy(lm(CONC ~ TIME, data = .))) %>%
unnest(fit) %>%
filter(term=='TIME') %>%
mutate(estimate=abs(estimate))
Output: Output:
# A tibble: 2 x 6
ID term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 pat1 TIME 0.00975 0.00334 -2.92 0.0128
2 pat2 TIME 0.00835 0.00313 -2.67 0.0204
If joining with original data is needed, try:如果需要加入原始数据,请尝试:
#Code 2
data <- data %>% left_join(data %>% group_by(ID) %>%
filter(TIME!=10) %>%
do(fit = tidy(lm(CONC ~ TIME, data = .))) %>%
unnest(fit) %>%
filter(term=='TIME') %>%
mutate(estimate=abs(estimate)) %>%
select(c(ID,estimate)))
Similar to @RicS .类似于@RicS 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.