R-如何根據另一列的條件通過計算在數據框中創建新列

Question

在一個項目中，我在CT掃描儀中測量了不同偏離中心位置（列=偏離中心）（工作台高度）處腫瘤的碘濃度（列= ROI_IC）。 我知道每種腫瘤的真實濃度（列= Real_IC；有4種不同的腫瘤具有4種不同的real_IC濃度）。 在每個偏心位置測量每個腫瘤10次（列= Measurement_repeat）。 我計算了測得的碘濃度與實際碘濃度之間的絕對誤差（列= absError_IC）

這只是數據的開頭：

Offcenter Measurement_repeat Real_IC ROI_IC absError_IC
1          0                  1     0.0    0.4         0.4
2          0                  2     0.0    0.3         0.3
3          0                  3     0.0    0.3         0.3
4          0                  4     0.0    0.0         0.0
5          0                  5     0.0    0.0         0.0
6          0                  6     0.0   -0.1         0.1
7          0                  7     0.0   -0.2         0.2
8          0                  8     0.0   -0.2         0.2
9          0                  9     0.0   -0.1         0.1
10         0                 10     0.0    0.0         0.0
11         0                  1     0.4    0.4         0.0
12         0                  2     0.4    0.3         0.1
13         0                  3     0.4    0.2         0.2
14         0                  4     0.4    0.0         0.4
15         0                  5     0.4    0.0         0.4
16         0                  6     0.4   -0.1         0.5
17         0                  7     0.4    0.1         0.3
18         0                  8     0.4    0.3         0.1
19         0                  9     0.4    0.6         0.2
20         0                 10     0.4    0.7         0.3

現在，我想創建一個名為corrError_IC的新列。
在此列中，應基於在Offcenter = 0針對該特定Real_IC濃度發現的平均絕對誤差（10次測量的平均值）來校正測得的碘濃度（ROI_IC）。

因為有4種腫瘤濃度，所以我想在其他偏心值上應用偏心= 0處的4個平均值。

mean1=mean of the 10 absError-IC measurements of the `Real_IC=0`

mean2=mean of the 10 absError-IC measurements of the `Real_IC=0.4`

mean3=mean of the 10 absError-IC measurements of the `Real_IC=3`

mean4=mean of the 10 absError-IC measurements of the `Real_IC=5`

基本上，我想要某個特定腫瘤在Offcenter = 0的平均絕對誤差（有4種不同的腫瘤類型，具有四個不同的Real_IC），然后我想通過從該偏移得出的絕對誤差值來校正其他Offcenter位置上的所有腫瘤。 Offcenter = 0數據。

我嘗試了ifelse語句，但無法弄清楚。

編輯：偏心具有特定級別： c(-6,-4,-3,-2,-1,0,1,2,3,4,6)

Answer 1

這是我將如何解決此問題的方法。

計算由absError_IC分組的Real_IC 。
左連接原始data.frame與分組平均值

代碼示例

## replicate sample data sets
ROI_IC = c(0.4, 0.3, 0.3, 0.0, 0.0, -0.1, -0.2, -0.2, -0.1, 0.0, 
           0.4, 0.3, 0.2, 0.0, 0.0, -0.1, 0.1, 0.3, 0.6, 0.7)
df = data.frame("Offcenter"=rep(0, 40),
                "Measurement_repeat"=rep( c(1:10), 4),
                "Real_IC"=rep( c(0,0.4,3,5), each=10), 
                "ROI_IC"=rep(ROI_IC, 2), 
                stringsAsFactors=F)
df$absError_IC = abs(df$Real_IC - df$ROI_IC)

## compute mean of "absError_IC" grouped by "Real_IC"
mean_values = aggregate(df[df$Offcenter==0, c("absError_IC")], 
                        by=list("Real_IC"=df$Real_IC),
                        FUN=mean)
names(mean_values)[which(names(mean_values)=="x")] = "MAE"

## left join to append column
df = merge(df, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)
## notice that column order shifts based on "key"
df[c(1:5, 10:15), ]

我建議使用data.table包，該包在需要處理大數據時特別有用。

library(data.table)
## dt = data.table(df) or dt = fread(<path>)
## dt[dt$Offcenter==0, c("absError_IC") := abs(dt$Real_IC - dt$ROI_IC)]

## compute grouped mean
mean_values = dt[, j=list("MAE"=mean(absError_IC)), by=list(Real_IC)]

## left join
dt = merge(dt, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)

Answer 2

考慮ave進行內聯聚合，其中第一個參數是數值字段，下一個參數是分組字段，而最后一個需要命名參數FUN參數是數字函數： ave(num_vector, ..., FUN=func) 。

df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=mean))

要處理NA，請將function參數擴展為na.rm參數：

df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=function(x) mean(x, na.rm=TRUE))

Answer 3

我找到了一種方法來計算所需的數據，方法是創建一個額外的列，以偏心= 0的情況從4個Real_IC級別獲取平均絕對誤差，並在Real_IC具有一定級別時將它們匹配。 在第二步中，我從ROI_IC中減去這些。 但是，如何將代碼簡化為更通用的形式（此刻，我將根據其行位置計算平均absErrors）？ 抱歉，我是一個絕對的初學者；（

注意：我的data.frame稱為“ ds_M”

#Define absolute errors for the 4 Real_IC levels as variables

average1<-mean(ds_M$absError_IC[1:10]) #for Real_IC=0
average2<-mean(ds_M$absError_IC[11:20]) #for Real_IC=0.4
average3<-mean(ds_M$absError_IC[21:30]) #for Real_IC=3
average4<-mean(ds_M$absError_IC[31:40]) #for Real_IC=5

# New column assigning the correction factor to each Real_IC level
ds_M$absCorr[ds_M$Real_IC==0]<-average1
ds_M$absCorr[ds_M$Real_IC==0.4]<-average2
ds_M$absCorr[ds_M$Real_IC==3]<-average3
ds_M$absCorr[ds_M$Real_IC==5]<-average4

# Calculate new column with corrected ROI_ICs
ds_M$corrError_IC<-ds_M$ROI_IC - ds_M$absCorr

R-如何根據另一列的條件通過計算在數據框中創建新列

問題描述

3 個解決方案

解決方案1
0 2019-07-20 17:40:49

解決方案2
0 2019-07-20 18:22:37

解決方案3
0 2019-07-21 09:56:31

R-如何根據另一列的條件通過計算在數據框中創建新列

問題描述

3 個解決方案

解決方案1 0 2019-07-20 17:40:49

解決方案2 0 2019-07-20 18:22:37

解決方案3 0 2019-07-21 09:56:31

解決方案1
0 2019-07-20 17:40:49

解決方案2
0 2019-07-20 18:22:37

解決方案3
0 2019-07-21 09:56:31