简体   繁体   English

R-如何根据另一列的条件通过计算在数据框中创建新列

[英]R - How to create a new column in a dataframe with calculations based on condition of another column

In a project, I measured the iodine concentration of tumors (column=ROI_IC) at different off center positions (column=Offcenter) (table heights) in a CT scanner. 在一个项目中,我在CT扫描仪中测量了不同偏离中心位置(列=偏离中心)(工作台高度)处肿瘤的碘浓度(列= ROI_IC)。 I know the true concentration of each of the tumors (column=Real_IC; there are 4 different tumors with 4 different real_IC concentrations). 我知道每种肿瘤的真实浓度(列= Real_IC;有4种不同的肿瘤具有4种不同的real_IC浓度)。 Each tumor was measured at each off-center position 10 times (column=Measurement_repeat). 在每个偏心位置测量每个肿瘤10次(列= Measurement_repeat)。 I calculated an absolute error between the measured iodine concentration and the real iodine concentration (column=absError_IC) 我计算了测得的碘浓度与实际碘浓度之间的绝对误差(列= absError_IC)

This is just the head of the data: 这只是数据的开头:

Offcenter Measurement_repeat Real_IC ROI_IC absError_IC
1          0                  1     0.0    0.4         0.4
2          0                  2     0.0    0.3         0.3
3          0                  3     0.0    0.3         0.3
4          0                  4     0.0    0.0         0.0
5          0                  5     0.0    0.0         0.0
6          0                  6     0.0   -0.1         0.1
7          0                  7     0.0   -0.2         0.2
8          0                  8     0.0   -0.2         0.2
9          0                  9     0.0   -0.1         0.1
10         0                 10     0.0    0.0         0.0
11         0                  1     0.4    0.4         0.0
12         0                  2     0.4    0.3         0.1
13         0                  3     0.4    0.2         0.2
14         0                  4     0.4    0.0         0.4
15         0                  5     0.4    0.0         0.4
16         0                  6     0.4   -0.1         0.5
17         0                  7     0.4    0.1         0.3
18         0                  8     0.4    0.3         0.1
19         0                  9     0.4    0.6         0.2
20         0                 10     0.4    0.7         0.3

Now I would like to create a new column called corrError_IC. 现在,我想创建一个名为corrError_IC的新列。
In this column, the measured iodine concentration (ROI_IC) should be corrected based on the mean absolute error (mean of 10 measurements) that was found for that specific Real_IC concentration at Offcenter = 0 在此列中,应基于在Offcenter = 0针对该特定Real_IC浓度发现的平均绝对误差(10次测量的平均值)来校正测得的碘浓度(ROI_IC)。

Because there are 4 tumor concentrations there are 4 mean values at Off-center =0 that I want to apply on the other off-center-values. 因为有4种肿瘤浓度,所以我想在其他偏心值上应用偏心= 0处的4个平均值。

mean1=mean of the 10 absError-IC measurements of the `Real_IC=0`

mean2=mean of the 10 absError-IC measurements of the `Real_IC=0.4`

mean3=mean of the 10 absError-IC measurements of the `Real_IC=3`

mean4=mean of the 10 absError-IC measurements of the `Real_IC=5`

Basically, I want the average absolute error for a specific tumor at Offcenter = 0 (there are 4 different tumor types with four different Real_IC) and then I want correct all tumors at the other Offcenter positions by this absolute error values that were derived from the Offcenter = 0 data. 基本上,我想要某个特定肿瘤在Offcenter = 0的平均绝对误差(有4种不同的肿瘤类型,具有四个不同的Real_IC),然后我想通过从该偏移得出的绝对误差值来校正其他Offcenter位置上的所有肿瘤。 Offcenter = 0数据。

I tried ifelse statements but I was not able to figure it out. 我尝试了ifelse语句,但无法弄清楚。

EDIT: Off-center has specific levels: c(-6,-4,-3,-2,-1,0,1,2,3,4,6) 编辑:偏心具有特定级别: c(-6,-4,-3,-2,-1,0,1,2,3,4,6)

Here is how I would approach this problem. 这是我将如何解决此问题的方法。

  1. compute mean of absError_IC grouped by Real_IC . 计算由absError_IC分组的Real_IC
  2. left join original data.frame with grouped mean 左连接原始data.frame与分组平均值

Code Example 代码示例

## replicate sample data sets
ROI_IC = c(0.4, 0.3, 0.3, 0.0, 0.0, -0.1, -0.2, -0.2, -0.1, 0.0, 
           0.4, 0.3, 0.2, 0.0, 0.0, -0.1, 0.1, 0.3, 0.6, 0.7)
df = data.frame("Offcenter"=rep(0, 40),
                "Measurement_repeat"=rep( c(1:10), 4),
                "Real_IC"=rep( c(0,0.4,3,5), each=10), 
                "ROI_IC"=rep(ROI_IC, 2), 
                stringsAsFactors=F)
df$absError_IC = abs(df$Real_IC - df$ROI_IC)

## compute mean of "absError_IC" grouped by "Real_IC"
mean_values = aggregate(df[df$Offcenter==0, c("absError_IC")], 
                        by=list("Real_IC"=df$Real_IC),
                        FUN=mean)
names(mean_values)[which(names(mean_values)=="x")] = "MAE"

## left join to append column
df = merge(df, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)
## notice that column order shifts based on "key"
df[c(1:5, 10:15), ]

在此处输入图片说明

I suggest using data.table package which is particularly useful when there is need to manipulate large data. 我建议使用data.table包,该包在需要处理大数据时特别有用。

library(data.table)
## dt = data.table(df) or dt = fread(<path>)
## dt[dt$Offcenter==0, c("absError_IC") := abs(dt$Real_IC - dt$ROI_IC)]

## compute grouped mean
mean_values = dt[, j=list("MAE"=mean(absError_IC)), by=list(Real_IC)]

## left join
dt = merge(dt, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)

Consider ave for inline aggregation where its first argument is the numeric quantity field, next arguments is grouping fields, and very last argument requiring named parameter, FUN , is the numeric function: ave(num_vector, ..., FUN=func) . 考虑ave进行内联聚合,其中第一个参数是数值字段,下一个参数是分组字段,而最后一个需要命名参数FUN参数是数字函数: ave(num_vector, ..., FUN=func)

df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=mean))

To handle NAs, extend the function argument for na.rm argument: 要处理NA,请将function参数扩展为na.rm参数:

df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=function(x) mean(x, na.rm=TRUE))

I found a way to compute what I want by creating an extra column taking the average absolute errors from the 4 Real_IC levels for Off-center = 0 and matching them whenever Real_IC has a certain level. 我找到了一种方法来计算所需的数据,方法是创建一个额外的列,以偏心= 0的情况从4个Real_IC级别获取平均绝对误差,并在Real_IC具有一定级别时将它们匹配。 In a second step, I subtract these from the ROI_ICs. 在第二步中,我从ROI_IC中减去这些。 However, how can I simplify that code to a more general form (at the moment I calculate the average absErrors based on their row location)? 但是,如何将代码简化为更通用的形式(此刻,我将根据其行位置计算平均absErrors)? Sorry I am an absolute beginner ;( 抱歉,我是一个绝对的初学者;(

Of note: My data.frame is called "ds_M" 注意:我的data.frame称为“ ds_M”

#Define absolute errors for the 4 Real_IC levels as variables

average1<-mean(ds_M$absError_IC[1:10]) #for Real_IC=0
average2<-mean(ds_M$absError_IC[11:20]) #for Real_IC=0.4
average3<-mean(ds_M$absError_IC[21:30]) #for Real_IC=3
average4<-mean(ds_M$absError_IC[31:40]) #for Real_IC=5

# New column assigning the correction factor to each Real_IC level
ds_M$absCorr[ds_M$Real_IC==0]<-average1
ds_M$absCorr[ds_M$Real_IC==0.4]<-average2
ds_M$absCorr[ds_M$Real_IC==3]<-average3
ds_M$absCorr[ds_M$Real_IC==5]<-average4

# Calculate new column with corrected ROI_ICs
ds_M$corrError_IC<-ds_M$ROI_IC - ds_M$absCorr

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于一个数据框中的列的值和R中另一个数据框的列标题名称有条件地创建新列 - how to conditionally create new column based on the values of a column in one dataframe and the column header names of another dataframe in R 基于另一个在 dataframe 中创建新列,并与 R 中的另一个数据集匹配 - Create new column in dataframe based on another and matching to another dataset in R 如何基于另一列的分组排列在数据框中创建新列 - How to create a new column in a dataframe based on grouped permutations of another column R:使用另一个数据框中的列名,条件和值在一个数据框中创建一个新列 - R: Create a new column in a dataframe, using column name, condition and value from another dataframe 根据R中的另一列创建一个新列 - Create a new column based on another column in R 如何基于R中的另一列创建具有多个值的新列 - How to create a new column with multiple values based on another column in R 如何根据R中的另一列内容创建新列 - How to create new column based on another column's contents in R 在一个 dataframe 中创建一个列,基于另一个 dataframe 在 R 中的另一列 - Create a column in one dataframe based on another column in another dataframe in R R根据if else条件创建新列 - R create new column based on if else condition 根据 R 中的某些条件创建新列 - Create new column based on some condition in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM