简体   繁体   English

根据第一列中的值,将函数应用于数据框中除第一列之外的所有行和列

[英]apply function to all rows and columns in data frame except first column based on value in first column

Hello and hopefully I can explain this somewhat simply. 你好,希望我可以简单地解释一下。 I know this can be done with a loop, but that will take forever and I need this analysis to be done as part of a web page, so some sort of apply function should work much better hopefully. 我知道这可以通过循环来完成,但这需要永远,我需要将此分析作为网页的一部分来完成,因此某种类型的应用函数应该可以更好地工作。

I have 2 data frames. 我有2个数据帧。 Data frame A has a list of distinct "anchors" and category values for each one (these are weighted mean values from an already-performed ddply). 数据框A具有每个的不同“锚点”和类别值的列表(这些是来自已执行的ddply的加权平均值)。

anchor     ecomax    ecomin     volume     price    runtime
1   9482 0.12981362 0.5714286 0.12981362 0.1324330 1.00000000
2   9488 0.01458662 0.5544864 0.01458662 0.2967270 0.04166667
3   9549 0.09734398 0.5721429 0.09734398 0.1219376 1.00000000
4   9574 0.00902656 0.5505136 0.00902656 0.1455307 0.14652568
5   9575 0.00902656 0.5505136 0.00902656 0.1460919 0.14652568
6   9576 0.07608863 0.5613563 0.07608863 0.1114813 1.00000000

Data fram B is a larger data frame of the same category values (ignore the names for now), but there are multiple entries for each anchor. 数据帧B是相同类别值的较大数据帧(暂时忽略名称),但每个锚点有多个条目。

  anchor ecomax_max_med ecomin_min_med volume_med price_med run_time_minimum_med
1   9482     0.12981362      0.5714286 0.12981362 0.1120882           1.00000000
2   9482     0.12981362      0.5714286 0.12981362 0.1686777           1.00000000
3   9488     0.01552049      0.5550000 0.01552049 0.2925363           0.04166667
4   9488     0.01292292      0.5535714 0.01292292 0.3041928           0.04166667
5   9549     0.09734398      0.5721429 0.09734398 0.1238916           1.00000000
6   9549     0.09734398      0.5721429 0.09734398 0.1184564           1.00000000

I want to subtract category values for B from their means (Data Frame A) based on its matching anchor; 我想基于其匹配锚从其均值(数据框A)中减去B的类别值; ie the first 2 rows of B (anchor 9482) will take differences from the first row of A (anchor 9482 mean), the next 2 rows of B (anchor 9488) will take differences from the next row of A (anchor 9488 mean), and so on. 即B的前2行(锚9482)将取A与A的第一行的差异(锚9482的意思),接下来的2行B(锚9488)将与A的下一行(锚9488的意思)取差异, 等等。

The end result is to have each row/column (other than anchor) of new Data Frame C be the difference between the values in Data Frame B and their corresponding anchor means (Data Frame A). 最终结果是使新数据帧C的每个行/列(除了锚点)成为数据帧B中的值与其对应的锚定装置(数据帧A)之间的差异。 I hope this is fairly straight-forward; 我希望这是相当直截了当的; it can easily be done with a lengthy loop. 它可以通过冗长的循环轻松完成。 I'm guessing that this requires some combination of "match" or "by", but I"m not sure and this has been extremely frustrating. Help! 我猜这需要“匹配”或“通过”的某种组合,但我不确定这是非常令人沮丧的。帮助!

Here is a data.table solution. 这是一个data.table解决方案。

It works by merging A and B by anchor (which is set as a key). 它通过anchor (将其设置为键)合并AB来工作。 It then evaluates the expression e which we have created to be 然后它评估我们创建的表达式e

list(ecomax_diff = ecomax_max_med - ecomax, ecomin_diff = ecomin_min_med - ecomin, volume_diff = volume_med - volume, price_diff = price_med - price, runtime_diff = run_time_minimum_med - runtime)

using mapply , sprintf and parse . 使用mapplysprintfparse

The solution depends on passing corresponding column names for each data.table to mapply. 解决方案取决于将每个data.table的相应列名称传递给mapply。

library(data.table)
DA <- data.table(A)
DB <- data.table(B)
setkey(DA, 'anchor')
setkey(DB, 'anchor')

.calls <- mapply(sprintf, as.list(names(DA)[-1]), 
  as.list(names(DB)[-1]), as.list(names(DA)[-1]), 
  MoreArgs = list(fmt = '%s_diff = %s - %s'))

.e <- parse(text = sprintf('list(%s)', paste(.calls, collapse =', ')))


DA[DB, eval(.e)]
##  anchor ecomax_diff ecomin_diff volume_diff price_diff runtime_diff
## 1:   9482  0.00000000   0.0000000  0.00000000 -0.0203448            0
## 2:   9482  0.00000000   0.0000000  0.00000000  0.0362447            0
## 3:   9488  0.00093387   0.0005136  0.00093387 -0.0041907            0
## 4:   9488 -0.00166370  -0.0009150 -0.00166370  0.0074658            0
## 5:   9549  0.00000000   0.0000000  0.00000000  0.0019540            0
## 6:   9549  0.00000000   0.0000000  0.00000000 -0.0034812            0

A second, less efficient but perhaps easier to follow solution 第二种,效率较低但可能更容易遵循的解决方案

 # calculate the difference between the respective columns (merged appropriately
 DIFF <- DB[, names(DB)[-1],with = F] - DA[DB][, names(DA)[-1], with = F]
 # combine with the anchor column from DB 
 DC <-  cbind(DB[,list(anchor)],DIFF)
 # rename with the names from A (otherwise they will have the same as DB
 setnames(DC, names(DA))
 # It creates the correct output !
 DC
 ##    anchor      ecomax      ecomin      volume      price      runtime
 ## 1:   9482  0.00000000   0.0000000  0.00000000 -0.0203448            0
 ## 2:   9482  0.00000000   0.0000000  0.00000000  0.0362447            0
 ## 3:   9488  0.00093387   0.0005136  0.00093387 -0.0041907            0
 ## 4:   9488 -0.00166370  -0.0009150 -0.00166370  0.0074658            0
 ## 5:   9549  0.00000000   0.0000000  0.00000000  0.0019540            0
 ## 6:   9549  0.00000000   0.0000000  0.00000000 -0.0034812            0
  • Note: This may become even straightforward if -.data.table ignores character columns in future versions 注意:如果-.data.table在将来的版本中忽略字符列,这可能会变得更加简单
datmer <- merge(datA, datB)
str(datmer)
#------------------    
'data.frame':   6 obs. of  11 variables:
 $ anchor              : int  9482 9482 9488 9488 9549 9549
 $ ecomax              : num  0.1298 0.1298 0.0146 0.0146 0.0973 ...
 $ ecomin              : num  0.571 0.571 0.554 0.554 0.572 ...
 $ volume              : num  0.1298 0.1298 0.0146 0.0146 0.0973 ...
 $ price               : num  0.132 0.132 0.297 0.297 0.122 ...
 $ runtime             : num  1 1 0.0417 0.0417 1 ...
 $ ecomax_max_med      : num  0.1298 0.1298 0.0155 0.0129 0.0973 ...
 $ ecomin_min_med      : num  0.571 0.571 0.555 0.554 0.572 ...
 $ volume_med          : num  0.1298 0.1298 0.0155 0.0129 0.0973 ...
 $ price_med           : num  0.112 0.169 0.293 0.304 0.124 ...
 $ run_time_minimum_med: num  1 1 0.0417 0.0417 1 ...

 datmer2 <- cbind(datmer[,1, drop=FALSE], 
                  as.matrix(datmer[, 2:6])  - as.matrix(datmer[7:11]) )
 datmer2
#--------
  anchor      ecomax     ecomin      volume      price runtime
1   9482  0.00000000  0.0000000  0.00000000  0.0203448       0
2   9482  0.00000000  0.0000000  0.00000000 -0.0362447       0
3   9488 -0.00093387 -0.0005136 -0.00093387  0.0041907       0
4   9488  0.00166370  0.0009150  0.00166370 -0.0074658       0
5   9549  0.00000000  0.0000000  0.00000000 -0.0019540       0
6   9549  0.00000000  0.0000000  0.00000000  0.0034812       0

If you wanted to use the differences in the order that @mnel did it (BA), you would also get the column names to be the same as those of the second dataframe: 如果您想按@mnel(BA)的顺序使用差异,您还可以获得与第二个数据帧的列名相同的列名:

 str( cbind(datmer[,1, drop=FALSE], as.matrix(datmer[7:11])  - as.matrix(datmer[2:6]) ) )
'data.frame':   6 obs. of  6 variables:
 $ anchor              : int  9482 9482 9488 9488 9549 9549
 $ ecomax_max_med      : num  0 0 0.000934 -0.001664 0 ...
 $ ecomin_min_med      : num  0 0 0.000514 -0.000915 0 ...
 $ volume_med          : num  0 0 0.000934 -0.001664 0 ...
 $ price_med           : num  -0.02034 0.03624 -0.00419 0.00747 0.00195 ...
 $ run_time_minimum_med: num  0 0 0 0 0 0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列的第一个值对所有行进行子集 - Subset All Rows Based on First Value of Column 使用apply suite根据一列中的值在数据帧的某些行上使用多个列执行函数 - Use apply suite to perform a function using multiple columns on certain rows of a data frame based on the values in one column R如何根据列的第一个字符删除数据框中的行 - R how to remove rows in a data frame based on the first character of a column 如何基于R中第一个数据帧的列值按行连接两个数据帧? - How to join two data frames by rows based on column value of first data frame in R? 将函数应用于数据框所有列的每一列的因子 - apply a function to factors of each column for all columns of a data frame R:根据输入值与其他列的接近程度从数据框中的第一列返回值 - R: return value from first column in data frame based on closeness of inputted values to toher columns 将列添加到仅包含其他行的第一个值的数据框中 - Add column to data frame containing the first value only of other rows 使用第一列条目作为新R数据框中的同名来转置数据框中的行和列 - Transpose rows and columns in data frame using first column entries as colnames in new R data frame 替换除第一列以外的数据框中的特定值 - Replace specific values in a data frame except first column 在R中,如何根据第一列的内容向数据帧添加一列? - In R, how to add a column to a data frame based on the contents of the first column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM