简体   繁体   English

R - 基于列字符串值的多列的加权平均值

[英]R - Weighted Mean by row for multiple columns based on columns string values

I have a data.frame "DF" of 2020 observations and 79066 variables.我有一个包含 2020 年观测值和 79066 个变量的 data.frame “DF”。 The first column is the "Year" spanning continuously from 1 to 2020, the others variables are the values.第一列是从 1 到 2020 连续跨越的“年份”,其他变量是值。

In the first instance, I did an average by row in order to have one mean value per year.在第一种情况下,我逐行进行平均,以便每年获得一个平均值。

Eg例如

Aver <- apply(DF[,2:79066], 1, mean, na.rm=TRUE)

However, I would like to do a weighted average and the weight values differ based on columns string values.但是,我想做一个加权平均,权重值因列字符串值而异。

The header name of the variables is "Year" (first column) followed by 79065 columns, where the name of each column is composed of a string that starts from 50 to 300, followed by ".R" repeated from 1 to 15 times, and the ".yr" from 10 to 30. This brings 251(50-300) x 15(R) x 21(10-30) = 79065 columns Eg : "Year", "50.R1.10.yr", "50.R1.11.yr", "50.R1.12.yr", ... "50.R1.30.yr", "51.R1.10.yr", "51.R1.11.yr", "51.R1.12.yr", ... "51.R1.30.yr", ..."300.R1.10.yr", "300.R1.11.yr", "300.R1.12.yr", ... "300.R1.30.yr", "50.R2.10.yr", "50.R2.11.yr", "50.R2.12.yr", ... "50.R2.30.yr", "51.R2.10.yr", "51.R2.11.yr", "51.R2.12.yr", ... "51.R2.30.yr", ..."300.R2.10.yr", "300.R2.11.yr", "300.R2.12.yr", ... "300.R2.30.yr", ... "50.R15.10.yr", "50.R15.11.yr", "50.R15.12.yr", ... "300.R15.30.yr".变量的 header 名称为“Year”(第一列)后跟 79065 列,其中每列的名称由从 50 到 300 开始的字符串组成,后面是重复 1 到 15 次的“.R”,以及从 10 到 30 的“.yr”。这带来 251(50-300) x 15(R) x 21(10-30) = 79065 列 例如:“Year”、“50.R1.10.yr”、 “50.R1.11.yr”、“50.R1.12.yr”、...“50.R1.30.yr”、“51.R1.10.yr”、“51.R1.11.年”、“51.R1.12.yr”、...“51.R1.30.yr”、...“300.R1.10.yr”、“300.R1.11.yr”、“ 300.R1.12.yr", ... "300.R1.30.yr", "50.R2.10.yr", "50.R2.11.yr", "50.R2.12.yr ", ... "50.R2.30.yr", "51.R2.10.yr", "51.R2.11.yr", "51.R2.12.yr", ... "51 .R2.30.yr", ..."300.R2.10.yr", "300.R2.11.yr", "300.R2.12.yr", ..."300.R2.30 .yr”,...“50.R15.10.yr”,“50.R15.11.yr”,“50.R15.12.yr”,...“300.R15.30.yr”。

The weight I would like to assign to each column is based on the string values 50 to 300. I would like to give more weight to values on the column "50."我想分配给每列的权重基于字符串值 50 到 300。我想为“50”列上的值赋予更多权重。 and following a power function, less weight to "300.".和下面一个电源function,重量减轻到“300.”。

The equation fitting my values is a power function: y = 2305.2*x^-1.019.适合我的值的方程是幂 function:y = 2305.2*x^-1.019。

Eg例如

av.classes <- data.frame(av=seq(50, 300, 1))
library(dplyr)
av.classes.weight <- av.classes %>% mutate(weight = 2305.2*av^-1.019)

Thank you for any help.感谢您的任何帮助。

I guess you could get your weight vector like this:我想你可以像这样得到你的权重向量:

library(tidyverse)

weights_precursor <- str_split(names(data)[-1], pattern = "\\.", n = 2, simplify = TRUE)[, 1] %>% 
  as.numeric()

weights <- 2305.2 * weights_precursor ^ -1.019

Setting up some sample data:设置一些示例数据:

DF <- data.frame(year=2020,`50.R1.10.yr`=1,`300.R15.30.yr`=10)
names(DF) <- stringr::str_remove(names(DF),"X")

Getting numerical vector:获取数值向量:

weights <- stringr::str_split(names(DF),"\\.")
weights <- sapply(1:length(weights),function(x) weights[[x]][1])[-1]
as.numeric(weights)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM