简体   繁体   English

R中带有神经网络()的行权重

[英]row weights with neuralnet() in R

messing with different models in R with data that has a variable amount of observations per row.将 R 中的不同模型与每行具有可变观察量的数据混为一谈。 training data looks like this:训练数据如下所示:

df <- data.frame('name' = c('Jimmy','Greg','Alice','Alice'),
           'year' = c(2020, 2020, 2020, 2021),
           'high_jump_average' = c(24, 22, 18, 19),
           'high_jump_tests' = c(15, 8, 1, 9),
           'max_squat' = c(405, 365, 245, 265),
           'weight' = c(218, 212, 165, 168))

Trying to predict high_jump_average for each person, based on their weight and max_squat at the beginning of the year (many more features, just keeping question simple)尝试根据年初的体重和 max_squat 预测每个人的 high_jump_average(更多功能,只是保持问题简单)

Each person performs a variable amount of high_jump_tests each year.每个人每年都会进行不同数量的 high_jump_tests。 I am attempting to predict their high_jump_average along these (currently unknown) number of high_jump_tests that they will take that year我正在尝试根据他们当年将要进行的这些(目前未知的)high_jump_tests 数量来预测他们的 high_jump_average

In the model, I want to weight the rows based on how many jump_tests they took that year.在模型中,我想根据他们当年进行的 jump_tests 对行进行加权。 Eg in this example, I would want the model to understand Alice's 2020 results (just 1 test/observation) were not as informative as Jimmy's 2020 results (15 tests/observations)例如,在此示例中,我希望模型能够理解 Alice 的 2020 年结果(仅 1 次测试/观察)不如 Jimmy 的 2020 年结果(15 次测试/观察)提供的信息量大

with linear regression, I can use the argument:使用线性回归,我可以使用以下参数:

lm(high_jump_average ~ max_squat + weight,
data = df,
weights = high_jump_tests)

for this.为了这。 But with neuralnet() I'm not sure how to express this -- currently, I am just duplicating rows based on number of jump_tests observed:但是对于 neuralnet() 我不确定如何表达这一点——目前,我只是根据观察到的 jump_tests 的数量来复制行:

df <- df %>%
uncount(high_jump_tests)

and then normalizing all of the data, and running the neuralnet() with this new dataframe然后对所有数据进行归一化,并使用这个新数据帧运行神经网络()

But this increases the number of rows considerably, and makes the neuralnet take over an hour to finish.但这会大大增加行数,并使神经网络需要一个多小时才能完成。

Is there a way to do this without duplicating the rows?有没有办法在不复制行的情况下做到这一点? An argument within neuralnet() ?神经网络()中的一个论点?

Thanks in advance!提前致谢!

You could assign your weights in the startweights argument:您可以在startweights参数中分配权重:

a vector containing starting values for the weights.包含权重起始值的向量。 Set to NULL for random initialization.设置为 NULL 以进行随机初始化。

You could use this reproducible code:您可以使用此可重现的代码:

df <- data.frame('name' = c('Jimmy','Greg','Alice','Alice'),
                 'year' = c(2020, 2020, 2020, 2021),
                 'high_jump_average' = c(24, 22, 18, 19),
                 'high_jump_tests' = c(15, 8, 1, 9),
                 'max_squat' = c(405, 365, 245, 265),
                 'weight' = c(218, 212, 165, 168))

library(neuralnet)
neuralnet(high_jump_average ~ max_squat + weight, data = df, startweights = c(df$high_jump_tests))
#> $call
#> neuralnet(formula = high_jump_average ~ max_squat + weight, data = df, 
#>     startweights = c(df$high_jump_tests))
#> 
#> $response
#>   high_jump_average
#> 1                24
#> 2                22
#> 3                18
#> 4                19
#> 
#> $covariate
#>      max_squat weight
#> [1,]       405    218
#> [2,]       365    212
#> [3,]       245    165
#> [4,]       265    168
#> 
#> $model.list
#> $model.list$response
#> [1] "high_jump_average"
#> 
#> $model.list$variables
#> [1] "max_squat" "weight"   
#> 
#> 
#> $err.fct
#> function (x, y) 
#> {
#>     1/2 * (y - x)^2
#> }
#> <bytecode: 0x7f7e6876c2e8>
#> <environment: 0x7f7e6876eb70>
#> attr(,"type")
#> [1] "sse"
#> 
#> $act.fct
#> function (x) 
#> {
#>     1/(1 + exp(-x))
#> }
#> <bytecode: 0x7f7e68760fc0>
#> <environment: 0x7f7e687645d8>
#> attr(,"type")
#> [1] "logistic"
#> 
#> $linear.output
#> [1] TRUE
#> 
#> $data
#>    name year high_jump_average high_jump_tests max_squat weight
#> 1 Jimmy 2020                24              15       405    218
#> 2  Greg 2020                22               8       365    212
#> 3 Alice 2020                18               1       245    165
#> 4 Alice 2021                19               9       265    168
#> 
#> $exclude
#> NULL
#> 
#> $net.result
#> $net.result[[1]]
#>          [,1]
#> [1,] 20.75207
#> [2,] 20.75207
#> [3,] 20.75207
#> [4,] 20.75207
#> 
#> 
#> $weights
#> $weights[[1]]
#> $weights[[1]][[1]]
#>           [,1]
#> [1,] 0.9346973
#> [2,] 0.5025252
#> [3,] 0.7208604
#> 
#> $weights[[1]][[2]]
#>           [,1]
#> [1,]  9.094366
#> [2,] 11.657705
#> 
#> 
#> 
#> $generalized.weights
#> $generalized.weights[[1]]
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    0    0
#> [3,]    0    0
#> [4,]    0    0
#> 
#> 
#> $startweights
#> $startweights[[1]]
#> $startweights[[1]][[1]]
#>           [,1]
#> [1,] 0.9346973
#> [2,] 0.5025252
#> [3,] 0.7208604
#> 
#> $startweights[[1]][[2]]
#>            [,1]
#> [1,] -0.6453341
#> [2,]  1.9180050
#> 
#> 
#> 
#> $result.matrix
#>                                        [,1]
#> error                          1.137501e+01
#> reached.threshold              8.283793e-03
#> steps                          1.140000e+02
#> Intercept.to.1layhid1          9.346973e-01
#> max_squat.to.1layhid1          5.025252e-01
#> weight.to.1layhid1             7.208604e-01
#> Intercept.to.high_jump_average 9.094366e+00
#> 1layhid1.to.high_jump_average  1.165770e+01
#> 
#> attr(,"class")
#> [1] "nn"

Created on 2022-07-23 by the reprex package (v2.0.1)reprex 包(v2.0.1)于 2022-07-23 创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM