简体   繁体   中英

row weights with neuralnet() in R

messing with different models in R with data that has a variable amount of observations per row. training data looks like this:

df <- data.frame('name' = c('Jimmy','Greg','Alice','Alice'),
           'year' = c(2020, 2020, 2020, 2021),
           'high_jump_average' = c(24, 22, 18, 19),
           'high_jump_tests' = c(15, 8, 1, 9),
           'max_squat' = c(405, 365, 245, 265),
           'weight' = c(218, 212, 165, 168))

Trying to predict high_jump_average for each person, based on their weight and max_squat at the beginning of the year (many more features, just keeping question simple)

Each person performs a variable amount of high_jump_tests each year. I am attempting to predict their high_jump_average along these (currently unknown) number of high_jump_tests that they will take that year

In the model, I want to weight the rows based on how many jump_tests they took that year. Eg in this example, I would want the model to understand Alice's 2020 results (just 1 test/observation) were not as informative as Jimmy's 2020 results (15 tests/observations)

with linear regression, I can use the argument:

lm(high_jump_average ~ max_squat + weight,
data = df,
weights = high_jump_tests)

for this. But with neuralnet() I'm not sure how to express this -- currently, I am just duplicating rows based on number of jump_tests observed:

df <- df %>%
uncount(high_jump_tests)

and then normalizing all of the data, and running the neuralnet() with this new dataframe

But this increases the number of rows considerably, and makes the neuralnet take over an hour to finish.

Is there a way to do this without duplicating the rows? An argument within neuralnet() ?

Thanks in advance!

You could assign your weights in the startweights argument:

a vector containing starting values for the weights. Set to NULL for random initialization.

You could use this reproducible code:

df <- data.frame('name' = c('Jimmy','Greg','Alice','Alice'),
                 'year' = c(2020, 2020, 2020, 2021),
                 'high_jump_average' = c(24, 22, 18, 19),
                 'high_jump_tests' = c(15, 8, 1, 9),
                 'max_squat' = c(405, 365, 245, 265),
                 'weight' = c(218, 212, 165, 168))

library(neuralnet)
neuralnet(high_jump_average ~ max_squat + weight, data = df, startweights = c(df$high_jump_tests))
#> $call
#> neuralnet(formula = high_jump_average ~ max_squat + weight, data = df, 
#>     startweights = c(df$high_jump_tests))
#> 
#> $response
#>   high_jump_average
#> 1                24
#> 2                22
#> 3                18
#> 4                19
#> 
#> $covariate
#>      max_squat weight
#> [1,]       405    218
#> [2,]       365    212
#> [3,]       245    165
#> [4,]       265    168
#> 
#> $model.list
#> $model.list$response
#> [1] "high_jump_average"
#> 
#> $model.list$variables
#> [1] "max_squat" "weight"   
#> 
#> 
#> $err.fct
#> function (x, y) 
#> {
#>     1/2 * (y - x)^2
#> }
#> <bytecode: 0x7f7e6876c2e8>
#> <environment: 0x7f7e6876eb70>
#> attr(,"type")
#> [1] "sse"
#> 
#> $act.fct
#> function (x) 
#> {
#>     1/(1 + exp(-x))
#> }
#> <bytecode: 0x7f7e68760fc0>
#> <environment: 0x7f7e687645d8>
#> attr(,"type")
#> [1] "logistic"
#> 
#> $linear.output
#> [1] TRUE
#> 
#> $data
#>    name year high_jump_average high_jump_tests max_squat weight
#> 1 Jimmy 2020                24              15       405    218
#> 2  Greg 2020                22               8       365    212
#> 3 Alice 2020                18               1       245    165
#> 4 Alice 2021                19               9       265    168
#> 
#> $exclude
#> NULL
#> 
#> $net.result
#> $net.result[[1]]
#>          [,1]
#> [1,] 20.75207
#> [2,] 20.75207
#> [3,] 20.75207
#> [4,] 20.75207
#> 
#> 
#> $weights
#> $weights[[1]]
#> $weights[[1]][[1]]
#>           [,1]
#> [1,] 0.9346973
#> [2,] 0.5025252
#> [3,] 0.7208604
#> 
#> $weights[[1]][[2]]
#>           [,1]
#> [1,]  9.094366
#> [2,] 11.657705
#> 
#> 
#> 
#> $generalized.weights
#> $generalized.weights[[1]]
#>      [,1] [,2]
#> [1,]    0    0
#> [2,]    0    0
#> [3,]    0    0
#> [4,]    0    0
#> 
#> 
#> $startweights
#> $startweights[[1]]
#> $startweights[[1]][[1]]
#>           [,1]
#> [1,] 0.9346973
#> [2,] 0.5025252
#> [3,] 0.7208604
#> 
#> $startweights[[1]][[2]]
#>            [,1]
#> [1,] -0.6453341
#> [2,]  1.9180050
#> 
#> 
#> 
#> $result.matrix
#>                                        [,1]
#> error                          1.137501e+01
#> reached.threshold              8.283793e-03
#> steps                          1.140000e+02
#> Intercept.to.1layhid1          9.346973e-01
#> max_squat.to.1layhid1          5.025252e-01
#> weight.to.1layhid1             7.208604e-01
#> Intercept.to.high_jump_average 9.094366e+00
#> 1layhid1.to.high_jump_average  1.165770e+01
#> 
#> attr(,"class")
#> [1] "nn"

Created on 2022-07-23 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM