简体   繁体   中英

Summing R Matrix ignoring NA's

I have the following claim counts data (triangular) by limits:

claims=matrix(c(2019,690,712,NA,773,574,NA,NA,232),nrow=3, byrow=T) 

What would be the most elegant way to do the following simple things resembling Excel's sumif() :

  1. put the matrix into as.data.frame() with column names: "100k", "250k", "500k"
  2. sum all numbers except first row; (in this case summing 773,574, and 232). I am looking for a neat reference so I can easily generalize the notation to larger claim triangles.

Sum all numbers, ignoring the NA's. sum(claims, na.rm = T) - Thanks for Gregor's suggestion. *I played around with the package ChainLadder a bit and enjoyed how it handles triangular data, especially in plotting and calculating link ratios. I wonder more generally if basic R suffices in doing some quick and dirty sumif() or pairwise link ratio kind of calculations? This would be a bonus for me if anyone out there could dispense some words of wisdom.

Thank you!

claims=matrix(c(2019,690,712,NA,773,574,NA,NA,232),nrow=3, byrow=T) 
claims.df = as.data.frame(claims)
names(claims.df) <- c("100k", "250k", "500k")
# This isn't the best idea because standard column names don't start with numbers
# If you go non-standard, you'll have to always quote them, that is
claims.df$100k   # doesn't work
claims.df$`100k` # works    

# sum everything
sum(claims, na.rm = T)

# sum everything except for first row
sum(claims[-1, ], na.rm = T)

It's much easier to give specific advice to specific questions than general advice. As to " I wonder more generally if basic R suffices in doing some quick and dirty sumif() or pairwise link ratio kind of calculations?", at least as to the sumif comment, I'm reminded of fortunes::fortune(286)

...this is kind of like asking "will your Land Rover make it up my driveway?", but I'll assume the question was asked in all seriousness.

sum adds up whatever numbers you give it. Subsetting based on logicals so simple that there is no need for a separate sumif function. Say you have x = rnorm(100) , y = runif(100) .

# sum x if x > 0
sum(x[x > 0])

# sum x if y < 0.5
sum(x[y < 0.5])

# sum x if x > 0 and y < 0.5
sum(x[x > 0 & y < 0.5])

# sum every other x
sum(x[c(T, F)]

# sum all but the first 10 and last 10 x
sum(x[-c(1:10, 91:100)]

I don't know what a pairwise link ratio is, but I'm willing to bet base R can handle it easily.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM