简体   繁体   English

每行的第一个非NA和最后一个非NA之间的差异

[英]difference between first non-NA and last non-NA in each row

I have a data frame with up to 5 measurements (x) and their corresponding time: 我有一个最多包含5个测量值(x)及其相应时间的数据框:

df = structure(list(x1 = c(92.9595722286402, 54.2085219673818, 
46.3227062573019, 
NA, 65.1501442134141, 49.736451235317), time1 = c(43.2715277777778, 
336.625, 483.975694444444, NA, 988.10625, 510.072916666667), 
x2 = c(82.8368681534474, 53.7981639701784, 12.9993531230419, 
NA, 64.5678816290574, 55.331442940348), time2 = c(47.8166666666667, 
732, 506.747222222222, NA, 1455.25486111111, 958.976388888889
), x3 = c(83.5433119686794, 65.723072881366, 19.0147593408309, 
NA, 65.1989838202356, 36.7000828457705), time3 = c(86.5888888888889, 
1069.02083333333, 510.275, NA, 1644.21527777778, 1154.95694444444
), x4 = c(NA, 66.008102917677, 40.6243513885846, NA, 62.1694420909955, 
29.0078249523063), time4 = c(NA, 1379.22986111111, 520.726388888889, 
NA, 2057.20833333333, 1179.86805555556), x5 = c(NA, 61.0047472617535, 
45.324715258421, NA, 59.862110645527, 45.883161439362), time5 = c(NA, 
1825.33055555556, 523.163888888889, NA, 3352.26944444444, 
1364.99513888889)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))

"NA" means that the person (row) didn't have a measurement. “ NA”表示该人(行)没有测量值。

I would like to calculate the difference between the last existing measurement and the first one. 我想计算最后一个现有度量与第一个度量之间的差异。

So for the first one it would be x3 minus x1 (6.4), for the second it would be -6.8 and so on. 因此,对于第一个,它将是x3减去x1(6.4),对于第二个将是-6.8,依此类推。

I tried something like this, which didnt work: 我尝试了类似的方法,但是没有用:

df$diff = apply(df %>% select(., contains("x")), 1, function(x) head(x, 
na.rm = T) - tail(x, na.rm=T))

Any suggestions? 有什么建议么? Also, is apply/rowwise the most efficient way, or is there a vectorized function to do that? 另外,应用/逐行是最有效的方法,还是有矢量化函数可以做到这一点?

A vectorized way would be using max.col where we get "first" and "last" non-NA value using ties.method parameter 一种矢量化方法是使用max.col ,其中我们使用ties.method参数获得"first""last"非NA值

#Get column number of first and last col
first_col <- max.col(!is.na(df[x_cols]), ties.method = "first")
last_col <- max.col(!is.na(df[x_cols]), ties.method = "last")

#subset the dataframe to include only `"x"` cols
new_df <- as.data.frame(df[grep("^x", names(df))])

#Subtract last non-NA value with the first one
df$new_calc <- new_df[cbind(1:nrow(df), last_col)] - 
               new_df[cbind(1:nrow(df), first_col)]

Using apply you could do 使用apply可以做到

x_cols <- grep("^x", names(df))

df$new_calc <- apply(df[x_cols], 1, function(x) {
    new_x <- x[!is.na(x)]
    if (length(new_x) > 0)
      new_x[length(new_x)] - new_x[1L]
    else NA
})

We can use tidyverse methods on the tbl_df . 我们可以在tbl_df上使用tidyverse方法。 Create a row names column ( rownames_to_column ), gather the 'x' columns to 'long' format while removing the NA elements ( na.rm = TRUE ), grouped by row name, get the diff erence of first and last 'val'ues and bind the extracted column with the original dataset 'df' 创建一个行的名称列( rownames_to_column ), gather的“X”列“长”格式,同时消除了NA元素( na.rm = TRUE ),按行名称分组,获得diff的erence firstlast “val'ues并将提取的列与原始数据集“ df”绑定

library(tidyverse)
rownames_to_column(df, 'rn') %>% 
    select(rn, starts_with('x')) %>% 
    gather(key, val, -rn, na.rm = TRUE) %>%
    group_by(rn) %>%
    summarise(Diff = diff(c(first(val), last(val)))) %>% 
    mutate(rn = as.numeric(rn)) %>%
    complete(rn = min(rn):max(rn)) %>% 
    pull(Diff) %>%
    bind_cols(df, new_col = .)
# A tibble: 6 x 11
#     x1 time1    x2  time2    x3  time3    x4 time4    x5 time5 new_col
#  <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
#1  93.0  43.3  82.8   47.8  83.5   86.6  NA     NA   NA     NA   -9.42 
#2  54.2 337.   53.8  732    65.7 1069.   66.0 1379.  61.0 1825.   6.80 
#3  46.3 484.   13.0  507.   19.0  510.   40.6  521.  45.3  523.  -0.998
#4  NA    NA    NA     NA    NA     NA    NA     NA   NA     NA   NA    
#5  65.2 988.   64.6 1455.   65.2 1644.   62.2 2057.  59.9 3352.  -5.29 
#6  49.7 510.   55.3  959.   36.7 1155.   29.0 1180.  45.9 1365.  -3.85 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM