简体   繁体   English

列值取决于其他列中的非 NA 条目

[英]Column Values Depend on Non-NA entries in Other Columns

I am in need of a way to devise a column (data_sequence) that uses the following logic as based off values in other columns (col_1:col_5).我需要一种方法来 devise 一列 (data_sequence),它使用以下逻辑作为其他列 (col_1:col_5) 中的值的基础。

  1. The column elements in col_1:col_5 contain both 1) values and 2) NA entries. col_1:col_5 中的列元素包含 1) 值和 2) NA 条目。

  2. The flow of construction of the 'data_sequence' column moves from right to left within col_1:col_5. “data_sequence”列的构造流程在 col_1:col_5 中从右向左移动。

  3. Initially, 'data_sequence' assumes the values of the rightmost column (col_5) until the first instance of NA is hit in that column.最初,“data_sequence”采用最右边列 (col_5) 的值,直到在该列中命中第一个 NA 实例。

  4. col_4 then becomes the relevant column upon which to harvest data. col_4 然后成为收集数据的相关列。 col_4 values are then assumed for 'data_sequence' in corresponding rows until its first instance of NA appears.然后在相应行中为“data_sequence”假定 col_4 值,直到它的第一个 NA 实例出现。

The process continues through col_1, at which point 'data_sequence' is fully populated with the appropriate values.该过程继续通过 col_1,此时“data_sequence”已完全填充适当的值。

The values present in col_1:col_5 stair-step downward from right to left as this sample data indicates.如本示例数据所示,col_1:col_5 中的值从右到左逐步向下。 That is, values in adjacent columns may begin in the same row, but values never begin at a lower row value in the left column of any adjacent pair of columns.也就是说,相邻列中的值可能从同一行开始,但值永远不会从任何相邻列对的左列中较低的行值开始。

Once 'data_sequence' is populated, I also need a column (column_offset) that provides the column offset relative to the first column (row).填充“data_sequence”后,我还需要一个列 (column_offset),它提供相对于第一列(行)的列偏移量。

Any solutions, elegant or otherwise, are greatly appreciated.非常感谢任何优雅或其他解决方案。

在此处输入图像描述

sixteen_tons <- tibble(row = 1:12,
               col_1 = c( rep(NA, 5),  1:7), 
               col_2 = c( rep(NA, 3) , 15:21, rep(NA, 2) ),
               col_3 = c( rep(NA, 2) , 33:39, rep(NA, 3) ),
               col_4 = c( rep(NA, 2) , 55:59, rep(NA, 5) ),
               col_5 = c( 91:93, rep(NA, 9) ),
               data_sequence = c(91:93, 56:59, 38:39, 21, 6:7),
               column_offset = c(rep(5,3), rep(4,4), rep(3,2), rep(2,1), rep(1,2)   )
)

We could use coalesce with max.col我们可以使用coalescemax.col

library(dplyr)
library(purrr)
sixteen_tons %>% 
  mutate(data_sequence2 = invoke(coalesce, 
     rev(across(starts_with('col_')))),
    column_offset2 = max.col(!is.na(across(starts_with('col_'))), 'last'))

-output -输出

# A tibble: 12 × 10
     row col_1 col_2 col_3 col_4 col_5 data_sequence column_offset data_sequence2 column_offset2
   <int> <int> <int> <int> <int> <int>         <dbl>         <dbl>          <int>          <int>
 1     1    NA    NA    NA    NA    91            91             5             91              5
 2     2    NA    NA    NA    NA    92            92             5             92              5
 3     3    NA    NA    33    55    93            93             5             93              5
 4     4    NA    15    34    56    NA            56             4             56              4
 5     5    NA    16    35    57    NA            57             4             57              4
 6     6     1    17    36    58    NA            58             4             58              4
 7     7     2    18    37    59    NA            59             4             59              4
 8     8     3    19    38    NA    NA            38             3             38              3
 9     9     4    20    39    NA    NA            39             3             39              3
10    10     5    21    NA    NA    NA            21             2             21              2
11    11     6    NA    NA    NA    NA             6             1              6              1
12    12     7    NA    NA    NA    NA             7             1              7              1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM