[英]Column Values Depend on Non-NA entries in Other Columns
I am in need of a way to devise a column (data_sequence) that uses the following logic as based off values in other columns (col_1:col_5).我需要一种方法来 devise 一列 (data_sequence),它使用以下逻辑作为其他列 (col_1:col_5) 中的值的基础。
The column elements in col_1:col_5 contain both 1) values and 2) NA entries. col_1:col_5 中的列元素包含 1) 值和 2) NA 条目。
The flow of construction of the 'data_sequence' column moves from right to left within col_1:col_5. “data_sequence”列的构造流程在 col_1:col_5 中从右向左移动。
Initially, 'data_sequence' assumes the values of the rightmost column (col_5) until the first instance of NA is hit in that column.最初,“data_sequence”采用最右边列 (col_5) 的值,直到在该列中命中第一个 NA 实例。
col_4 then becomes the relevant column upon which to harvest data. col_4 然后成为收集数据的相关列。 col_4 values are then assumed for 'data_sequence' in corresponding rows until its first instance of NA appears.
然后在相应行中为“data_sequence”假定 col_4 值,直到它的第一个 NA 实例出现。
The process continues through col_1, at which point 'data_sequence' is fully populated with the appropriate values.该过程继续通过 col_1,此时“data_sequence”已完全填充适当的值。
The values present in col_1:col_5 stair-step downward from right to left as this sample data indicates.如本示例数据所示,col_1:col_5 中的值从右到左逐步向下。 That is, values in adjacent columns may begin in the same row, but values never begin at a lower row value in the left column of any adjacent pair of columns.
也就是说,相邻列中的值可能从同一行开始,但值永远不会从任何相邻列对的左列中较低的行值开始。
Once 'data_sequence' is populated, I also need a column (column_offset) that provides the column offset relative to the first column (row).填充“data_sequence”后,我还需要一个列 (column_offset),它提供相对于第一列(行)的列偏移量。
Any solutions, elegant or otherwise, are greatly appreciated.非常感谢任何优雅或其他解决方案。
sixteen_tons <- tibble(row = 1:12,
col_1 = c( rep(NA, 5), 1:7),
col_2 = c( rep(NA, 3) , 15:21, rep(NA, 2) ),
col_3 = c( rep(NA, 2) , 33:39, rep(NA, 3) ),
col_4 = c( rep(NA, 2) , 55:59, rep(NA, 5) ),
col_5 = c( 91:93, rep(NA, 9) ),
data_sequence = c(91:93, 56:59, 38:39, 21, 6:7),
column_offset = c(rep(5,3), rep(4,4), rep(3,2), rep(2,1), rep(1,2) )
)
We could use coalesce
with max.col
我们可以使用
coalesce
和max.col
library(dplyr)
library(purrr)
sixteen_tons %>%
mutate(data_sequence2 = invoke(coalesce,
rev(across(starts_with('col_')))),
column_offset2 = max.col(!is.na(across(starts_with('col_'))), 'last'))
-output -输出
# A tibble: 12 × 10
row col_1 col_2 col_3 col_4 col_5 data_sequence column_offset data_sequence2 column_offset2
<int> <int> <int> <int> <int> <int> <dbl> <dbl> <int> <int>
1 1 NA NA NA NA 91 91 5 91 5
2 2 NA NA NA NA 92 92 5 92 5
3 3 NA NA 33 55 93 93 5 93 5
4 4 NA 15 34 56 NA 56 4 56 4
5 5 NA 16 35 57 NA 57 4 57 4
6 6 1 17 36 58 NA 58 4 58 4
7 7 2 18 37 59 NA 59 4 59 4
8 8 3 19 38 NA NA 38 3 38 3
9 9 4 20 39 NA NA 39 3 39 3
10 10 5 21 NA NA NA 21 2 21 2
11 11 6 NA NA NA NA 6 1 6 1
12 12 7 NA NA NA NA 7 1 7 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.