R中行的扫描和替换值

Question

I have this dataset:我有这个数据集：

sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA",  "125 ocean road") )

   col1            col2              col3           col4
 james 123 forest road                NA             NA
 john           jason 124 valley street             NA
 henry             tim             peter 125 ocean road

I want to try and figure out a way in which the second column always contains the "address" - the final product would look this would look something like this :我想尝试找出第二列始终包含“地址”的方法 - 最终产品看起来像这样：

# code to show sample of desired result
 desired_result = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "124 valley street", "125 ocean road"))

   col1              col2
 james   123 forest road
  john 124 valley street
 henry    125 ocean road

I have been trying to think of and research functions in R that are able to "scan" if the value contained within a row/column starts with a number, and make a decision accordingly.我一直在尝试思考和研究 R 中的功能，如果行/列中包含的值以数字开头，则它们能够“扫描”，并做出相应的决定。

I had the following idea - I can check to see if a given column starts with a number or not:我有以下想法 - 我可以检查给定列是否以数字开头：

sample_data$is_col2_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col2,1,1))
sample_data$is_col3_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col3,1,1))
sample_data$is_col4_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col4,1,1))

   col1            col2              col3           col4 is_col2_a_number is_col3_a_number is_col4_a_number
1 james 123 forest road                NA             NA             TRUE            FALSE            FALSE
2  john           jason 124 valley street             NA            FALSE             TRUE            FALSE
3 henry             tim             peter 125 ocean road            FALSE            FALSE             TRUE

Next, I would try to figure out how to code the following logic:接下来，我将尝试弄清楚如何编写以下逻辑：

For a given row, find the first cell that contains the value TRUE对于给定的行，找到第一个包含值 TRUE 的单元格
Keep the column corresponding to that condition保持与该条件相对应的列

I tried this row-by-row:我逐行尝试了这个：

first_row = sample_data[1,]

ifelse(first_row$is_col2_a_number == "TRUE", first_row[,c(1,2)], ifelse(first_row$is_col3_a_number, first_row[, c(1,3)], first_row[, c(1,4)]))

But I think I have made this unnecessarily complicated.但我认为我把这变得不必要地复杂了。 Can someone please give me a hand and suggest how I can continue solving this problem?有人可以帮帮我并建议我如何继续解决这个问题吗？

Thank you谢谢

Answer 1

This should work:这应该有效：

library(dplyr)
library(tidyr)
library(stringr)
sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA",  "125 ocean road") )

tmp <- sample_data %>% 
  mutate(across(col2:col4, ~case_when(str_detect(.x, "^\\d") ~ .x, 
                                      TRUE ~ NA_character_)), 
  address = coalesce(col2, col3, col4)) %>% 
  select(col1, address)
tmp
#>    col1           address
#> 1 james   123 forest road
#> 2  john 124 valley street
#> 3 henry    125 ocean road

^{Created on 2022-06-30 by the reprex package (v2.0.1)}^{由代表 package (v2.0.1) 于 2022 年 6 月 30 日创建}

Answer 2

I thought of a (very ineffective) way to solve my own problem我想到了一个（非常无效的）方法来解决我自己的问题

sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA",  "125 ocean road") )


sample_data$is_col2_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col2,1,1))
sample_data$is_col3_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col3,1,1))
sample_data$is_col4_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col4,1,1))

a1 <- sample_data[which(sample_data$is_col2_a_number == "TRUE"), ]
a1 <- a1[,c(1,2)]
colnames(a1)[2] <- "i"

b1 <- sample_data[which(sample_data$is_col3_a_number == "TRUE"), ]
b1 <- b1[,c(1,3)]
colnames(b1)[2] <- "i"

c1 <- sample_data[which(sample_data$is_col4_a_number == "TRUE"), ]
c1 <- c1[,c(1,4)]
colnames(c1)[2] <- "i"

final = rbind(a1,b1,c1)

Here is the desired output:这是所需的 output：

   col1                 i
1 james   123 forest road
2  john 124 valley street
3 henry    125 ocean road

R中行的扫描和替换值

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-06-30 23:43:52

解决方案2
1 2022-07-01 02:37:00

R中行的扫描和替换值

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-06-30 23:43:52

解决方案2 1 2022-07-01 02:37:00

解决方案1
2 已采纳 2022-06-30 23:43:52

解决方案2
1 2022-07-01 02:37:00