[英]Scanning and Replacing Values of Rows in R
I have this dataset:我有这个数据集:
sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA", "125 ocean road") )
col1 col2 col3 col4
james 123 forest road NA NA
john jason 124 valley street NA
henry tim peter 125 ocean road
I want to try and figure out a way in which the second column always contains the "address" - the final product would look this would look something like this :我想尝试找出第二列始终包含“地址”的方法 - 最终产品看起来像这样:
# code to show sample of desired result
desired_result = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "124 valley street", "125 ocean road"))
col1 col2
james 123 forest road
john 124 valley street
henry 125 ocean road
I have been trying to think of and research functions in R that are able to "scan" if the value contained within a row/column starts with a number, and make a decision accordingly.我一直在尝试思考和研究 R 中的功能,如果行/列中包含的值以数字开头,则它们能够“扫描”,并做出相应的决定。
I had the following idea - I can check to see if a given column starts with a number or not:我有以下想法 - 我可以检查给定列是否以数字开头:
sample_data$is_col2_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col2,1,1))
sample_data$is_col3_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col3,1,1))
sample_data$is_col4_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col4,1,1))
col1 col2 col3 col4 is_col2_a_number is_col3_a_number is_col4_a_number
1 james 123 forest road NA NA TRUE FALSE FALSE
2 john jason 124 valley street NA FALSE TRUE FALSE
3 henry tim peter 125 ocean road FALSE FALSE TRUE
Next, I would try to figure out how to code the following logic:接下来,我将尝试弄清楚如何编写以下逻辑:
I tried this row-by-row:我逐行尝试了这个:
first_row = sample_data[1,]
ifelse(first_row$is_col2_a_number == "TRUE", first_row[,c(1,2)], ifelse(first_row$is_col3_a_number, first_row[, c(1,3)], first_row[, c(1,4)]))
But I think I have made this unnecessarily complicated.但我认为我把这变得不必要地复杂了。 Can someone please give me a hand and suggest how I can continue solving this problem?有人可以帮帮我并建议我如何继续解决这个问题吗?
Thank you谢谢
This should work:这应该有效:
library(dplyr)
library(tidyr)
library(stringr)
sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA", "125 ocean road") )
tmp <- sample_data %>%
mutate(across(col2:col4, ~case_when(str_detect(.x, "^\\d") ~ .x,
TRUE ~ NA_character_)),
address = coalesce(col2, col3, col4)) %>%
select(col1, address)
tmp
#> col1 address
#> 1 james 123 forest road
#> 2 john 124 valley street
#> 3 henry 125 ocean road
Created on 2022-06-30 by the reprex package (v2.0.1)由代表 package (v2.0.1) 于 2022 年 6 月 30 日创建
I thought of a (very ineffective) way to solve my own problem我想到了一个(非常无效的)方法来解决我自己的问题
sample_data = data.frame(col1 = c("james", "john", "henry"), col2 = c("123 forest road", "jason", "tim"), col3 = c("NA", "124 valley street", "peter"), col4 = c("NA", "NA", "125 ocean road") )
sample_data$is_col2_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col2,1,1))
sample_data$is_col3_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col3,1,1))
sample_data$is_col4_a_number = grepl("^[0-9]{1,}$", substr(sample_data$col4,1,1))
a1 <- sample_data[which(sample_data$is_col2_a_number == "TRUE"), ]
a1 <- a1[,c(1,2)]
colnames(a1)[2] <- "i"
b1 <- sample_data[which(sample_data$is_col3_a_number == "TRUE"), ]
b1 <- b1[,c(1,3)]
colnames(b1)[2] <- "i"
c1 <- sample_data[which(sample_data$is_col4_a_number == "TRUE"), ]
c1 <- c1[,c(1,4)]
colnames(c1)[2] <- "i"
final = rbind(a1,b1,c1)
Here is the desired output:这是所需的 output:
col1 i
1 james 123 forest road
2 john 124 valley street
3 henry 125 ocean road
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.