[英]Handling numbers within character strings in R
I have the following (numbers-containing) character vector: 我有以下(包含数字的)字符向量:
nums = c("1, 2", "1, 2, 4", "2, 4", "1, 2, 3, 4, 5", "2, 3, 5", NA, NA, NA, NA)
I want to set an algorithm that test if n
subset of elements within nums
contain n
unique numbers and then remove those numbers from other elements. 我想设置测试,如果一个算法
n
内的元素的子集nums
包含n
独特的数字,然后从其他元素中删除这些数字。 Where n
is any number from 1
to 9
. 其中
n
是1
到9
任何数字。
In the example above, as the first 3
elements contain only 3
numbers: 1, 2, 4
, these numbers should be removed from other elements. 在上面的示例中,由于前
3
元素仅包含3
数字: 1, 2, 4
,因此应从其他元素中删除这些数字。 So the output would be like: 所以输出将是这样的:
nums = c("1, 2", "1, 2, 4", "2, 4", "3, 5", "3, 5", NA, NA, NA, NA)
Note that it could be 2
elements having 2
unique numbers or 4
elements having 4
unique numbers, ... etc. 请注意,它可以是具有
2
唯一编号的2
元素或具有4
唯一编号的4
元素,等等。
I'd like to keep the final output as a character vector of the same length as the original. 我想将最终输出保留为长度与原始长度相同的字符向量。
If I understand well, u can apply something like the following: 如果我理解得很好,则可以应用以下内容:
library(stringr)
library(readr)
library(purrr)
nums = c("1, 2", "1, 2, 4", "2, 4", "1, 2, 3, 4, 5", "2, 3, 5", NA, NA, NA, NA)
# create a list within each element is a character element of nums
num_into_list <- stringr::str_split(nums, ",")
# convert to numbers
num_into_list <- purrr::map(num_into_list, readr::parse_number)
# collect unique numbers from the nth first subset of the list (example 3)
not_allowed <- unique(unlist(num_into_list[1:3]))
# filter only values on the rest of the subset that doesn't contain
# values in not_allowed vector, using a logical subsetting operation
# inside of anonymous function (purrr shortcut to create this)
output_list <- c(num_into_list[1:3], # first 3 subset are the same
purrr::map(num_into_list[4:9], ~ .[!(. %in% not_allowed)]))
# finally convert into a chr vector
output <- unlist(output_list)
You can make a function with the above code if parametrize the nth first subset to create the not_allowed
vector and the length of your vector to then reconstitute the list (in the output_list
step indexation). 如果对第n个第一个子集进行参数化以创建
not_allowed
向量,然后使用向量的长度来重构列表(在output_list
步骤索引中),则可以使用上述代码创建not_allowed
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.