简体   繁体   English

在R中处理字符串中的数字

[英]Handling numbers within character strings in R

I have the following (numbers-containing) character vector: 我有以下(包含数字的)字符向量:

nums = c("1, 2", "1, 2, 4", "2, 4", "1, 2, 3, 4, 5", "2, 3, 5", NA, NA, NA, NA)

I want to set an algorithm that test if n subset of elements within nums contain n unique numbers and then remove those numbers from other elements. 我想设置测试,如果一个算法n内的元素的子集nums包含n独特的数字,然后从其他元素中删除这些数字。 Where n is any number from 1 to 9 . 其中n19任何数字。

In the example above, as the first 3 elements contain only 3 numbers: 1, 2, 4 , these numbers should be removed from other elements. 在上面的示例中,由于前3元素仅包含3数字: 1, 2, 4 ,因此应从其他元素中删除这些数字。 So the output would be like: 所以输出将是这样的:

nums = c("1, 2", "1, 2, 4", "2, 4", "3, 5", "3, 5", NA, NA, NA, NA)

Note that it could be 2 elements having 2 unique numbers or 4 elements having 4 unique numbers, ... etc. 请注意,它可以是具有2唯一编号的2元素或具有4唯一编号的4元素,等等。

I'd like to keep the final output as a character vector of the same length as the original. 我想将最终输出保留为长度与原始长度相同的字符向量。

If I understand well, u can apply something like the following: 如果我理解得很好,则可以应用以下内容:

library(stringr)
library(readr)
library(purrr)
nums = c("1, 2", "1, 2, 4", "2, 4", "1, 2, 3, 4, 5", "2, 3, 5", NA, NA, NA, NA)

# create a list within each element is a character element of nums
num_into_list <- stringr::str_split(nums, ",")

# convert to numbers
num_into_list <- purrr::map(num_into_list, readr::parse_number)

# collect unique numbers from the nth first subset of the list (example 3)
not_allowed <- unique(unlist(num_into_list[1:3]))

# filter only values on the rest of the subset that doesn't contain
# values in not_allowed vector, using a logical subsetting operation
# inside of  anonymous function (purrr shortcut to create this)   
output_list <- c(num_into_list[1:3],   # first 3 subset are the same
                 purrr::map(num_into_list[4:9], ~ .[!(. %in% not_allowed)]))

# finally convert into a chr vector
output <- unlist(output_list)

You can make a function with the above code if parametrize the nth first subset to create the not_allowed vector and the length of your vector to then reconstitute the list (in the output_list step indexation). 如果对第n个第一个子集进行参数化以创建not_allowed向量,然后使用向量的长度来重构列表(在output_list步骤索引中),则可以使用上述代码创建not_allowed

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM