简体   繁体   English

基于字符串是否包含所述标识符的新标识符列到数据框

[英]New identifier column to dataframe based on whether string contains said identifier

I am an absolute novice to R. What I would like to achieve is to have an identifier added to each dataframe row based on whether a string value in the same row contains that identifier.我是 R 的绝对新手。我想要实现的是根据同一行中的字符串值是否包含该标识符,将标识符添加到每个数据框行。

Assume dataframe:假设数据框:

df <- data.frame(Code = c("DE8230", "18FR16", "2UK34", "45BE87C", "1894DE56", "AB12FR", "ES12456"),
                 Type = c("A", "B", "C", "C", "E", "A", "C"),
                 Value = c(12, 14, 8, 20, 21, 16, 5))


      Code Type Value
1   DE8230    A    12
2   18FR16    B    14
3    2UK34    C     8
4  45BE87C    C    20
5 1894DE56    E    21
6   AB12FR    A    16
7  ES12456    C     5

I want to add a country column based on whether an identifier (eg DE, FR, UK, BE, ES) is present in the column 'Code' and than to list that country.我想根据标识符(例如 DE、FR、UK、BE、ES)是否存在于“代码”列中来添加一个国家/地区列,而不是列出该国家/地区。

What I tried:我尝试了什么:

identifiers <- c("DE", "FR", "UK") #identifiers of choice

df <- mutate(df, country = 0)

for (i in 1:length(identifiers)){
  df <- mutate(df,
          country = ifelse(grepl(identifiers[i], Code), identifiers[i], country)
  )
}

Which yields:哪个产量:

      Code Type Value country
1   DE8230    A    12      DE
2   18FR16    B    14      FR
3    2UK34    C     8      UK
4 1894DE56    C    20      DE
5   AB12FR    E    21      FR

Although this works, I think there must be a much more elegant solution omitting the for loop and just using same dplyr statement.虽然这可行,但我认为必须有一个更优雅的解决方案,省略 for 循环并只使用相同的 dplyr 语句。 However, I have not been able to figure it out.但是,我一直无法弄清楚。

Nb: It is important that the mentioned identifiers are listed in a separate vector or list and not part of the mutate statement. Nb:重要的是,提到的标识符列在单独的向量或列表中,而不是 mutate 语句的一部分。 This is just a hypothetical example, datasets and number of identifiers are much larger.这只是一个假设的例子,数据集和标识符的数量要大得多。

We may use str_extract by paste ing the identifiers as a single string with |我们可以通过将标识符作为单个字符串与| paste来使用str_extract separator and extract those substring from the 'Code'分隔符并从“代码”中提取那些子字符串

library(dplyr)
library(stringr)
df %>% 
  mutate(country = str_extract(Code, str_c(identifiers, collapse = "|"))) %>% 
   drop_na(country)

-output -输出

      Code Type Value country
1   DE8230    A    12      DE
2   18FR16    B    14      FR
3    2UK34    C     8      UK
4 1894DE56    E    21      DE
5   AB12FR    A    16      FR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于重复列创建基于列的唯一标识符的新列 - create new column of unique identifier based off column with duplicates R:识别数据框中的列名是否包含字符串 - R: To identify whether column names in a dataframe contains string 根据自定义标识符对数据框中的行进行排序 - Sorting rows in dataframe based on custom identifier 使用管道中的标识符列创建一个具有最大值的新列 - Create a new column with max values using the identifier column within a pipeline 如何通过创建唯一标识符来传播两列数据框? - How to spread two column dataframe with creating a unique identifier? 取散布着行的凌乱数据框(即一行记录每个观察块)并将 bookend 行移动到一个新列作为标识符? - Take messy dataframe with rows interspersed (i.e. a row bookending each chunk of observations) and move bookend row into a new column as identifier? 根据 R 中的前一个数字列向数据框添加新的字符串列 - Adding a new string column to a dataframe based on a previous numeric column in R 基于匹配其他列的部分字符串在数据框中创建新列 - Create new column in dataframe based on partial string matching other column 基于标识符的 Grep 行 - Grep lines based on identifier 根据标识符标记范围 - Mark range based on identifier
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM