根据R中的部分匹配重新编码变量

Question

This question probably has a simple answer, so I apologize in advance. 这个问题可能有一个简单的答案，所以我先向您道歉。 I would like to use R to recode the values in v2, df1 and make them look like the data values in v2, df2. 我想使用R重新编码v2，df1中的值，并使它们看起来像v2，df2中的数据值。 Is it possible to do this using a partial match of the values in v2, df1 that contain, say 'Y' and recode those values to 'Yr' as in v2, df2? 是否可以使用v2，df1中包含“ Y”的值的部分匹配来做到这一点，并像v2，df2中那样将这些值重新编码为“ Yr”？

> df1
  v1   v2
1  1 Yr01
2  2 Yr02
3  3 Yr03
4  4 Yr04
5  5 Yr05

> df2
  v1 v2
1  1 Yr
2  2 Yr
3  3 Yr
4  4 Yr
5  5 Yr
>

Answer 1

You can use grepl() to generate a vector of booleans depending on whatever you define as your regex. 您可以使用grepl()生成布尔向量，具体取决于定义为正则表达式的内容。 See here for details on an easy intro to regex: http://www.regular-expressions.info/tutorial.html 有关正则表达式的简单介绍，请参见此处： http : //www.regular-expressions.info/tutorial.html

df1 <- read.table(text = "
  v1   v2
  1  1 Yr01
  2  2 Yr02
  3  3 Yr03
  4  4 Yr04
  5  5 Yr05", 
  header = TRUE, stringsAsFactors = FALSE)

df1[grepl("Y", df1$v2), "v2"] <- "Yr"
> df1

  v1 v2
1  1 Yr
2  2 Yr
3  3 Yr
4  4 Yr
5  5 Yr

If your data is a factor, you can convert to character first, then use the code above. 如果您的数据是一个因素，则可以先转换为字符，然后使用上面的代码。

Answer 2

我认为这对您有用，但是根据您有多少个组和数据框的大小，可能会有更好的方法：

df1$v2 <- ifelse(grepl("Y", df1$v2), "Yr", df1$v2)

Answer 3

正则表达式的另一种用法。

df1$v2 <- gsub("Y.*","Yr", df1$v2)

根据R中的部分匹配重新编码变量

问题描述

3 个解决方案

解决方案1
5 2012-04-14 03:32:29

解决方案2
0 2012-04-14 03:34:11

解决方案3
0 2012-04-14 13:19:22

根据R中的部分匹配重新编码变量

问题描述

3 个解决方案

解决方案1 5 2012-04-14 03:32:29

解决方案2 0 2012-04-14 03:34:11

解决方案3 0 2012-04-14 13:19:22

解决方案1
5 2012-04-14 03:32:29

解决方案2
0 2012-04-14 03:34:11

解决方案3
0 2012-04-14 13:19:22