简体   繁体   English

根据其他两列中的值创建是/否列

[英]Create yes/no column based on values in two other columns

I have a dataset that looks like this:我有一个看起来像这样的数据集:

df <- structure(list(ID = 1:10, Region1 = c("Europe", "NA", 
"Asia", "NA", "Europe", "NA", "Africa", "NA", "Europe", "North America"), Region2 = c("NA", "Europe", 
"NA", "NA", "NA", "Europe", 
"NA", "NA", "NA", "NA"
)), 
class = "data.frame", row.names = c(NA, -10L))

I want to create a new column called EuropeYN which is either yes or no depending on whether EITHER of the region columns ( region1 or region2 ) include "Europe".我想创建一个名为EuropeYN的新列,根据区域列( region1region2 )中的任一个是否包含“欧洲”,它是是还是否。 The final data should look like this:最终数据应如下所示:

df <- structure(list(ID = 1:10, Region1 = c("Europe", "NA", 
"Asia", "NA", "Europe", "NA", "Africa", "NA", "Europe", "North America"), Region2 = c("NA", "Europe", 
"NA", "NA", "NA", "Europe", 
"NA", "NA", "NA", "NA"
), EuropeYN = c("yes", "yes", "no", "no", "yes", "yes", "no", "no", "yes", "no")), 
class = "data.frame", row.names = c(NA, -10L))

I know how to do this if it was just checking to see if "Europe" appears in one column, but have no idea how to do this when checking across multiple columns.如果只是检查“欧洲”是否出现在一列中,我知道如何执行此操作,但不知道在检查多列时如何执行此操作。 This is what I would do if it was just one column:如果它只是一列,我会这样做:

df$EuropeYN <- ifelse(grepl("Europe",df$region1), "yes", "no")

Any ideas on the best way to approach this?...关于解决这个问题的最佳方法的任何想法?...

A little late but maybe still worth a look:有点晚,但也许仍然值得一看:

library(dplyr)
library(stringr)
df %>%
  rowwise() %>%
  mutate(YN = +any(str_detect(c_across(Region1:Region2), 'Europe')))
# A tibble: 10 x 4
# Rowwise: 
      ID Region1       Region2    YN
   <int> <chr>         <chr>   <int>
 1     1 Europe        NA          1
 2     2 NA            Europe      1
 3     3 Asia          NA          0
 4     4 NA            NA          0
 5     5 Europe        NA          1
 6     6 NA            Europe      1
 7     7 Africa        NA          0
 8     8 NA            NA          0
 9     9 Europe        NA          1
10    10 North America NA          0

or, without + :或者,没有+

df %>%
   rowwise() %>%
   mutate(YN = any(str_detect(c_across(Region1:Region2), 'Europe')))
# A tibble: 10 x 4
# Rowwise: 
      ID Region1       Region2 YN   
   <int> <chr>         <chr>   <lgl>
 1     1 Europe        NA      TRUE 
 2     2 NA            Europe  TRUE 
 3     3 Asia          NA      FALSE
 4     4 NA            NA      FALSE
 5     5 Europe        NA      TRUE 
 6     6 NA            Europe  TRUE 
 7     7 Africa        NA      FALSE
 8     8 NA            NA      FALSE
 9     9 Europe        NA      TRUE 
10    10 North America NA      FALSE

If you have several columns across which you want to mutate you can use starts_with (or also contains or ends_with ) to address these columns:如果您有几列想要mutate您可以使用starts_with (或也containsends_with )来处理这些列:

df %>%
  rowwise() %>%
  mutate(YN = any(str_detect(c_across(starts_with('R')), 'Europe'))) 

我的方法与您的方法非常相似:

dplyr::mutate(df, EuropeYN = ifelse((Region1 == "Europe" | Region2 == "Europe"), "yes", "no"))

Two ways:两种方式:

  1. Literally check each of two columns:从字面上检查两列中的每一列:

     ifelse(df$Region1 == "Europe" | df$Region2 == "Europe", "yes", "no") # [1] "yes" "yes" "no" "no" "yes" "yes" "no" "no" "yes" "no"

    This has the advantage of being easier to read (subjective) and very clear.这具有更易于阅读(主观)且非常清晰的优点。

  2. Select a range of columns and look for equality:选择一系列列并寻找相等性:

     subset(df, select = Region1:Region2) == "Europe" # Region1 Region2 # 1 TRUE FALSE # 2 FALSE TRUE # 3 FALSE FALSE # 4 FALSE FALSE # 5 TRUE FALSE # 6 FALSE TRUE # 7 FALSE FALSE # 8 FALSE FALSE # 9 TRUE FALSE # 10 FALSE FALSE apply(subset(df, select = Region1:Region2) == "Europe", 1, any) # 1 2 3 4 5 6 7 8 9 10 # TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE

    This allows us to use 1 or more columns.这允许我们使用 1 个或多个列。

Either of those can be assigned back into the frame with df$EuropeYN <- ... .可以使用df$EuropeYN <- ...将其中任何一个分配回框架。

Here is a vectorized base R way.这是矢量化的基本 R 方式。

i <- rowSums(df[grep("Region", names(df))] == "Europe") > 0
df$EuropeYN <- c("no", "yes")[i + 1L]

We may use if_any here as a vectorized option in tidyverse我们可能会使用if_any这里的矢量选项tidyverse

library(dplyr)
library(stringr)
df %>%
     mutate(YN = if_any(starts_with("Region"), str_detect, 'Europe'))
   ID       Region1 Region2    YN
1   1        Europe      NA  TRUE
2   2            NA  Europe  TRUE
3   3          Asia      NA FALSE
4   4            NA      NA FALSE
5   5        Europe      NA  TRUE
6   6            NA  Europe  TRUE
7   7        Africa      NA FALSE
8   8            NA      NA FALSE
9   9        Europe      NA  TRUE
10 10 North America      NA FALSE

Or in base R或者在base R

df$YN <-  Reduce(`|`, lapply(df[startsWith(names(df), 'Region')], 
        `%in%`, 'Europe'))

NOTE: It is easier to subset with a logical flag instead of "Yes"/"No"注意:使用逻辑标志而不是"Yes"/"No"更容易进行子集化

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM