繁体   English   中英

通过将列值与R中的其他列值进行匹配来过滤行

[英]Filtering Rows by Matching Column Value to Other Column Value in R

我对R很陌生,所以这可能比预期的要容易,我可能想得太多。 假设我有一个data.frame(df),我想从另一列中选择符合条件的行,但是最重要的是,我需要该条件对组是唯一的。 例如:

Column1    Column2    Column3 
Name1      Some Val   Criteria1
Name1      Unwanted   Also Unwanted
Name2      Some Val2  Criteria2
Name2      Unwanted   Also Unwanted

这可能会造成混淆。 但基本上,我想根据每个名称的匹配条件选择每个Some Val,所以我希望它是:

Column1    Column2    Column3
Name1      Some Val1  Criteria1
Name2      Some Val2  Criteria2

问题是,如果只选择几个名称,就很容易做到。 但是我有成千上万,这意味着要写出成千上万的名字和数千种不同的标准。

使用dplyr您可以

library(dplyr)
df %>%
    group_by(Column1) %>%
    filter(str_detect(Column2, "Some Val"))
## A tibble: 2 x 3
## Groups:   Column1 [2]
#  Column1 Column2   Column3
#  <fct>   <fct>     <fct>
#1 Name1   Some Val  Criteria1
#2 Name2   Some Val2 Criteria2

样本数据

df <- read.table(text =
    "Column1    Column2    Column3
Name1      'Some Val'   Criteria1
Name1      Unwanted   'Also Unwanted'
Name2      'Some Val2'  Criteria2
Name2      Unwanted   'Also Unwanted'", header = T)

如果要基于特定组的条件从组中选择行,则需要某种对象来指定每个组的条件。 您可以使用data.frame(以下代码中的criteria_by_group )执行此操作。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)

df <- tribble(
  ~group_col, ~value_col, ~criteria_col,
  "Name1", "Some Val", "Criteria1",
  "Name1", "Unwanted", "Not Criteria1",
  "Name2", "Some Val2", "Criteria2", 
  "Name2", "Unwanted", "Not Criteria2"
)

criteria_by_group <- tribble(
  ~group_col, ~group_criteria,
  "Name1", "Criteria1",
  "Name2", "Criteria2"
)

df <- left_join(df, criteria_by_group, by = "group_col")

df
#> # A tibble: 4 x 4
#>   group_col value_col criteria_col  group_criteria
#>   <chr>     <chr>     <chr>         <chr>         
#> 1 Name1     Some Val  Criteria1     Criteria1     
#> 2 Name1     Unwanted  Not Criteria1 Criteria1     
#> 3 Name2     Some Val2 Criteria2     Criteria2     
#> 4 Name2     Unwanted  Not Criteria2 Criteria2

df %>%
  group_by(group_col) %>%
  filter(criteria_col == group_criteria[1])
#> # A tibble: 2 x 4
#> # Groups:   group_col [2]
#>   group_col value_col criteria_col group_criteria
#>   <chr>     <chr>     <chr>        <chr>         
#> 1 Name1     Some Val  Criteria1    Criteria1     
#> 2 Name2     Some Val2 Criteria2    Criteria2

reprex软件包 (v0.2.1)创建于2019-02-27

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM