![](/img/trans.png)
[英]R: Filtering and adding a repeating identical value in a column for hundreds of rows in a dataframe
[英]Filtering Rows by Matching Column Value to Other Column Value in R
我对R很陌生,所以这可能比预期的要容易,我可能想得太多。 假设我有一个data.frame(df),我想从另一列中选择符合条件的行,但是最重要的是,我需要该条件对组是唯一的。 例如:
Column1 Column2 Column3
Name1 Some Val Criteria1
Name1 Unwanted Also Unwanted
Name2 Some Val2 Criteria2
Name2 Unwanted Also Unwanted
这可能会造成混淆。 但基本上,我想根据每个名称的匹配条件选择每个Some Val,所以我希望它是:
Column1 Column2 Column3
Name1 Some Val1 Criteria1
Name2 Some Val2 Criteria2
问题是,如果只选择几个名称,就很容易做到。 但是我有成千上万,这意味着要写出成千上万的名字和数千种不同的标准。
使用dplyr
您可以
library(dplyr)
df %>%
group_by(Column1) %>%
filter(str_detect(Column2, "Some Val"))
## A tibble: 2 x 3
## Groups: Column1 [2]
# Column1 Column2 Column3
# <fct> <fct> <fct>
#1 Name1 Some Val Criteria1
#2 Name2 Some Val2 Criteria2
df <- read.table(text =
"Column1 Column2 Column3
Name1 'Some Val' Criteria1
Name1 Unwanted 'Also Unwanted'
Name2 'Some Val2' Criteria2
Name2 Unwanted 'Also Unwanted'", header = T)
如果要基于特定于组的条件从组中选择行,则需要某种对象来指定每个组的条件。 您可以使用data.frame(以下代码中的criteria_by_group
)执行此操作。
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
df <- tribble(
~group_col, ~value_col, ~criteria_col,
"Name1", "Some Val", "Criteria1",
"Name1", "Unwanted", "Not Criteria1",
"Name2", "Some Val2", "Criteria2",
"Name2", "Unwanted", "Not Criteria2"
)
criteria_by_group <- tribble(
~group_col, ~group_criteria,
"Name1", "Criteria1",
"Name2", "Criteria2"
)
df <- left_join(df, criteria_by_group, by = "group_col")
df
#> # A tibble: 4 x 4
#> group_col value_col criteria_col group_criteria
#> <chr> <chr> <chr> <chr>
#> 1 Name1 Some Val Criteria1 Criteria1
#> 2 Name1 Unwanted Not Criteria1 Criteria1
#> 3 Name2 Some Val2 Criteria2 Criteria2
#> 4 Name2 Unwanted Not Criteria2 Criteria2
df %>%
group_by(group_col) %>%
filter(criteria_col == group_criteria[1])
#> # A tibble: 2 x 4
#> # Groups: group_col [2]
#> group_col value_col criteria_col group_criteria
#> <chr> <chr> <chr> <chr>
#> 1 Name1 Some Val Criteria1 Criteria1
#> 2 Name2 Some Val2 Criteria2 Criteria2
由reprex软件包 (v0.2.1)创建于2019-02-27
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.