简体   繁体   English

根据另一个数据框中的列值从 R 中的数据框中删除行

[英]Delete rows from a data frame in R based on column values in another data frame

I have two data frames as follows:我有两个数据框如下:

df1 <- data.frame(fruit=c("apple", "blackberry", "orange", "pear", "grape"), 
color=c("black", "purple", "blue", "green", "red"), 
quantity1=c(1120, 7600, 21409, 120498, 25345), 
quantity2=c(1200, 7898, 21500, 140985, 27098), 
taste=c("sweet", "bitter", "sour", "salty", "spicy"))

df2 <- data.frame(fruit=c("apple", "orange", "pear"), 
color=c("black", "yellow", "green"), 
quantity=c(1145, 65094, 120500))

I would like to delete rows in df1 based on rows in df2, they must match all 3 conditions:我想根据 df2 中的行删除 df1 中的行,它们必须匹配所有 3 个条件:

  1. The fruit name must match水果名称必须匹配
  2. The color must match颜色必须匹配
  3. The quantity in df2 must be a value in between the two quantities in df1 df2 中的数量必须是 df1 中两个数量之间的值

The output for my example should look like:我的示例中的 output 应如下所示:

df3 <- data.frame(fruit=c("blackberry", "orange", "grape"), 
color=c("purple", "blue", "red"), 
quantity1=c(7600, 21409, 25345), 
quantity2=c(21500, 7898, 27098), 
taste=c("bitter", "sour", "spicy"))

I wonder if tidyverse could be also used:我想知道是否也可以使用tidyverse

library(tidyverse)
df1 %>%
  left_join(df2, by = c("fruit", "color")) %>%
  filter(is.na(quantity) | quantity <= quantity1 | quantity >= quantity2)
  
#>        fruit  color quantity1 quantity2  taste quantity
#> 1 blackberry purple      7600      7898 bitter       NA
#> 2     orange   blue     21409     21500   sour       NA
#> 3      grape    red     25345     27098  spicy       NA

With data.table , we can use a non-equi join使用data.table ,我们可以使用非等连接

library(data.table)
setDT(df1)[!df2, on = .(fruit, color, quantity1 <= quantity,
           quantity2 >= quantity)]
#        fruit  color quantity1 quantity2  taste
#1: blackberry purple      7600      7898 bitter
#2:     orange   blue     21409     21500   sour
#3:      grape    red     25345     27098  spicy

Or using the same methodology with fuzzy_anti_join as showed in this post或者使用与fuzzy_anti_join相同的方法,如本文所示

You can use fuzzy_anti_join from fuzzyjoin package:您可以使用来自fuzzy_anti_join fuzzyjoin的fuzzy_anti_join:

fuzzyjoin::fuzzy_anti_join(df1, df2, 
     by = c('fruit', 'color','quantity1' = 'quantity', 'quantity2' = 'quantity'), 
     match_fun = list(`==`, `==`, `<=`, `>=`))

# A tibble: 3 x 5
#  fruit      color  quantity1 quantity2 taste 
#  <chr>      <chr>      <dbl>     <dbl> <chr> 
#1 blackberry purple      7600      7898 bitter
#2 orange     blue       21409     21500 sour  
#3 grape      red        25345     27098 spicy 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:从一个数据框中提取行,基于列名匹配另一个数据框中的值 - R: Extract Rows from One Data Frame, Based on Column Names Matching Values from Another Data Frame R:如何根据给定列的值删除数据框的行 - R: How to delete rows of a data frame based on the values of a given column R:根据来自另一个数据框的匹配行更新列 - R: Update column based on matching rows from another data frame 对于R中的data.frame,根据来自另一个数据帧的值从一个数据帧提取数据 - For data.frame in R, pulling data from one data frame based on values from another data frame 从数据框中删除其列值与另一个数据框的列值不匹配的数据 - R - remove rows from data frame whose column values don't match another data frame's column values - R 根据R中另一列的值乘以数据框列的值 - Multiplying data frame column values based on the value of another column in R 如果一列值基于 R 数据帧中的另一列匹配,则过滤行 - filter rows if one column values matches based on another column in R data frame R:根据另一列操作一个数据框列的值 - R: Manipulate values of one data frame column based on another column 基于另一列中的值对 R 数据帧中的行进行矢量化重新编码 - Vectorized recoding of rows in R data frame based on value in another column 根据数据框R中的另一列从另一列中提取一列的值 - Extract values for a column from another column based on another column in data frame R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM