简体   繁体   中英

Comparing each row of one dataframe with a row in another dataframe using R

I'm relatively new to R and I have looked for an answer for my problem but didn't find one. I want to compare two dataframes.

library(dplyr)
library(gtools)

v1 <- LETTERS[1:10]

combinations_from_4_letters <- (as.data.frame(combinations(n = 10, r = 4, v = v1),
stringsAsFactors = FALSE))
combinations_from_4_letters$group <- rep(1:15, each = 14)
combinations_from_2_letters <- (as.data.frame(combinations(n = 10, r = 2, v = v1),
stringsAsFactors = FALSE))

Dataframe 'combinations_from_4_letters' contains all combinations that can be made from 10 letters without repetitions and permutations. The combinations are binned into groups from 1-15. I want to find out how often pairs of the 10 letters (saved in dataframe 'combinations_from_2_letters') are found in each group (basically a frequency table). I started doing a complicated loop looping through both dataframes but I think there must be a more 'R' solution to it, similar to comparing a dataframe and a vector like:

combinations_from_4_letters %in% combinations_from_2_letters[i,])

Thank you in advance for your help!

I recommend an approach like the following:

# adding dummy column for a complete cross-join
combinations_from_4_letters = combinations_from_4_letters %>%
  mutate(ones = 1)
combinations_from_2_letters = combinations_from_2_letters %>%
  mutate(ones = 1)

joined = combinations_from_2_letters %>%
  inner_join(combinations_from_4_letters, by = "ones") %>%
  # comparison goes here
  mutate(within = ifelse(comb2 %in% comb4, 1, 0)) %>%
  group_by(comb2) %>%
  summarise(freq = sum(within))

You'll probably need to modify to ensure it matches the exact column names and your comparison condition.

Key ideas:

  • adding filler column so we have a complete cross-join
  • mutate a new indicator column for whether the two letter pair is within the four letter pair
  • sum indicators on the two letter pair

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM