In each data frame row, match each column to a key in another data frame, and sums the values of the key in a new data frame

Question

I'm not quite sure how to word my question. I think that what I want to do is create a loop that takes each value in a data frame row, matches it to a key in another data frame, and sums the key values in each column of that row, storing it in a new data frame with the same dimensions of the key.

It should be much easier to explain using an example. I'm a complete novice to R and programming and am still learning the vocabulary.

I have a dataframe of words where each column corresponds to a phoneme (unique speech sound).

Words_DF <- data.frame( word = c("CAT", "BAT", "APPLE"), Phoneme1 = c("K", "B", "AE"), Phoneme2 = c("AE", "AE", "P"), Phoneme3 = c("T", "T", "AH"), Phoneme4 = c("Null", "Null", "L"))

    word Phoneme1 Phoneme2 Phoneme3 Phoneme4
 1   CAT        K       AE        T       Null
 2   BAT        B       AE        T       Null
 3 APPLE       AE        P       AH        L

I have another data frame where each phoneme corresponds to a series of binary values.

 Phoneme_DF <- data.frame( phoneme = c("AE", "AH", "B", "K", "T", "P", "L"), is_consonant = c(0, 0, 1, 1, 1, 1, 1), is_labial = c(0, 0, 0, 0, 0, 1, 0))


   phoneme is_consonant is_labial
1      AE            0         0
2      AH            0         0
3       B            1         1
4       K            1         0
5       T            1         0
6       P            1         1
7       L            1         0

I'm trying to figure out a way go through each row of my Words_DF, and look up the the value in each phoneme column in my Phoneme_DF and sum them in a new data frame that looks like this:

New_DF <- data.frame( word = c("CAT", "BAT", "APPLE"), consonants_in_word = c(2, 2, 3), labials_in_word = c(0, 1, 1))

    word consonants_in_word labials_in_word
1   CAT                  2               0
2   BAT                  2               1
3 APPLE                  2               1

I have tried writing some kind of loop that goes through each row of Words_DF and within each row goes through each column and looks up that value in the Phoneme_DF, then sums

   New_DF <- data.frame( word = c("CAT", "BAT", "APPLE"), consonants_in_word =      c(0, 0 , 0 ), labials_in_word = c(0, 0, 0))

  for(i in 1:length(SAMPLE_Words)){
    for(j in 1:length(where(SAMPE_Words[[j]]) %in% SAMPLE_Phoneme_DF[i])) {
    rbind(New_DF, sum(Phoneme_DF[i, ]))
   }
 }

I hope my question made sense. Thanks for your help! :)

Answer 1

I think you're desired output is off, Apple should only have 2 consonants. Try this:

library(tidyverse)

Words_DF %>% 
  gather(value, key, -word) %>% 
  left_join(Phoneme_DF, by = c("key" = "phoneme")) %>% 
  group_by(word) %>% 
  mutate(consonants_in_word = sum(is_consonant, na.rm = TRUE),
         labials_in_word = sum(is_labial, na.rm = TRUE)) %>% 
  distinct(word, .keep_all = TRUE) %>% 
  select(word, consonants_in_word, labials_in_word)

Which returns:

# A tibble: 3 x 3
# Groups:   word [3]
   word consonants_in_word labials_in_word
  <chr>              <int>           <int>
1   CAT                  2               0
2   BAT                  2               1
3 APPLE                  2               1

And this is the data I used:

Words_DF <- read.table(text = "word Phoneme1 Phoneme2 Phoneme3 Phoneme4
                               1   CAT        K       AE        T       Null
                               2   BAT        B       AE        T       Null
                               3 APPLE       AE        P       AH        L",
                       stringsAsFactors = FALSE, header = TRUE)

Phoneme_DF <- read.table(text = "phoneme is_consonant is_labial
                                 1      AE            0         0
                                 2      AH            0         0
                                 3       B            1         1
                                 4       K            1         0
                                 5       T            1         0
                                 6       P            1         1
                                 7       L            1         0",
                         stringsAsFactors = FALSE, header = TRUE)

Answer 2

I have the data.table counterpart, for anyone interested:

Phoneme_DF[melt(Words_DF,id.vars = "word", value.name = "phoneme"), on = "phoneme"][
,lapply(.SD,function(x){sum(x,na.rm = TRUE)}),
.SDcols = c("is_consonant","is_labial"),by = word]

gives

    word is_consonant is_labial
1:   CAT            2         0
2:   BAT            2         1
3: APPLE            2         1

Procedure is similar as what tyluRp proposed: you reshape the wordDF data table in long format, join it with the other, and then sum the values of consonant and labelial by word.

In each data frame row, match each column to a key in another data frame, and sums the values of the key in a new data frame

Question

2 answers

solution1
3 ACCPTED 2017-11-20 01:27:05

solution2
1 2017-11-20 09:47:42

In each data frame row, match each column to a key in another data frame, and sums the values of the key in a new data frame

Question

2 answers

solution1 3 ACCPTED 2017-11-20 01:27:05

solution2 1 2017-11-20 09:47:42

solution1
3 ACCPTED 2017-11-20 01:27:05

solution2
1 2017-11-20 09:47:42