简体   繁体   中英

Replacing words in Dataframe by values in another Dataframe in R

I'm trying to replace string values in a Dataframe by numeric values in another one . Let me create an example:

df1 <- data.frame("Time" = seq(1, 3, 1),
              "Words" = c("dog cat", "bird fish", "elephant shark turtle"))

This is my main Dataframe. It's pretty simple: The first column refers to the Time spent and the second column shows every word registered on that specific time. The other Dataframe is:

df2 <- data.frame("Words" = c("turtle", "bird", "dog", "elephant", "cat", "shark", "fish"),
              "Points" = seq(1, 7, 1))

This Dataframe refers to the Points that each Word accomplishes. My goal is to calculate the median of points on each time period. For instance, when Time = 1, "dog" equals 3 and "cat" equals 5, so the median will be 4 .

I'm having a lot of trouble with this. The first thing is how to separate the words within a string . Then, I think that I have to create a loop to search in df2 the values that appears in df1 and substitute them by their values . Finally, I have to reach the median of each Time period to finish the task.

Any ideas?

Using tidy approach

library(tidyverse)
df1 %>%
  mutate(Words = strsplit(df1$Words, " ")) %>% 
  unnest(Words) %>% 
  inner_join(df2, by="Words") %>% 
  group_by(Time) %>% 
  summarize(Score = median(Points))

Using data.table

library(data.table)
setDT(df1)[,.(Words=strsplit(Words, " ")[[1]]),by=Time] %>% 
  .[setDT(df2), on=.(Words)] %>% 
  .[,(Score = median(Points)), by=Time]

Output:

   Time Score
  <dbl> <dbl>
1     1   4  
2     2   4.5
3     3   4  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM