根據r中另一個數據框中的列填充數據框中的列

Question

我有一個評論數據框，看起來像這樣（df1）

Comments
Apple laptops are really good for work,we should buy them
Apple Iphones are too costly,we can resort to some other brands
Google search is the best search engine 
Android phones are great these days
I lost my visa card today

我有另一個merchent名稱的數據框，看起來像這樣（df2）：

Merchant_Name
Google
Android
Geoni
Visa
Apple
MC
WallMart

如果df2中的merchant_name出現在df 1的Comment中，則將該商家名稱附加到R中df1中的第二列。匹配不必是完全匹配。近似值是必需的。此外，df1包含大約500K行！ 我的最終輸出df可能看起來像這樣

Comments                                                        Merchant
Apple laptops are really good for work,we should buy them       Apple
Apple Iphones are too costly,we can resort to some other brands Apple
Google search is the best search engine                         Google
Android phones are great these days                             Android
I lost my visa card today                                       Visa

我怎樣才能在R中有效地做到這一點。 謝謝

Answer 1

這是regex的工作。 退房grepl內部命令lapply 。

comments = c(
   'Apple laptops are really good for work,we should buy them',
   'Apple Iphones are too costly,we can resort to some other brands',
   'Google search is the best search engine ',
   'Android phones are great these days',
   'I lost my visa card today'
)

brands = c(
   'Google',
   'Android',
   'Geoni',
   'Visa',
   'Apple',
   'MC',
   'WallMart'
)

brandinpattern = lapply(
   brands,
   function(brand) {
      commentswithbrand = grepl(x = tolower(comments), pattern = tolower(brand))
      if ( sum(commentswithbrand) > 0) {
         data.frame(
            comment = comments[commentswithbrand],
            brand = brand
         )
      } else {
         data.frame()
      }
   }
)

brandinpattern = do.call(rbind, brandinpattern)


> do.call(rbind, brandinpattern)
                                                          comment   brand
1                        Google search is the best search engine   Google
2                             Android phones are great these days Android
3                                       I lost my visa card today    Visa
4       Apple laptops are really good for work,we should buy them   Apple
5 Apple Iphones are too costly,we can resort to some other brands   Apple

Answer 2

試試這個

final_df <- data.frame(Comments = character(), Merchant_Name = character(), stringsAsFactors = F)

for(i in df1$Comments){
    for(j in df2$Merchant_Name){ 
        if(grepl(tolower(j),tolower(i))){ 
            final_df[nrow(final_df) + 1,] <- c(i, j)
            break
        }
    }
}


final_df

##                                                        comments  brands
##1       Apple laptops are really good for work,we should buy them   Apple
##2 Apple Iphones are too costly,we can resort to some other brands   Apple
##3                        Google search is the best search engine   Google
##4                             Android phones are great these days Android
##5                                       I lost my visa card today    Visa

根據r中另一個數據框中的列填充數據框中的列

問題描述

2 個解決方案

解決方案1
5 已采納 2015-11-13 08:43:37

解決方案2
0 2015-11-13 09:37:49

根據r中另一個數據框中的列填充數據框中的列

問題描述

2 個解決方案

解決方案1 5 已采納 2015-11-13 08:43:37

解決方案2 0 2015-11-13 09:37:49

解決方案1
5 已采納 2015-11-13 08:43:37

解決方案2
0 2015-11-13 09:37:49