簡體   English   中英

使用 dplyr 根據條件重命名/重新編碼 R 中的變量值

[英]Rename/recode variable value in R based on condition using dplyr

我有一個數據集dataExtended ,其中包含變量CountryOthern ,它是該特定國家/地區的葡萄酒數量。 CountryOther是字符類型, n是 integer。我想做的是將CountryOther中的值重命名為Other ,以防n <=20 我想用 dyplr package 來做,但我不確定該怎么做以及是否只使用mutatemutate_at

只要我不能寫出上述條件,我就嘗試按如下方式手動完成,但沒有成功:

dataExtended$CountryOther <- dataExtended$Country
dataExtended %>% 
  mutate(CountryOther = recode(CountryOther,
                               China = "Other", 
                               Mexico = "Other", 
                               Slovakia = "Other",
                               Bulgaria = "Other", 
                               Canada = "Other", 
                               Croatia = "Other", 
                               Uruguay = "Other", 
                               Georgia = "Other", 
                               Turkey = "Other", 
                               Moldova = "Other", 
                               Slovenia = "Other", 
                               Hungary = "Other", 
                               Switzerland = "Other", 
                               Greece = "Other", 
                               Israel = "Other", 
                               Lebanon= "Other"))

使用通過Red.csv readr::read_csv()導入的鏈接中的 Red.csv 創建一個data.frame / tibble

#> data
# A tibble: 8,666 × 8
   Name                               Country     Region       Winery                   Rating NumberOf…¹ Price Year 
   <chr>                              <chr>       <chr>        <chr>                     <dbl>      <dbl> <dbl> <chr>
 1 Pomerol 2011                       France      Pomerol      Château La Providence       4.2        100 95    2011 
 2 Lirac 2017                         France      Lirac        Château Mont-Redon          4.3        100 15.5  2017 
 3 Erta e China Rosso di Toscana 2015 Italy       Toscana      Renzo Masi                  3.9        100  7.45 2015 
 4 Bardolino 2019                     Italy       Bardolino    Cavalchina                  3.5        100  8.72 2019 
 5 Ried Scheibner Pinot Noir 2016     Austria     Carnuntum    Markowitsch                 3.9        100 29.2  2016 
 6 Gigondas (Nobles Terrasses) 2017   France      Gigondas     Vieux Clocher               3.7        100 19.9  2017 
 7 Marion's Vineyard Pinot Noir 2016  New Zealand Wairarapa    Schubert                    4          100 43.9  2016 
 8 Red Blend 2014                     Chile       Itata Valley Viña La Causa               3.9        100 17.5  2014 
 9 Chianti 2015                       Italy       Chianti      Castello Montaùto           3.6        100 10.8  2015 
10 Tradition 2014                     France      Minervois    Domaine des Aires Hautes    3.5        100  6.9  2014 
# … with 8,656 more rows, and abbreviated variable name ¹​NumberOfRatings

現在有了dplyr的幫助

library(dplyr)

data %>% 
  add_count(Country, name = "WineCount") %>% 
  mutate(CountryOther = ifelse(WineCount <= 20, "Other", Country))

我們得到

# A tibble: 8,666 × 10
   Name                               Country     Region       Winery      Rating Numbe…¹ Price Year  WineC…² Count…³
   <chr>                              <chr>       <chr>        <chr>        <dbl>   <dbl> <dbl> <chr>   <int> <chr>  
 1 Pomerol 2011                       France      Pomerol      Château La…    4.2     100 95    2011     2256 France 
 2 Lirac 2017                         France      Lirac        Château Mo…    4.3     100 15.5  2017     2256 France 
 3 Erta e China Rosso di Toscana 2015 Italy       Toscana      Renzo Masi     3.9     100  7.45 2015     2650 Italy  
 4 Bardolino 2019                     Italy       Bardolino    Cavalchina     3.5     100  8.72 2019     2650 Italy  
 5 Ried Scheibner Pinot Noir 2016     Austria     Carnuntum    Markowitsch    3.9     100 29.2  2016      220 Austria
 6 Gigondas (Nobles Terrasses) 2017   France      Gigondas     Vieux Cloc…    3.7     100 19.9  2017     2256 France 
 7 Marion's Vineyard Pinot Noir 2016  New Zealand Wairarapa    Schubert       4       100 43.9  2016       63 New Ze…
 8 Red Blend 2014                     Chile       Itata Valley Viña La Ca…    3.9     100 17.5  2014      326 Chile  
 9 Chianti 2015                       Italy       Chianti      Castello M…    3.6     100 10.8  2015     2650 Italy  
10 Tradition 2014                     France      Minervois    Domaine de…    3.5     100  6.9  2014     2256 France
# … with 8,656 more rows, and abbreviated variable names ¹​NumberOfRatings, ²​WineCount, ³​CountryOther

我們可以過濾WineCount <= 30

# A tibble: 125 × 10
   Name                                  Country     Region         Winery Rating Numbe…¹ Price Year  WineC…² Count…³
   <chr>                                 <chr>       <chr>          <chr>   <dbl>   <dbl> <dbl> <chr>   <int> <chr>  
 1 Steiner 2013                          Hungary     Sopron         Wenin…    3.7     100 24.5  2013        9 Other  
 2 Viile Metamorfosis Merlot 2015        Romania     Dealu Mare     Vitis…    3.5     102  7.5  2015       23 Romania
 3 Halkidiki Limnio - Merlot 2013        Greece      Chalkidiki     Tsant…    3.2     105 12.5  2013       13 Other  
 4 Cabernet Sauvignon 2013               Mexico      Valle de Guad… L. A.…    3.4    1066  8.65 2013        1 Other  
 5 Driopi Classic Agiorgitiko Nemea 2017 Greece      Nemea          Κτημα…    3.7     107 11.5  2017       13 Other  
 6 Malbec de Purcari 2018                Moldova     South Eastern  Châte…    4.1     107 12.0  2018        8 Other  
 7 Cabernet Sauvignon de Purcari 2017    Moldova     South Eastern  Châte…    4.1    1082 13.0  2017        8 Other  
 8 Cabernet Sauvignon 2016               Romania     Samburesti     Caste…    3.3     112  7.9  2016       23 Romania
 9 Aigle Les Murailles Rouge 2015        Switzerland Aigle          Henri…    3.7     112 23.2  2015       12 Other  
10 Γουμένισσα (Goumenissa) 2015          Greece      Goumenissa     Chatz…    3.7     115 20    2015       13 Other

檢查所需的 output: CountryOther列中有幾行填充了"Other"

最后我創建了這段有效的代碼:

#New table with wine count
wineCount <- data %>% count(Country)
#Joining two tables together
dataExtended <- inner_join(wineCount, data, by = "Country")
# Creating new variable CountryOther
dataExtended$CountryOther <- dataExtended$Country
# Renaming count from n to WineCount
dataExtended <- rename(dataExtended, WineCount = n)
# Replacement of countries with WineCount<=20 to Other
dataExtended <- dataExtended %>% 
  mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
# Final check
unique(dataExtended$CountryOther)

問題是我需要將更改存儲到 dataframe 中,這是我以前沒有做過的(正如您在我上一條評論中看到的那樣):

dataExtended <- rename(dataExtended, WineCount = n)

dataExtended <- dataExtended %>% 
  mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))

我還測試了您的代碼,它運行良好,而且看起來更整潔。 非常感謝您的幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM