[英]Rename/recode variable value in R based on condition using dplyr
我有一個數據集dataExtended
,其中包含變量CountryOther
和n
,它是該特定國家/地區的葡萄酒數量。 CountryOther
是字符類型, n
是 integer。我想做的是將CountryOther
中的值重命名為Other ,以防n <=20
。 我想用 dyplr package 來做,但我不確定該怎么做以及是否只使用mutate
或mutate_at
。
只要我不能寫出上述條件,我就嘗試按如下方式手動完成,但沒有成功:
dataExtended$CountryOther <- dataExtended$Country
dataExtended %>%
mutate(CountryOther = recode(CountryOther,
China = "Other",
Mexico = "Other",
Slovakia = "Other",
Bulgaria = "Other",
Canada = "Other",
Croatia = "Other",
Uruguay = "Other",
Georgia = "Other",
Turkey = "Other",
Moldova = "Other",
Slovenia = "Other",
Hungary = "Other",
Switzerland = "Other",
Greece = "Other",
Israel = "Other",
Lebanon= "Other"))
使用通過Red.csv
readr::read_csv()
導入的鏈接中的 Red.csv 創建一個data.frame
/ tibble
#> data
# A tibble: 8,666 × 8
Name Country Region Winery Rating NumberOf…¹ Price Year
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr>
1 Pomerol 2011 France Pomerol Château La Providence 4.2 100 95 2011
2 Lirac 2017 France Lirac Château Mont-Redon 4.3 100 15.5 2017
3 Erta e China Rosso di Toscana 2015 Italy Toscana Renzo Masi 3.9 100 7.45 2015
4 Bardolino 2019 Italy Bardolino Cavalchina 3.5 100 8.72 2019
5 Ried Scheibner Pinot Noir 2016 Austria Carnuntum Markowitsch 3.9 100 29.2 2016
6 Gigondas (Nobles Terrasses) 2017 France Gigondas Vieux Clocher 3.7 100 19.9 2017
7 Marion's Vineyard Pinot Noir 2016 New Zealand Wairarapa Schubert 4 100 43.9 2016
8 Red Blend 2014 Chile Itata Valley Viña La Causa 3.9 100 17.5 2014
9 Chianti 2015 Italy Chianti Castello Montaùto 3.6 100 10.8 2015
10 Tradition 2014 France Minervois Domaine des Aires Hautes 3.5 100 6.9 2014
# … with 8,656 more rows, and abbreviated variable name ¹NumberOfRatings
現在有了dplyr
的幫助
library(dplyr)
data %>%
add_count(Country, name = "WineCount") %>%
mutate(CountryOther = ifelse(WineCount <= 20, "Other", Country))
我們得到
# A tibble: 8,666 × 10
Name Country Region Winery Rating Numbe…¹ Price Year WineC…² Count…³
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <chr>
1 Pomerol 2011 France Pomerol Château La… 4.2 100 95 2011 2256 France
2 Lirac 2017 France Lirac Château Mo… 4.3 100 15.5 2017 2256 France
3 Erta e China Rosso di Toscana 2015 Italy Toscana Renzo Masi 3.9 100 7.45 2015 2650 Italy
4 Bardolino 2019 Italy Bardolino Cavalchina 3.5 100 8.72 2019 2650 Italy
5 Ried Scheibner Pinot Noir 2016 Austria Carnuntum Markowitsch 3.9 100 29.2 2016 220 Austria
6 Gigondas (Nobles Terrasses) 2017 France Gigondas Vieux Cloc… 3.7 100 19.9 2017 2256 France
7 Marion's Vineyard Pinot Noir 2016 New Zealand Wairarapa Schubert 4 100 43.9 2016 63 New Ze…
8 Red Blend 2014 Chile Itata Valley Viña La Ca… 3.9 100 17.5 2014 326 Chile
9 Chianti 2015 Italy Chianti Castello M… 3.6 100 10.8 2015 2650 Italy
10 Tradition 2014 France Minervois Domaine de… 3.5 100 6.9 2014 2256 France
# … with 8,656 more rows, and abbreviated variable names ¹NumberOfRatings, ²WineCount, ³CountryOther
我們可以過濾WineCount <= 30
:
# A tibble: 125 × 10
Name Country Region Winery Rating Numbe…¹ Price Year WineC…² Count…³
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <chr>
1 Steiner 2013 Hungary Sopron Wenin… 3.7 100 24.5 2013 9 Other
2 Viile Metamorfosis Merlot 2015 Romania Dealu Mare Vitis… 3.5 102 7.5 2015 23 Romania
3 Halkidiki Limnio - Merlot 2013 Greece Chalkidiki Tsant… 3.2 105 12.5 2013 13 Other
4 Cabernet Sauvignon 2013 Mexico Valle de Guad… L. A.… 3.4 1066 8.65 2013 1 Other
5 Driopi Classic Agiorgitiko Nemea 2017 Greece Nemea Κτημα… 3.7 107 11.5 2017 13 Other
6 Malbec de Purcari 2018 Moldova South Eastern Châte… 4.1 107 12.0 2018 8 Other
7 Cabernet Sauvignon de Purcari 2017 Moldova South Eastern Châte… 4.1 1082 13.0 2017 8 Other
8 Cabernet Sauvignon 2016 Romania Samburesti Caste… 3.3 112 7.9 2016 23 Romania
9 Aigle Les Murailles Rouge 2015 Switzerland Aigle Henri… 3.7 112 23.2 2015 12 Other
10 Γουμένισσα (Goumenissa) 2015 Greece Goumenissa Chatz… 3.7 115 20 2015 13 Other
檢查所需的 output: CountryOther
列中有幾行填充了"Other"
。
最后我創建了這段有效的代碼:
#New table with wine count
wineCount <- data %>% count(Country)
#Joining two tables together
dataExtended <- inner_join(wineCount, data, by = "Country")
# Creating new variable CountryOther
dataExtended$CountryOther <- dataExtended$Country
# Renaming count from n to WineCount
dataExtended <- rename(dataExtended, WineCount = n)
# Replacement of countries with WineCount<=20 to Other
dataExtended <- dataExtended %>%
mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
# Final check
unique(dataExtended$CountryOther)
問題是我需要將更改存儲到 dataframe 中,這是我以前沒有做過的(正如您在我上一條評論中看到的那樣):
dataExtended <- rename(dataExtended, WineCount = n)
和
dataExtended <- dataExtended %>%
mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
我還測試了您的代碼,它運行良好,而且看起來更整潔。 非常感謝您的幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.