[英]Tidy rows in one data frame based on a condition
I have a question in R programming. 我在R编程中有一个问题。
I have a data frame in R with the following data: 我在R中有一个数据框,其中包含以下数据:
Country Year Population Bikes Revenue
Austria 1970 85 NA NA
Austria 1973 86 NA NA
AUSTRIA 1970 NA 56 4567
AUSTRIA 1973 NA 54 4390
I want to summarise this data in order to have the following new data: 我想总结一下这些数据,以便获得以下新数据:
Country Year Population Bikes Revenue
Austria 1970 85 56 4567
Austria 1973 86 54 4390
Thus, I need to exclude the repeated years per country and join the Bikes and Revenue columns to the specific year and country. 因此,我需要排除每个国家/地区重复的年份,并将“自行车和收入”列加入特定的年份和国家/地区。
I would highly appreciate if you could help me with this issue. 如果您能在这个问题上为我提供帮助,我将非常感激。
Thank you. 谢谢。
One dplyr
possibility could be: dplyr
一种可能是:
df %>%
group_by(Country = toupper(Country), Year) %>%
summarise_all(list(~ sum(.[!is.na(.)])))
Country Year Population Bikes Revenue
<chr> <int> <int> <int> <int>
1 AUSTRIA 1970 85 56 4567
2 AUSTRIA 1973 86 54 4390
Or a combination of dplyr
and tidyr
: 或
dplyr
和tidyr
的组合:
df %>%
group_by(Country = toupper(Country), Year) %>%
fill(everything(), .direction = "up") %>%
fill(everything(), .direction = "down") %>%
distinct()
Or if you for some reasons need to use the country names starting by an uppercase letter: 或者,如果由于某些原因您需要使用以大写字母开头的国家/地区名称:
df %>%
mutate(Country = tolower(Country),
Country = paste0(toupper(substr(Country, 1, 1)), substr(Country, 2, nchar(Country)))) %>%
group_by(Country, Year) %>%
summarise_all(list(~ sum(.[!is.na(.)])))
Country Year Population Bikes Revenue
<chr> <int> <int> <int> <int>
1 Austria 1970 85 56 4567
2 Austria 1973 86 54 4390
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.