简体   繁体   English

替换列 r 中的多个值

[英]Replacing multipe values in a column r

I am trying to create a function that takes in two variables, the continent and the column that would like to be worked with from a dataframe.我正在尝试创建一个函数,该函数接受两个变量,即大陆和想要从数据框中使用的列。 I am then trying to calculate the mean value of the column for that particular continent to replace the NAs that are in that column for that continent.然后我试图计算该特定大陆列的平均值,以替换该大陆列中的 NA。 However, I seem to be having trouble when it comes to the actual replacement of the values, I keep running into errors.但是,在实际替换值时,我似乎遇到了麻烦,我一直遇到错误。 I have tried multiple ways such as replace, replace_na and mutate but I keep getting errors that I cannot seem to get away from.我尝试了多种方法,例如替换、替换_na 和 mutate,但我不断收到似乎无法摆脱的错误。 This code works when it is not in a function, but the minute I add it to the function I seem to get this error.这段代码在它不在函数中时有效,但在我将它添加到函数的那一刻,我似乎收到了这个错误。

df<-structure(list(location = c("Algeria", "Angola", "Benin", "Botswana", 
"Burkina Faso", "Burundi"), iso_code = c("DZA", "AGO", "BEN", 
"BWA", "BFA", "BDI"), continent = c("Africa", "Africa", "Africa", 
"Africa", "Africa", "Africa"), date = c("2020-09-02", "2020-09-02", 
"2020-09-02", "2020-09-02", "2020-09-02", "2020-09-02"), total_cases = c(44833, 
2654, 2145, 1733, 1375, 445), new_cases = c(339, 30, 0, 9, 5, 
0), new_cases_smoothed = c(372.143, 53, 4.286, 24.429, 3.286, 
2.143), total_deaths = c(1518, 108, 40, 6, 55, 1), new_deaths = c(8, 
1, 0, 0, 0, 0), new_deaths_smoothed = c(8.857, 0.857, 0.143, 
0.429, 0, 0), total_cases_per_million = c(1022.393, 80.751, 176.934, 
736.937, 65.779, 37.424), new_cases_per_million = c(7.731, 0.913, 
0, 3.827, 0.239, 0), new_cases_smoothed_per_million = c(8.487, 
1.613, 0.354, 10.388, 0.157, 0.18), total_deaths_per_million = c(34.617, 
3.286, 3.299, 2.551, 2.631, 0.084), new_deaths_per_million = c(0.182, 
0.03, 0, 0, 0, 0), new_deaths_smoothed_per_million = c(0.202, 
0.026, 0.012, 0.182, 0, 0), population = c(43851043, 32866268, 
12123198, 2351625, 20903278, 11890781), population_density = c(17.348, 
23.89, 99.11, 4.044, 70.151, 423.062), median_age = c(29.1, 16.8, 
18.8, 25.8, 17.6, 17.5), aged_65_older = c(6.211, 2.405, 3.244, 
3.941, 2.409, 2.562), aged_70_older = c(3.857, 1.362, 1.942, 
2.242, 1.358, 1.504), gdp_per_capita = c(13913.839, 5819.495, 
2064.236, 15807.374, 1703.102, 702.225), extreme_poverty = c(0.5, 
NA, 49.6, NA, 43.7, 71.7), cardiovasc_death_rate = c(278.364, 
276.045, 235.848, 237.372, 269.048, 293.068), diabetes_prevalence = c(6.73, 
3.94, 0.99, 4.81, 2.42, 6.05), female_smokers = c(0.7, NA, 0.6, 
5.7, 1.6, NA), male_smokers = c(30.4, NA, 12.3, 34.4, 23.9, NA
), handwashing_facilities = c(83.741, 26.664, 11.035, NA, 11.877, 
6.144), hospital_beds_per_thousand = c(1.9, NA, 0.5, 1.8, 0.4, 
0.8), life_expectancy = c(76.88, 61.15, 61.77, 69.59, 61.58, 
61.58)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))


fun1 <- function(cont, column)
{
  countries<-df%>%
    filter(continent == cont)
  
  m<-mean(countries[[column]],na.rm=T)

    df[,column]<-ifelse(is.na(df[,column]) & df$continent==cont,m,(df[,column]=df[,column]))
}

fun1("Europe","median_age")

Error: Error during wrapup: Can't recycle input of size 208 to size 1. Error: no more error handlers available (recursive errors?);错误:结束时出错:无法将大小 208 的输入回收到大小 1。错误:没有更多可用的错误处理程序(递归错误?); invoking 'abort' restart调用“中止”重启

You have a number of problems here.你在这里有很多问题。 The first is that you seem to have made an error copying your dput over, so your example code doesn't run.第一个是您似乎在复制 dput 时出错,因此您的示例代码无法运行。 Secondly, you are using the name mean as a variable name in the function, which is very likely to cause debugging confusion later.其次,你在函数中使用了名称mean作为变量名,这很容易造成后期调试混乱。 The third is that your function doesn't return anything.第三是你的函数不返回任何东西。 Lastly, your spacing makes the code very difficult to read.最后,您的间距使代码很难阅读。 You have lots of vertical spaces with new lines, but don't separate out your variable names and operators with spaces.你有很多带有新行的垂直空格,但不要用空格分隔变量名和运算符。 Again this makes things harder to debug.这再次使调试变得更加困难。

If you are using dplyr functions, you can take advantage of quasiquotation to make your code simpler and more intuitive to use.如果您正在使用 dplyr 函数,您可以利用准引号使您的代码更简单、更直观易用。 For example, you can write it to pass bare column names without having to wrap them in "double quotes"例如,您可以编写它来传递裸列名称,而不必将它们用“双引号”括起来

fun1 <- function(cont, col)
{
  col <- enquo(col)
  
  filter(df, continent == cont) %>%
    mutate(!!col := replace(!!col, is.na(!!col), mean(!!col, na.rm = TRUE)))
}

So you can write:所以你可以写:

fun1("Africa", new_cases)
#>       location iso_code continent       date total_cases new_cases new_cases_smoothed
#> 1      Algeria      DZA    Africa 2020-09-02       44833       339            372.143
#> 2       Angola      AGO    Africa 2020-09-02        2654        30             53.000
#> 3        Benin      BEN    Africa 2020-09-02        2145         0              4.286
#> 4     Botswana      BWA    Africa 2020-09-02        1733         9             24.429
#> 5 Burkina Faso      BFA    Africa 2020-09-02        1375         5              3.286
#> 6      Burundi      BDI    Africa 2020-09-02         445         0              2.143
#>   total_deaths new_deaths
#> 1         1518          8
#> 2          108          1
#> 3           40          0
#> 4            6          0
#> 5           55          0
#> 6            1          0

If you just want to replace all NA values in the numeric columns by the mean of the other countries in that continent, then you don't need a function at all.如果您只想用该大陆其他国家/地区的平均值替换数字列中的所有 NA 值,那么您根本不需要函数。 You can just use:你可以只使用:

df <- df %>% 
        group_by(continent) %>%
        mutate(across(total_cases:life_expectancy,
               function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))))

To transform the entire data frame.转换整个数据框。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM