简体   繁体   English

将自定义 function 应用于现有列以在 R 的数据框中创建新列的最佳方法

[英]Best way to apply a custom function to existing column to create a new column in data frame in R

I have a data frame with a character-type column containing strings of numbers in a comma-delimited manner ie 1, 2, 3, 4 .我有一个数据框,其字符类型列包含以逗号分隔的数字字符串,即1, 2, 3, 4 I have a custom function that I would like to apply to each value row-wise in the column in order to get a new value that I can store into a new column to the data frame df .我有一个自定义 function ,我想将其应用于列中的每个值,以便获得一个可以存储到数据框df的新列中的新值。

Initial data frame初始数据框

A B str
1 1 1, 2, 5
1 2 NA
2 1 NA
2 2 1, 3

Final data frame最终数据框

A B str      res
1 1 1, 2, 5  2
1 2 NA       0
2 1 NA       0
2 2 1, 3     1

This is my custom function getCounts这是我的自定义 function getCounts

getCounts <- function(str, x, y){
  if (is.na(str)){
    return(as.integer(0))
  }
  vec <- as.integer(unlist(strsplit(str, ',')))
  count <- 0
  for (i in vec) {
    if (i >= x & i <= y){
      count <- count + 1
    }
  }
  return(as.integer(count))
}


I originally tried using lapply as it seemed like it was best suited based on other posts but kept getting an error such as:我最初尝试使用lapply ,因为它似乎最适合基于其他帖子,但不断收到错误,例如:

df <- df %>% mutate(res = lapply(df$str, getCounts(df$str, 0, 2)))
Error: Problem with `mutate()` input `res`. x missing value where TRUE/FALSE needed i Input `res` is `lapply(df$str, getCounts(df$str, 0, 2))`

The only thing that seems to be working is when I use mapply , but I don't really understand why and if there is a better way to do this.唯一似乎有效的是当我使用时mapply ,但我真的不明白为什么以及是否有更好的方法来做到这一点。

df <- df %>%mutate(res = mapply(getCounts, df$str, 0, 2))

If I'm reading this right, you should be able to just use rowwise() :如果我没看错,您应该可以使用rowwise()

df %>%
  rowwise() %>%
  mutate(res = getCounts(str, 0, 2)) %>%
  ungroup()

with your data:使用您的数据:

data.frame(
    A = c(1,1,2,2),
    B = c(1,2,1,2),
    str = c('1, 2, 5', NA, NA, '1, 3')
) -> df

getCounts <- function(str, x, y){
    if (is.na(str)){
        return(as.integer(0))
    }
    vec <- as.integer(unlist(strsplit(str, ',')))
    count <- 0
    for (i in vec) {
        if (i >= x & i <= y){
            count <- count + 1
        }
    }
    return(as.integer(count))
}

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>%
    rowwise() %>%
    mutate(res = getCounts(str, 0, 2)) %>%
    ungroup()
#> # A tibble: 4 x 4
#>       A     B str       res
#>   <dbl> <dbl> <chr>   <int>
#> 1     1     1 1, 2, 5     2
#> 2     1     2 <NA>        0
#> 3     2     1 <NA>        0
#> 4     2     2 1, 3        1

Created on 2021-03-17 by the reprex package (v1.0.0)代表 package (v1.0.0) 于 2021 年 3 月 17 日创建

You can try Vectorize你可以试试Vectorize

df %>%
  mutate(res = Vectorize(getCounts)(str, 0, 2))

or sapplysapply

df %>%
  mutate(res = sapply(str, getCounts, x = 0, y = 2))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将函数应用于 R data.frame 中另一列的一系列值以使其保持矢量化的最佳方法是什么? - What is the best way to apply a function to a range of values from another column in R data.frame so it remains vectorized? 尝试使用 R 中的 function 在数据框中创建新列 - Trying to create a new column in a data frame using a function in R R:从现有数据框创建新数据框以研究一个变量的变化(列) - R: Create a new data frame from an existing data frame to study changes in one variable (column) R中现有数据帧的列作为新数据帧,但结果为NULL - Column of an existing data frame as a new data frame in R, but result is NULL 根据我的数据框中现有列的值,在 R 中创建一个新列 - Create a new column in R based off of values for an existing column in my data frame R是使用自定义函数按列处理数据的最佳方法 - R the best way to process data by column with a custom function 在函数参数中顺序更改的情况下,将函数应用于数据框中的列的最佳方法是什么? - What is the best way to apply a function to a column in a data frame with sequential changes in the function argument? 如何创建一个新的数据框,其中现有列成为 R 中的新列 - How to create a new data frame where existing columns become a new column in R 根据R中现有列中的文本字符串,使用二进制(0/1)数据创建新的数据框列 - Create new data frame columns with binary (0/1) data based on text strings in existing column in R R:使用lapply创建新列和值,并在data.frame列表中应用嵌套,输出错误 - R:create new column and value using lapply & apply nested on data.frame list, wrong output
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM