简体   繁体   English

避免在R data.table中使用sapply

[英]Avoiding this use of sapply in R data.table

I've written a function to remove everything from the first parentheses onwards in a string: 我编写了一个函数,从字符串的第一个括号开始删除所有内容:

until_parentheses <- function(string) {

  one <- stringr::str_split_fixed(string, "\\(", 2)[1, 1]

  res <- stringr::str_trim(one)

  return(res)

}

And I have a data.table with a column that looks (something) like this: 我有一个data.table,其中的列看起来像这样:

messy <- paste(letters[1:10], paste0(c(" (", letters[1:2], ")"), collapse = ""))

dt <- data.table(messy)

When I try to use until_parentheses() on the messy column like so 当我尝试像这样在凌乱的列上使用until_parentheses()

dt[, ":=" (clean = until_parentheses(messy))]

The function is applied to only the first element of messy and the clean column is the result repeated 10 times. 该函数仅适用于messy的第一个元素,clean列是重复10次的结果。

In order to have the clean column come out how I want it to I am using sapply: 为了让干净的列出来,我想要使用sapply:

dt[, ":=" (clean_2 = sapply(messy, until_parentheses))]

This gives the result I want however it takes a long time to run when dt is long. 这给出了我想要的结果,但是当dt很长时需要花费很长时间。

I feel like there are problems with both my until_parenthese() function and with my data.table method. 我感觉我的until_parenthese()函数和data.table方法都存在问题。 Does anyone have a solution that makes redundant my use of sapply in this instance? 在这种情况下,没有人有多余的解决方案使我无法使用sapply吗?

Thanks! 谢谢!

您可以使用向量化的gsub

dt[,clean_3:=gsub(' +[(].*','',messy)] ## replace anything after the first ( with a blank

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM