避免在R data.table中使用sapply

Question

I've written a function to remove everything from the first parentheses onwards in a string: 我编写了一个函数，从字符串的第一个括号开始删除所有内容：

until_parentheses <- function(string) {

  one <- stringr::str_split_fixed(string, "\\(", 2)[1, 1]

  res <- stringr::str_trim(one)

  return(res)

}

And I have a data.table with a column that looks (something) like this: 我有一个data.table，其中的列看起来像这样：

messy <- paste(letters[1:10], paste0(c(" (", letters[1:2], ")"), collapse = ""))

dt <- data.table(messy)

When I try to use until_parentheses() on the messy column like so 当我尝试像这样在凌乱的列上使用until_parentheses()时

dt[, ":=" (clean = until_parentheses(messy))]

The function is applied to only the first element of messy and the clean column is the result repeated 10 times. 该函数仅适用于messy的第一个元素，clean列是重复10次的结果。

In order to have the clean column come out how I want it to I am using sapply: 为了让干净的列出来，我想要使用sapply：

dt[, ":=" (clean_2 = sapply(messy, until_parentheses))]

This gives the result I want however it takes a long time to run when dt is long. 这给出了我想要的结果，但是当dt很长时需要花费很长时间。

I feel like there are problems with both my until_parenthese() function and with my data.table method. 我感觉我的until_parenthese()函数和data.table方法都存在问题。 Does anyone have a solution that makes redundant my use of sapply in this instance? 在这种情况下，没有人有多余的解决方案使我无法使用sapply吗？

Thanks! 谢谢！

Answer 1

您可以使用向量化的gsub ：

dt[,clean_3:=gsub(' +[(].*','',messy)] ## replace anything after the first ( with a blank

避免在R data.table中使用sapply

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-02-03 04:35:54

避免在R data.table中使用sapply

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-02-03 04:35:54

解决方案1
4 已采纳 2016-02-03 04:35:54