data.table-提取所有文本功能

Question

As part of a function, I am trying to isolate all features that are either character or factor . 作为功能的一部分，我试图隔离所有为character或factor 。 My data set is a data.table . 我的数据集是data.table 。

text_features <- c(names(data_set[sapply(data_set, is.character)]), names(data_set[sapply(data_set, is.factor)]))

When I run the function I am getting an exception message that says : 运行该函数时，我收到一条异常消息，内容为：

Error in [.data.table (data_set, sapply(data_set, is.character)) : i evaluates to a logical vector length 87 but there are 12992 rows. [.data.table （data_set，sapply（data_set，is.character））中的错误：我计算得出逻辑向量长度为87，但有12992行。 Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. 逻辑i的回收不再被允许，因为它隐藏了比值得提供的便利少的错误。 Explicitly use rep(...,length=.N) if you really need to recycle. 如果确实需要回收，则显式使用rep（...，length = .N）。

I understand this error is thrown by a recent version of data.table - How should I change my code to work the same way in order to avoid this error? 我了解此错误是由最新版本的data.table -我应该如何更改代码以相同的方式工作以避免此错误？

Note: 注意：

packageVersion("data.table")
[1] ‘1.10.4.3’

Thanks 谢谢

Answer 1

The error that you are getting is because you have commas in the wrong place when you are subsetting your inner data.tables. 您得到的错误是因为在对内部data.tables进行子集设置时，逗号放在错误的位置。 You want a subset of the columns, not rows: 您需要列的子集，而不是行：

data_set[sapply(data_set, is.character)] # subsetting rows
data_set[,sapply(data_set, is.character), with = FALSE] # subsetting columns

All that said, I think a much cleaner solution would be: 话虽如此，我认为更干净的解决方案是：

text_cols <- names(data_set)[sapply(data_set, class) %in% c("character","factor")]
data_set[, ..text_cols] # subset data

data.table-提取所有文本功能

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-11-03 14:42:44

data.table-提取所有文本功能

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-11-03 14:42:44

解决方案1
2 已采纳 2017-11-03 14:42:44