简体   繁体   English

data.table-提取所有文本功能

[英]data.table - Extract all the text features

As part of a function, I am trying to isolate all features that are either character or factor . 作为功​​能的一部分,我试图隔离所有为characterfactor My data set is a data.table . 我的数据集是data.table

text_features <- c(names(data_set[sapply(data_set, is.character)]), names(data_set[sapply(data_set, is.factor)]))

When I run the function I am getting an exception message that says : 运行该函数时,我收到一条异常消息,内容为:

Error in [.data.table (data_set, sapply(data_set, is.character)) : i evaluates to a logical vector length 87 but there are 12992 rows. [.data.table (data_set,sapply(data_set,is.character))中的错误:我计算得出逻辑向量长度为​​87,但有12992行。 Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. 逻辑i的回收不再被允许,因为它隐藏了比值得提供的便利少的错误。 Explicitly use rep(...,length=.N) if you really need to recycle. 如果确实需要回收,则显式使用rep(...,length = .N)。

I understand this error is thrown by a recent version of data.table - How should I change my code to work the same way in order to avoid this error? 我了解此错误是由最新版本的data.table -我应该如何更改代码以相同的方式工作以避免此错误?

Note: 注意:

packageVersion("data.table")
[1] ‘1.10.4.3’

Thanks 谢谢

The error that you are getting is because you have commas in the wrong place when you are subsetting your inner data.tables. 您得到的错误是因为在对内部data.tables进行子集设置时,逗号放在错误的位置。 You want a subset of the columns, not rows: 您需要列的子集,而不是行:

data_set[sapply(data_set, is.character)] # subsetting rows
data_set[,sapply(data_set, is.character), with = FALSE] # subsetting columns

All that said, I think a much cleaner solution would be: 话虽如此,我认为更干净的解决方案是:

text_cols <- names(data_set)[sapply(data_set, class) %in% c("character","factor")]
data_set[, ..text_cols] # subset data

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM