[英]R: data.table : searching on multiple columns AND setting data type
Q1: Q1:
Is it possible for me to search on two different columns in a data table. 我可以在数据表中的两个不同列上进行搜索吗? I have a 2 million odd row data and I want to have the option to search on either of the two columns. 我有200万个奇数行数据,并且我希望可以选择搜索两列中的任何一列。 One has names and other has integers. 一个拥有名称,另一个拥有整数。
Example: 例:
x <- data.table(foo=letters,bar=1:length(letters))
x
want to do
x['c'] : searching on foo column
as well as
x[2] : searching on bar column
Q2: Is it possible to change the default data types in a data table. Q2:是否可以更改数据表中的默认数据类型。 I am reading in a matrix with both character and integer columns however everything is being read in as a character. 我正在读取同时包含字符和整数列的矩阵,但是所有内容都作为字符读取。
Thanks! 谢谢! -Abhi -Abhi
To answer your Q2 first, a data.table
is a data.frame
, both of which are internally a list
. 首先要回答您的Q2, data.table
是data.frame
,两者在内部都是list
。 Each column of the data.table
(or data.frame
) can therefore be of a different class. 因此data.table
(或data.frame
)的每一列都可以属于不同的类。 But you can't do that with a matrix
. 但是你不能用matrix
做到这一点。 You can use :=
to change the class (by reference - no unnecessary copy being made), for example, of "bar" here: 您可以使用:=
来更改类(通过引用-无需进行不必要的复制),例如,此处的“ bar”类:
x[, bar := as.integer(as.character(bar))]
For Q1, if you want to use fast subset (using binary search) feature of data.table
, then you've to set key , using the function setkey
. 对于Q1,如果要使用data.table
快速子集(使用二进制搜索)功能,则必须使用setkey
函数来设置key 。
setkey(x, foo)
allows you to fast-subset on 'x' alone like: x['a']
(or x[J('a')]
). 允许您单独对'x'进行快速子集,例如: x['a']
(或x[J('a')]
)。 Similarly setting a key on 'bar' allows you to fast-subset on that column. 同样,在“栏”上设置键可以让您快速对该列进行子集设置。
If you set the key on both 'foo' and 'bar' then you can provide values for both like so: 如果在“ foo”和“ bar”上都设置了键,则可以为它们提供值,如下所示:
setkey(x) # or alternatively setkey(x, foo, bar)
x[J('c', 3)]
However, this'll subset those where x == 'c' and y == 3. Currently, I don't think there is a way to do a |
但是,这将子集x =='c' 和 y == 3的那些子集。目前,我认为没有办法做|
operation with fast-subset directly. 直接使用快速子集进行操作。 You'll have to resort to a vector-scan approach in that case. 在这种情况下,您将不得不采用矢量扫描方法。
Hope this is what your question was about. 希望这就是您的问题。 Not sure. 不确定。
Your matrix is already a character. 您的矩阵已经是一个字符。 Matrices hold only one data type. 矩阵仅保存一种数据类型。 You can try X['c']
and X[J(2)]
. 您可以尝试X['c']
和X[J(2)]
。 You can change data types as X[,col := as.character(col)]
您可以将数据类型更改为X[,col := as.character(col)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.