简体   繁体   English

R:data.table:在多列上搜索并设置数据类型

[英]R: data.table : searching on multiple columns AND setting data type

Q1: Q1:

Is it possible for me to search on two different columns in a data table. 我可以在数据表中的两个不同列上进行搜索吗? I have a 2 million odd row data and I want to have the option to search on either of the two columns. 我有200万个奇数行数据,并且我希望可以选择搜索两列中的任何一列。 One has names and other has integers. 一个拥有名称,另一个拥有整数。

Example: 例:

x <- data.table(foo=letters,bar=1:length(letters))
x

want to do
x['c'] : searching on foo column
as well as 
x[2]   : searching on bar column

Q2: Is it possible to change the default data types in a data table. Q2:是否可以更改数据表中的默认数据类型。 I am reading in a matrix with both character and integer columns however everything is being read in as a character. 我正在读取同时包含字符和整数列的矩阵,但是所有内容都作为字符读取。

Thanks! 谢谢! -Abhi -Abhi

To answer your Q2 first, a data.table is a data.frame , both of which are internally a list . 首先要回答您的Q2, data.tabledata.frame ,两者在内部都是list Each column of the data.table (or data.frame ) can therefore be of a different class. 因此data.table (或data.frame )的每一列都可以属于不同的类。 But you can't do that with a matrix . 但是你不能用matrix做到这一点。 You can use := to change the class (by reference - no unnecessary copy being made), for example, of "bar" here: 您可以使用:=来更改类(通过引用-无需进行不必要的复制),例如,此处的“ bar”类:

x[, bar := as.integer(as.character(bar))]

For Q1, if you want to use fast subset (using binary search) feature of data.table , then you've to set key , using the function setkey . 对于Q1,如果要使用data.table快速子集(使用二进制搜索)功能,则必须使用setkey函数来设置key

setkey(x, foo)

allows you to fast-subset on 'x' alone like: x['a'] (or x[J('a')] ). 允许您单独对'x'进行快速子集,例如: x['a'] (或x[J('a')] )。 Similarly setting a key on 'bar' allows you to fast-subset on that column. 同样,在“栏”上设置键可以让您快速对该列进行子集设置。

If you set the key on both 'foo' and 'bar' then you can provide values for both like so: 如果在“ foo”和“ bar”上都设置了键,则可以为它们提供值,如下所示:

setkey(x) # or alternatively setkey(x, foo, bar)
x[J('c', 3)]

However, this'll subset those where x == 'c' and y == 3. Currently, I don't think there is a way to do a | 但是,这将子集x =='c' y == 3的那些子集。目前,我认为没有办法做| operation with fast-subset directly. 直接使用快速子集进行操作。 You'll have to resort to a vector-scan approach in that case. 在这种情况下,您将不得不采用矢量扫描方法。

Hope this is what your question was about. 希望这就是您的问题。 Not sure. 不确定。

Your matrix is already a character. 您的矩阵已经是一个字符。 Matrices hold only one data type. 矩阵仅保存一种数据类型。 You can try X['c'] and X[J(2)] . 您可以尝试X['c']X[J(2)] You can change data types as X[,col := as.character(col)] 您可以将数据类型更改为X[,col := as.character(col)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM