简体   繁体   English

替代R中的数据帧

[英]Alternative to data frame in R

What's the right type of object to encapsulate objects of different type in R? 在R中封装不同类型对象的正确类型的对象是什么? Data frames don't seem to be the right type of object for this? 数据框似乎不是正确的对象类型? Data frames have a number of entries each having the same sub-fields. 数据帧具有多个条目,每个条目具有相同的子字段。

What I need is a single object, with different sub-objects some of which might be arrays. 我需要的是一个单个对象,具有不同的子对象,其中一些可能是数组。

Example: 例:

score<-0.95
confidence_in_score<-0.5
confidence_interval<-c(0,1)
token<-"foobar"
object_to_return<-data.frame(score,confidence_in_score,confidence_interval,token)


  score confidence_in_score confidence_interval token
1  0.05                 0.5                   0   ggg
2  0.05                 0.5                   1   ggg

What I really want is a container where one element will be an array confidence_interval with just two elements. 我真正想要的是一个容器,其中一个元素将是一个只有两个元素的数组confidence_interval。

Motivation: To pass back to the calling program a single object instead of several individual sub-objects. 动机:将一个对象而不是几个单独的子对象传递给调用程序。

Try this: 尝试这个:

# next 4 lines are from question 
score<-0.95
confidence_in_score<-0.5
confidence_interval<-c(0,1)
token<-"foobar"

list(score = score, 
     confidence_in_score = confidence_in_score, 
     confidence_interval = confidence_interval, 
     token = token)

As an example of returning a list from R itself look at the last line of source code of eigen . 作为从R本身返回列表的示例,请查看eigen源代码的最后一行。 It returns a list with two components named values and vectors . 它返回一个列表,其中包含两个名为valuesvectors组件。 (Just type eigen on a line by itself in R to see its source.) (只需在R中eigen在一条线上键入eigen以查看其来源。)

Try ?list from within R for more info and examples. 从R中尝试?list以获取更多信息和示例。

These links may also be helpful: 这些链接也可能有用:

https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Lists-and-data-frames https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Lists-and-data-frames

http://www.homogenisation.org/admin/docs/Lists&DataFrames.pdf http://www.homogenisation.org/admin/docs/Lists&DataFrames.pdf

The object you look for is a list . 您查找的对象是一个list The sub-elements of a list can contain any kind of R object. 列表的子元素可以包含任何类型的R对象。 For example: 例如:

str(list(mtcars, summary(mtcars), c('bla', 'spam', 'ham'), array(runif(1000))))
List of 4
 $ :'data.frame':   32 obs. of  11 variables:
  ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
  ..$ disp: num [1:32] 160 160 108 258 360 ...
  ..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
  ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
  ..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
  ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
  ..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
  ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
  ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
  ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
 $ : 'table' chr [1:6, 1:11] "Min.   :10.40  " "1st Qu.:15.43  " "Median :19.20  " "Mean   :20.09  " ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "" "" "" "" ...
  .. ..$ : chr [1:11] "     mpg" "     cyl" "     disp" "      hp" ...
 $ : chr [1:3] "bla" "spam" "ham"
 $ : num [1:1000(1d)] 0.5061 0.0806 0.4081 0.5038 0.8896 ...

The following tutorial I wrote could be of benefit to you. 我写的以下教程可能对您有所帮助。

I would NOT recommend a list. 我不推荐列表。 Every set of data can be represented in a relational database, which is much easier to work with. 每组数据都可以在关系数据库中表示,这样更容易使用。 In your case, you have two options: 在您的情况下,您有两种选择:

data = data.frame(score = 0.95,
                  confidence_in_score = 0.5,
                  confidence_interval_lower_limit = 0,
                  confidence_interval_upper_limit = 1,
                  token = "foobar")

Or 要么

interval = data.frame(score = 0.95,
                      confidence_in_score = 0.5,
                      token = "foobar")

limit = data.frame(token = c("foobar", "foobar"),
                   type = c("lower_limit", "upper_limit"),
                   value= c(0, 1) )

You can use dplyr and tidyr to easily manipulate this data. 您可以使用dplyr和tidyr轻松操作此数据。 For example, 例如,

library(dplyr)
library(tidyr)

data %>%
  select(token, 
         confidence_interval_lower_limit,
         confidence_interval_upper_limit) %>%
  gather(type, value, -token)

or 要么

interval %>%
  spread(type, value) %>%
  full_join(limit)

Two rules of relational data are: 关系数据的两个规则是:

Never repeat information anywhere except for ID columns 除ID列外,不要在任何地方重复信息

No cell should contain more than one piece of information 任何单元格都不应包含多条信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM