简体   繁体   English

在R中绑定-放置值,获取级别索引

[英]cbind in R - putting values, getting level indices

I think my question is somewhat similar to this one. 我认为我的问题与此类似。 cbind is changing the values of the vector I am using (or using references to the values) I am basically getting data from a data frame and then organizing them in columns according to a certain factor (interface type). cbind正在更改我正在使用的向量的值(或使用对值的引用),我基本上是从数据帧中获取数据,然后根据某个因素(接口类型)将它们组织在列中。 I think it has something to do with the levels, there, but I am not sure what those even mean right now. 我认为这与那里的水平有关,但是我不确定这些现在意味着什么。 Here is what I ma doing and the results I am getting: 这是我正在做的事情以及得到的结果:

#Grouping subjects number of collisions data according to the interface they used
> ui1NumCollisions = dout$numCollisions[ dout$Interface=="0"]
> ui2NumCollisions = dout$numCollisions[ dout$Interface=="1"]
> ui3NumCollisions = dout$numCollisions[ dout$Interface=="2"]
> ui4NumCollisions = dout$numCollisions[ dout$Interface=="3"]
#checking data
> ui1NumCollisions
 [1] 43,  30,  37,  6,   22,  9,   19,  9,   14,  106, 50,  53, 
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
> ui2NumCollisions
 [1] 17, 16, 23, 12, 15, -1, 11, 26, 19, 32, 36, 13,
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
> ui3NumCollisions
 [1] 17, 38, 16, 13, 42, 50, 10, 17, 2,  28, 14, 30,
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
> ui4NumCollisions
 [1] 42, 28, 22, 36, 10, 25, 45, 48, 18, 11, 21, 7, 
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
#Creates matrix with each column containing collision data for each interface
#(I think)
> uiNumCollisions = cbind( '1' = ui1NumCollisions
+                        , '2' = ui2NumCollisions
+                        , '3' = ui3NumCollisions
+                        , '4' = ui4NumCollisions)
#checking matrix values
> uiNumCollisions
       1  2  3  4
 [1,] 26 10 10 25
 [2,] 20  9 24 19
 [3,] 23 16  9 15
 [4,] 31  5  6 22
 [5,] 15  8 25  2
 [6,] 33  1 29 17
 [7,] 12  4  2 27
 [8,] 33 18 10 28
 [9,]  7 12 13 11
[10,]  3 21 19  4
[11,] 29 22  7 14
[12,] 30  6 20 32
> uiNumCollisionsSummary = summary(uiNumCollisions)
> uiNumCollisionsSummary
       1               2               3              4        
 Min.   : 3.00   Min.   : 1.00   Min.   : 2.0   Min.   : 2.00  
 1st Qu.:14.25   1st Qu.: 5.75   1st Qu.: 8.5   1st Qu.:13.25  
 Median :24.50   Median : 9.50   Median :11.5   Median :18.00  
 Mean   :21.83   Mean   :11.00   Mean   :14.5   Mean   :18.00  
 3rd Qu.:30.25   3rd Qu.:16.50   3rd Qu.:21.0   3rd Qu.:25.50  
 Max.   :33.00   Max.   :22.00   Max.   :29.0   Max.   :32.00 

Notice that 106 is not part of column 1, nor is it the maximum value there, but instead 33. So, why are the values in uiNumCollisions different from the individual columns (ui1NumCollisions, ui2NumCollisions, etc.)? 请注意,106不是列1的一部分,也不是列中的最大值,而是33的最大值。那么,为什么uiNumCollisions中的值与各个列(ui1NumCollisions,ui2NumCollisions等)不同? It seems like I am getting the indices of the values from levels table. 看来我正在从级别表中获取值的索引。 What I really wanted were the values themselves. 我真正想要的是价值观本身。 This should have a simple answer I assume. 我认为这应该有一个简单的答案。 I looked at a bunch of problems related to data binding, but could not figure out a solution to this problem using what I have found. 我看了一堆与数据绑定有关的问题,但无法使用我发现的方法找出解决该问题的方法。 What am I missing here? 我在这里想念什么?

I thank in advance for the help. 我先感谢您的帮助。 Sincerely, 真诚的

Paulo. 保罗

/-------FOLLOW - UP based on reply from DWin------- / -------关注-根据DWin的回复-------

Thanks for the reply. 谢谢回复。 The solution of applying the data.frame to uiNumCollisions worked in getting the right data in there. 将data.frame应用于uiNumCollisions的解决方案可以在其中获取正确的数据。 However, when I apply the summary function: 但是,当我应用摘要功能时:

uiNumCollisionsSummary = summary(uiNumCollisions)

I no longer get the statistics I used to (mean, median, etc.). 我不再获得以前使用的统计信息(平均值,中位数等)。 Why is that? 这是为什么?

In addition, after that, I want to apply a boxplot to uiNumCollisions and the an anova. 另外,在那之后,我想对uiNumCollisions和方差分析应用箱形图。 For the boxplot, what I use is the following: 对于箱线图,我使用以下内容:

par( fig=c(0.0,1.0,0.0,1.0))
temp = boxplot( uiNumCollisions)

The result I get for the boxplot is 我得到的箱线图的结果是

"Error in oldClass(stats) <- cl :  adding class "factor" to an invalid object"

For the ANOVA I was using the following code: 对于ANOVA,我使用以下代码:

temp = c(ui1NumCollisions, ui2NumCollisions, ui3NumCollisions, ui4NumCollisions)
temp.type = rep(c("1", "2", "3", "4"), c(12,12,12,12))
temp.type = factor(temp.type)
options(contrasts = c("contr.helmert", "contr.poly"))
uiNumCollisionsAOV = aov(temp ~ temp.type)
summary(uiNumCollisionsAOV)

However, this obviously will not work unless I convert each column to something else. 但是,除非我将每一列都转换为其他内容,否则这显然将无法工作。 I tried different fixes, like reapplying factors to each column (eg: ui1NumCollisions = factor(ui1NumCollisions)) . 我尝试了不同的修复方法,例如将因子重新应用于每列(例如: ui1NumCollisions = factor(ui1NumCollisions)) That fixed the factor levels, but when I went to convert back to numeric values using something like as.numeric(levels(ui1NumCollisions)[ui1NumCollisions]) , I only got NAs. 这固定了因子水平,但是当我使用as.numeric(levels(ui1NumCollisions)[ui1NumCollisions])类的东西转换回数值时,我只有NA。 Hence,indeed, your solution worked and I really appreciate it, but it does not completely resolve my problem. 因此,实际上,您的解决方案有效,我非常感谢,但是它不能完全解决我的问题。 Is there an easies around? 周围容易吗? Perhaps to simply import the dout table in a way I can get all the data without the factors that could then resolve all the factor issues I am having? 也许只是以一种我可以获取所有数据而又没有可以解决我遇到的所有因素问题的因素的方式简单地导入dout表的方法?

/-------FOLLOW - UP #2------- / -------关注-UP#2 -------

I finally found what the problem was. 我终于找到了问题所在。 There were commas between data instead of simply spaces. 数据之间有逗号而不是简单的空格。 The file, data.out looked like this: 文件data.out如下所示:

Subject, uiType, numCollisions, startTimeTraining, startTime, endTime, detlaTraining, deltaTask
0, 0, 43, 0, 510.261, 1743.75, 510.261, 1233.49
1, 1, 17, 0, 1198.65, 2044.62, 1198.65, 845.965
2, 2, 17, 0, 445.788, 1622.83, 445.788, 1177.04
3, 3, 42, 0, 254.793, 1196.93, 254.793, 942.132
4, 1, 16, 0, 1583.5, 2887.39, 1583.5, 1303.9
5, 2, 38, 0, 79.095, 886.533, 79.095, 1287.438
6, 3, 28, 0, 866.75, 1617.48, 866.75, 750.73
7, 1, 23, 0, 565.575, 1361.79, 565.575, 796.216
8, 2, 16, 0, 1211.99, 2538.37, 1211.99, 1326.38
...

And it was supposed to look like this. 它应该看起来像这样。

Subject uiType numCollisions startTimeTraining startTime endTime detlaTraining deltaTask
0 0 43 0 510.261 1743.75 510.261 1233.49
1 1 17 0 1198.65 2044.62 1198.65 845.965
2 2 17 0 445.788 1622.83 445.788 1177.04
3 3 42 0 254.793 1196.93 254.793 942.132
4 1 16 0 1583.5 2887.39 1583.5 1303.9
5 2 38 0 79.095 886.533 79.095 1287.438
6 3 28 0 866.75 1617.48 866.75 750.73
7 1 23 0 565.575 1361.79 565.575 796.216
8 2 16 0 1211.99 2538.37 1211.99 1326.38
...

When I loaded the data table using these lines: 当我使用以下行加载数据表时:

numSamples = 8#or more
dout = read.table("data.out", header = TRUE)
dout = dout[1:numSamples,]
dout

I would get a weird table filled with integers attached to commas, which messed up my data conversion to numbers and were giving me those factors. 我会得到一个奇怪的表,里面充满了逗号附加的整数,这将我的数据转换成数字,并给了我这些因素。

After I fixed that, the original code worked like a charm. 在我修复该问题之后,原始代码就像一个魅力。

I appreciate the help from DWin and the opportunity to post this issue here, even though it was a rather silly mistake of my part. 我感谢DWin的帮助,也有机会在这里发布此问题,尽管这对我来说是一个很愚蠢的错误。

Lesson learned: double-check your data after you wake-up instead of before going to bed. 经验教训: 唤醒后而不是睡觉之前,请仔细检查您的数据。

Thanks, 谢谢,

Paulo. 保罗

Because you extracted those factor columns as vectors they lost the 'data.frame' class. 因为您将这些因子列提取为向量,所以它们丢失了“ data.frame”类。 So it was not so much changing the labels as it was loosing htem entirely. 因此,与其说是改变标签,不如说是彻底失去了标签。 When you used cbind, the result was a matrix. 使用cbind时,结果是一个矩阵。 Matrices loose any factor attributes. 矩阵会松散任何因子属性。 Factor labels are in the attributes. 因子标签在属性中。 So the content of the matrix became the factor indices rather than the factor labels. 因此,矩阵的内容成为因子索引而不是因子标签。 If instead of using cbind you had used the data.frame function your labels would have remained intact. 如果不是使用cbind而是使用了data.frame函数,则标签将保持完整。 You probably don't want to have your column names be digits, though. 不过,您可能不希望列名是数字。

uiNumCollisions = data.frame( one = ui1NumCollisions
                    , two = ui2NumCollisions
                    , three = ui3NumCollisions
                    , four = ui4NumCollisions)

It might help if you looked at : 如果您查看以下内容,可能会有所帮助:

str(ui1NumCollisions)
attributes(ui1NumCollisions)

Strategy 2: You could have kept the NumCollisions extracts as data.frames with: 策略2:您可以将NumCollisions提取物保留为data.frames,并具有以下内容:

 ui1NumCollisions = dout[ dout$Interface=="0", "numCollisions", 
                                              drop=FALSE]

Then you would be using cbind.data.frame (behind the scenes) when you called cbind 然后,你将使用cbind.data.frame (幕后),当你打电话cbind

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM