简体   繁体   English

在R中为3D图重建数据框

[英]Restructuring a data frame for 3D plots in R

I realize often times that 3D plots are not the most efficient way to present a set of data, but previous 2D plots I've made for a particular dataset seem to indicate that a 3D plot would help to break the information into more distinct clusters for analysis. 我常常意识到,3D绘图并不是表示一组数据的最有效方法,但是我为特定数据集制作的先前2D绘图似乎表明3D绘图将有助于将信息分解为更多的不同簇。分析。 That being said, I've never done this in R and I'm having trouble restructuring my data frame before making a 3D scatterplot using plot3d(). 话虽这么说,但我从未在R中做到这一点,而且在使用plot3d()创建3D散点图之前,我在重组数据帧时遇到麻烦。

At the moment, my data frame has 2 columns and a few thousand rows of information. 目前,我的数据框有2列和数千行信息。 Column one is an identifier, A,B,C... and Column 2 is one measured feature for that identifier. 第一列是标识符,A,B,C ...,第二列是该标识符的一个测量特征。

Ex 防爆

ID Area 
A   1.2
A   3.0
A   2.7
B   1.4
B   2.5
C   4.3
C   2.1
C   1.7

I will plot the area on the Y axis. 我将在Y轴上绘制面积。 Using a function like table(), I can get the number of times A, B, or C occur: (A=3,B=2,C=3) and this value will become the x coordinate for all the IDs with that result. 使用诸如table()之类的函数,我可以获得A,B或C发生的次数:(A = 3,B = 2,C = 3),该值将成为具有该ID的所有ID的x坐标结果。 But what I would like to do is have that information also put into a third column that assigns a unique z for the given x coordinate. 但是我想做的是将该信息也放入第三列,该列为给定的x坐标分配唯一的z。 In other words, Z should represent how many times a given X has shown up, and would increase by 1 for each new instance of a particular X. Ultimately, the reason is so that area values (y) for all the objects within a particular ID are stacked above each other over a unique x,z coordinate. 换句话说,Z应该表示给定X出现了多少次,并且对于特定X的每个新实例将增加1。最终,原因是,特定X内所有对象的面积值(y) ID在唯一的x,z坐标上彼此堆叠。 This is where I am stuck. 这就是我卡住的地方。 Essentially, I would want the final data frame output given the above input to look like this: 本质上,我希望给出上述输入的最终数据帧输出看起来像这样:

ID(x) Area(y)  Z
    3    1.2   1
    3    3.0   1
    3    2.7   1
    2    1.4   1
    2    2.5   1
    3    4.3   2
    3    2.1   2
    3    1.7   2 

We could do this in a couple of ways. 我们可以通过两种方式来做到这一点。

1. base R - aggregate/ave 1.基数R-聚合/平均

We can use aggregate to get the length of each elements ('IDx') in 'ID' column, transform the output dataset ('dfN') by creating the 'Z' column based on the duplicate elements in the 'IDx' and 'merge' the 'dfN' with the original dataset 'df1' 我们可以使用aggregate来获取“ ID”列中每个元素(“ IDx”)的长度,通过基于“ IDx”和“”中的重复元素创建“ Z”列来转换输出数据集(“ dfN”)将'dfN'与原始数据集'df1'合并

dfN <- aggregate(cbind(IDx=seq_along(ID))~ID, df1, FUN=length)
dfN$Z <- with(dfN, ave(IDx, IDx, FUN=function(x) cumsum(duplicated(x))+1L))
 merge(df1, dfN, by='ID')[-1]
 #  Area IDx Z
 #1  1.2   3 1
 #2  3.0   3 1
 #3  2.7   3 1
 #4  1.4   2 1
 #5  2.5   2 1
 #6  4.3   3 2
 #7  2.1   3 2
 #8  1.7   3 2

2. base R - ave/rle 2.基本R-ave / rle

We can create the 'IDx' column with ave and then use `rle/inverse.rle' to create the 'Z' column 我们可以使用ave创建“ IDx”列,然后使用“ rle / inverse.rle”创建“ Z”列

 df1$IDx <- with(df1, ave(seq_along(ID), ID, FUN=length))
 v1 <- with(df1, paste0(ID, IDx))
 df1$Z <- inverse.rle(within.list(rle(v1), values <-ave(lengths, 
             lengths, FUN=function(x) cumsum(duplicated(x))+1L)))
 df1
 #  ID Area IDx Z
 #1  A  1.2   3 1
 #2  A  3.0   3 1
 #3  A  2.7   3 1
 #4  B  1.4   2 1
 #5  B  2.5   2 1
 #6  C  4.3   3 2
 #7  C  2.1   3 2
 #8  C  1.7   3 2

3. data.table 3. data.table

Convert the 'data.frame' to 'data.table' ( setDT ), create the 'IDx' ie the nrows ( .N ), grouped by 'ID'. 将'data.frame'转换为'data.table'( setDT ),创建'IDx',即按'ID'分组的nrows( .N )。 Based on the duplicate elements in 'IDx', we can create the 'Z' column. 基于“ IDx”中的重复元素,我们可以创建“ Z”列。 Set the key as 'ID' ( setkey ), join with 'df1', and assign the unnecessary column to NULL ( ID:= NULL ) 将键设置为'ID'( setkey ),与'df1' setkey ,并将不必要的列分配为NULL( ID:= NULL

library(data.table)
setkey(setDT(df1)[, list(IDx=.N), by = ID][, IDx1:= IDx][,
     list(ID,Z=cumsum(duplicated(IDx1))+1L) , IDx], ID)[df1][, ID := NULL][]

#   IDx Z Area
#1:   3 1  1.2
#2:   3 1  3.0
#3:   3 1  2.7
#4:   2 1  1.4
#5:   2 1  2.5
#6:   3 2  4.3
#7:   3 2  2.1
#8:   3 2  1.7

4. dplyr 4. dplyr

The idea is similar as above. 这个想法与上面类似。 Instead of 'merge', we use left_join 代替“合并”,我们使用left_join

library(dplyr)
left_join(df1, 
            df1 %>% 
              group_by(ID) %>% 
              summarise(IDx=n()) %>% 
              group_by(IDx) %>%
              mutate(Z=cumsum(duplicated(IDx))+1L), by='ID') %>% 
              select(-ID)
 #  Area IDx Z
 #1  1.2   3 1
 #2  3.0   3 1
 #3  2.7   3 1
 #4  1.4   2 1
 #5  2.5   2 1
 #6  4.3   3 2
 #7  2.1   3 2
 #8  1.7   3 2

NOTE: Tested this with another dataset 'df2' 注意:使用另一个数据集“ df2”对此进行了测试

data 数据

df1 <- structure(list(ID = c("A", "A", "A", "B", "B", "C", "C", "C"), 
Area = c(1.2, 3, 2.7, 1.4, 2.5, 4.3, 2.1, 1.7)), .Names = c("ID", 
"Area"), class = "data.frame", row.names = c(NA, -8L))

df2 <-  structure(list(ID = c("A", "A", "A", "B", "B", "C", "C", "C", 
"D", "D", "D", "E", "E", "F"), Area = c(1.2, 3, 2.7, 1.4, 2.5, 
4.3, 2.1, 1.7, 1.2, 1.4, 2.1, 1.2, 1.5, 2.3)), .Names = c("ID", 
"Area"), class = "data.frame", row.names = c(NA, -14L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM