简体   繁体   English

R 中的压缩数据帧

[英]Condensing Data Frame in R

I just have a simple question, I really appreciate everyones input, you have been a great help to my project.我只是有一个简单的问题,非常感谢大家的意见,你们对我的项目帮助很大。 I have an additional question about data frames in R.我还有一个关于 R 中的数据帧的问题。

I have data frame that looks similar to something like this:我有看起来类似于这样的数据框:

    C <- c("","","","","","","","A","B","D","A","B","D","A","B","D")
    D <- c(NA,NA,NA,2,NA,NA,1,1,4,2,2,5,2,1,4,2)
    G <- list(C=C,D=D)
    T <- as.data.frame(G)
    T
   C  D
1     NA
2     NA
3     NA
4     2
5     NA
6     NA
7     1
8  A  1
9  B  4
10 D  2
11 A  2
12 B  5
13 D  2 
14 A  1
15 B  4
16 D  2 

I would like to be able to condense all the repeat characters into one, and look similar to this:我希望能够将所有重复字符压缩成一个,并且看起来类似于:

    J B C E
  1   2 1
  2 A 1 2 1
  3 B 4 5 4
  4 D 2 2 2

So of course, the data is all the same, it is just that it is condensed and new columns are formed to hold the data.所以当然,数据都是一样的,只是它被压缩并形成了新的列来保存数据。 I am sure there is an easy way to do it, but from the books I have looked through, I haven't seen anything for this!我相信有一个简单的方法可以做到这一点,但从我看过的书中,我没有看到任何关于这个的东西!

EDIT I edited the example because it wasn't working with the answers so far.编辑我编辑了这个例子,因为到目前为止它没有与答案一起工作。 I wonder if the NA's, blanks, and unevenness from the blanks are contributing??我想知道空白中的 NA、空白和不均匀是否有影响?

This seems to get the results you are looking for.似乎得到了你正在寻找的结果。 I'm assuming it's OK to remove the NA values since that matches the desired output you show.我假设可以删除NA值,因为它与您显示的所需 output 匹配。

T <- na.omit(T)
T$ind <- ave(1:nrow(T), T$C, FUN = seq_along)
reshape(T, direction = "wide", idvar = "C", timevar = "ind")
#    C D.1 D.2 D.3
# 4      2   1  NA
# 8  A   1   2   1
# 9  B   4   5   4
# 10 D   2   2   2

library(reshape2)
dcast(T, C ~ ind, value.var = "D", fill = "")
#   C 1 2 3
# 1   2 1  
# 2 A 1 2 1
# 3 B 4 5 4
# 4 D 2 2 2

here´sa reshape solution:这是重塑解决方案:

require(reshape)
cast(T, C ~ ., function(x) x)

Changed T to df to avoid a bad habit.将 T 更改为 df 以避免坏习惯。 Returns a list, which my not be what you want but you can convert from there.返回一个列表,这不是您想要的,但您可以从那里转换。

C <- c("A","B","D","A","B","D","A","B","D")
D <- c(1,4,2,2,5,2,1,4,2)
my.df <- data.frame(id=C,val=D)

ret <- function(x) x
by.df <- by(my.df$val,INDICES=my.df$id,ret)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM