简体   繁体   中英

create a matrix from data frame

I have a data frame with categorical values

Names   Dis   Del
    A   0-2   0-2
    A   2-4   0-2
    A   6-8   6-8
    B  8-10  8-10
    C   10+   10+

What I want is output in the number of count as per this data

       0-2  2-4  6-8  8-10  10+      
 0-2     1                       
 2-4     1                    
 6-8               1           
8-10                     1   
 10+                          1  

I also want to export this data which was created out of this data frame.

From the comments of @mtoto & @jogo:

table(mydf[-1])

or:

xtabs(data=mydf, ~ Dis+Del)

Both give:

      Del
Dis    0-2 10+ 6-8 8-10
  0-2    1   0   0    0
  10+    0   1   0    0
  2-4    1   0   0    0
  6-8    0   0   1    0
  8-10   0   0   0    1

If you want to get the levels in the correct order ( 10+ as last one):

mydf$Dis <- factor(mydf$Dis, levels = c("0-2","2-4","6-8","8-10","10+"))
mydf$Del <- factor(mydf$Del, levels = c("0-2","6-8","8-10","10+"))

Now you get:

      Del
Dis    0-2 6-8 8-10 10+
  0-2    1   0    0   0
  2-4    1   0    0   0
  6-8    0   1    0   0
  8-10   0   0    1   0
  10+    0   0    0   1

Used data:

mydf <- read.table(text="Names   Dis   Del
    A   0-2   0-2
    A   2-4   0-2
    A   6-8   6-8
    B  8-10  8-10
    C   10+   10+", header=TRUE)

I think you're looking for the dcast function from the reshape2 package.

df <- data.frame(Dis = c("0-2","2-4", "6-8", "8-10", "10+"),
                 Del = c("0-2", "0-2", "6-8", "8-10", "10+"))

Convert the columns you want to reshape by to factors.

df$Dis <- as.factor(df$Dis)
df$Del <- as.factor(df$Del)

Add a count columnt to reduce:

df$counts <- 1

Then apply the dcast function. We use the two named columns to set us cols/rows of the new matrix. The fun.aggregate ensures that if you have multiple occurrences of the same combination then you get the count of the occurrences. If you want a binary 0/1, then set this to max

wide_df <- dcast(df,
                  Dis ~ Del,
                  value.var = "counts",
                  fun.aggregate = sum)

Here is the result:

print(wide_df)
   Dis 0-2 10+ 6-8 8-10
1  0-2   1   0   0    0
2  10+   0   1   0    0
3  2-4   1   0   0    0
4  6-8   0   0   1    0
5 8-10   0   0   0    1

To get the same ordering as in your question, you can set the factors in the first step to be ordered with whatever order you desire.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM