简体   繁体   中英

convert a data frame to another data frame with 0s and 1s in R

I would like to convert data frame A to data frame B

A = data.frame(male = c(3, 5), female = c(1,2))

B = data.frame(male = c(1,1,1,1,1,1,1,1,0,0,0), female = c(0,0,0,0,0,0,0,0,1,1,1))

I have this method

new <- data.frame(male = c(rep(1, sum(male)), rep(0, sum(female))), female = c(rep(0, sum(male)), rep(1, sum(female))))

which gives me the desired data frame.

However, is there a better way to do this as my original data frame (A) is more complex than the example?

UPDATE

The data frame can be more complex in a way such as

A = data.frame(month = c("July", "August"), male = c(5, 3), female = c(2,1))

to be transformed to

data.frame(month = c(rep("July", 5), rep("July", 2), rep("Aug", 3), rep("Aug", 1)),
       male = c(rep(1, 5), rep(0, 2), rep(1, 3), rep(0, 1)),
       female = c(rep(0, 5), rep(1, 2), rep(0, 3), rep(1, 1)))

#    month male female
#1    July    1      0
#2    July    1      0
#3    July    1      0
#4    July    1      0
#5    July    1      0
#6    July    0      1
#7    July    0      1
#8  August    1      0
#9  August    1      0
#10 August    1      0
#11 August    0      1

Thank you.

We can do this in tidyverse . gather the data into 'long' format, then expand the rows by uncount ing the 'val' column, create a column of 1s, grouped by 'month', create a sequence column ('ind'), spread from 'long' to 'wide'

library(tidyverse)
gather(A, sex, val, -month) %>%
    uncount(val) %>% 
    mutate(val = 1) %>%
    group_by(month = factor(month, levels = month.name)) %>% 
    mutate(ind = row_number()) %>%
    spread(sex, val, fill = 0) %>%
    select(month, male, female)
# A tibble: 11 x 3
# Groups:   month [2]
#   month   male female
#   <fct>  <dbl>  <dbl>
# 1 July       1      0
# 2 July       1      0
# 3 July       1      0
# 4 July       1      0
# 5 July       1      0
# 6 July       0      1
# 7 July       0      1
# 8 August     1      0
# 9 August     1      0
#10 August     1      0
#11 August     0      1

Or using similar logic with data.table

library(data.table)
dcast(melt(setDT(A), id.var = 'month')[, rep(1, value), 
 .(month, variable)], month + rowid(month) ~ variable, 
    value.var = 'V1', fill = 0)[, month_1 := NULL][]

data

A <- data.frame(month = c("July", "August"), male = c(5, 3), female = c(2,1))

You can use inverse.rle :

male<-c(1,0)
female<-c(0,1)
inverse.rle(list(lengths=sapply(A,sum),values=male))
 [1] 1 1 1 1 1 1 1 1 0 0 0
inverse.rle(list(lengths=sapply(A,sum),values=female))
 [1] 0 0 0 0 0 0 0 0 1 1 1

Now let's apply this method to your complicated data:

split(A,A$month) %>% # split the data by months
lapply(function(x) data.frame(month=x[,1], # take each month's data, and create a data.frame for it with a month column, and the male and female columns with zeros and ones
  male=inverse.rle(list(lengths=sapply(x[,2:3],sum),values=c(1,0))), # if the data is very big, you might want to do they sapply here outside of this lapply, but I doubt this would make a big difference
  female=inverse.rle(list(lengths=sapply(x[,2:3],sum),values=c(0,1))))) %>%
do.call(dplyr::bind_rows, .) %>% # use do.call to take the list we created and bind it. I'm using dplyr's bind.rows because rbind formats the rows poorly.
arrange(sapply(test$month, function(x) which(x==month.name))) # the rows come out sorted by alphabetical order of months, so this fixes that.

result:

    month male female
1    July    1      0
2    July    1      0
3    July    1      0
4    July    1      0
5    July    1      0
6    July    0      1
7    July    0      1
8  August    1      0
9  August    1      0
10 August    1      0
11 August    0      1

Not sure if there is a package that deals with this but using base R we can use apply

do.call(rbind, apply(A, 1, function(x) {
   y <- as.numeric(x[-1])
  data.frame(month = rep(x[1], sum(y)), male = rep(c(1, 0), c(y[1], y[2])), 
             female = rep(c(0, 1), c(y[1], y[2]))) #Thanks @iod for simplifying
})) 


#    month male female
#1    July    1      0
#2    July    1      0
#3    July    1      0
#4    July    1      0
#5    July    1      0
#6    July    0      1
#7    July    0      1
#8  August    1      0
#9  August    1      0
#10 August    1      0
#11 August    0      1

Here for every row we create a dataframe where first column is the month. We calculate the number of 1's for male from "male" column and number of 0's is subtracted from the total sum - number of males and vice versa for females.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM