简体   繁体   中英

Creating 4 new dataframes from discrete column in R

I have the head() of the dataframe displayed:

Input Dataframe:

  isCool    isTall    isWide   Building
1      0         0         1          0
2      1         1         0          1
3      1         0         1          2
4      0         1         0          3
5      1         0         0          1

Building has 4 (Building 0,1,2,3) unique values and I want to output 4 new dataframes that has a flag. How can i do this in R?

Expected Output:

DF 1 (flags building 0 or not building 0)

  isCool    isTall    isWide Building_0
1      0         0         1          1
2      1         1         0          0
3      1         0         1          0
4      0         1         0          0
5      1         0         0          0

DF2 (flags building 1 or not building 1)

  isCool    isTall    isWide Building_1
1      0         0         1          0
2      1         1         0          1
3      1         0         1          0
4      0         1         0          0
5      1         0         0          1

DF3 (flags building 2 or not building 2)

  isCool    isTall    isWide Building_2
1      0         0         1          0
2      1         1         0          0
3      1         0         1          1
4      0         1         0          0
5      1         0         0          0

DF4 (flags building 3 or not building 3)

  isCool    isTall    isWide Building_3
1      0         0         1          0
2      1         1         0          0
3      1         0         1          0
4      0         1         0          1
5      1         0         0          0

EDIT:

The Building column in the input determines the outputted 4 dataframes. For example, for DF1 there is a flag column Building_0 which flags whether the observation is within building 0 or not. Additionally, for DF2 there is a flag column Building_1 which flags whether or not the observation is within building 1 or not. Each output dataframe will be the same length as the input dataframe.

EDIT 2:

I've created this function based on Vinícius Félix solution. I duplicate 4 lines of code however based on the unqiue values of Building. Is there a way around this to just use the function once to generate 4 DFs?

flag_df <- function(df, colname, num) {
  df %>%
    mutate(colname = if_else(.data[[colname]] == num, 1, 0)) %>% 
    rename_with(.fn = ~paste0(colname,"_", num),.cols = colname) %>%
    dplyr::select(-colname)
}

d_1 <- flag_df(test_df, "Building", 0)
d_2 <- flag_df(test_df, "Building", 1)
d_3 <- flag_df(test_df, "Building", 2)
d_4 <- flag_df(test_df, "Building", 3)

Loop over the unique sort ed values of 'Building', create the new dataset, by appending the first 3 columns with the newly created 'Building' by doing a elementwise comparison ( == ) on the looped value

fn1 <- function(dat, colnm) {
  un1 <- sort(unique(dat[[colnm]]))
  lst1 <- lapply(un1, function(i) {
         tmp <- dat[setdiff(names(dat), colnm)]
         tmp[[paste0(colnm, "_", i)]] <- +(dat[[colnm]] == i)
         tmp
     })
   names(lst1) <- paste("DF_", seq_along(lst1))
   lst1
}

-output

> fn1(df1, "Building")
$`DF_ 1`
  isCool isTall isWide Building_0
1      0      0      1          1
2      1      1      0          0
3      1      0      1          0
4      0      1      0          0
5      1      0      0          0

$`DF_ 2`
  isCool isTall isWide Building_1
1      0      0      1          0
2      1      1      0          1
3      1      0      1          0
4      0      1      0          0
5      1      0      0          1

$`DF_ 3`
  isCool isTall isWide Building_2
1      0      0      1          0
2      1      1      0          0
3      1      0      1          1
4      0      1      0          0
5      1      0      0          0

$`DF_ 4`
  isCool isTall isWide Building_3
1      0      0      1          0
2      1      1      0          0
3      1      0      1          0
4      0      1      0          1
5      1      0      0          0

It is better to keep in a list , but if we need to create multiple objects, use list2env (not recommended)

list2env(lst1, .GlobalEnv)

Or this can be done in an easier way with model.matrix

Map(cbind, list(df1[1:3]), Building = 
       asplit(model.matrix(~  factor(df1$Building)-1), 2))

-output

[[1]]
  isCool isTall isWide Building
1      0      0      1        1
2      1      1      0        0
3      1      0      1        0
4      0      1      0        0
5      1      0      0        0

[[2]]
  isCool isTall isWide Building
1      0      0      1        0
2      1      1      0        1
3      1      0      1        0
4      0      1      0        0
5      1      0      0        1

[[3]]
  isCool isTall isWide Building
1      0      0      1        0
2      1      1      0        0
3      1      0      1        1
4      0      1      0        0
5      1      0      0        0

[[4]]
  isCool isTall isWide Building
1      0      0      1        0
2      1      1      0        0
3      1      0      1        0
4      0      1      0        1
5      1      0      0        0

data

df1 <- structure(list(isCool = c(0L, 1L, 1L, 0L, 1L), isTall = c(0L, 
1L, 0L, 1L, 0L), isWide = c(1L, 0L, 1L, 0L, 0L), Building = c(0L, 
1L, 2L, 3L, 1L)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5"))

Perhaps we can try the code below

lapply(
  sort(unique(df$Building)),
  function(x) {
    transform(
      df,
      Building = +(Building == x)
    )
  }
)

which gives

[[1]]
  isCool isTall isWide Building
1      0      0      1        1
2      1      1      0        0
3      1      0      1        0
4      0      1      0        0
5      1      0      0        0

[[2]]
  isCool isTall isWide Building
1      0      0      1        0
2      1      1      0        1
3      1      0      1        0
4      0      1      0        0
5      1      0      0        1

[[3]]
  isCool isTall isWide Building
1      0      0      1        0
2      1      1      0        0
3      1      0      1        1
4      0      1      0        0
5      1      0      0        0

[[4]]
  isCool isTall isWide Building
1      0      0      1        0
2      1      1      0        0
3      1      0      1        0
4      0      1      0        1
5      1      0      0        0

There is a way to do that, but assign is a very " dangerous " function, so be careful!

library(dplyr)
library(purrr)

flag_building <- function(num){
  df %>%
    mutate(Building = if_else(Building == num,1,0)) %>% 
    rename_with(.fn = ~paste0("Building_",num),.cols = "Building") %>% 
    assign(value = .,x = paste0("df_",num),envir = globalenv() )
}

map(unique(df$Building),.f = flag_building)

ls()

[1] "df"            "df_0"          "df_1"          "df_2"         
[5] "df_3"          "flag_building"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM