简体   繁体   English

用 R 中的 NA 值重塑 dataframe

[英]Reshaping a dataframe with NA values in R

I'm having a dataframe with NA values我有一个带有 NA 值的 dataframe

 df <- data.frame("About" = c("Ram","Std 8",NA,NA,NA,"John", "Std 9", NA, NA,NA,NA),
                 "Questions" = c(NA,NA,"Q1","Q2","Q3",NA,NA,"Q1","Q2","Q3","Q4"),
                 "Ratings" = c(NA,NA,7,7,7,NA,NA,7,7,7,7), stringsAsFactors = FALSE)

The expected output is as follows:预期的output如下:

 expectedOutput <- data.frame("About" = c("Ram","John"),
                             "Standard" = c("Std 8", "Std 9"),
                             "Q1" = c(7,7),
                             "Q2" = c(7,7),
                             "Q3" = c(7,7),
                             "Q4" = c(0,7))

I tried to achieve this using the reshape function我尝试使用reshape function 来实现这一点

DataTransform <- reshape(df, idvar = "About", v.names = "Ratings", timevar = "Questions", direction = "wide")

Can anyone help me to achieve the expected output by reshaping the given dataframe?任何人都可以通过重塑给定的 dataframe 来帮助我实现预期的 output 吗?

Thanks in advance!!提前致谢!!

A base R approach,一个base R方法,

df2 <- df  # Assigning the df into a new one

To fill the NA values with last occured non NA values by creating a new column Standard ,要通过创建新列Standard用最后出现的non NA值填充NA值,

df2$Standard <- na.omit(df[,1])[cumsum(!is.na(df[,1]))] 

Similarly, after canceling out the names includes Std, by replacing all values with non NA values in About column, finaldf appears.同样,在取消包含 Std 的名称后,通过将About列中的所有值替换为非NA值,出现finaldf

df2[grepl("Std",df2[,1]),1] <- NA
df2[,1] <- na.omit(df2[,1])[cumsum(!is.na(df2[,1]))] 
finaldf <- df2[!is.na(df2[,"Ratings"]),]

   About Questions Ratings Standard
3    Ram        Q1       7    Std 8
4    Ram        Q2       7    Std 8
5    Ram        Q3       7    Std 8
8   John        Q1       7    Std 9
9   John        Q2       7    Std 9
10  John        Q3       7    Std 9
11  John        Q4       7    Std 9

This is the same part as you did by using the reshape() function.这与您使用reshape() function 所做的部分相同。

out <- reshape(finaldf, idvar = "About", v.names = "Ratings", timevar = "Questions", direction = "wide")
out[is.na(out)] <- 0
colnames(out) <- c("About","Standard","Q1","Q2","Q3","Q4")

gives,给,

  About Standard Q1 Q2 Q3 Q4
3   Ram    Std 8  7  7  7  0
8  John    Std 9  7  7  7  7

Here is a concise and clean tidyverse approach.这是一个简洁干净的tidyverse方法。 There are two assumptions that this will work:有两个假设这将起作用:

  1. After a students name there will always follow a string containing "Std" in the next row.在学生姓名之后,下一行总是会跟随一个包含"Std"的字符串。 (If there are other patterns as well you can extent this approach by adding them to the str_detect call). (如果还有其他模式,您可以通过将它们添加到str_detect调用来扩展这种方法)。

  2. All other rows of About are NA. About的所有其他行都是 NA。

Further, from your expected output it seems like you want to treat missing values in Questions as 0 .此外,从您预期的 output 看来,您似乎想将Questions中的缺失值视为0 If you prefer NA instead, you can drop the values_fill argument in pivot_wider .如果您更喜欢NA ,则可以将values_fill参数放在pivot_wider中。

library(tidyverse)

df <- data.frame("About" = c("Ram","Std 8",NA,NA,NA,"John", "Std 9", NA, NA,NA,NA),
                "Questions" = c(NA,NA,"Q1","Q2","Q3",NA,NA,"Q1","Q2","Q3","Q4"),
                "Ratings" = c(NA,NA,7,7,7,NA,NA,7,7,7,7), stringsAsFactors = FALSE)

df %>%
  mutate(About = ifelse(str_detect(lead(About), "Std") & !is.na(About),
                       paste(About, lead(About)),
                       NA)) %>%
  fill(About) %>% 
  drop_na(Questions) %>% 
  pivot_wider(names_from = Questions,
              values_from = Ratings,
              values_fill = 0
  )

#> # A tibble: 2 x 5
#>   About         Q1    Q2    Q3    Q4
#>   <chr>      <dbl> <dbl> <dbl> <dbl>
#> 1 Ram Std 8      7     7     7     0
#> 2 John Std 9     7     7     7     7

Created on 2020-06-13 by the reprex package (v0.3.0)代表 package (v0.3.0) 于 2020 年 6 月 13 日创建

Before using reshape or pivot_wider we need to convert our data suitable for such transformation.在使用 reshape 或 pivot_wider 之前,我们需要转换适合这种转换的数据。

library(tidyverse) #for all the awesome packages
library(janitor) #to clean names


df <- data.frame("About" = c("Ram","Std 8",NA,NA,NA,"John", "Std 9", NA, NA,NA,NA),
                 "Questions" = c(NA,NA,"Q1","Q2","Q3",NA,NA,"Q1","Q2","Q3","Q4"),
                 "Ratings" = c(NA,NA,7,7,7,NA,NA,7,7,7,7), stringsAsFactors = FALSE)

df %>%
  as_tibble() -> df # I like to work with tibble

df
#> # A tibble: 11 x 3
#>    About Questions Ratings
#>    <chr> <chr>       <dbl>
#>  1 Ram   <NA>           NA
#>  2 Std 8 <NA>           NA
#>  3 <NA>  Q1              7
#>  4 <NA>  Q2              7
#>  5 <NA>  Q3              7
#>  6 John  <NA>           NA
#>  7 Std 9 <NA>           NA
#>  8 <NA>  Q1              7
#>  9 <NA>  Q2              7
#> 10 <NA>  Q3              7
#> 11 <NA>  Q4              7


#I found I can remove a column out from the above tibble, the below function moves the values to the left if there is a NA

t(apply(df, 1, function(x) c(x[!is.na(x)], x[is.na(x)]))) -> df[] 

df
#> # A tibble: 11 x 3
#>    About Questions Ratings
#>    <chr> <chr>     <chr>  
#>  1 Ram    <NA>     <NA>   
#>  2 Std 8  <NA>     <NA>   
#>  3 Q1    " 7"      <NA>   
#>  4 Q2    " 7"      <NA>   
#>  5 Q3    " 7"      <NA>   
#>  6 John   <NA>     <NA>   
#>  7 Std 9  <NA>     <NA>   
#>  8 Q1    " 7"      <NA>   
#>  9 Q2    " 7"      <NA>   
#> 10 Q3    " 7"      <NA>   
#> 11 Q4    " 7"      <NA>


df %>% 
  clean_names() %>%  # no capitals
  dplyr::select(-ratings) %>% # removing the extra columns
  mutate(questions = questions %>% parse_number()) -> df1 # make the second column numeric


df1
#> # A tibble: 11 x 2
#>    about questions
#>    <chr>     <dbl>
#>  1 Ram          NA
#>  2 Std 8        NA
#>  3 Q1            7
#>  4 Q2            7
#>  5 Q3            7
#>  6 John         NA
#>  7 Std 9        NA
#>  8 Q1            7
#>  9 Q2            7
#> 10 Q3            7
#> 11 Q4            7

# this for loop will get me a vector for the name column which I can use to append it to the df

name <- as.character()
for(i in 1:nrow(df1)){

  if(is.na(df1[i,2])){
    if(is.na(df1[i+1,2])){
      name <- c(name , as.character(df1[i,1]))
    } else {
      name <- c(name, NA)
    }
  } else {
    name <- c(name, NA)
  }

}

name 
#>  [1] "Ram"  NA     NA     NA     NA     "John" NA     NA     NA     NA    
#> [11] NA

name %>% 
  enframe(name = NULL, value = "name") -> name_df #converting vector to tibble

name_df 
#> # A tibble: 11 x 1
#>    name 
#>    <chr>
#>  1 Ram  
#>  2 <NA> 
#>  3 <NA> 
#>  4 <NA> 
#>  5 <NA> 
#>  6 John 
#>  7 <NA> 
#>  8 <NA> 
#>  9 <NA> 
#> 10 <NA> 
#> 11 <NA>

df1 %>% 
  bind_cols(name_df)%>% #binding the new column to the original df
  mutate(std = ifelse(is.na(questions) & is.na(name), about, NA)) %>% # mutating a new column for standard
  fill(name) %>% # this will fill the NA with non NA previous value
  fill(std) %>% 
  drop_na(questions) %>% # dropping unnecessary rows
  pivot_wider(names_from = "about", values_from = "questions") -> final_df # now I can use pivot_wider to get the expected result

final_df
#> # A tibble: 2 x 6
#>   name  std      Q1    Q2    Q3    Q4
#>   <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Ram   Std 8     7     7     7    NA
#> 2 John  Std 9     7     7     7     7

Created on 2020-06-13 by the reprex package (v0.3.0)代表 package (v0.3.0) 于 2020 年 6 月 13 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM