[英]Reshaping a dataframe with NA values in R
我有一个带有 NA 值的 dataframe
df <- data.frame("About" = c("Ram","Std 8",NA,NA,NA,"John", "Std 9", NA, NA,NA,NA),
"Questions" = c(NA,NA,"Q1","Q2","Q3",NA,NA,"Q1","Q2","Q3","Q4"),
"Ratings" = c(NA,NA,7,7,7,NA,NA,7,7,7,7), stringsAsFactors = FALSE)
预期的output如下:
expectedOutput <- data.frame("About" = c("Ram","John"),
"Standard" = c("Std 8", "Std 9"),
"Q1" = c(7,7),
"Q2" = c(7,7),
"Q3" = c(7,7),
"Q4" = c(0,7))
我尝试使用reshape
function 来实现这一点
DataTransform <- reshape(df, idvar = "About", v.names = "Ratings", timevar = "Questions", direction = "wide")
任何人都可以通过重塑给定的 dataframe 来帮助我实现预期的 output 吗?
提前致谢!!
一个base R
方法,
df2 <- df # Assigning the df into a new one
要通过创建新列Standard用最后出现的non NA
值填充NA
值,
df2$Standard <- na.omit(df[,1])[cumsum(!is.na(df[,1]))]
同样,在取消包含 Std 的名称后,通过将About列中的所有值替换为非NA
值,出现finaldf
。
df2[grepl("Std",df2[,1]),1] <- NA
df2[,1] <- na.omit(df2[,1])[cumsum(!is.na(df2[,1]))]
finaldf <- df2[!is.na(df2[,"Ratings"]),]
About Questions Ratings Standard
3 Ram Q1 7 Std 8
4 Ram Q2 7 Std 8
5 Ram Q3 7 Std 8
8 John Q1 7 Std 9
9 John Q2 7 Std 9
10 John Q3 7 Std 9
11 John Q4 7 Std 9
这与您使用reshape()
function 所做的部分相同。
out <- reshape(finaldf, idvar = "About", v.names = "Ratings", timevar = "Questions", direction = "wide")
out[is.na(out)] <- 0
colnames(out) <- c("About","Standard","Q1","Q2","Q3","Q4")
给,
About Standard Q1 Q2 Q3 Q4
3 Ram Std 8 7 7 7 0
8 John Std 9 7 7 7 7
这是一个简洁干净的tidyverse
方法。 有两个假设这将起作用:
在学生姓名之后,下一行总是会跟随一个包含"Std"
的字符串。 (如果还有其他模式,您可以通过将它们添加到str_detect
调用来扩展这种方法)。
About
的所有其他行都是 NA。
此外,从您预期的 output 看来,您似乎想将Questions
中的缺失值视为0
。 如果您更喜欢NA
,则可以将values_fill
参数放在pivot_wider
中。
library(tidyverse)
df <- data.frame("About" = c("Ram","Std 8",NA,NA,NA,"John", "Std 9", NA, NA,NA,NA),
"Questions" = c(NA,NA,"Q1","Q2","Q3",NA,NA,"Q1","Q2","Q3","Q4"),
"Ratings" = c(NA,NA,7,7,7,NA,NA,7,7,7,7), stringsAsFactors = FALSE)
df %>%
mutate(About = ifelse(str_detect(lead(About), "Std") & !is.na(About),
paste(About, lead(About)),
NA)) %>%
fill(About) %>%
drop_na(Questions) %>%
pivot_wider(names_from = Questions,
values_from = Ratings,
values_fill = 0
)
#> # A tibble: 2 x 5
#> About Q1 Q2 Q3 Q4
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Ram Std 8 7 7 7 0
#> 2 John Std 9 7 7 7 7
由代表 package (v0.3.0) 于 2020 年 6 月 13 日创建
在使用 reshape 或 pivot_wider 之前,我们需要转换适合这种转换的数据。
library(tidyverse) #for all the awesome packages
library(janitor) #to clean names
df <- data.frame("About" = c("Ram","Std 8",NA,NA,NA,"John", "Std 9", NA, NA,NA,NA),
"Questions" = c(NA,NA,"Q1","Q2","Q3",NA,NA,"Q1","Q2","Q3","Q4"),
"Ratings" = c(NA,NA,7,7,7,NA,NA,7,7,7,7), stringsAsFactors = FALSE)
df %>%
as_tibble() -> df # I like to work with tibble
df
#> # A tibble: 11 x 3
#> About Questions Ratings
#> <chr> <chr> <dbl>
#> 1 Ram <NA> NA
#> 2 Std 8 <NA> NA
#> 3 <NA> Q1 7
#> 4 <NA> Q2 7
#> 5 <NA> Q3 7
#> 6 John <NA> NA
#> 7 Std 9 <NA> NA
#> 8 <NA> Q1 7
#> 9 <NA> Q2 7
#> 10 <NA> Q3 7
#> 11 <NA> Q4 7
#I found I can remove a column out from the above tibble, the below function moves the values to the left if there is a NA
t(apply(df, 1, function(x) c(x[!is.na(x)], x[is.na(x)]))) -> df[]
df
#> # A tibble: 11 x 3
#> About Questions Ratings
#> <chr> <chr> <chr>
#> 1 Ram <NA> <NA>
#> 2 Std 8 <NA> <NA>
#> 3 Q1 " 7" <NA>
#> 4 Q2 " 7" <NA>
#> 5 Q3 " 7" <NA>
#> 6 John <NA> <NA>
#> 7 Std 9 <NA> <NA>
#> 8 Q1 " 7" <NA>
#> 9 Q2 " 7" <NA>
#> 10 Q3 " 7" <NA>
#> 11 Q4 " 7" <NA>
df %>%
clean_names() %>% # no capitals
dplyr::select(-ratings) %>% # removing the extra columns
mutate(questions = questions %>% parse_number()) -> df1 # make the second column numeric
df1
#> # A tibble: 11 x 2
#> about questions
#> <chr> <dbl>
#> 1 Ram NA
#> 2 Std 8 NA
#> 3 Q1 7
#> 4 Q2 7
#> 5 Q3 7
#> 6 John NA
#> 7 Std 9 NA
#> 8 Q1 7
#> 9 Q2 7
#> 10 Q3 7
#> 11 Q4 7
# this for loop will get me a vector for the name column which I can use to append it to the df
name <- as.character()
for(i in 1:nrow(df1)){
if(is.na(df1[i,2])){
if(is.na(df1[i+1,2])){
name <- c(name , as.character(df1[i,1]))
} else {
name <- c(name, NA)
}
} else {
name <- c(name, NA)
}
}
name
#> [1] "Ram" NA NA NA NA "John" NA NA NA NA
#> [11] NA
name %>%
enframe(name = NULL, value = "name") -> name_df #converting vector to tibble
name_df
#> # A tibble: 11 x 1
#> name
#> <chr>
#> 1 Ram
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 <NA>
#> 6 John
#> 7 <NA>
#> 8 <NA>
#> 9 <NA>
#> 10 <NA>
#> 11 <NA>
df1 %>%
bind_cols(name_df)%>% #binding the new column to the original df
mutate(std = ifelse(is.na(questions) & is.na(name), about, NA)) %>% # mutating a new column for standard
fill(name) %>% # this will fill the NA with non NA previous value
fill(std) %>%
drop_na(questions) %>% # dropping unnecessary rows
pivot_wider(names_from = "about", values_from = "questions") -> final_df # now I can use pivot_wider to get the expected result
final_df
#> # A tibble: 2 x 6
#> name std Q1 Q2 Q3 Q4
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Ram Std 8 7 7 7 NA
#> 2 John Std 9 7 7 7 7
由代表 package (v0.3.0) 于 2020 年 6 月 13 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.