I have a dataframe like:
library(tidyverse)
df_mess <- tibble::tribble(
~id, ~value, ~answer_text,
123, 25, "age",
123, NA, "female",
234, 29, "age",
234, NA, "male",
345, 14, "age",
345, NA, "female"
)
I would like to reshape in a way to have "tidy" data, aka 1 row for each observation.
df <- tibble::tribble(
~id, ~age, ~sex,
123, 25, "female",
234, 29, "male",
345, 14, "female"
)
I tried a version of gather
/ spread
, but I had no luck.
Any lead is appreciated.
If the structure of the your data is always the same I would do something like:
df_mess$new <- lead(df_mess$answer_text)
df_mess <- subset(df_mess,df_mess$value>0)
but this is a possible solution only for this particular case.
Here is a solution with spread and gather. The spread
will get all variables like age
where the name of the variable appears in the answer_text column. If the values of the variable are in the answer_text column (like sex in this case), you will need to gather
these back like below.
In order to get the sex column to work, I changed the NAs
in value
to -99. You could use any value though. If you spread without something in the value
column, it will show as NA
in the female
and male
columns that are created from the spread.
df_mess[is.na(df_mess)] <- -99
df_mess %>%
spread(answer_text, value) %>%
gather(sex, temp, female, male, na.rm = TRUE) %>%
select(-temp)
output
# A tibble: 3 x 3
id age sex
<dbl> <dbl> <chr>
1 123 25 female
2 345 14 female
3 234 29 male
Example with more variables and a legitimate NA
in the size
variable for id
123.
df_mess <- tibble::tribble(
~id, ~value, ~answer_text,
123, 25, "age",
123, NA, "female",
234, 29, "age",
234, NA, "male",
345, 14, "age",
345, NA, "female",
123, NA, "brown",
234, NA, "blonde",
345, NA, "black",
123, NA, "size",
234, 30, "size",
345, 40, "size",
)
df_mess[is.na(df_mess)] <- -99
df_clean <- df_mess %>%
spread(answer_text, value) %>%
gather(sex, temp, female, male, na.rm = TRUE) %>%
select(-temp) %>%
gather(hair, temp, black:brown, na.rm = TRUE) %>%
select(-temp)
df_clean[df_clean == -99] <- NA
df_clean
output
id age size sex hair
<dbl> <dbl> <dbl> <chr> <chr>
1 345 14 40 female black
2 234 29 30 male blonde
3 123 25 NA female brown
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.