I've spent reasonable time to figure out of my problem, but I couldn't and I decided to ask here. I have a data set from a survey in which each household has a different identity number. In another column number of individuals are given for that household.
Household ID Individuals
173 1
174 1
174 2
175 1
175 2
175 3
What I would like to do is to create a new column which is referring two other columns in such a way that if there is just one individual for that household I want to have Household ID (173) ; if number of individuals are more than one, for the first individual I want to have Household ID (174), for the second one Househod ID + B (for ex. 174B) and so on. I have used ifelse
but didn't get exactly what I want. Namely:
Household ID Individuals New Column
173 1 173
174 1 174
174 2 174B
175 1 175
175 2 175B
175 3 175C
Thanks in advance.
If we want the output with LETTERS at the end, do a group by 'HouseholdID' and then paste
the 'HouseholdID' with the matching LETTERS
based on the 'Individuals' sequence
library(dplyr)
library(stringr)
df1 %>%
group_by(HouseholdID) %>%
mutate(NewColumn = if(n() > 1) c(HouseholdID[1],
str_c(HouseholdID[-1], LETTERS[Individuals[-1]]))
else as.character(HouseholdID))
# A tibble: 6 x 3
# Groups: HouseholdID [3]
# HouseholdID Individuals NewColumn
# <int> <int> <chr>
#1 173 1 173
#2 174 1 174
#3 174 2 174B
#4 175 1 175
#5 175 2 175B
#6 175 3 175C
Or it can be also done with make.unique
df1$NewColumn <- make.unique(as.character(HouseholdID))
here, instead of LETTERS at the end, the unique identifier is 1, 2, 3
df1 <- structure(list(HouseholdID = c(173L, 174L, 174L, 175L, 175L,
175L), Individuals = c(1L, 1L, 2L, 1L, 2L, 3L)), class = "data.frame",
row.names = c(NA,
-6L))
case_when
in dplyr
package is a good choice for multiple ifelse:
library(tidyverse) ; library(stringr)
df %>% mutate(New = case_when(Individuals == 1 ~ str_c(Household_ID, "", sep = ""),
Individuals == 2 ~ str_c(Household_ID, "B", sep = ""),
Individuals == 3 ~ str_c(Household_ID, "C", sep = "")))
And here is the result I get:
Household_ID Individuals New
1 173 1 173
2 174 1 174
3 174 2 174B
4 175 1 175
5 175 2 175B
6 175 3 175C
PS: For the data part, if you need.
library(data.table)
df = fread("Household_ID Individuals
173 1
174 1
174 2
175 1
175 2
175 3")
But if there are a lot of unique values in Individuals
, you can try to create a new column with the alphabet match to each Individuals value, and then create another column to combine the Household ID
, drop the alphabet column at last.
df %>%
mutate(Letter = LETTERS[Individuals]) %>%
mutate(New = ifelse(Individuals != 1,
str_c(Household_ID, Letter, sep = ""),
Household_ID)) %>%
select(-Letter)
Hope this help!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.