简体   繁体   中英

How to create a new column referring to another column?

I've spent reasonable time to figure out of my problem, but I couldn't and I decided to ask here. I have a data set from a survey in which each household has a different identity number. In another column number of individuals are given for that household.

Household ID  Individuals
173           1 
174           1 
174           2
175           1
175           2
175           3

What I would like to do is to create a new column which is referring two other columns in such a way that if there is just one individual for that household I want to have Household ID (173) ; if number of individuals are more than one, for the first individual I want to have Household ID (174), for the second one Househod ID + B (for ex. 174B) and so on. I have used ifelse but didn't get exactly what I want. Namely:

Household ID  Individuals  New Column
 173           1            173 
 174           1            174
 174           2            174B  
 175           1            175
 175           2            175B
 175           3            175C

Thanks in advance.

If we want the output with LETTERS at the end, do a group by 'HouseholdID' and then paste the 'HouseholdID' with the matching LETTERS based on the 'Individuals' sequence

library(dplyr)
library(stringr)
df1 %>% 
  group_by(HouseholdID) %>%
  mutate(NewColumn = if(n() > 1) c(HouseholdID[1], 
          str_c(HouseholdID[-1], LETTERS[Individuals[-1]]))
           else as.character(HouseholdID))
# A tibble: 6 x 3
# Groups:   HouseholdID [3]
#  HouseholdID Individuals NewColumn
#        <int>       <int> <chr>    
#1         173           1 173      
#2         174           1 174      
#3         174           2 174B     
#4         175           1 175      
#5         175           2 175B     
#6         175           3 175C     

Or it can be also done with make.unique

df1$NewColumn <- make.unique(as.character(HouseholdID))

here, instead of LETTERS at the end, the unique identifier is 1, 2, 3

data

df1 <- structure(list(HouseholdID = c(173L, 174L, 174L, 175L, 175L, 
175L), Individuals = c(1L, 1L, 2L, 1L, 2L, 3L)), class = "data.frame", 
row.names = c(NA, 
-6L))

case_when in dplyr package is a good choice for multiple ifelse:

library(tidyverse) ; library(stringr)
df %>% mutate(New = case_when(Individuals == 1 ~ str_c(Household_ID, "", sep = ""),
                              Individuals == 2 ~ str_c(Household_ID, "B", sep = ""),
                              Individuals == 3 ~ str_c(Household_ID, "C", sep = "")))

And here is the result I get:

  Household_ID Individuals  New
1          173           1  173
2          174           1  174
3          174           2 174B
4          175           1  175
5          175           2 175B
6          175           3 175C

PS: For the data part, if you need.

library(data.table)
df = fread("Household_ID  Individuals
            173           1 
            174           1 
            174           2
            175           1
            175           2
            175           3")

But if there are a lot of unique values in Individuals , you can try to create a new column with the alphabet match to each Individuals value, and then create another column to combine the Household ID , drop the alphabet column at last.

df %>% 
  mutate(Letter = LETTERS[Individuals]) %>%
  mutate(New = ifelse(Individuals != 1, 
                      str_c(Household_ID, Letter, sep = ""), 
                      Household_ID)) %>%
  select(-Letter)

Hope this help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM