[英]How to combine columns based on contingencies?
I have the following df:我有以下 df:
SUMLEV STATE COUNTY AGEGRP TOT_POP TOT_MALE
50 1 1 0 55601 26995
50 7 33 0 218022 105657
50 14 500 0 24881 13133
50 4 70 0 22400 11921
50 3 900 0 57840 28500
50 22 11 0 10138 5527
I would like to make a new columns named CODE
based on the columns state
and county
.我想根据列
state
和county
创建一个名为CODE
的新列。 I would like to paste the number from state
to the number from county
.我想将
state
的号码粘贴到county
的号码中。 However, if county is a single or double digit number, I would like it to have zeroes before it, like 001
and 033
.但是,如果县是一位数或两位数,我希望它前面有零,例如
001
和033
。
Ideally the final df would look like:理想情况下,最终的 df 看起来像:
SUMLEV STATE COUNTY AGEGRP TOT_POP TOT_MALE CODE
50 1 1 0 55601 26995 1001
50 7 33 0 218022 105657 7033
50 14 500 0 24881 13133 14500
50 4 70 0 22400 11921 4070
50 3 900 0 57840 28500 3900
50 22 11 0 10138 5527 22011
Is there a short, elegant way of doing this?有没有一种简短而优雅的方式来做到这一点?
We can use sprintf
我们可以使用
sprintf
library(dplyr)
df %>%
mutate(CODE = sprintf('%d%03d', STATE, COUNTY))
# SUMLEV STATE COUNTY AGEGRP TOT_POP TOT_MALE CODE
#1 50 1 1 0 55601 26995 1001
#2 50 7 33 0 218022 105657 7033
#3 50 14 500 0 24881 13133 14500
#4 50 4 70 0 22400 11921 4070
#5 50 3 900 0 57840 28500 3900
#6 50 22 11 0 10138 5527 22011
If we need to split the column 'CODE' into two, we can use separate
如果我们需要将“CODE”列一分为二,我们可以使用
separate
library(tidyr)
df %>%
mutate(CODE = sprintf('%d%03d', STATE, COUNTY)) %>%
separate(CODE, into = c("CODE1", "CODE2"), sep= "(?=...$)")
Or extract
to capture substrings as a group或
extract
以捕获子串作为一个组
df %>%
mutate(CODE = sprintf('%d%03d', STATE, COUNTY)) %>%
extract(CODE, into = c("CODE1", "CODE2"), "^(.*)(...)$")
Or with str_pad
或者使用
str_pad
library(stringr)
df %>%
mutate(CODE = str_c(STATE, str_pad(COUNTY, width = 3, pad = '0')))
Or in base R
或者在
base R
df$CODE <- sprintf('%d%03d', df$STATE, df$COUNTY)
df <- structure(list(SUMLEV = c(50L, 50L, 50L, 50L, 50L, 50L), STATE = c(1L,
7L, 14L, 4L, 3L, 22L), COUNTY = c(1L, 33L, 500L, 70L, 900L, 11L
), AGEGRP = c(0L, 0L, 0L, 0L, 0L, 0L), TOT_POP = c(55601L, 218022L,
24881L, 22400L, 57840L, 10138L), TOT_MALE = c(26995L, 105657L,
13133L, 11921L, 28500L, 5527L)), class = "data.frame", row.names = c(NA,
-6L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.