[英]How to combine multiple character columns into a single column in an R data frame
I am working with Census data and I need to combine four character columns into a single column.我正在处理人口普查数据,我需要将四个字符列组合成一个列。
Example:例子:
LOGRECNO STATE COUNTY TRACT BLOCK
60 01 001 021100 1053
61 01 001 021100 1054
62 01 001 021100 1055
63 01 001 021100 1056
64 01 001 021100 1057
65 01 001 021100 1058
I want to create a new column that adds the strings of STATE, COUNTY, TRACT, and BLOCK together into a single string.我想创建一个新列,将 STATE、COUNTY、TRACT 和 BLOCK 的字符串添加到一个字符串中。 Example:例子:
LOGRECNO STATE COUNTY TRACT BLOCK BLOCKID
60 01 001 021100 1053 01001021101053
61 01 001 021100 1054 01001021101054
62 01 001 021100 1055 01001021101055
63 01 001 021100 1056 01001021101056
64 01 001 021100 1057 01001021101057
65 01 001 021100 1058 01001021101058
I've tried:我试过了:
AL_Blocks$BLOCK_ID<- paste(c(AL_Blocks$STATE, AL_Blocks$County, AL_Blocks$TRACT, AL_Blocks$BLOCK), collapse = "")
But this combines all rows of all four columns into a single string.但这会将所有四列的所有行组合成一个字符串。
You can use do.call
and paste0
. 您可以使用do.call
和paste0
。 Try: 尝试:
AL_Blocks$BLOCK_ID <- do.call(paste0, AL_Block[c("STATE", "COUNTY", "TRACT", "BLOCK")])
Example output: 输出示例:
do.call(paste0, AL_Blocks[c("STATE", "COUNTY", "TRACT", "BLOCK")])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
do.call(paste0, AL_Blocks[2:5])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
You can also use unite
from "tidyr", like this: 您也可以从“ tidyr”使用unite
,如下所示:
library(tidyr)
library(dplyr)
AL_Blocks %>%
unite(BLOCK_ID, STATE, COUNTY, TRACT, BLOCK, sep = "", remove = FALSE)
# LOGRECNO BLOCK_ID STATE COUNTY TRACT BLOCK
# 1 60 010010211001053 01 001 021100 1053
# 2 61 010010211001054 01 001 021100 1054
# 3 62 010010211001055 01 001 021100 1055
# 4 63 010010211001056 01 001 021100 1056
# 5 64 010010211001057 01 001 021100 1057
# 6 65 010010211001058 01 001 021100 1058
where "AL_Blocks" is provided as: 其中“ AL_Blocks”的提供方式为:
AL_Blocks <- structure(list(LOGRECNO = c("60", "61", "62", "63", "64", "65"),
STATE = c("01", "01", "01", "01", "01", "01"), COUNTY = c("001", "001",
"001", "001", "001", "001"), TRACT = c("021100", "021100", "021100",
"021100", "021100", "021100"), BLOCK = c("1053", "1054", "1055", "1056",
"1057", "1058")), .Names = c("LOGRECNO", "STATE", "COUNTY", "TRACT",
"BLOCK"), class = "data.frame", row.names = c(NA, -6L))
Try this: 尝试这个:
AL_Blocks$BLOCK_ID<- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))
there was a typo in County... it should've been COUNTY. 县里有个错字...本来应该是县。 Also, you don't need the collapse parameter. 另外,您不需要塌陷参数。
I hope that helps. 希望对您有所帮助。
You can try this too 你也可以尝试
AL_Blocks <- transform(All_Blocks, BLOCKID = paste(STATE,COUNTY,
TRACT, BLOCK, sep = "")
Or try this 或者试试这个
DF$BLOCKID <-
paste(DF$LOGRECNO, DF$STATE, DF$COUNTY,
DF$TRACT, DF$BLOCK, sep = "")
(Here is a method to set up the dataframe for people coming into this discussion later) (这是一种为以后要进行讨论的人们设置数据框的方法)
DF <-
data.frame(LOGRECNO = c(60, 61, 62, 63, 64, 65),
STATE = c(1, 1, 1, 1, 1, 1),
COUNTY = c(1, 1, 1, 1, 1, 1),
TRACT = c(21100, 21100, 21100, 21100, 21100, 21100),
BLOCK = c(1053, 1054, 1055, 1056, 1057, 1058))
您可以使用tidyverse
包:
DF %>% unite(new_var, STATE, COUNTY, TRACT, BLOCK)
The new kid on the block is the glue
package:块上的新孩子是glue
package:
library(glue)
my_data %>%
glue::glue("{STATE}{COUNTY}{TRACT}{BLOCK}")
You can both WRITE and READ Text files with any specified "string-separator", not necessarily a character separator. 您可以使用任何指定的“字符串分隔符”(不一定是字符分隔符)来写入和读取文本文件。 This is very useful in many cases when the data has practically all terminal symbols, and thus, no 1 symbol can be used as a separator. 这在数据几乎具有所有终端符号的情况下非常有用,因此,不能将1个符号用作分隔符。 Here are examples of read and write functions: 下面是读取和写入功能的例子:
writeSepText <- function(df, fileName, separator) {
con <- file(fileName)
data <- apply(df, 1, paste, collapse = separator)
# data
data <- writeLines(data, con)
close(con)
return
}
writeSepText(df=as.data.frame(Titanic), fileName="/Users/user/break_sep.txt", separator="<break>")
readSepText <- function(fileName, separator) {
data <- readLines(con <- file(fileName))
close(con)
records <- sapply(data, strsplit, split=separator)
dataFrame <- data.frame(t(sapply(records,c)))
rownames(dataFrame) <- 1: nrow(dataFrame)
return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}
df <- readSepText(fileName="/Users/user/break_sep.txt", separator="<break>"); df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.