简体   繁体   English

如何将多个字符列合并为 R 数据框中的单个列

[英]How to combine multiple character columns into a single column in an R data frame

I am working with Census data and I need to combine four character columns into a single column.我正在处理人口普查数据,我需要将四个字符列组合成一个列。

Example:例子:

LOGRECNO STATE COUNTY  TRACT BLOCK
    60    01    001  021100  1053
    61    01    001  021100  1054
    62    01    001  021100  1055
    63    01    001  021100  1056
    64    01    001  021100  1057
    65    01    001  021100  1058

I want to create a new column that adds the strings of STATE, COUNTY, TRACT, and BLOCK together into a single string.我想创建一个新列,将 STATE、COUNTY、TRACT 和 BLOCK 的字符串添加到一个字符串中。 Example:例子:

LOGRECNO STATE COUNTY  TRACT BLOCK  BLOCKID
    60    01    001  021100  1053   01001021101053
    61    01    001  021100  1054   01001021101054
    62    01    001  021100  1055   01001021101055
    63    01    001  021100  1056   01001021101056
    64    01    001  021100  1057   01001021101057
    65    01    001  021100  1058   01001021101058

I've tried:我试过了:

AL_Blocks$BLOCK_ID<- paste(c(AL_Blocks$STATE, AL_Blocks$County, AL_Blocks$TRACT,    AL_Blocks$BLOCK), collapse = "")

But this combines all rows of all four columns into a single string.但这会将所有四列的所有行组合成一个字符串。

You can use do.call and paste0 . 您可以使用do.callpaste0 Try: 尝试:

AL_Blocks$BLOCK_ID <- do.call(paste0, AL_Block[c("STATE", "COUNTY", "TRACT", "BLOCK")])

Example output: 输出示例:

do.call(paste0, AL_Blocks[c("STATE", "COUNTY", "TRACT", "BLOCK")])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"
do.call(paste0, AL_Blocks[2:5])
# [1] "010010211001053" "010010211001054" "010010211001055" "010010211001056"
# [5] "010010211001057" "010010211001058"

You can also use unite from "tidyr", like this: 您也可以从“ tidyr”使用unite ,如下所示:

library(tidyr)
library(dplyr)
AL_Blocks %>% 
  unite(BLOCK_ID, STATE, COUNTY, TRACT, BLOCK, sep = "", remove = FALSE)
#   LOGRECNO        BLOCK_ID STATE COUNTY  TRACT BLOCK
# 1       60 010010211001053    01    001 021100  1053
# 2       61 010010211001054    01    001 021100  1054
# 3       62 010010211001055    01    001 021100  1055
# 4       63 010010211001056    01    001 021100  1056
# 5       64 010010211001057    01    001 021100  1057
# 6       65 010010211001058    01    001 021100  1058

where "AL_Blocks" is provided as: 其中“ AL_Blocks”的提供方式为:

AL_Blocks <- structure(list(LOGRECNO = c("60", "61", "62", "63", "64", "65"), 
    STATE = c("01", "01", "01", "01", "01", "01"), COUNTY = c("001", "001", 
    "001", "001", "001", "001"), TRACT = c("021100", "021100", "021100", 
    "021100", "021100", "021100"), BLOCK = c("1053", "1054", "1055", "1056",
    "1057", "1058")), .Names = c("LOGRECNO", "STATE", "COUNTY", "TRACT", 
    "BLOCK"), class = "data.frame", row.names = c(NA, -6L))

Try this: 尝试这个:

AL_Blocks$BLOCK_ID<- with(AL_Blocks, paste0(STATE, COUNTY, TRACT, BLOCK))

there was a typo in County... it should've been COUNTY. 县里有个错字...本来应该是县。 Also, you don't need the collapse parameter. 另外,您不需要塌陷参数。

I hope that helps. 希望对您有所帮助。

You can try this too 你也可以尝试

AL_Blocks <- transform(All_Blocks, BLOCKID = paste(STATE,COUNTY,
                       TRACT, BLOCK, sep = "")

Or try this 或者试试这个

DF$BLOCKID <-
  paste(DF$LOGRECNO, DF$STATE, DF$COUNTY, 
        DF$TRACT, DF$BLOCK, sep = "")

(Here is a method to set up the dataframe for people coming into this discussion later) (这是一种为以后要进行讨论的人们设置数据框的方法)

DF <- 
  data.frame(LOGRECNO = c(60, 61, 62, 63, 64, 65),
             STATE = c(1, 1, 1, 1, 1, 1),
             COUNTY = c(1, 1, 1, 1, 1, 1), 
             TRACT = c(21100, 21100, 21100, 21100, 21100, 21100), 
             BLOCK = c(1053, 1054, 1055, 1056, 1057, 1058))

您可以使用tidyverse包:

DF %>% unite(new_var, STATE, COUNTY, TRACT, BLOCK)

The new kid on the block is the glue package:块上的新孩子是glue package:

library(glue)

my_data %>%

    glue::glue("{STATE}{COUNTY}{TRACT}{BLOCK}")

You can both WRITE and READ Text files with any specified "string-separator", not necessarily a character separator. 您可以使用任何指定的“字符串分隔符”(不一定是字符分隔符)来写入读取文本文件。 This is very useful in many cases when the data has practically all terminal symbols, and thus, no 1 symbol can be used as a separator. 这在数据几乎具有所有终端符号的情况下非常有用,因此,不能将1个符号用作分隔符。 Here are examples of read and write functions: 下面是读取写入功能的例子:

WRITE OUT Special Separator Text: 写出特殊分隔符文本:

writeSepText <- function(df, fileName, separator) {
    con <- file(fileName)
    data <- apply(df, 1, paste, collapse = separator)
    # data
    data <- writeLines(data, con)
    close(con)
    return
}

Test Writing out text file separated by a string "bra_break_ket" 测试写出用字符串“ bra_break_ket”分隔的文本文件

writeSepText(df=as.data.frame(Titanic), fileName="/Users/user/break_sep.txt", separator="<break>")

READ In text files with special separator string 在带有特殊分隔符字符串的文本文件中读取

readSepText <- function(fileName, separator) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separator)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}

Test Reading in text file separated by 以文本文件分隔的测试阅读

df <- readSepText(fileName="/Users/user/break_sep.txt", separator="<break>"); df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:将data.frame的字符列合并为一个向量 - R: Combine character columns of data.frame into one vector 如何在数据框中添加多个字符列? - How to add multiple character columns in data frame? 如何使用R删除data.frame中特定列中的字符? - How to remove a character in specific columns in a data.frame with R? R如何根据列的第一个字符删除数据框中的行 - R how to remove rows in a data frame based on the first character of a column R-如何在data.frame列中建立一个字符模式列表 - R - How to build a list a character pattern in data.frame column 根据两个字符列之间的差异创建R data.frame列 - Creating an R data.frame column based on the difference between two character columns 将数据框字符串列拆分为多列 - Split data frame string column into multiple columns 选择在R中的字符串中找不到的数据框中的列名 - Selecting columns names in data frame not found in character strings in r 在数据框的一列中按字符数拆分字符串,以在R?中创建多列 - Split an string by number of characters in a column of a data frame to create multiple columns in R? 在R中,如何将多个列中的某些字符串组合成一个长格式列,在ID内按时间排列? - In R, how to combine certain strings from multiple columns to be arranged in a long format column by Time within ID?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM