如何将变量未存储在同一行且缺少列到列的标准分隔符的文本文件读入 R？

Question

I am trying to read a text file ( https://www.bls.gov/bdm/us_age_naics_00_table5.txt ) into R , but I am not sure how to go about parsing it. I am trying to read a text file ( https://www.bls.gov/bdm/us_age_naics_00_table5.txt ) into R , but I am not sure how to go about parsing it. As you can see, the column names (years) are not located all on the same row, and the space between data is not consistent from column to column.如您所见，列名（年份）并非全部位于同一行，并且列与列之间的数据间距不一致。 I am familiar with using read.csv() and read.delim() , but I'm not sure how to go about reading a complex file like this one.我熟悉使用read.csv()和read.delim() ，但我不知道如何 go 来阅读这样一个复杂的文件。

Answer 1

Here is a manual parse:这是一个手动解析：

require(readr)
string = read_lines(file="https://www.bls.gov/bdm/us_age_naics_00_table5.txt")
string = string[nchar(string) != 0]
string = string[-c(1,2)]  # don't contain information
string = string[string != " "]
string = string[-151]     # footnote
sMatrix = matrix(string, nrow = 30)
dfList = sapply(1:ncol(sMatrix), function(x) readr::read_table(paste(sMatrix[,x])))
df = do.call(cbind,dfList)
df = df[,!duplicated(colnames(df))] # removes columns with duplicate names

If you then want to recode "_" as NA , and format the numbers:如果您想将 "_" 重新编码为NA ，并格式化数字：

df[df == "_"] = NA
df = as.data.frame(sapply(df, function(x) gsub(",","",x)))
i <- apply(df, 2, function(x) !any(is.na(as.numeric(na.omit(x))))) # if a column can be converted to numeric without any NAs, e.g. column 1 can't
df[,i] = lapply(df[,i], as.numeric)

如何将变量未存储在同一行且缺少列到列的标准分隔符的文本文件读入 R？

问题描述

1 个解决方案

解决方案1
0 2021-05-27 16:59:38

如何将变量未存储在同一行且缺少列到列的标准分隔符的文本文件读入 R？

问题描述

1 个解决方案

解决方案1 0 2021-05-27 16:59:38

解决方案1
0 2021-05-27 16:59:38