简体   繁体   English

如何将变量未存储在同一行且缺少列到列的标准分隔符的文本文件读入 R?

[英]How to read a text file whose variables are not stored on the same row, and that lacks a standard delimiter from column to column, into R?

I am trying to read a text file ( https://www.bls.gov/bdm/us_age_naics_00_table5.txt ) into R , but I am not sure how to go about parsing it. I am trying to read a text file ( https://www.bls.gov/bdm/us_age_naics_00_table5.txt ) into R , but I am not sure how to go about parsing it. As you can see, the column names (years) are not located all on the same row, and the space between data is not consistent from column to column.如您所见,列名(年份)并非全部位于同一行,并且列与列之间的数据间距不一致。 I am familiar with using read.csv() and read.delim() , but I'm not sure how to go about reading a complex file like this one.我熟悉使用read.csv()read.delim() ,但我不知道如何 go 来阅读这样一个复杂的文件。

Here is a manual parse:这是一个手动解析:

require(readr)
string = read_lines(file="https://www.bls.gov/bdm/us_age_naics_00_table5.txt")
string = string[nchar(string) != 0]
string = string[-c(1,2)]  # don't contain information
string = string[string != " "]
string = string[-151]     # footnote
sMatrix = matrix(string, nrow = 30)
dfList = sapply(1:ncol(sMatrix), function(x) readr::read_table(paste(sMatrix[,x])))
df = do.call(cbind,dfList)
df = df[,!duplicated(colnames(df))] # removes columns with duplicate names

If you then want to recode "_" as NA , and format the numbers:如果您想将 "_" 重新编码为NA ,并格式化数字:

df[df == "_"] = NA
df = as.data.frame(sapply(df, function(x) gsub(",","",x)))
i <- apply(df, 2, function(x) !any(is.na(as.numeric(na.omit(x))))) # if a column can be converted to numeric without any NAs, e.g. column 1 can't
df[,i] = lapply(df[,i], as.numeric)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取具有不同列宽但在R中具有固定分隔符的文本文件 - Reading text file with varying column width but fixed delimiter in R 有没有办法在 R 中读取的文本文件后命名列? - Is there a way to name a column after a text file from which it was read in R? 当日期和时间在 R 的同一列中时如何读取 excel 文件 - How to read in excel file when Date and Time in the same column in R 如何在 R dataframe 中将数据从第 i 行第 2 列更新到第 j 行第 1 列但由两个变量 (dplyr) 分组? - How to update data from column i row 2 to column j row 1 but grouped by two variables (dplyr) in a R dataframe? 如何在R中保留文本文件(.cel)的行和列格式 - How to retain row and column formate of a text file(.cel) in R R-将一列文字分为两列,没有分隔符 - R - Splitting a column text into 2 columns without delimiter 使用R将文本文件读入一列 - Read text file using R into one column 根据 R 中行的标准偏差从矩阵中删除一列数据 - Removing a column of data from a matrix based on standard deviation of the row in R 如果第1列中的文本等于第2列中的文本(在r中),则从数据框中删除一行 - Removing a row from a Dataframe if text in column 1 equals text in column 2 (in r) 如何选择一组中某列具有相同值而另一列具有不同值的2行? - how to chose 2 row of a group whose have the same value in some column and different value in one column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM