![](/img/trans.png)
[英]Reading and binding multiple CSV files that have different columns in R
[英]reading first 4 columns from a multiple text files (which might have unbalance number of columns by rows) in r
我有以下类似文本 (csv) 文件要阅读,其中前四列是我感兴趣的,但之后我有很多垃圾。 我只想将前四列读入 R。
我想要前四列,以便输出(在 excel 中打开的 csv)看起来像:
由于 SO 的限制,我无法粘贴整个文件或附加它。 这是锻炼的较小示例:
type,latitude,longitude,name,link1,
W,43.075319,-89.386145,Mirch Masala,"<just link, jjksskkls hskks > ","<just link, jjksskkls hskks > "
W,43.07488,-89.390698,Himal Chuli Restaurant,"<just link, jjksskkls hskks > ","<just link, hskks , hsksks > "
W,43.074887,-89.391011,Chautara Restaurant,"<just link, hskks , hsksks > ","<just link, jjksskkls hskks > "
W,43.092866,-89.351587,Dobhan Restaurant,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074746,-89.393137,State Street Cash Mart,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.072801,-89.395718,Dotty Dumplings Dowry,"<just link, jjksskkls , hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074744,-89.393046,Dobra Tea,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.076372,-89.380231,Hi-Madison,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.019624,-89.421822,Candlewood Suites Fitchburg,"<just link, jjksskkls , ssjjs hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.08154,-89.524094,Holiday Inn Hotel & Suites Madison West,"<just link, jjksskkls 100 hskks > ","<just link, jjksskkls , ssjjs hskks > "
在导入到 R 时只读取前四列有什么想法吗?
根据您对问题的评论,您的标题有些误导。 您遇到问题的地方是不知道最终data.frame
将具有的确切列数。
从?read.table
帮助页面:
count.fields 可用于确定导致错误记录长度报告的读取文件问题
所以,让我们尝试一个不同的答案。
首先,让这代表您的数据:
"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"
(如果这已经保存为文本或 csv 文件,则您不需要下一步,但为了最小的可重现示例......)
将这些行写入文本文件以模拟read.table
过程:
writeLines('"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"', "myRaggedFile.txt")
这将创建一个“ read.csv
”的文件,以使用read.table
或read.csv
。 不过,诀窍是使用count.fields
来确定文件应该有多少列。
dat <- read.csv("myRaggedFile.txt", header=FALSE,
col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))
dat
# X1 X2 X3 X4 X5 X6 X7 X8
# 1 W 43.07532 -89.38614 Mirch Masala <J, K>
# 2 W 43.07488 -89.39070 Himal Chuli Restaurant <J, K> <J, K> <J, K>
# 3 W 43.07489 -89.39101 Chautara Restaurant <J, K> <J, K>
# 4 W 43.09287 -89.35159 Dobhan Restaurant <J, K> <J, K> <J, K> <J, K>
# 5 W 43.07475 -89.39314 State Street Cash Mart <J, K>
# 6 W 43.07280 -89.39572 Dotty Dumplings Dowry
dat <- dat[1:4] # To keep just the first four columns
## Or, continuing with my original answer:
## read.csv("myRaggedFile.txt", header=FALSE,
## col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))[1:4]
当您读入文件时,请使用以下内容:
fist4columns <- read.table("/file/path/filename.csv", header=TRUE, sep=",")[, c(1:4)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.