[英]reading first 4 columns from a multiple text files (which might have unbalance number of columns by rows) in r
I have the following like text (csv) files to read, where first four columns are of my interest but after that I have alot of junk.我有以下类似文本 (csv) 文件要阅读,其中前四列是我感兴趣的,但之后我有很多垃圾。 I just want to read first four columns into R.
我只想将前四列读入 R。
I want first four columns, so that output (csv opened in excel) would look like:我想要前四列,以便输出(在 excel 中打开的 csv)看起来像:
I could not paste the whole file nor attach it due to limitations of SO.由于 SO 的限制,我无法粘贴整个文件或附加它。 Here is smaller example for workout:
这是锻炼的较小示例:
type,latitude,longitude,name,link1,
W,43.075319,-89.386145,Mirch Masala,"<just link, jjksskkls hskks > ","<just link, jjksskkls hskks > "
W,43.07488,-89.390698,Himal Chuli Restaurant,"<just link, jjksskkls hskks > ","<just link, hskks , hsksks > "
W,43.074887,-89.391011,Chautara Restaurant,"<just link, hskks , hsksks > ","<just link, jjksskkls hskks > "
W,43.092866,-89.351587,Dobhan Restaurant,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074746,-89.393137,State Street Cash Mart,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.072801,-89.395718,Dotty Dumplings Dowry,"<just link, jjksskkls , hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074744,-89.393046,Dobra Tea,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.076372,-89.380231,Hi-Madison,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.019624,-89.421822,Candlewood Suites Fitchburg,"<just link, jjksskkls , ssjjs hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.08154,-89.524094,Holiday Inn Hotel & Suites Madison West,"<just link, jjksskkls 100 hskks > ","<just link, jjksskkls , ssjjs hskks > "
Any idea on just reading first four columns while importing into R ?在导入到 R 时只读取前四列有什么想法吗?
Based on your comments to your question, your title is somewhat misleading.根据您对问题的评论,您的标题有些误导。 Where you're running into problems is not knowing the exact number of columns your final
data.frame
would have.您遇到问题的地方是不知道最终
data.frame
将具有的确切列数。
From the ?read.table
help page:从
?read.table
帮助页面:
count.fields can be useful to determine problems with reading files which result in reports of incorrect record lengths
count.fields 可用于确定导致错误记录长度报告的读取文件问题
So, let's try a different answer.所以,让我们尝试一个不同的答案。
First, let this represent your data:首先,让这代表您的数据:
"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"
(This next step won't be required from your side if this is already saved as a text or csv file, but for the sake of a minimal reproducible example...) (如果这已经保存为文本或 csv 文件,则您不需要下一步,但为了最小的可重现示例......)
Write those lines to a text file to simulate the read.table
process:将这些行写入文本文件以模拟
read.table
过程:
writeLines('"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"', "myRaggedFile.txt")
That will create a "ragged" file to be read in using read.table
or read.csv
.这将创建一个“
read.csv
”的文件,以使用read.table
或read.csv
。 The trick, though, is to use count.fields
to figure out how many columns the file should have.不过,诀窍是使用
count.fields
来确定文件应该有多少列。
dat <- read.csv("myRaggedFile.txt", header=FALSE,
col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))
dat
# X1 X2 X3 X4 X5 X6 X7 X8
# 1 W 43.07532 -89.38614 Mirch Masala <J, K>
# 2 W 43.07488 -89.39070 Himal Chuli Restaurant <J, K> <J, K> <J, K>
# 3 W 43.07489 -89.39101 Chautara Restaurant <J, K> <J, K>
# 4 W 43.09287 -89.35159 Dobhan Restaurant <J, K> <J, K> <J, K> <J, K>
# 5 W 43.07475 -89.39314 State Street Cash Mart <J, K>
# 6 W 43.07280 -89.39572 Dotty Dumplings Dowry
dat <- dat[1:4] # To keep just the first four columns
## Or, continuing with my original answer:
## read.csv("myRaggedFile.txt", header=FALSE,
## col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))[1:4]
when you read in your file use something like:当您读入文件时,请使用以下内容:
fist4columns <- read.table("/file/path/filename.csv", header=TRUE, sep=",")[, c(1:4)] fist4columns <- read.table("/file/path/filename.csv", header=TRUE, sep=",")[, c(1:4)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.