I have the following like text (csv) files to read, where first four columns are of my interest but after that I have alot of junk. I just want to read first four columns into R.
I want first four columns, so that output (csv opened in excel) would look like:
I could not paste the whole file nor attach it due to limitations of SO. Here is smaller example for workout:
type,latitude,longitude,name,link1,
W,43.075319,-89.386145,Mirch Masala,"<just link, jjksskkls hskks > ","<just link, jjksskkls hskks > "
W,43.07488,-89.390698,Himal Chuli Restaurant,"<just link, jjksskkls hskks > ","<just link, hskks , hsksks > "
W,43.074887,-89.391011,Chautara Restaurant,"<just link, hskks , hsksks > ","<just link, jjksskkls hskks > "
W,43.092866,-89.351587,Dobhan Restaurant,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074746,-89.393137,State Street Cash Mart,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.072801,-89.395718,Dotty Dumplings Dowry,"<just link, jjksskkls , hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074744,-89.393046,Dobra Tea,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.076372,-89.380231,Hi-Madison,"<just link, jjksskkls hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.019624,-89.421822,Candlewood Suites Fitchburg,"<just link, jjksskkls , ssjjs hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.08154,-89.524094,Holiday Inn Hotel & Suites Madison West,"<just link, jjksskkls 100 hskks > ","<just link, jjksskkls , ssjjs hskks > "
Any idea on just reading first four columns while importing into R ?
Based on your comments to your question, your title is somewhat misleading. Where you're running into problems is not knowing the exact number of columns your final data.frame
would have.
From the ?read.table
help page:
count.fields can be useful to determine problems with reading files which result in reports of incorrect record lengths
So, let's try a different answer.
First, let this represent your data:
"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"
(This next step won't be required from your side if this is already saved as a text or csv file, but for the sake of a minimal reproducible example...)
Write those lines to a text file to simulate the read.table
process:
writeLines('"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"', "myRaggedFile.txt")
That will create a "ragged" file to be read in using read.table
or read.csv
. The trick, though, is to use count.fields
to figure out how many columns the file should have.
dat <- read.csv("myRaggedFile.txt", header=FALSE,
col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))
dat
# X1 X2 X3 X4 X5 X6 X7 X8
# 1 W 43.07532 -89.38614 Mirch Masala <J, K>
# 2 W 43.07488 -89.39070 Himal Chuli Restaurant <J, K> <J, K> <J, K>
# 3 W 43.07489 -89.39101 Chautara Restaurant <J, K> <J, K>
# 4 W 43.09287 -89.35159 Dobhan Restaurant <J, K> <J, K> <J, K> <J, K>
# 5 W 43.07475 -89.39314 State Street Cash Mart <J, K>
# 6 W 43.07280 -89.39572 Dotty Dumplings Dowry
dat <- dat[1:4] # To keep just the first four columns
## Or, continuing with my original answer:
## read.csv("myRaggedFile.txt", header=FALSE,
## col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))[1:4]
when you read in your file use something like:
fist4columns <- read.table("/file/path/filename.csv", header=TRUE, sep=",")[, c(1:4)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.