简体   繁体   中英

reading first 4 columns from a multiple text files (which might have unbalance number of columns by rows) in r

I have the following like text (csv) files to read, where first four columns are of my interest but after that I have alot of junk. I just want to read first four columns into R.

在此处输入图片说明

I want first four columns, so that output (csv opened in excel) would look like:

在此处输入图片说明

I could not paste the whole file nor attach it due to limitations of SO. Here is smaller example for workout:

type,latitude,longitude,name,link1,
W,43.075319,-89.386145,Mirch Masala,"<just link, jjksskkls  hskks > ","<just link, jjksskkls  hskks > "
W,43.07488,-89.390698,Himal Chuli Restaurant,"<just link, jjksskkls  hskks > ","<just link,  hskks , hsksks  > "
W,43.074887,-89.391011,Chautara Restaurant,"<just link,  hskks , hsksks  > ","<just link, jjksskkls  hskks > "
W,43.092866,-89.351587,Dobhan Restaurant,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074746,-89.393137,State Street Cash Mart,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.072801,-89.395718,Dotty Dumplings Dowry,"<just link, jjksskkls ,    hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074744,-89.393046,Dobra Tea,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.076372,-89.380231,Hi-Madison,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.019624,-89.421822,Candlewood Suites Fitchburg,"<just link, jjksskkls , ssjjs hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.08154,-89.524094,Holiday Inn Hotel & Suites Madison West,"<just link, jjksskkls 100  hskks > ","<just link, jjksskkls , ssjjs hskks > "

Any idea on just reading first four columns while importing into R ?

Based on your comments to your question, your title is somewhat misleading. Where you're running into problems is not knowing the exact number of columns your final data.frame would have.

From the ?read.table help page:

count.fields can be useful to determine problems with reading files which result in reports of incorrect record lengths

So, let's try a different answer.

First, let this represent your data:

"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"

(This next step won't be required from your side if this is already saved as a text or csv file, but for the sake of a minimal reproducible example...)

Write those lines to a text file to simulate the read.table process:

writeLines('"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"', "myRaggedFile.txt")

That will create a "ragged" file to be read in using read.table or read.csv . The trick, though, is to use count.fields to figure out how many columns the file should have.

dat <- read.csv("myRaggedFile.txt", header=FALSE, 
                col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))
dat
#      X1       X2        X3                     X4     X5     X6     X7     X8
# 1     W 43.07532 -89.38614           Mirch Masala <J, K>                     
# 2     W 43.07488 -89.39070 Himal Chuli Restaurant <J, K> <J, K> <J, K>       
# 3     W 43.07489 -89.39101    Chautara Restaurant <J, K> <J, K>              
# 4     W 43.09287 -89.35159      Dobhan Restaurant <J, K> <J, K> <J, K> <J, K>
# 5     W 43.07475 -89.39314 State Street Cash Mart <J, K>                     
# 6     W 43.07280 -89.39572  Dotty Dumplings Dowry            
dat <- dat[1:4] # To keep just the first four columns
## Or, continuing with my original answer:
## read.csv("myRaggedFile.txt", header=FALSE, 
##          col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))[1:4]

when you read in your file use something like:

fist4columns <- read.table("/file/path/filename.csv", header=TRUE, sep=",")[, c(1:4)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM