简体   繁体   English

从 r 中的多个文本文件(可能有不平衡的行列数)中读取前 4 列

[英]reading first 4 columns from a multiple text files (which might have unbalance number of columns by rows) in r

I have the following like text (csv) files to read, where first four columns are of my interest but after that I have alot of junk.我有以下类似文本 (csv) 文件要阅读,其中前四列是我感兴趣的,但之后我有很多垃圾。 I just want to read first four columns into R.我只想将前四列读入 R。

在此处输入图片说明

I want first four columns, so that output (csv opened in excel) would look like:我想要前四列,以便输出(在 excel 中打开的 csv)看起来像:

在此处输入图片说明

I could not paste the whole file nor attach it due to limitations of SO.由于 SO 的限制,我无法粘贴整个文件或附加它。 Here is smaller example for workout:这是锻炼的较小示例:

type,latitude,longitude,name,link1,
W,43.075319,-89.386145,Mirch Masala,"<just link, jjksskkls  hskks > ","<just link, jjksskkls  hskks > "
W,43.07488,-89.390698,Himal Chuli Restaurant,"<just link, jjksskkls  hskks > ","<just link,  hskks , hsksks  > "
W,43.074887,-89.391011,Chautara Restaurant,"<just link,  hskks , hsksks  > ","<just link, jjksskkls  hskks > "
W,43.092866,-89.351587,Dobhan Restaurant,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074746,-89.393137,State Street Cash Mart,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.072801,-89.395718,Dotty Dumplings Dowry,"<just link, jjksskkls ,    hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.074744,-89.393046,Dobra Tea,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.076372,-89.380231,Hi-Madison,"<just link, jjksskkls  hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.019624,-89.421822,Candlewood Suites Fitchburg,"<just link, jjksskkls , ssjjs hskks > ","<just link, jjksskkls , ssjjs hskks > "
W,43.08154,-89.524094,Holiday Inn Hotel & Suites Madison West,"<just link, jjksskkls 100  hskks > ","<just link, jjksskkls , ssjjs hskks > "

Any idea on just reading first four columns while importing into R ?在导入到 R 时只读取前四列有什么想法吗?

Based on your comments to your question, your title is somewhat misleading.根据您对问题的评论,您的标题有些误导。 Where you're running into problems is not knowing the exact number of columns your final data.frame would have.您遇到问题的地方是不知道最终data.frame将具有的确切列数。

From the ?read.table help page:?read.table帮助页面:

count.fields can be useful to determine problems with reading files which result in reports of incorrect record lengths count.fields 可用于确定导致错误记录长度报告的读取文件问题

So, let's try a different answer.所以,让我们尝试一个不同的答案。

First, let this represent your data:首先,让这代表您的数据:

"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"

(This next step won't be required from your side if this is already saved as a text or csv file, but for the sake of a minimal reproducible example...) (如果这已经保存为文本或 csv 文件,则您不需要下一步,但为了最小的可重现示例......)

Write those lines to a text file to simulate the read.table process:将这些行写入文本文件以模拟read.table过程:

writeLines('"W",43.075319,-89.386145,"Mirch Masala","<J, K>"
"W",43.07488,-89.390698,"Himal Chuli Restaurant","<J, K>","<J, K>","<J, K>"
"W",43.074887,-89.391011,"Chautara Restaurant","<J, K>","<J, K>"
"W",43.092866,-89.351587,"Dobhan Restaurant","<J, K>","<J, K>","<J, K>","<J, K>"
"W",43.074746,-89.393137,"State Street Cash Mart","<J, K>"
"W",43.072801,-89.395718,"Dotty Dumplings Dowry"', "myRaggedFile.txt")

That will create a "ragged" file to be read in using read.table or read.csv .这将创建一个“ read.csv ”的文件,以使用read.tableread.csv The trick, though, is to use count.fields to figure out how many columns the file should have.不过,诀窍是使用count.fields来确定文件应该有多少列。

dat <- read.csv("myRaggedFile.txt", header=FALSE, 
                col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))
dat
#      X1       X2        X3                     X4     X5     X6     X7     X8
# 1     W 43.07532 -89.38614           Mirch Masala <J, K>                     
# 2     W 43.07488 -89.39070 Himal Chuli Restaurant <J, K> <J, K> <J, K>       
# 3     W 43.07489 -89.39101    Chautara Restaurant <J, K> <J, K>              
# 4     W 43.09287 -89.35159      Dobhan Restaurant <J, K> <J, K> <J, K> <J, K>
# 5     W 43.07475 -89.39314 State Street Cash Mart <J, K>                     
# 6     W 43.07280 -89.39572  Dotty Dumplings Dowry            
dat <- dat[1:4] # To keep just the first four columns
## Or, continuing with my original answer:
## read.csv("myRaggedFile.txt", header=FALSE, 
##          col.names=1:max(count.fields("myRaggedFile.txt", sep=",")))[1:4]

when you read in your file use something like:当您读入文件时,请使用以下内容:

fist4columns <- read.table("/file/path/filename.csv", header=TRUE, sep=",")[, c(1:4)] fist4columns <- read.table("/file/path/filename.csv", header=TRUE, sep=",")[, c(1:4)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM