[英]How to remove trailing spaces in write.table in R
I have a lagre data frame which looks similar to this format: 我有一个lagre数据框,它看起来类似于以下格式:
line1
line2<tab>value1
When it is read in R using read.csv it is forced into a data frame as follows: 当使用read.csv在R中读取它时,将其强制插入数据帧,如下所示:
V1<tab>V2
line1<tab>NA
line2<tab>value1
I can replace the NA with an empty string, but when I write using write.table, I get a tab and empty space after line 1 in the output file. 我可以将NA替换为空字符串,但是当我使用write.table进行写入时,在输出文件的第1行之后会出现一个制表符和空白区域。
How do I make it so that the output is in the same format as the input ie the trailing tabbed white space be removed 如何使输出与输入格式相同,即删除尾随的选项卡式空白
Sample file appended: 附加的样本文件:
#Sample SGA file format
@HD VN:1.0.0 IA:NA
@PL NM:TEST
1 1 705 50947 YDL185W YOR202W - - -
1 2 377 50947 YDL185W YOR202W - - -
1 3 317 50947 YDL185W YOR202W - - -
...
@SP CF:ORF,IGNA
TEST 1
TEST2 1
head(dput(data)) 头(dput(数据))
structure(list(V1 = c("#Sample SGA file format", "@HD",
"@PL", "1", "1", "1"), V2 = c("", "VN:1.0.0", "NM:TEST", "1",
"2", "3"), V3 = c("", "IA:NA", "", "705", "377", "317"), V4 = c(NA,
NA, NA, 50947L, 50947L, 50947L), V5 = c("", "", "", "YDL185W",
"YDL185W", "YDL185W"), V6 = c("", "", "", "YOR202W", "YOR202W",
"YOR202W"), V7 = c("", "", "", "-", "-", "-"), V8 = c("", "",
"", "-", "-", "-"), V9 = c("", "", "", "-", "-", "-")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9"), row.names = c(NA,
6L), class = "data.frame")
and str(data) 和str(数据)
'data.frame': 1541 obs. of 9 variables:
$ V1: chr "#Sample SGA file format" "@HD" "@PL" "1" ...
$ V2: chr "" "VN:1.0.0" "NM:TEST" "1" ...
$ V3: chr "" "IA:NA" "" "705" ...
$ V4: int NA NA NA 50947 50947 50947 50947 50947 50947 50947 ...
$ V5: chr "" "" "" "YDL185W" ...
$ V6: chr "" "" "" "YOR202W" ...
$ V7: chr "" "" "" "-" ...
$ V8: chr "" "" "" "-" ...
$ V9: chr "" "" "" "-" ...
I'll wager a guess. 我猜一猜。 It sounds like you could do one of two things. 听起来您可以做两件事之一。
First, you could use 首先,您可以使用
data[is.na(data)] <- ''
library(stringr)
write.table(str_trim(apply(data, 1, paste, collapse='\t')),
'fileout.tsv',
row.names=FALSE)
Or you can use a command line utility like sed
to remove trailing whitespace from a file: 或者,您可以使用sed
类的命令行实用程序从文件中删除结尾的空格:
sed -e :a -e 's/^.\{1,77\}$/ & /;ta'
This is very convoluted, but here goes. 这是非常令人费解的,但是这里有。
Read line1 as a header in read.csv
: foo <- read.csv("input.csv")
读取line1作为read.csv
的标题: foo <- read.csv("input.csv")
Write just the 1st column name using write
: write(colnames(foo)[1],"out/output.csv")
使用write
只write
第一列名称: write(colnames(foo)[1],"out/output.csv")
Finally, write the rest of the table using append
and without column names: write.table(foo,"output.csv",sep=",",row.names=F,col.names=F,append=T,quote=F)
最后,使用append
但不包含列名的表的其余部分: write.table(foo,"output.csv",sep=",",row.names=F,col.names=F,append=T,quote=F)
This should get you the output file in the same format at the input file. 这将使您获得与输入文件相同格式的输出文件。
This is similar to Justin's answer, using regex. 这类似于Justin的使用正则表达式的答案。
cn <- file("output.txt",open="w") #opens write connection to file
writeLines(paste(names(data),collapse="\t"),con=cn) #writes header
#converts data frame into vector of character, with fields separated by tabs
to.print <- apply(data,1,paste,collapse="\t")
to.print <- gsub("\\tNA$","",to.print) #deletes trailing <tab>NA
writeLines(to.print,con=cn) #writes data frame rows
close(cn)
If you want read.table
to behave exactly as read.csv
does, all you need to do is make the parameters the same 如果希望read.table
的行为与read.csv
完全相同,则只需使参数相同
read.table(file, header = TRUE, sep = ",", quote="\"", dec=".",
fill = TRUE, comment.char="")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.