如何在R中的write.table中删除尾随空格

Question

I have a lagre data frame which looks similar to this format: 我有一个lagre数据框，它看起来类似于以下格式：

line1
line2<tab>value1

When it is read in R using read.csv it is forced into a data frame as follows: 当使用read.csv在R中读取它时，将其强制插入数据帧，如下所示：

V1<tab>V2
line1<tab>NA
line2<tab>value1

I can replace the NA with an empty string, but when I write using write.table, I get a tab and empty space after line 1 in the output file. 我可以将NA替换为空字符串，但是当我使用write.table进行写入时，在输出文件的第1行之后会出现一个制表符和空白区域。

How do I make it so that the output is in the same format as the input ie the trailing tabbed white space be removed 如何使输出与输入格式相同，即删除尾随的选项卡式空白

Sample file appended: 附加的样本文件：

#Sample SGA file format
@HD VN:1.0.0    IA:NA
@PL NM:TEST
1   1   705 50947   YDL185W YOR202W -   -   -
1   2   377 50947   YDL185W YOR202W -   -   -
1   3   317 50947   YDL185W YOR202W -   -   -
...
@SP CF:ORF,IGNA
TEST    1
TEST2   1

head(dput(data)) 头（dput（数据））

structure(list(V1 = c("#Sample SGA file format", "@HD", 
"@PL", "1", "1", "1"), V2 = c("", "VN:1.0.0", "NM:TEST", "1", 
"2", "3"), V3 = c("", "IA:NA", "", "705", "377", "317"), V4 = c(NA, 
NA, NA, 50947L, 50947L, 50947L), V5 = c("", "", "", "YDL185W", 
"YDL185W", "YDL185W"), V6 = c("", "", "", "YOR202W", "YOR202W", 
"YOR202W"), V7 = c("", "", "", "-", "-", "-"), V8 = c("", "", 
"", "-", "-", "-"), V9 = c("", "", "", "-", "-", "-")), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9"), row.names = c(NA, 
6L), class = "data.frame")

and str(data) 和str（数据）

'data.frame':   1541 obs. of  9 variables:
 $ V1: chr  "#Sample SGA file format" "@HD" "@PL" "1" ...
 $ V2: chr  "" "VN:1.0.0" "NM:TEST" "1" ...
 $ V3: chr  "" "IA:NA" "" "705" ...
 $ V4: int  NA NA NA 50947 50947 50947 50947 50947 50947 50947 ...
 $ V5: chr  "" "" "" "YDL185W" ...
 $ V6: chr  "" "" "" "YOR202W" ...
 $ V7: chr  "" "" "" "-" ...
 $ V8: chr  "" "" "" "-" ...
 $ V9: chr  "" "" "" "-" ...

Answer 1

I'll wager a guess. 我猜一猜。 It sounds like you could do one of two things. 听起来您可以做两件事之一。

First, you could use 首先，您可以使用

data[is.na(data)] <- ''
library(stringr)
write.table(str_trim(apply(data, 1, paste, collapse='\t')),
            'fileout.tsv',
            row.names=FALSE)

Or you can use a command line utility like sed to remove trailing whitespace from a file: 或者，您可以使用sed类的命令行实用程序从文件中删除结尾的空格：

sed -e :a -e 's/^.\{1,77\}$/ & /;ta'

Answer 2

This is very convoluted, but here goes. 这是非常令人费解的，但是这里有。

Read line1 as a header in read.csv : foo <- read.csv("input.csv") 读取line1作为read.csv的标题： foo <- read.csv("input.csv")
Write just the 1st column name using write : write(colnames(foo)[1],"out/output.csv") 使用write只write第一列名称： write(colnames(foo)[1],"out/output.csv")
Finally, write the rest of the table using append and without column names: write.table(foo,"output.csv",sep=",",row.names=F,col.names=F,append=T,quote=F) 最后，使用append但不包含列名的表的其余部分： write.table(foo,"output.csv",sep=",",row.names=F,col.names=F,append=T,quote=F)

This should get you the output file in the same format at the input file. 这将使您获得与输入文件相同格式的输出文件。

Answer 3

This is similar to Justin's answer, using regex. 这类似于Justin的使用正则表达式的答案。

cn <- file("output.txt",open="w") #opens write connection to file
writeLines(paste(names(data),collapse="\t"),con=cn) #writes header
#converts data frame into vector of character, with fields separated by tabs
to.print <- apply(data,1,paste,collapse="\t") 
to.print <- gsub("\\tNA$","",to.print) #deletes trailing <tab>NA
writeLines(to.print,con=cn) #writes data frame rows
close(cn)

Answer 4

If you want read.table to behave exactly as read.csv does, all you need to do is make the parameters the same 如果希望read.table的行为与read.csv完全相同，则只需使参数相同

read.table(file, header = TRUE, sep = ",", quote="\"", dec=".",
     fill = TRUE, comment.char="")

如何在R中的write.table中删除尾随空格

问题描述

4 个解决方案

解决方案1
4 已采纳 2012-08-30 16:28:19

解决方案2
3 2012-08-29 21:18:36

解决方案3
3 2012-08-31 18:52:55

解决方案4
-1 2012-08-29 21:29:04

如何在R中的write.table中删除尾随空格

问题描述

4 个解决方案

解决方案1 4 已采纳 2012-08-30 16:28:19

解决方案2 3 2012-08-29 21:18:36

解决方案3 3 2012-08-31 18:52:55

解决方案4 -1 2012-08-29 21:29:04

解决方案1
4 已采纳 2012-08-30 16:28:19

解决方案2
3 2012-08-29 21:18:36

解决方案3
3 2012-08-31 18:52:55

解决方案4
-1 2012-08-29 21:29:04