[英]How to remove trailing spaces in write.table in R
我有一個lagre數據框,它看起來類似於以下格式:
line1
line2<tab>value1
當使用read.csv在R中讀取它時,將其強制插入數據幀,如下所示:
V1<tab>V2
line1<tab>NA
line2<tab>value1
我可以將NA替換為空字符串,但是當我使用write.table進行寫入時,在輸出文件的第1行之后會出現一個制表符和空白區域。
如何使輸出與輸入格式相同,即刪除尾隨的選項卡式空白
附加的樣本文件:
#Sample SGA file format
@HD VN:1.0.0 IA:NA
@PL NM:TEST
1 1 705 50947 YDL185W YOR202W - - -
1 2 377 50947 YDL185W YOR202W - - -
1 3 317 50947 YDL185W YOR202W - - -
...
@SP CF:ORF,IGNA
TEST 1
TEST2 1
頭(dput(數據))
structure(list(V1 = c("#Sample SGA file format", "@HD",
"@PL", "1", "1", "1"), V2 = c("", "VN:1.0.0", "NM:TEST", "1",
"2", "3"), V3 = c("", "IA:NA", "", "705", "377", "317"), V4 = c(NA,
NA, NA, 50947L, 50947L, 50947L), V5 = c("", "", "", "YDL185W",
"YDL185W", "YDL185W"), V6 = c("", "", "", "YOR202W", "YOR202W",
"YOR202W"), V7 = c("", "", "", "-", "-", "-"), V8 = c("", "",
"", "-", "-", "-"), V9 = c("", "", "", "-", "-", "-")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9"), row.names = c(NA,
6L), class = "data.frame")
和str(數據)
'data.frame': 1541 obs. of 9 variables:
$ V1: chr "#Sample SGA file format" "@HD" "@PL" "1" ...
$ V2: chr "" "VN:1.0.0" "NM:TEST" "1" ...
$ V3: chr "" "IA:NA" "" "705" ...
$ V4: int NA NA NA 50947 50947 50947 50947 50947 50947 50947 ...
$ V5: chr "" "" "" "YDL185W" ...
$ V6: chr "" "" "" "YOR202W" ...
$ V7: chr "" "" "" "-" ...
$ V8: chr "" "" "" "-" ...
$ V9: chr "" "" "" "-" ...
我猜一猜。 聽起來您可以做兩件事之一。
首先,您可以使用
data[is.na(data)] <- ''
library(stringr)
write.table(str_trim(apply(data, 1, paste, collapse='\t')),
'fileout.tsv',
row.names=FALSE)
或者,您可以使用sed
類的命令行實用程序從文件中刪除結尾的空格:
sed -e :a -e 's/^.\{1,77\}$/ & /;ta'
這是非常令人費解的,但是這里有。
讀取line1作為read.csv
的標題: foo <- read.csv("input.csv")
使用write
只write
第一列名稱: write(colnames(foo)[1],"out/output.csv")
最后,使用append
但不包含列名的表的其余部分: write.table(foo,"output.csv",sep=",",row.names=F,col.names=F,append=T,quote=F)
這將使您獲得與輸入文件相同格式的輸出文件。
這類似於Justin的使用正則表達式的答案。
cn <- file("output.txt",open="w") #opens write connection to file
writeLines(paste(names(data),collapse="\t"),con=cn) #writes header
#converts data frame into vector of character, with fields separated by tabs
to.print <- apply(data,1,paste,collapse="\t")
to.print <- gsub("\\tNA$","",to.print) #deletes trailing <tab>NA
writeLines(to.print,con=cn) #writes data frame rows
close(cn)
如果希望read.table
的行為與read.csv
完全相同,則只需使參數相同
read.table(file, header = TRUE, sep = ",", quote="\"", dec=".",
fill = TRUE, comment.char="")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.