简体   繁体   中英

How to get empty last elements from strsplit() in R?

I need to process some data that are mostly csv. The problem is that R ignores the comma if it comes at the end of a line (eg, the one that comes after 3 in the example below).

> strsplit("1,2,3,", ",")
[[1]]
[1] "1" "2" "3"

I'd like it to be read in as [1] "1" "2" "3" NA instead. How can I do this? Thanks.

Here are a couple ideas

scan(text="1,2,3,", sep=",", quiet=TRUE)
#[1]  1  2  3 NA

unlist(read.csv(text="1,2,3,", header=FALSE), use.names=FALSE)
#[1]  1  2  3 NA

Those both return integer vectors. You can wrap as.character around either of them to get the exact output you show in the Question:

as.character(scan(text="1,2,3,", sep=",", quiet=TRUE))
#[1] "1" "2" "3" NA 

Or, you could specify what="character" in scan , or colClasses="character" in read.csv for slightly different output

scan(text="1,2,3,", sep=",", quiet=TRUE, what="character")
#[1] "1" "2" "3" "" 

unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character"), use.names=FALSE)
#[1] "1" "2" "3" "" 

You could also specify na.strings="" along with colClasses="character"

unlist(read.csv(text="1,2,3,", header=FALSE, colClasses="character", na.strings=""), 
       use.names=FALSE)
#[1] "1" "2" "3" NA 

Hadley's stringi (and previously stringr ) libraries are a huge improvement on base string functions (fully vectorized, consistent function interface):

require(stringr)
str_split("1,2,3,", ",")

[1] "1" "2" "3" "" 

as.integer(unlist(str_split("1,2,3,", ",")))
[1]  1  2  3 NA

Using stringi package:

require(stringi)
> stri_split_fixed("1,2,3,",",")
[[1]]
[1] "1" "2" "3" "" 
## you can directly specify if you want to omit this empty elements
> stri_split_fixed("1,2,3,",",",omit_empty = TRUE)
[[1]]
[1] "1" "2" "3"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM