简体   繁体   中英

splitting string of unicode characters in R

I have a column of unicode characters that I need to split so I can calulate the frequency of them. I have tried a number of different ways to try and split these but am not making any headway. The input format of the data is

[1] "\U00010603"                                                                                                                                                                                    
[2] "\U0001076b\U00010631\U0001076b"                                                                                                                                                                
[3] "\U00010631\U00010633"
[4] "\U0001061a\U00010655\U00010609\U00010631"
... 

and id like the output to be

[1] "\U00010603"                                                                                                                                                                                    
[2] "\U0001076b"
[3] "\U00010631"
[4] "\U0001076b" 
...

I have tried

df <- c("\U00010603","\U0001076b\U00010631\U0001076b", "\U00010631\U00010633","\U0001061a\U00010655\U00010609\U00010631")

df1 <- strsplit(df, "\\", fixed = TRUE)

df1 <- lapply(df,strsplit, split = '\\', fixed = TRUE)                                                                                                                                                               

I have also tried various forms of \U0 . Thank you for your help. The output is basically an identical list of the input.

these are UNICODE CHARACTERS. You could tell that from the name. Hence to split them use:

strsplit(df,"")
[[1]]
[1] "\U00010603"

[[2]]
[1] "\U0001076b" "\U00010631"          "\U0001076b"

[[3]]
[1] "\U00010631" "\U00010633"

[[4]]
[1] "\U0001061a" "\U00010655"          "\U00010609" "\U00010631"         

Note that the first element has only 1 character etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM