简体   繁体   中英

Removing White space: cleaning data in R Web data odd formatting in

So I have some data I web scraped when I use write.csv, I'm getting huge white spaces in Excel. Here is a sample of 2 rows from my data frame:

dat <- data.frame(one="\r\n Something", two="\n\n\n another one"

Would anyone happen to know how to approach the issue of removing white space?

You have two semi-complicated questions in here. The first "Would anyone happen to know how to approach the issue of removing white space?" is too vague and complex for me to really help you beyond suggesting using the functions in the stringr package? ¯\\_(ツ)_/¯ idk if that helps?

The second "Secondary: could anyone help aid me by showing me how to clean up my "referee.report" text? This is the column I'm most interested in. I would especially like to remove the "\\r\\n" among other things." is more of something to solve.

referee.report = structure(c("\r\n                                    \r\n                                        DOI: 10.5256/f1000research.6599.r7859\r\n                                    \r\n                                                                                                                                                                                                                        \r\n                                        I have read the revised article by Horrell and D'Orazio. They have responded appropriately to\r\n                                                                                    ... Continue reading\r\n                                                                            \r\n                                    \r\n                                        I have read the revised article by Horrell and D'Orazio. They have responded appropriately to the concerns/questions raised by all 3 reviewers. Accordingly, I recommend indexing the submitted revised article.\r\n                                        \r\n                                                                                            \r\n                                                                                                                I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.                                                                                                     \r\n                                                                                    \r\n                                        Competing Interests:\r\n                                        No competing interests were disclosed.\r\n                                                                                Close\r\n                                    \r\n                                    \r\n                                        REPORT A CONCERN\r\n                                    \r\n                                ", 
                             "\r\n                                    \r\n                                        DOI: 10.5256/f1000research.6601.r7701\r\n                                    \r\n                                                                                                                                                                                                                        \r\n                                        The revision\r\n                                                                                    ... Continue reading\r\n                                                                            \r\n                                    \r\n                                        The revision is approved\r\n                                        \r\n                                                                                            \r\n                                                                                                                I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.                                                                                                     \r\n                                                                                    \r\n                                        Competing Interests:\r\n                                        No competing interests were disclosed.\r\n                                                                                Close\r\n                                    \r\n                                    \r\n                                        REPORT A CONCERN\r\n                                    \r\n                                "
), .Names = c("http://f1000research.com/articles/3-288/v2", "http://f1000research.com/articles/4-34/v2"
))

cleanOutput <- function(listObject){
  listObject = sapply(listObject, str_split,"\\r\\n")
  listObject = sapply(listObject, trimws)
  listObject = paste(listObject[listObject!=""]) ##This line eliminates empty values and NAs
  return(listObject)
}

cleanOutput(referee.report)

Try this function?

EDIT:

This version removes the \\t from the start of the lines.

Edit: Turns out str_trim removes the "\\t" at the start of the lines. Edit was not needed.

Update so Polka's code works some what but lapply removes \\'s, but since the variable in list form I'd need to convert it to a character but when I do the \\'s return:

Update paste() to concatenate all the strings and return a single value produces the same result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM