简体   繁体   中英

How to read comma-delimited fields with pipes as text delimiters in R

So I've got a.txt file that uses commas to separate fields, but it also uses pipes ("|") as text delimiters. I would like to read this.txt file using R (though I could use other programmes if this is impossible with R), and I would like that all values would be in the right columns.

A sample of data:

15,|0370A01D-DC1E-4534-8176-A08A1E2F82E4|,|EDU|,|Education|,|Appropriations and authorization regarding higher education issues.|,|2008|
16,|03A8F7BB-9716-4494-BF41-013C27B5ECA6|,|GOV|,|Government Issues|,|issues affecting local government including appropriations|,|2003|
17,|04696109-082B-4EF6-9AA8-A6DB1013D15D|,|TEC|,|Telecommunications|,|RUS Broadband Applikcation|,|2008|
18,|04FA0BA7-E9D2-4F1E-8193-45F023065C89|,|DOC|,|District of Columbia|,|HUD Appropriations FY2009, CDBG
Financial Services Appropriations FY2009, District of Columbia
Commerce, Justice, Science Appropriations, Juvenile Justice, Byrne Grant|,|2008|
19,|04FA0BA7-E9D2-4F1E-8193-45F023065C89|,|HOU|,|Housing|,|HUD Appropriations FY2009, CDBG
Financial Services Appropriations FY2009, District of Columbia
Commerce, Justice, Science Appropriations, Juvenile Justice, Byrne Grant|,|2008|

So each row contains a row number (15, 16, ..., 19), a |uniqueID|, an |IssueID| of three letters, a longer version of |Issue|, a |SpecificIssue|, and a |Year|.

The closest I got to reading this file is by using the following code (I know that I identify pipe as a separator in it and it is incorrect, but this gives the best result thus far):

lob_issues2 <- fread("file.txt", sep = "|", fill = TRUE)

This results in the following table .

As you can see, the SpecificIssue column in rows 18 and 19 are causing trouble. Perhaps these values are too long or sth, and this makes R assign parts of these values in new columns. I would like that R would keep these values in the SpecificIssue column. Any suggestions on what code to use in order to achieve that?

Thanks in advance. Also, if you think another programme is better for this, please let me know.

Use the quote= argument to let it know that | is being used as the quote character:

lob_issues2  <- read.table("file.txt", quote = "|", sep = ",")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM