简体   繁体   中英

Parse text file with separator to create dataframe in R?

Hi I have a text file that looks like this:

[1] "Development Name - Woodstock Terrace"                   
[2] "Location - 920 Trinity Avenue, Bronx 10456"             
[3] "Number of Apts. - 319"                                  
[4] "Type of Project - Co-op"                                
[5] "Development Name - York Hill Apartments"                
[6] "Location - 1540 York Avenue, New York 10028"            
[7] "Number of Apts. - 296"                                  
[8] "Type of Project - Co-op"

I want a dataframe with columns for the development name, location, number of apartments, and type of project. Each new row starts with a new development name. In the actual file there are a few hundred rows.

Not sure how to do this. Maybe using " - " as a separator with read_delim ? Please help!

Assuming the input shown reproducibly in the Note at the end, we convert it to dcf format by replacing space, minus, space with colon, space and inserting a newline before Development. Then read that in using read.dcf, convert it to data frame and fix the types.

library(magrittr)

input %>%
  sub(" - ", ": ", .) %>%
  sub("^(Development)", "\n\\1", .) %>%
  textConnection %>%
  read.dcf %>%
  as.data.frame %>%
  type.convert(as.is = TRUE)

giving:

      Development Name                         Location Number of Apts. Type of Project
1    Woodstock Terrace  920 Trinity Avenue, Bronx 10456             319           Co-op
2 York Hill Apartments 1540 York Avenue, New York 10028             296           Co-op

Note

input <- c("Development Name - Woodstock Terrace", "Location - 920 Trinity Avenue, Bronx 10456", 
"Number of Apts. - 319", "Type of Project - Co-op", "Development Name - York Hill Apartments", 
"Location - 1540 York Avenue, New York 10028", "Number of Apts. - 296", 
"Type of Project - Co-op")

Read your text as df with one Column. Lets name the column X1:

df=tibble(X1=c("Development Name - Woodstock Terrace",   
               "Location - 920 Trinity Avenue",          
               "Number of Apts. - 319",                  
               "Type of Project - Co-op",                
               "Development Name - York Hill Apartments",
               "Location - 1540 York Avenue",            
               "Number of Apts. - 296",                  
               "Type of Project - Co-op"))

Create Columns and Values Vectors and read them as a new data frame

ColumnNames=c("Development Name - ","Location - ","Number of Apts. - ","Type of Project - ")
Columns=str_match(df$X1,ColumnNames)%>%str_remove(' - ')
Values=str_remove_all(df$X1,ColumnNames)
df0=tibble(Cols=Columns,Vals=Values)

Pivot Wide the new data frame, See also pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols"

df1=df0%>%
  group_by(Cols)%>%
  mutate(row = row_number())%>%
  pivot_wider(names_from = Cols,values_from=Vals,id_cols=Columns)%>%
  select(-row)

> df1
# A tibble: 2 x 4
  `Development Name`   Location           `Number of Apts.` `Type of Project`
  <chr>                <chr>              <chr>             <chr>            
1 Woodstock Terrace    920 Trinity Avenue 319               Co-op            
2 York Hill Apartments 1540 York Avenue   296               Co-op   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM