Hi I have a text file that looks like this:
[1] "Development Name - Woodstock Terrace"
[2] "Location - 920 Trinity Avenue, Bronx 10456"
[3] "Number of Apts. - 319"
[4] "Type of Project - Co-op"
[5] "Development Name - York Hill Apartments"
[6] "Location - 1540 York Avenue, New York 10028"
[7] "Number of Apts. - 296"
[8] "Type of Project - Co-op"
I want a dataframe with columns for the development name, location, number of apartments, and type of project. Each new row starts with a new development name. In the actual file there are a few hundred rows.
Not sure how to do this. Maybe using " - " as a separator with read_delim
? Please help!
Assuming the input shown reproducibly in the Note at the end, we convert it to dcf format by replacing space, minus, space with colon, space and inserting a newline before Development. Then read that in using read.dcf, convert it to data frame and fix the types.
library(magrittr)
input %>%
sub(" - ", ": ", .) %>%
sub("^(Development)", "\n\\1", .) %>%
textConnection %>%
read.dcf %>%
as.data.frame %>%
type.convert(as.is = TRUE)
giving:
Development Name Location Number of Apts. Type of Project
1 Woodstock Terrace 920 Trinity Avenue, Bronx 10456 319 Co-op
2 York Hill Apartments 1540 York Avenue, New York 10028 296 Co-op
input <- c("Development Name - Woodstock Terrace", "Location - 920 Trinity Avenue, Bronx 10456",
"Number of Apts. - 319", "Type of Project - Co-op", "Development Name - York Hill Apartments",
"Location - 1540 York Avenue, New York 10028", "Number of Apts. - 296",
"Type of Project - Co-op")
Read your text as df with one Column. Lets name the column X1:
df=tibble(X1=c("Development Name - Woodstock Terrace",
"Location - 920 Trinity Avenue",
"Number of Apts. - 319",
"Type of Project - Co-op",
"Development Name - York Hill Apartments",
"Location - 1540 York Avenue",
"Number of Apts. - 296",
"Type of Project - Co-op"))
Create Columns and Values Vectors and read them as a new data frame
ColumnNames=c("Development Name - ","Location - ","Number of Apts. - ","Type of Project - ")
Columns=str_match(df$X1,ColumnNames)%>%str_remove(' - ')
Values=str_remove_all(df$X1,ColumnNames)
df0=tibble(Cols=Columns,Vals=Values)
Pivot Wide the new data frame, See also pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols"
df1=df0%>%
group_by(Cols)%>%
mutate(row = row_number())%>%
pivot_wider(names_from = Cols,values_from=Vals,id_cols=Columns)%>%
select(-row)
> df1
# A tibble: 2 x 4
`Development Name` Location `Number of Apts.` `Type of Project`
<chr> <chr> <chr> <chr>
1 Woodstock Terrace 920 Trinity Avenue 319 Co-op
2 York Hill Apartments 1540 York Avenue 296 Co-op
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.