I have a column in a dataframe with scraped prices like this:
prices
$1,50 $1,20
$1,50
$1,75 $1,25 $1,35
In summary in each column I can have many prices. What I would like is to obtain different columns that are separated from $, this is what I need based on the example that I put.
prices price1 price2 price3
$1,50 $1,20 1,50 1,20 NA
$1,50 1,50 NA NA
$1,75 $1,25 $1,35 1,75 1,25 1,35
I have tried the following but neither option does what I need. Help
str_split(prices, pattern = '[$]') # I get a column with values like this c("", "1,50")
separate(prices, sep = '[$]', into = c("price1", "price2"), remove = FALSE)
#Price1 is created empty and I am trying to use it in a function,
#so in some dataframes the number of prices can vary.
One approach using dplyr
:
df %>%
rowwise() %>%
mutate(price = list(gsub("$", "",strsplit(prices, " ")[[1]],fixed = T))) %>%
unnest_wider(price,names_sep = "")
Output:
prices price1 price2 price3
<chr> <chr> <chr> <chr>
1 $1,50 $1,20 1,50 1,20 NA
2 $1,50 1,50 NA NA
3 $1,75 $1,25 $1,35 1,75 1,25 1,35
Input:
df = structure(list(prices = c("$1,50 $1,20", "$1,50", "$1,75 $1,25 $1,35"
)), class = "data.frame", row.names = c(NA, -3L))
in base R you could do:
read.table(text=df$prices, fill=TRUE, header = FALSE, sep='$', dec = ',')[-1]
V2 V3 V4
1 1.50 1.20 NA
2 1.50 NA NA
3 1.75 1.25 1.35
And if you dont want them as numeric but as character with ,
in them you can do:
read.table(text=df$prices, fill=TRUE, header=FALSE, sep='$', na.strings='')[-1]
V2 V3 V4
1 1,50 1,20 <NA>
2 1,50 <NA> <NA>
3 1,75 1,25 1,35
You can the change the names: ie set the names to paste0('prices', seq(ncol(df1))
If your default locale has comma as the decimal separator, then:
library(tidyverse)
options("readr.default_locale" = readr::locale(decimal_mark = ","))
df <- tibble(prices =
c("$1,50 $1,20",
"$1,50",
"$1,75 $1,25 $1,35"))
df |>
mutate(prices = prices |>
str_split(" ") |>
map( ~ str_remove(., "\\$"))) |>
unnest_wider(prices) |>
mutate(across(.fns = readr::parse_number))
#> New names:
#> New names:
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> # A tibble: 3 × 3
#> ...1 ...2 ...3
#> <dbl> <dbl> <dbl>
#> 1 1.5 1.2 NA
#> 2 1.5 NA NA
#> 3 1.75 1.25 1.35
Otherwise:
df |>
mutate(prices = prices |>
str_split(" ") |>
map( ~ str_remove(., "\\$"))) |>
unnest_wider(prices) |>
mutate(across(.fns = ~ readr::parse_number(., locale = readr::locale(decimal_mark = ","))))
#> New names:
#> New names:
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> # A tibble: 3 × 3
#> ...1 ...2 ...3
#> <dbl> <dbl> <dbl>
#> 1 1.5 1.2 NA
#> 2 1.5 NA NA
#> 3 1.75 1.25 1.35
With cSplit
:
library(splitstackshape)
s <- cSplit(df, "prices", "$", type.convert = T)[, -1]
df[, paste0("price", 1:ncol(s))] <- s
# prices price1 price2 price3
#1 $1,50 $1,20 1,50 1,20 <NA>
#2 $1,50 1,50 <NA> <NA>
#3 $1,75 $1,25 $1,35 1,75 1,25 1,35
In this approach we convert the data to long form using separate_rows
, transform it using transform
and convert back to wide form using reshape
. We use a mix of dplyr, tidyr and base functions choosing among them based on which ever gives shorter code.
1) Add a P column which is the same as prices, separate the prices column into rows, add a column row which numbers the rows and n which numbers them within prices and then convert to wide form. reshape is a bit less code than pivot_wider in this case but the latter could have been used. Also we use transform which is like mutate except it outputs a data frame which we need for reshape. At the end select out what we need.
library(dplyr)
library(tidyr)
DF %>%
mutate(P = prices, prices = gsub("\\$", "", prices), row = 1:n()) %>%
separate_rows(prices, sep = " +") %>%
transform(n = ave(1:nrow(.), row, FUN = seq_along)) %>%
reshape(dir = "wide", idvar = c("row", "P"), timevar = "n", sep = "") %>%
select(prices = P, everything(), -row)
giving:
prices prices1 prices2 prices3
1 $1,50 $1,20 1,50 1,20 <NA>
3 $1,50 1,50 <NA> <NA>
4 $1,75 $1,25 $1,35 1,75 1,25 1,35
2) If you want the prices column converted to numeric and if decimal point is dot in the current locale then use this which replaces the commas with dots and adds convert=TRUE
to separate_rows
. If comma is the decimal point in the current locale then omit the second mutate
below.
DF %>%
mutate(P = prices, prices = gsub("\\$", "", prices),
prices = gsub(",", ".", prices),
row = 1:n()) %>%
separate_rows(prices, sep = " +", convert = TRUE) %>%
transform(n = ave(1:nrow(.), row, FUN = seq_along)) %>%
reshape(dir = "wide", idvar = c("row", "P"), timevar = "n", sep = "") %>%
select(prices = P, everything(), -row)
The input in reproducible form:
DF <-
structure(list(prices = c("$1,50 $1,20", "$1,50", "$1,75 $1,25 $1,35"
)), class = "data.frame", row.names = c(NA, -3L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.