简体   繁体   中英

How do I upload a CSV file to R with one row that contains a list in the form of ["123", "456", "789"]?

I am trying to upload a CSV file that has various data in normal format (column name and then either numeric or string) as well as a column that has a list of numbers of various length in ["x"] format (ie row 1 = ["111", "222"], row 2 = ["333"], row 3 = ["555","666","777"]. How do I upload that data so that I can conduct analysis with it?

When I turned it into a character string, the data came back as "[\\"x\\"]". When I turned it into a factor, it looked like the format in the CSV. But I still can't do anything with the [" present.

Hi you can use the stringr package to grab the digits out of the square brackets. I think the reason \\ shows up is because it is used as an escaping character to escape the second set of "" . Anyways, this will simplify it,

I made some ugly data

df <- data.frame(x = c(1, 2, 3),
                 y = c('[\\"111\\", \\"222\\"]', '[\\"333\\"]', '[\\"555\\", \\"666\\", \\"777\\"]'))
df
  x                                 y
1 1            [\\"111\\", \\"222\\"]
2 2                       [\\"333\\"]
3 3 [\\"555\\", \\"666\\", \\"777\\"]

Now just using some regex from and stringr::str_extract_all we grab all occurrences of 1 or more digits in succession.

df$y <- stringr::str_extract_all(df$y, "(\\d+)")

(\\\\d+) is simply saying I want to grab groups of 1 or more digits.

This yields a nested list without the \\ included.

  x             y
1 1      111, 222
2 2           333
3 3 555, 666, 777

They are still strings, so if you want to evaluate the numbers you need to do stuff like:

> eval(parse(text = df$y[[1]][1])) / 111
[1] 1

For the whole data frame you may consider unnesting it and making a new column (or overriding the original to change the data type and turn the strings into evaluate(able) expressions, For this we can use some of the tidyverse ( tidyr::unnest and dplyr::mutate )

df %>% 
  tidyr::unnest() %>% 
  dplyr::rowwise %>% 
  dplyr::mutate(numeric_y = eval(parse(text = y))) 

# A tibble: 6 x 3
      x y     numeric_y
  <dbl> <chr>     <dbl>
1     1 111         111
2     1 222         222
3     2 333         333
4     3 555         555
5     3 666         666
6     3 777         777

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM