I am trying to clean up some data in R but I am struggeling to get it done. Currently, I've multiple columns, some of which with multiple values/entries per cell. However, I only care about the names and matching numbers.
Here's my data as of now:
ID Name(s) Number(s) ...
#1 X, Y 123, 456
#2 Z 789
#3 Y, Z 456, 789
#4 W 0
...
What I want to achieve is a clean list of names matched with the corresponding number, like this:
Name Number
W 0
X 123
Y 456
Z 789
The same number always corresponds to the same name, I simply don't have a clean version of this data. I would appreaciate your help!
We can use separate_rows
to get comma-separated values in different rows, arrange
the data and select only unique rows with distinct
.
library(dplyr)
df %>%
tidyr::separate_rows(Name, Number, sep = ",") %>%
select(-ID) %>%
arrange_all() %>%
distinct()
# Name Number
#1 W 0
#2 X 123
#3 Y 456
#4 Z 789
data
df <- structure(list(ID = 1:4, Name = c("X,Y", "Z", "Y,Z", "W"),
Number = c("123,456", "789", "456,789", "0")),
class = "data.frame", row.names = c(NA, -4L))
We can use cSplit
to split the data into 'long' format
library(splitstackshape)
library(data.table)
unique(cSplit(df, c("Name", "Number"), ",", "long")[order(Name, Number),
.(Name, Number)])
# Name Number
#1: W 0
#2: X 123
#3: Y 456
#4: Z 789
df <- structure(list(ID = 1:4, Name = c("X,Y", "Z", "Y,Z", "W"),
Number = c("123,456", "789", "456,789", "0")),
class = "data.frame", row.names = c(NA, -4L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.