简体   繁体   中英

Dcast in R basis repeated

I have data named as my_data. The amount of data is > 100000. Sample output is like below

id                      source
    8166923397733625478 happimobiles
    8166923397733625478 Springfit
    7301100145962413274 Duroflex
    6703062895304712434 happimobiles
    6897156268457025524 themrphone
    37564799155342281   Sangeetha Mobiles
    1159098248970201145 Sangeetha Mobiles

I used the code below and also table(my_data).

library("readxl")
my_data <- read_excel("C:\\Users\\ashishpatodia\\Desktop\\R\\Code\\Sample_Data_Overlap.xlsx",sheet = "10000 sample")

setDT(my_data)
(cohorts <- dcast(unique(my_data)[,cohort:=(source),by=id],cohort~ source, fun.aggregate=length, value.var="cohort"))

I want output where every id should be counted under source and also under which that is repeated Ex ID ending with 5478 falls under both happimobiles and springfit. So happimobiles has id 8166923397733625478 and 6703062895304712434 which makes it 2 and 1 is common with springfit.

Output

                   happimobiles   Springfit  Duroflex themrphone   Sangeetha    
happimobiles         2                1        0          0            0
Springfit            1                1        0          0            0
Duroflex             0                0        1          0            0  
themrphone           0                0        0          1            0
Sangeetha            0                0        0          0            1

I have also tried

Pivot<-dcast(my_data,source~source,value.var = "id",function(x) length((x)))

which is giving me only unique records in specific partner correctly but not overlaps.

I also tried

crossprod(table(my_data))

But this doesnot give correct answer

Link to entire data

https://docs.google.com/spreadsheets/d/1HUoRlVVf8EBedj1puXdgtTS6GGeFsXYqjVicUwbc5KE/edit#gid=0 for which i want the code to run

We can use table with crossprod from base R

crossprod(table(my_data))
#            source
#source              Duroflex happimobiles Sangeetha Mobiles Springfit themrphone
#  Duroflex                 1            0                 0         0          0
#  happimobiles             0            2                 0         1          0
#  Sangeetha Mobiles        0            0                 2         0          0
#  Springfit                0            1                 0         1          0
#  themrphone               0            0                 0         0          1

data

my_data <- structure(list(id = c(8166923397733625856, 8166923397733625856, 
7301100145962413056, 6703062895304712192, 6897156268457025536, 
37564799155342280, 1159098248970201088), source = c("happimobiles", 
"Springfit", "Duroflex", "happimobiles", "themrphone", "Sangeetha Mobiles", 
"Sangeetha Mobiles")), class = "data.frame", row.names = c(NA, 
-7L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM