R：circlize circos plot - 如何在重疊最小的扇區之間繪制未連接區域

Question

我有一組數據框，在4組患者和細胞類型之間具有共同特征。 我有很多不同的功能，但共享的功能（存在於多個組中）只是少數幾個。

我想制作一個圓圈圖，它反映了患者群體和細胞類型之間共享特征之間的少數聯系，同時了解每組中有多少非共享特征。

我想到它的方式，它應該是一個有4個扇區的圖（每組患者和細胞類型一個），它們之間有一些連接。 每個扇區大小應反映組中的要素總數，並且該區域的大部分不應連接到其他組，而是空的。

這就是我到目前為止所做的，但我不希望扇區專用於每個功能，只需要每組患者和細胞類型。

MWE：

library(circlize)

patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
dat

chordDiagram(as.data.frame(dat), transparency = 0.5)

編輯！！

@ m-dz在他的回答中顯示的實際上是我正在尋找的格式，4個不同的患者/ cell.type組合的4個扇區，只顯示連接，而未連接的功能，雖然未顯示，應該占該部門的規模。

但是，我意識到我的情況比上面的MWE更復雜。

的特征被認為是出現在2患者/ cell.type基團，而不是僅當它是在2組相同的 ，而且當它是類似 ...（高於閾值的序列同一性）。 這樣，我有裁員......

患者1細胞1中的特征A可以連接到患者2細胞1中的特征A，但也可以連接到特征B ...特征A應該僅對患者1細胞1計數一次（唯一計數），並擴展到患者2中的2個不同特征 - 小區1。

請參閱下面的示例，了解我的實際數據如何更精確，看看是否可以使用此示例，我們可以獲得最終的圓圈圖！ 謝謝！！

##MWE
#NON OVERLAPPING SETS!

#1: non-shared features
nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
nonshared

#2: features shared between cell types within same patient
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedcells

#3: features shared between patients within same cell types
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedpats

#4: features shared between patients and cell types
#4.1: shared across pat1-cell1, pat1-cell2, pat2-cell1, pat2-cell2
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
#4.2: shared across pat1-cell1, pat1-cell2, pat2-cell1
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
#4.3: shared across pat1-cell1, pat1-cell2, pat2-cell2
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
#4.4: shared across pat1-cell1, pat2-cell1, pat2-cell2
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
#4.5: shared across pat1-cell2, pat2-cell1, pat2-cell2
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
#4.6: shared across pat1-cell1, pat2-cell2
sharedall6 <- data.frame()
#4.7: shared across pat1-cell2, pat2-cell1
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))

sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)
sharedall
#you see there might be overlaps between the different subsets of sharedall, but not between sharedall, sharedparts, sharedcells, and nonshared

#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)

#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2     cell2   c3||c3
#4 pat1||pat2     cell2   c3||c4
#5 pat1||pat2     cell2   c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2

Answer 1

關於預期結果的附注：目的是創建一個圖表，顯示共享多少要素，忽略單個要素（下面的第1個圖）或共享要素重疊（例如，在第二個圖上看起來所有相同的要素共享群體，這看起來不是第一個情節，但重要的是群體之間共享的特征的比例）。

下面的代碼產生以下兩個數字（圖1左側供參考）：

所有個人特色

簡單的獨特和共享功能

其中一個應該滿足期望。

# Prep. data --------------------------------------------------------------

nonshared <- data.frame(patient=c(rep("pat1",20), rep("pat2",10)), cell.type=c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4)), feature=paste("a",1:30,sep=''))
sharedcells <- data.frame(patient=c(rep("pat1",3), rep("pat2",4)), cell.types=c(rep("cell1||cell2",3),rep("cell1||cell2",4)), features=c("b1||b1","b1||b1","b1||b1","b2||b2","b3||b3","b4||b4","b4||b5"))
sharedpats <- data.frame(patients=c(rep("pat1||pat2",2), rep("pat1||pat2",6)), cell.type=c(rep("cell1",2),rep("cell2",6)), features=c("c1||c1","c2||c1","c3||c3","c3||c4","c3||c5","c6||c5","c7||c7","c8||c8"))
sharedall1 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1||pat2-cell2",4)), features=c("d1||d1||d1||d1","d2||d2||d2||d3","d4||d4||d3||d3","d5||d5||d5||d5"))
sharedall2 <- data.frame(both=c(rep("pat1-cell1||pat1-cell2||pat2-cell1",2)), features=c("d6||d6||d6","d7||d7||d7"))
sharedall3 <- data.frame(both="pat1-cell1||pat1-cell2||pat2-cell2", features="d8||d8||d9")
sharedall4 <- data.frame(both="pat1-cell1||pat2-cell1||pat2-cell2", features="d10||d10||d9")
sharedall5 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1||pat2-cell2",3)), features=c("d11||d11||d11","d12||d13||d13","d12||d14||d14"))
sharedall6 <- data.frame()
sharedall7 <- data.frame(both=c(rep("pat1-cell2||pat2-cell1",2)), features=c("d15||d16","d17||d17"))
sharedall <- rbind(sharedall1, sharedall2, sharedall3, sharedall4, sharedall5, sharedall6, sharedall7)

#I NEED A CIRCOS PLOT THAT SHOWS ALL THE CONNECTIONS. THE NON-CONNECTED (nonshared) FEATURES SHOULD NOT BE SHOWN, BUT THE SHOULD COUNT TO THE SIZE OF THE SECTOR (CORRESPONDING TO A PATIENT-CELL COMBINATION)

#THE FEATURES SHOULD BE COUNT UNIQUELY, SO IF THERE ARE ENTRIES LIKE:
#3 pat1||pat2     cell2   c3||c3
#4 pat1||pat2     cell2   c3||c4
#5 pat1||pat2     cell2   c3||c5
#THE FEATURE c3 SHOULD BE COUNT ONCE FOR pat1, AND EXPAND TO 3 DIFFERENT FEATURES IN pat2



# Start -------------------------------------------------------------------

library(circlize)
library(data.table)
library(magrittr)
library(stringr)
library(RColorBrewer)

# Split and pad with 0 ----------------------------------------------------
fun <- function(x) unlist(tstrsplit(x, split = '||', fixed = TRUE))

nonshared %>% setDT()
sharedcells %>% setDT()
sharedpats %>% setDT()
sharedall %>% setDT()

nonshared <- nonshared[, .(group = paste(patient, cell.type, sep = '-'), feature)][, feature := paste0('a', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedcells <- sharedcells[, lapply(.SD, fun), by = 1:nrow(sharedcells)][, .(group = paste(patient, cell.types, sep = '-'), feature = features)][, feature := paste0('b', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedpats <- sharedpats[, lapply(.SD, fun), by = 1:nrow(sharedpats)][, .(group = paste(patients, cell.type, sep = '-'), feature = features)][, feature := paste0('c', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]
sharedall <- sharedall[, lapply(.SD, fun), by = 1:nrow(sharedall)][, .(group = both, feature = features)][, feature := paste0('d', str_pad(str_extract(feature, '[0-9]+'), 2, 'left', '0'))]

dt_split <- rbindlist(
  list(
    nonshared,
    sharedcells,
    sharedpats,
    sharedall
  )
)

# Set key and self join to find shared features ---------------------------
setkey(dt_split, feature)
dt_join <- dt_split[dt_split, .(group, i.group, feature), allow.cartesian = TRUE] %>%
  .[group != i.group, ]

# Create a "sorted key" ---------------------------------------------------
# key := paste(sort(.SD)...
# To leave only unique combinations of groups and features
dt_join <-
  dt_join[,
          key := paste(sort(.SD), collapse = '|'),
          by = 1:nrow(dt_join),
          .SDcols = c('group', 'i.group')
          ] %>%
  setorder(feature, key) %>%
  unique(by = c('key', 'feature')) %>%
  .[, .(
    group_from = i.group,
    group_to = group,
    feature = feature)]

# Rename and key ----------------------------------------------------------

dt_split %>% setnames(old = 'group', new = 'group_from') %>% setkey(group_from, feature)
dt_join %>% setkey(group_from, feature)



# Individual features -----------------------------------------------------

# Features without connections --------------------------------------------

dt_singles <- dt_split[, .(group_from, group_to = group_from, feature)] %>%
  .[, N := .N, by = feature] %>%
  .[!(N > 1 & group_from == group_to), !c('N')]

# Bind all, add some columns etc. -----------------------------------------

dt_bind <- rbind(dt_singles, dt_join) %>% setorder(group_from, feature, group_to)

dt_bind[, ':='(
  group_from_f = paste(group_from, feature, sep = '.'),
  group_to_f = paste(group_to, feature, sep = '.'))]
dt_bind[, feature := NULL]  # feature can be removed

# Colour
dt_bind[, colour := ifelse(group_from_f == group_to_f, "#FFFFFF00", '#00000050')]  # Change first to #FF0000FF to show red blobs

# Prep. sectors -----------------------------------------------------------

sectors_f <- union(dt_bind[, group_from_f], dt_bind[, group_to_f]) %>% sort()

colour_lookup <-
  union(dt_bind[, group_from], dt_bind[, group_to]) %>% sort() %>%
  structure(seq_along(.) + 1, names = .)
sector_colours <- str_replace_all(sectors_f, '.[a-d][0-9]+', '') %>%
  colour_lookup[.]

# Gaps between sectors ----------------------------------------------------

gap_sizes <- c(0.0, 1.0)
gap_degree <-
  sapply(table(names(sector_colours)), function(i) c(rep(gap_sizes[1], i-1), gap_sizes[2])) %>%
  unlist() %>% unname()
# gap_degree <- rep(0, length(sectors_f))  # Or no gap



# Plot! -------------------------------------------------------------------

# Each "sector" is a separate patient/cell/feature combination

circos.par(gap.degree = gap_degree)
circos.initialize(sectors_f, xlim = c(0, 1))
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = sector_colours, bg.border = NA)

for(i in 1:nrow(dt_bind)) {
  row_i <- dt_bind[i, ]
  circos.link(
    row_i[['group_from_f']], c(0, 1),
    row_i[['group_to_f']], c(0, 1),
    border = NA, col = row_i[['colour']]
  )
}

# "Feature" labels
circos.trackPlotRegion(track.index = 2, ylim = c(0, 1), panel.fun = function(x, y) {
  sector.index = get.cell.meta.data("sector.index")
  circos.text(0.5, 0.25, sector.index, col = "white", cex = 0.6, facing = "clockwise", niceFacing = TRUE)
}, bg.border = NA)

# "Patient/cell" labels
for(s in names(colour_lookup)) {
  sectors <- sectors_f %>% { .[str_detect(., s)] }
  highlight.sector(
    sector.index = sectors, track.index = 1, col = colour_lookup[s],
    text = s, text.vjust = -1, niceFacing = TRUE)
}

circos.clear()



# counts of unique and shared features ------------------------------------

xlims <- dt_split[, .N, by = group_from][, .(x_from = 0, x_to = N)] %>% as.matrix()
links <- dt_join[, .N, by = .(group_from, group_to)]
colours <- dt_split[, unique(group_from)] %>% structure(seq_along(.) + 1, names = .)

library(circlize)

sectors = names(colours)
circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(sectors, xlim = xlims)
circos.trackPlotRegion(ylim = c(0, 1), track.height = 0.05, bg.col = colours, bg.border = NA)

for(i in 1:nrow(links)) {
  link <- links[i, ]
  circos.link(link[[1]], c(0, link[[3]]), link[[2]], c(0, link[[3]]), col = '#00000025', border = NA)
}

# "Patient/cell" labels
for(s in sectors) {
  highlight.sector(
    sector.index = s, track.index = 1, col = colours[s], 
    text = s, text.vjust = -1, niceFacing = TRUE)
}

circos.clear()

編輯：只需添加刪除評論中的鏈接：請參閱此答案以獲取標簽的一個很好的示例！

Answer 2

@ m-dz提供了正確的方向。 我可以提供有關模擬數據的更多詳細信息。

讓我們從這里開始：

patients <- c(rep("patient1",20), rep("patient2",10))
cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))

as.data.frame將dat轉換為三列數據框（即一個鄰接列表，其中鏈接從第一列開始，指向第二列）

dat = as.data.frame(dat, stringsAsFactors = FALSE)

為患者/細胞和特征生成顏色。

features = unique(dat[[2]])
features_col = structure(rand_color(length(features)), names = features)
patients_col = structure(2:5, names = unique(dat[[1]]))

如果一個特征僅存在於一個患者/細胞組合中，您不想顯示它但仍希望保持其在圖中的位置，您可以將#FFFFFF00設置為其顏色（白色，具有完全透明度，以便它不會涵蓋其他鏈接）。 在這里，我們希望鏈接顏色與特征扇區相同。

col = ifelse(dat[[3]], features_col[dat[[2]]], "#FFFFFF00")
col = gsub("FF$", "80", col) # half transparent
features_count = tapply(dat[[3]], dat[[2]], sum)
# set color to white if it only exists in one patient/cell
col[features_count[dat[[2]]] == 1] = "#FFFFFF00"

最后的和弦圖：

chordDiagram(dat, col = col, grid.col = c(features_col, patients_col))

您可以在特征扇區中看到至少有兩個指向患者/細胞的鏈接。

Answer 3

准備好數據

    library(circlize)
    patients <- c(rep("patient1",20), rep("patient2",10))
    cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
    features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"),     paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
    dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
    dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
    dat<-as.data.frame(dat,stringsAsFactors = FALSE)

獲得患者和細胞類型的所有組合

    df=NULL
    for(i in levels(as.factor(dat$feature))){
        temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
        temp$feature=i
        temp$Freq=1
        Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
        for(j in Freq_0){
          temp$Freq[temp$V1==j | temp$V2==j]=0
        }
        df<-rbind(df,temp)
    }

添加顏色

    df$color=rainbow(dim(df)[1])
    df[which(df$Freq==0),5]="white"
    df$Freq=1
    chordDiagram(df[,c(-3,-5)], transparency = 0.5,col = df$color)

不同的鏈接意味着不同的特征，鏈接顏色為白色，其中'Freq'為0

我將顏色“白色”變成“黑色”，而黑色則更加顯眼

如果你想留下'feature'屬性......讓我們先准備好數據

    library(circlize)
    patients <- c(rep("patient1",20), rep("patient2",10))
    cell.types <- c(rep("cell1",12), rep("cell2",8),rep("cell1",6), rep("cell2",4))
    features <- c(paste("feature",1:12,sep="_"), paste("feature",9:16,sep="_"), paste("feature",c(1,2,9,10,17,18),sep="_"), paste("feature",c(1,18,19,20),sep="_"))
    dat <- data.frame(patient=patients, cell.type=cell.types, feature=features)
    dat <- with(dat, table(paste(patient,cell.type,sep='|'), feature))
    dat<-as.data.frame(dat,stringsAsFactors = FALSE)
    df=NULL
    for(i in levels(as.factor(dat$feature))){
      temp<-as.data.frame(matrix(combn(dat[which(dat$feature==i),1],2),byrow = TRUE,ncol=2),stringsAsFactors = FALSE)
      temp$feature=i
      temp$Freq=1
      Freq_0<-subset(dat$Var1,dat$feature==i & dat$Freq==0)
      for(j in Freq_0){
        temp$Freq[temp$V1==j | temp$V2==j]=0
      }
      df<-rbind(df,temp)
    }

處理過它

    library(dplyr)
    df1<-subset(df,df$Freq==1)
    df0<-subset(df,df$Freq==0)
    df1_mod<-summarise(group_by(df1,V1,V2),Freq=n())
    df0_mod<-summarise(group_by(df0,V1,V2),Freq=n())

添加顏色

    df1_mod$color<-rainbow(5)
    df0_mod$color<-"white"
    df_res<-rbind(df0_mod,df1_mod)

畫出來

chordDiagram(df_res, transparency = 0.5,col = df_res$color)

這些圖片顯示'Freq'中有很多零。

R：circlize circos plot - 如何在重疊最小的扇區之間繪制未連接區域

問題描述

3 個解決方案

解決方案1
4 已采納 2017-03-24 10:11:27

解決方案2
3 2017-03-25 09:27:08

解決方案3
1 2017-03-26 14:25:58

R：circlize circos plot - 如何在重疊最小的扇區之間繪制未連接區域

問題描述

3 個解決方案

解決方案1 4 已采納 2017-03-24 10:11:27

解決方案2 3 2017-03-25 09:27:08

解決方案3 1 2017-03-26 14:25:58

解決方案1
4 已采納 2017-03-24 10:11:27

解決方案2
3 2017-03-25 09:27:08

解決方案3
1 2017-03-26 14:25:58