简体   繁体   English

R从数据框制作带圆圈的圆/弦图

[英]R make circle/chord diagram with circlize from dataframe

I would like to make a chord diagram using the circlize package . 我想使用circlize包制作一个和弦图。 I have a dataframe containing cars with four columns. 我有一个包含四列汽车的数据框。 The 2 first columns contains information on car band and model owned and the next two columns to the brand and model the respondent migrated to. 前两列包含有关所拥有的汽车乐队和车型的信息,后两列包含被调查者迁移到的品牌和车型的信息。

Here is a simple example of the dataframe: 这是数据框的一个简单示例:

   Brand_from model_from Brand_to Model_to
1:      VOLVO        s80      BMW  5series
2:        BMW    3series      BMW  3series
3:      VOLVO        s60    VOLVO      s60
4:      VOLVO        s60    VOLVO      s80
5:        BMW    3series     AUDI       s4
6:       AUDI         a4      BMW  3series
7:       AUDI         a5     AUDI       a5

It would be great to be able to make this into a chord diagram. 能够将其制成和弦图,将是非常不错的。 I found an example in the help that worked but I'm not able to convert my data into the right format in order to make the plot. 我在帮助中找到了一个有效的示例,但无法将数据转换为正确的格式以进行绘图。 This code is from the help in the circlize package. 此代码来自circlize软件包中的帮助。 This produces one layer, I guess I need two, brand and model. 这会产生一层,我想我需要两层,品牌和型号。

mat = matrix(1:18, 3, 6)
rownames(mat) = paste0("S", 1:3)
colnames(mat) = paste0("E", 1:6)

rn = rownames(mat)
cn = colnames(mat)
factors = c(rn, cn)
factors = factor(factors, levels = factors)
col_sum = apply(mat, 2, sum)
row_sum = apply(mat, 1, sum)
xlim = cbind(rep(0, length(factors)), c(row_sum, col_sum))

par(mar = c(1, 1, 1, 1))
circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(factors = factors, xlim = xlim)
circos.trackPlotRegion(factors = factors, ylim = c(0, 1), bg.border = NA,
                       bg.col = c("red", "green", "blue", rep("grey", 6)), track.height = 0.05,
                       panel.fun = function(x, y) {
                         sector.name = get.cell.meta.data("sector.index")
                         xlim = get.cell.meta.data("xlim")
                         circos.text(mean(xlim), 1.5, sector.name, adj = c(0.5, 0))
})

col = c("#FF000020", "#00FF0020", "#0000FF20")
for(i in seq_len(nrow(mat))) {
  for(j in seq_len(ncol(mat))) {
    circos.link(rn[i], c(sum(mat[i, seq_len(j-1)]), sum(mat[i, seq_len(j)])),
                cn[j], c(sum(mat[seq_len(i-1), j]), sum(mat[seq_len(i), j])),
                col = col[i], border = "white")
  }
}
circos.clear()

This code produces the following plot: 此代码产生以下图:

在此处输入图片说明

Ideal result would be like this example, but instead of continents I would like car brand and on the inner circle the car models belonging to the brand 理想的结果将类似于此示例,但我不是大洲,而是汽车品牌,而在内圈则属于该品牌的汽车模型 在此处输入图片说明

As I updated the package a little bit, there is now a simpler way to do it. 当我对软件包进行了一点更新时,现在有了一种更简单的方法。 I will give another answer here in case someone is interested with it. 如果有人对此感兴趣,我将在这里给出另一个答案。

In the latest several versions of circlize , chordDiagram() accepts both adjacency matrix and adjacency list as input, which means, now you can provide a data frame which contains pairwise relation to the function. 在最新的circlize版本中, chordDiagram()接受邻接矩阵和邻接列表作为输入,这意味着现在您可以提供一个包含与该函数成对关系的数据框。 Also there is a highlight.sector() function which can highlight or mark more than one sectors at a same time. 还有一个highlight.sector()函数,可以同时突出显示或标记多个扇区。

I will implement the plot which I showed before but with shorter code: 我将用之前的代码来实现之前显示的图:

df = read.table(textConnection("
 brand_from model_from brand_to model_to
      VOLVO        s80      BMW  5series
        BMW    3series      BMW  3series
      VOLVO        s60    VOLVO      s60
      VOLVO        s60    VOLVO      s80
        BMW    3series     AUDI       s4
       AUDI         a4      BMW  3series
       AUDI         a5     AUDI       a5
"), header = TRUE, stringsAsFactors = FALSE)

brand = c(structure(df$brand_from, names=df$model_from),
          structure(df$brand_to,names= df$model_to))
brand = brand[!duplicated(names(brand))]
brand = brand[order(brand, names(brand))]
brand_color = structure(2:4, names = unique(brand))
model_color = structure(2:8, names = names(brand))

The value for brand , brand_color and model_color are: brandbrand_colormodel_color的值是:

> brand
     a4      a5      s4 3series 5series     s60     s80
 "AUDI"  "AUDI"  "AUDI"   "BMW"   "BMW" "VOLVO" "VOLVO"
> brand_color
 AUDI   BMW VOLVO
    2     3     4
> model_color
     a4      a5      s4 3series 5series     s60     s80
      2       3       4       5       6       7       8

This time, we only add one additional track which puts lines and brand names. 这次,我们只添加了一条附加路线,用于放置行和品牌名称。 And also you can find the input variable is actually a data frame ( df[, c(2, 4)] ). 而且您还可以找到输入变量实际上是一个数据帧( df[, c(2, 4)] )。

library(circlize)
gap.degree = do.call("c", lapply(table(brand), function(i) c(rep(2, i-1), 8)))
circos.par(gap.degree = gap.degree)

chordDiagram(df[, c(2, 4)], order = names(brand), grid.col = model_color,
    directional = 1, annotationTrack = "grid", preAllocateTracks = list(
        list(track.height = 0.02))
)

Same as the before, the model names are added manually: 与之前相同,手动添加模型名称:

circos.trackPlotRegion(track.index = 2, panel.fun = function(x, y) {
    xlim = get.cell.meta.data("xlim")
    ylim = get.cell.meta.data("ylim")
    sector.index = get.cell.meta.data("sector.index")
    circos.text(mean(xlim), mean(ylim), sector.index, col = "white", cex = 0.6, facing = "inside", niceFacing = TRUE)
}, bg.border = NA)

In the end, we add the lines and the brand names by highlight.sector() function. 最后,我们通过highlight.sector()函数添加行和品牌名称。 Here the value of sector.index can be a vector with length more than 1 and the line (or a thin rectangle) will cover all specified sectors. 在这里, sector.index的值可以是长度大于1的向量,并且线(或细矩形)将覆盖所有指定的扇区。 A label will be added in the middle of sectors and the radical position is controlled by text.vjust option. 标签将添加到扇区的中间,基本位置由text.vjust选项控制。

for(b in unique(brand)) {
  model = names(brand[brand == b])
  highlight.sector(sector.index = model, track.index = 1, col = brand_color[b], 
    text = b, text.vjust = -1, niceFacing = TRUE)
}

circos.clear()

在此处输入图片说明

The key here is to convert your data into a matrix (adjacency matrix in which rows correspond to 'from' and columns correspond to 'to'). 此处的关键是将数据转换成矩阵(邻接矩阵,其中行对应于“ from”,而列对应于“ to”)。

df = read.table(textConnection("
 Brand_from model_from Brand_to Model_to
      VOLVO        s80      BMW  5series
        BMW    3series      BMW  3series
      VOLVO        s60    VOLVO      s60
      VOLVO        s60    VOLVO      s80
        BMW    3series     AUDI       s4
       AUDI         a4      BMW  3series
       AUDI         a5     AUDI       a5
"), header = TRUE, stringsAsFactors = FALSE)

from = paste(df[[1]], df[[2]], sep = ",")
to = paste(df[[3]], df[[4]], sep = ",")

mat = matrix(0, nrow = length(unique(from)), ncol = length(unique(to)))
rownames(mat) = unique(from)
colnames(mat) = unique(to)
for(i in seq_along(from)) mat[from[i], to[i]] = 1

Value of mat is mat价值是

> mat
            BMW,5series BMW,3series VOLVO,s60 VOLVO,s80 AUDI,s4 AUDI,a5
VOLVO,s80             1           0         0         0       0       0
BMW,3series           0           1         0         0       1       0
VOLVO,s60             0           0         1         1       0       0
AUDI,a4               0           1         0         0       0       0
AUDI,a5               0           0         0         0       0       1

Then send the matrix to chordDiagram with specifying order and directional . 然后将矩阵指定orderdirectional发送到chordDiagram Manual specification of order is to make sure same brands are grouped together. 手动指定order是为了确保将相同品牌分组在一起。

par(mar = c(1, 1, 1, 1))
chordDiagram(mat, order = sort(union(from, to)), directional = TRUE)
circos.clear()

To make the figure more complex, You can create a track for brand names, a track for identication of brands, a track for model names. 为了使图形更复杂,您可以创建商标名称跟踪,商标标识跟踪,型号名称跟踪。 Also we can set the gap between brands larger than inside each brand. 同样,我们可以将品牌之间的差距设置为大于每个品牌内部的差距。

1 set gap.degree 1套gap.degree

circos.par(gap.degree = c(2, 2, 8, 2, 8, 2, 8))

2 before drawing chord diagram, we create two empty tracks, one for brand names, one for identification lines by preAllocateTracks argument. 2在绘制和弦图之前,我们通过preAllocateTracks参数创建两个空轨道,一个用于品牌名称,一个用于标识线。

par(mar = c(1, 1, 1, 1))
chordDiagram(mat, order = sort(union(from, to)),
    direction = TRUE, annotationTrack = "grid", preAllocateTracks = list(
        list(track.height = 0.02),
        list(track.height = 0.02))
)

3 add the model name to the annotation track (this track is created by default, the thicker track in both left and right figures. Note this is the third track from outside circle to inside) 3将模型名称添加到注释轨道(默认情况下创建此轨道,左右图中的轨道较粗。请注意,这是从外圆到内的第三个轨道)

circos.trackPlotRegion(track.index = 3, panel.fun = function(x, y) {
    xlim = get.cell.meta.data("xlim")
    ylim = get.cell.meta.data("ylim")
    sector.index = get.cell.meta.data("sector.index")
    model = strsplit(sector.index, ",")[[1]][2]
    circos.text(mean(xlim), mean(ylim), model, col = "white", cex = 0.8, facing = "inside", niceFacing = TRUE)
}, bg.border = NA)

4 add brand identification line. 4.添加品牌识别线。 Because brand covers more than one sector, we need to manually calculate the start and end degree for the line (arc). 由于品牌涉及多个领域,因此我们需要手动计算线(弧)的起点和终点。 In following, rou1 and rou2 are height of two borders in the second track. 在下面, rou1rou2是第二轨道中两个边界的高度。 The idendification lines are drawn in the second track. 同化线在第二轨道中绘制。

all_sectors = get.all.sector.index()
rou1 = get.cell.meta.data("yplot", sector.index = all_sectors[1], track.index = 2)[1]
rou2 = get.cell.meta.data("yplot", sector.index = all_sectors[1], track.index = 2)[2]

start.degree = get.cell.meta.data("xplot", sector.index = all_sectors[1], track.index = 2)[1]
end.degree = get.cell.meta.data("xplot", sector.index = all_sectors[3], track.index = 2)[2]
draw.sector(start.degree, end.degree, rou1, rou2, clock.wise = TRUE, col = "red", border = NA)

5 first get the coordinate of text in the polar coordinate system, then map to data coordinate system by reverse.circlize . 5首先获取极坐标系中文本的坐标,然后通过reverse.circlize映射到数据坐标系。 Note the cell you map coordinate back and the cell you draw text should be the same cell. 请注意,将坐标向后映射的单元格和绘制文本的单元格应为同一单元格。

m = reverse.circlize( (start.degree + end.degree)/2, 1, sector.index = all_sectors[1], track.index = 1)
circos.text(m[1, 1], m[1, 2], "AUDI", cex = 1.2, facing = "inside", adj = c(0.5, 0), niceFacing = TRUE, 
    sector.index = all_sectors[1], track.index = 1)

For the other two brand, with the same code. 对于其他两个品牌,具有相同的代码。

start.degree = get.cell.meta.data("xplot", sector.index = all_sectors[4], track.index = 2)[1]
end.degree   = get.cell.meta.data("xplot", sector.index = all_sectors[5], track.index = 2)[2]
draw.sector(start.degree, end.degree, rou1, rou2, clock.wise = TRUE, col = "green", border = NA)
m = reverse.circlize( (start.degree + end.degree)/2, 1, sector.index = all_sectors[1], track.index = 1)
circos.text(m[1, 1], m[1, 2], "BMW", cex = 1.2, facing = "inside", adj = c(0.5, 0), niceFacing = TRUE, 
    sector.index = all_sectors[1], track.index = 1)

start.degree = get.cell.meta.data("xplot", sector.index = all_sectors[6], track.index = 2)[1]
end.degree  = get.cell.meta.data("xplot", sector.index = all_sectors[7], track.index = 2)[2]
draw.sector(start.degree, end.degree, rou1, rou2, clock.wise = TRUE, col = "blue", border = NA)
m = reverse.circlize( (start.degree + end.degree)/2, 1, sector.index = all_sectors[1], track.index = 1)
circos.text(m[1, 1], m[1, 2], "VOLVO", cex = 1.2, facing = "inside", adj = c(0.5, 0), niceFacing = TRUE, 
    sector.index = all_sectors[1], track.index = 1)

circos.clear()

If you want to set colors, please go to the package vignette, If you want, you can also use circos.axis to add axes on the plot. 如果要设置颜色,请转到小插图包。如果需要,还可以使用circos.axis在绘图上添加轴。

在此处输入图片说明

Read in your data using read.table, resulting in 7x4 data.frame (brand.txt should be tab separated). 使用read.table读入数据,生成7x4 data.frame(brand.txt应该用制表符分隔)。

mt <- read.table("//your-path/brand.txt",header=T,sep="\t",na.string="NA")

Your variables names(mt) are: "Brand_from", "model_from", "Brand_to" and "Model_to". 您的变量名称(mt)为:“ Brand_from”,“ model_from”,“ Brand_to”和“ Model_to”。 Select your two variables of interest, for example: 选择您感兴趣的两个变量,例如:

mat <- table(mt$Brand_from, mt$model_from)

This results in the following table: 结果如下表所示:

 # >mat # 3series a4 a5 s60 s80 # AUDI 0 1 1 0 0 # BMW 2 0 0 0 0 # VOLVO 0 0 0 2 1 

Then you can run everything the same from "rn = rownames(mat)" as you provided in your circlize script 然后,您可以运行与circlize脚本中提供的“ rn = rownames(mat)”相同的所有内容

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM