[英]R make circle/chord diagram with circlize from dataframe
I would like to make a chord diagram using the circlize package . 我想使用circlize包制作一个和弦图。 I have a dataframe containing cars with four columns.
我有一个包含四列汽车的数据框。 The 2 first columns contains information on car band and model owned and the next two columns to the brand and model the respondent migrated to.
前两列包含有关所拥有的汽车乐队和车型的信息,后两列包含被调查者迁移到的品牌和车型的信息。
Here is a simple example of the dataframe: 这是数据框的一个简单示例:
Brand_from model_from Brand_to Model_to
1: VOLVO s80 BMW 5series
2: BMW 3series BMW 3series
3: VOLVO s60 VOLVO s60
4: VOLVO s60 VOLVO s80
5: BMW 3series AUDI s4
6: AUDI a4 BMW 3series
7: AUDI a5 AUDI a5
It would be great to be able to make this into a chord diagram. 能够将其制成和弦图,将是非常不错的。 I found an example in the help that worked but I'm not able to convert my data into the right format in order to make the plot.
我在帮助中找到了一个有效的示例,但无法将数据转换为正确的格式以进行绘图。 This code is from the help in the circlize package.
此代码来自circlize软件包中的帮助。 This produces one layer, I guess I need two, brand and model.
这会产生一层,我想我需要两层,品牌和型号。
mat = matrix(1:18, 3, 6)
rownames(mat) = paste0("S", 1:3)
colnames(mat) = paste0("E", 1:6)
rn = rownames(mat)
cn = colnames(mat)
factors = c(rn, cn)
factors = factor(factors, levels = factors)
col_sum = apply(mat, 2, sum)
row_sum = apply(mat, 1, sum)
xlim = cbind(rep(0, length(factors)), c(row_sum, col_sum))
par(mar = c(1, 1, 1, 1))
circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(factors = factors, xlim = xlim)
circos.trackPlotRegion(factors = factors, ylim = c(0, 1), bg.border = NA,
bg.col = c("red", "green", "blue", rep("grey", 6)), track.height = 0.05,
panel.fun = function(x, y) {
sector.name = get.cell.meta.data("sector.index")
xlim = get.cell.meta.data("xlim")
circos.text(mean(xlim), 1.5, sector.name, adj = c(0.5, 0))
})
col = c("#FF000020", "#00FF0020", "#0000FF20")
for(i in seq_len(nrow(mat))) {
for(j in seq_len(ncol(mat))) {
circos.link(rn[i], c(sum(mat[i, seq_len(j-1)]), sum(mat[i, seq_len(j)])),
cn[j], c(sum(mat[seq_len(i-1), j]), sum(mat[seq_len(i), j])),
col = col[i], border = "white")
}
}
circos.clear()
This code produces the following plot: 此代码产生以下图:
Ideal result would be like this example, but instead of continents I would like car brand and on the inner circle the car models belonging to the brand 理想的结果将类似于此示例,但我不是大洲,而是汽车品牌,而在内圈则属于该品牌的汽车模型
As I updated the package a little bit, there is now a simpler way to do it. 当我对软件包进行了一点更新时,现在有了一种更简单的方法。 I will give another answer here in case someone is interested with it.
如果有人对此感兴趣,我将在这里给出另一个答案。
In the latest several versions of circlize , chordDiagram()
accepts both adjacency matrix and adjacency list as input, which means, now you can provide a data frame which contains pairwise relation to the function. 在最新的circlize版本中,
chordDiagram()
接受邻接矩阵和邻接列表作为输入,这意味着现在您可以提供一个包含与该函数成对关系的数据框。 Also there is a highlight.sector()
function which can highlight or mark more than one sectors at a same time. 还有一个
highlight.sector()
函数,可以同时突出显示或标记多个扇区。
I will implement the plot which I showed before but with shorter code: 我将用之前的代码来实现之前显示的图:
df = read.table(textConnection("
brand_from model_from brand_to model_to
VOLVO s80 BMW 5series
BMW 3series BMW 3series
VOLVO s60 VOLVO s60
VOLVO s60 VOLVO s80
BMW 3series AUDI s4
AUDI a4 BMW 3series
AUDI a5 AUDI a5
"), header = TRUE, stringsAsFactors = FALSE)
brand = c(structure(df$brand_from, names=df$model_from),
structure(df$brand_to,names= df$model_to))
brand = brand[!duplicated(names(brand))]
brand = brand[order(brand, names(brand))]
brand_color = structure(2:4, names = unique(brand))
model_color = structure(2:8, names = names(brand))
The value for brand
, brand_color
and model_color
are: brand
, brand_color
和model_color
的值是:
> brand
a4 a5 s4 3series 5series s60 s80
"AUDI" "AUDI" "AUDI" "BMW" "BMW" "VOLVO" "VOLVO"
> brand_color
AUDI BMW VOLVO
2 3 4
> model_color
a4 a5 s4 3series 5series s60 s80
2 3 4 5 6 7 8
This time, we only add one additional track which puts lines and brand names. 这次,我们只添加了一条附加路线,用于放置行和品牌名称。 And also you can find the input variable is actually a data frame (
df[, c(2, 4)]
). 而且您还可以找到输入变量实际上是一个数据帧(
df[, c(2, 4)]
)。
library(circlize)
gap.degree = do.call("c", lapply(table(brand), function(i) c(rep(2, i-1), 8)))
circos.par(gap.degree = gap.degree)
chordDiagram(df[, c(2, 4)], order = names(brand), grid.col = model_color,
directional = 1, annotationTrack = "grid", preAllocateTracks = list(
list(track.height = 0.02))
)
Same as the before, the model names are added manually: 与之前相同,手动添加模型名称:
circos.trackPlotRegion(track.index = 2, panel.fun = function(x, y) {
xlim = get.cell.meta.data("xlim")
ylim = get.cell.meta.data("ylim")
sector.index = get.cell.meta.data("sector.index")
circos.text(mean(xlim), mean(ylim), sector.index, col = "white", cex = 0.6, facing = "inside", niceFacing = TRUE)
}, bg.border = NA)
In the end, we add the lines and the brand names by highlight.sector()
function. 最后,我们通过
highlight.sector()
函数添加行和品牌名称。 Here the value of sector.index
can be a vector with length more than 1 and the line (or a thin rectangle) will cover all specified sectors. 在这里,
sector.index
的值可以是长度大于1的向量,并且线(或细矩形)将覆盖所有指定的扇区。 A label will be added in the middle of sectors and the radical position is controlled by text.vjust
option. 标签将添加到扇区的中间,基本位置由
text.vjust
选项控制。
for(b in unique(brand)) {
model = names(brand[brand == b])
highlight.sector(sector.index = model, track.index = 1, col = brand_color[b],
text = b, text.vjust = -1, niceFacing = TRUE)
}
circos.clear()
The key here is to convert your data into a matrix (adjacency matrix in which rows correspond to 'from' and columns correspond to 'to'). 此处的关键是将数据转换成矩阵(邻接矩阵,其中行对应于“ from”,而列对应于“ to”)。
df = read.table(textConnection("
Brand_from model_from Brand_to Model_to
VOLVO s80 BMW 5series
BMW 3series BMW 3series
VOLVO s60 VOLVO s60
VOLVO s60 VOLVO s80
BMW 3series AUDI s4
AUDI a4 BMW 3series
AUDI a5 AUDI a5
"), header = TRUE, stringsAsFactors = FALSE)
from = paste(df[[1]], df[[2]], sep = ",")
to = paste(df[[3]], df[[4]], sep = ",")
mat = matrix(0, nrow = length(unique(from)), ncol = length(unique(to)))
rownames(mat) = unique(from)
colnames(mat) = unique(to)
for(i in seq_along(from)) mat[from[i], to[i]] = 1
Value of mat
is mat
价值是
> mat
BMW,5series BMW,3series VOLVO,s60 VOLVO,s80 AUDI,s4 AUDI,a5
VOLVO,s80 1 0 0 0 0 0
BMW,3series 0 1 0 0 1 0
VOLVO,s60 0 0 1 1 0 0
AUDI,a4 0 1 0 0 0 0
AUDI,a5 0 0 0 0 0 1
Then send the matrix to chordDiagram
with specifying order
and directional
. 然后将矩阵指定
order
和directional
发送到chordDiagram
。 Manual specification of order
is to make sure same brands are grouped together. 手动指定
order
是为了确保将相同品牌分组在一起。
par(mar = c(1, 1, 1, 1))
chordDiagram(mat, order = sort(union(from, to)), directional = TRUE)
circos.clear()
To make the figure more complex, You can create a track for brand names, a track for identication of brands, a track for model names. 为了使图形更复杂,您可以创建商标名称跟踪,商标标识跟踪,型号名称跟踪。 Also we can set the gap between brands larger than inside each brand.
同样,我们可以将品牌之间的差距设置为大于每个品牌内部的差距。
1 set gap.degree
1套
gap.degree
circos.par(gap.degree = c(2, 2, 8, 2, 8, 2, 8))
2 before drawing chord diagram, we create two empty tracks, one for brand names, one for identification lines by preAllocateTracks
argument. 2在绘制和弦图之前,我们通过
preAllocateTracks
参数创建两个空轨道,一个用于品牌名称,一个用于标识线。
par(mar = c(1, 1, 1, 1))
chordDiagram(mat, order = sort(union(from, to)),
direction = TRUE, annotationTrack = "grid", preAllocateTracks = list(
list(track.height = 0.02),
list(track.height = 0.02))
)
3 add the model name to the annotation track (this track is created by default, the thicker track in both left and right figures. Note this is the third track from outside circle to inside) 3将模型名称添加到注释轨道(默认情况下创建此轨道,左右图中的轨道较粗。请注意,这是从外圆到内的第三个轨道)
circos.trackPlotRegion(track.index = 3, panel.fun = function(x, y) {
xlim = get.cell.meta.data("xlim")
ylim = get.cell.meta.data("ylim")
sector.index = get.cell.meta.data("sector.index")
model = strsplit(sector.index, ",")[[1]][2]
circos.text(mean(xlim), mean(ylim), model, col = "white", cex = 0.8, facing = "inside", niceFacing = TRUE)
}, bg.border = NA)
4 add brand identification line. 4.添加品牌识别线。 Because brand covers more than one sector, we need to manually calculate the start and end degree for the line (arc).
由于品牌涉及多个领域,因此我们需要手动计算线(弧)的起点和终点。 In following,
rou1
and rou2
are height of two borders in the second track. 在下面,
rou1
和rou2
是第二轨道中两个边界的高度。 The idendification lines are drawn in the second track. 同化线在第二轨道中绘制。
all_sectors = get.all.sector.index()
rou1 = get.cell.meta.data("yplot", sector.index = all_sectors[1], track.index = 2)[1]
rou2 = get.cell.meta.data("yplot", sector.index = all_sectors[1], track.index = 2)[2]
start.degree = get.cell.meta.data("xplot", sector.index = all_sectors[1], track.index = 2)[1]
end.degree = get.cell.meta.data("xplot", sector.index = all_sectors[3], track.index = 2)[2]
draw.sector(start.degree, end.degree, rou1, rou2, clock.wise = TRUE, col = "red", border = NA)
5 first get the coordinate of text in the polar coordinate system, then map to data coordinate system by reverse.circlize
. 5首先获取极坐标系中文本的坐标,然后通过
reverse.circlize
映射到数据坐标系。 Note the cell you map coordinate back and the cell you draw text should be the same cell. 请注意,将坐标向后映射的单元格和绘制文本的单元格应为同一单元格。
m = reverse.circlize( (start.degree + end.degree)/2, 1, sector.index = all_sectors[1], track.index = 1)
circos.text(m[1, 1], m[1, 2], "AUDI", cex = 1.2, facing = "inside", adj = c(0.5, 0), niceFacing = TRUE,
sector.index = all_sectors[1], track.index = 1)
For the other two brand, with the same code. 对于其他两个品牌,具有相同的代码。
start.degree = get.cell.meta.data("xplot", sector.index = all_sectors[4], track.index = 2)[1]
end.degree = get.cell.meta.data("xplot", sector.index = all_sectors[5], track.index = 2)[2]
draw.sector(start.degree, end.degree, rou1, rou2, clock.wise = TRUE, col = "green", border = NA)
m = reverse.circlize( (start.degree + end.degree)/2, 1, sector.index = all_sectors[1], track.index = 1)
circos.text(m[1, 1], m[1, 2], "BMW", cex = 1.2, facing = "inside", adj = c(0.5, 0), niceFacing = TRUE,
sector.index = all_sectors[1], track.index = 1)
start.degree = get.cell.meta.data("xplot", sector.index = all_sectors[6], track.index = 2)[1]
end.degree = get.cell.meta.data("xplot", sector.index = all_sectors[7], track.index = 2)[2]
draw.sector(start.degree, end.degree, rou1, rou2, clock.wise = TRUE, col = "blue", border = NA)
m = reverse.circlize( (start.degree + end.degree)/2, 1, sector.index = all_sectors[1], track.index = 1)
circos.text(m[1, 1], m[1, 2], "VOLVO", cex = 1.2, facing = "inside", adj = c(0.5, 0), niceFacing = TRUE,
sector.index = all_sectors[1], track.index = 1)
circos.clear()
If you want to set colors, please go to the package vignette, If you want, you can also use circos.axis
to add axes on the plot. 如果要设置颜色,请转到小插图包。如果需要,还可以使用
circos.axis
在绘图上添加轴。
Read in your data using read.table, resulting in 7x4 data.frame (brand.txt should be tab separated). 使用read.table读入数据,生成7x4 data.frame(brand.txt应该用制表符分隔)。
mt <- read.table("//your-path/brand.txt",header=T,sep="\t",na.string="NA")
Your variables names(mt) are: "Brand_from", "model_from", "Brand_to" and "Model_to". 您的变量名称(mt)为:“ Brand_from”,“ model_from”,“ Brand_to”和“ Model_to”。 Select your two variables of interest, for example:
选择您感兴趣的两个变量,例如:
mat <- table(mt$Brand_from, mt$model_from)
This results in the following table: 结果如下表所示:
# >mat # 3series a4 a5 s60 s80 # AUDI 0 1 1 0 0 # BMW 2 0 0 0 0 # VOLVO 0 0 0 2 1
Then you can run everything the same from "rn = rownames(mat)" as you provided in your circlize script 然后,您可以运行与circlize脚本中提供的“ rn = rownames(mat)”相同的所有内容
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.