简体   繁体   中英

Lines between certain points in a plot, based on the data? (with R)

I have done my research and googling but have yet to find a solution to the following problem. I have quite often found solutions to R-related issues from this forum, so I thought I'd give it a try and hope that somebody can suggest something. I would need it for my PhD thesis; anybody who's code or suggestions I will use will naturally be acknowledged and credited.

So: I need to draw lines/segments to connect points in a plot (of multidimensional scaling, specifically) in R (SPSS-based solutions are welcome as well) - but not between all points, just those that represent properties/variables that at least one data item shares - the placement of the lines should be based on the data that the plot in question is based on itself. Let me exeplify; below are some fictional data with dummy variables, where '1' means that the item has the property:

       "properties"
        a   b   c
"items" ---------
tree  | 1   1   0
house | 0   1   1
hut   | 0   1   1
book  | 1   0   0

在此处输入图片说明

The plot is a multidimensional scaling plot (distances are to be interpreted as dissimilarities). This is the logic:

  • there's a line between A and B, because there is at least one item/variable ("tree") in the data that has both properties;
  • there is a line between B and C, because there is at least one item in the data ("house" and "hut") that has both properties;
  • there is an item ("book") that has only one property (A), so it does not affect the placement of the lines
  • importantly, there is no line between A and C because there are no items in the data that have both properties.

What I am looking for is a way to add the grey lines automatically/computationally that I have for now drawn manually on the plot above. The automatic drawing should be based on the data as described above. With a small data set, drawing the lines manually is no problem, but becomes a problem when there are tens of such "properties" and hundreds of items/rows of data. Any ideas? Some R code (commented if possible) would be most welcome!

EDIT: It seems I forgot something very important. First thing, the solution proposed by @GaborCsardi below works perfectly with the example data, thanks for that! But I forgot to include that the linking of the points should also be "conservative", with as few connecting lines as possible. For example, if there is an item that has all the "properties", then it should not create lines between every single property point in the plot just because of that, if the points are connected by other items already, even if indirectly. So a plot based on the following data should not be a full triangle, even though item1 has all three properties:

      A B C
item1 1 1 1
item2 1 1 0
item3 0 1 1

Instead, A,B and B,C should be connected by a line, but a line between A and C would be exessive, as they are already indirectly connected (through B). Could this be done with incidence graphs?

This is very easy if you use graphs, and create the projection of the bipartite graph that you have in your table. Eg

library(igraph)

## Some example data
mat <- "       properties
        items  a   b   c
        tree   1   1   0
        house  0   1   1
        hut    0   1   1
        book   1   0   0
       "
tab <- read.table(textConnection(mat), skip=1,
                  header=TRUE, row.names=1)

## Create a bipartite graph
graph <- graph.incidence(as.matrix(tab))

## Project the bipartite graph
proj <- bipartite.projection(graph)

## Plot one of the projections, the one you need 
## happens to be the second one
plot(proj$proj2)

## Minimum spanning tree of the projection
plot(minimum.spanning.tree(proj$proj2))

For more information see the manual pages, ie ?"igraph-package" ?graph.incidence , ?bipartite.projection and ?plot.igraph .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM