简体   繁体   中英

Can canonical SMILES strings be constructed for arbitrary labelled graphs?

I have been reading a few papers on canonical SMILES strings recently and to me it seems like the canonization procedure most of the time relies on adding additional chemical information into the SMILES string representation. Now I am wondering:

Is there an algorithm that can compute some type of canonical SMILES string for a general vertex and edge labelled graphs? That is, is there an algorithm that provably maps isomorphic labelled graphs to the same canonical SMILES string?

Thank you in advance.

There is a difference between the 'representation' of a graph and the ordering of the vertices. For SMILES, the representation is one example of a ' line notation ' which also includes InChI . You can convert between these different notations, and chemical databases usually provide chemical structure information in multiple such formats, where possible.

So, yes given a canonical ordering of the atoms (vertices) of a chemical graph, you can then write it out as a SMILES string. There certainly are algorithms that can canonically order general graphs with vertex and edge labels. ( Note : usually the terminology is say 'canonical labels' for graphs, but that gets a bit confusing when talking about graphs that have vertex and/or edge labels already).

For example, the standard for computing canonical orders nAUTy/Traces can be used on vertex labelled graphs quite easily. Since the algorithm is based on partition refinement, the vertex labels are just used as the initial partition. Edge labels are a little more tricky, but can be handled by converting the graph into a layered graph where edge labels become layers. This is described in the nAUTy user guide and also here .

Another possibility is the 'Signatures' algorithm which is designed for chemicals but can work with arbitrary graphs with vertex and edge labels. The representation is not the same as SMILES, but the ordering it generates can of course be used to map across. The only part I am doubtful about is representing an arbitrary set of vertex labels in a SMILEs string - since atoms are single characters ('C', 'N', 'O', etc) you would have to have some encoding of the labels to characters, which limits things somewhat.

On a final note, the vertex and edge labels actually make it easier to final a canonical ordering of the vertices. There is a very simple algorithm called the 'Morgan algorithm' ( see here ) that generates invariants for the atoms. In graph terminology, it iteratively assigns labels to the vertices of the graph, updating those labels based on the labels of the neighbouring vertices. This also seems related to the 'Weisfeiler-Lehman' algorithm described here .

For a graph without vertex or edge labels, such an algorithm can only use the connectivity of the graph to construct labels. With vertex labels, however, the symmetries are broken and the search space to find an ordering is smaller. Consider the extreme case where the graph starts with unique labels for each vertex - equivalent to a chemical composed of one of each element, which is a little unlikely! - then the ordering of the vertices is simply some ordering over the vertex labels.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM