Was thinking of InChI[0] but on Googling SMILES and SELFIES I found this[1] talk, this[2] paper and my goodness I've been down a few rabbit holes since...
Note: There are two standardized formats for this called SMILES and SELFIES. SMILES is much better supported, but SELFIES is more robust. I'm integrating them into some bio and chem software I'm working on.
You can do things like look up, using PubChem's API, similar molecules etc to a SMILES string.
I believe most molecule editors can load and save SMILES.
SMILES and SELFIES are molecular graph representations and aren't meant to solve the "parse this sum formula" problem.
SELFIES are for genAI. If you ask a VAE to generate SMILES, it will spit out some strings that are invalid - can't happen with SELFIES, that is the one application where they are robust.
Does this do structural formulae too?
Was thinking of InChI[0] but on Googling SMILES and SELFIES I found this[1] talk, this[2] paper and my goodness I've been down a few rabbit holes since...
[0] https://en.wikipedia.org/wiki/International_Chemical_Identif... [1] https://www.inchi-trust.org/wp/wp-content/uploads/2019/12/18... [2] https://pubs.rsc.org/en/content/articlehtml/2022/dd/d1dd0001...
No, in Python you can use rdkit (https://github.com/rdkit/rdkit) for that
Note: There are two standardized formats for this called SMILES and SELFIES. SMILES is much better supported, but SELFIES is more robust. I'm integrating them into some bio and chem software I'm working on.
You can do things like look up, using PubChem's API, similar molecules etc to a SMILES string.
I believe most molecule editors can load and save SMILES.
SMILES and SELFIES are molecular graph representations and aren't meant to solve the "parse this sum formula" problem.
SELFIES are for genAI. If you ask a VAE to generate SMILES, it will spit out some strings that are invalid - can't happen with SELFIES, that is the one application where they are robust.
What about inchi? Isn’t that a common way of describing molecules as well?
Good point!
Does the SMILE (or Simplified Molecular Input Line Entry System) code have an EBNF definition ? https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Lin... Claims there is a context free grammar.
this is insanely cool