海归学者发起的公益学术平台
分享信息,整合资源
交流学术,偶尔风月
逆转录合成是为所需的靶分子设计合成路线的过程,需要确定将更简单的分子组合成靶产物的最佳策略。通常,逆向合成需要一系列反应步骤,从更简单的前体分子合成这些分子。这一过程中的主要挑战之一是探索大型逆合成超图,它代表了给定靶分子的所有可能合成途径。大多数现有的方法只使用从单步逆合成中获得的局部信息,而没有考虑到人类专家设想的多步合成的典型战略决策。事实上,在进行多步骤综合时,可以做出各种战略决策来简化流程并优化效率。
Fig. 1 | Frequency of distances in the Pistachio (red solid and dashed lines) and in the Schneider dataset (blue solid and dashed lines).
来自IBM欧洲研究院的Federico Zipoli等,提出了一种生成完整逆转录合成途径的新方法,该方法结合了人类专家的合成策略,而现有的单步预测模型无法捕捉到这些策略。该方法利用化学反应指纹,通常用于反应分类,来捕捉多步策略。逆合成路线在指纹空间中表示为字符串。已发表的化学途径的指纹被用于填充数据库,然后通过使用一种评分方法对分支进行排名来指导逆转录合成树的扩展,该评分有利于更接近人类专家编制的途径。
Fig. 2 | Representation of chemical synthesis paths in fingerprint space.
Growing strings in a chemical reaction space for searching retrosynthesis pathways
Federico Zipoli, Carlo Baldassari, Matteo Manica, Jannis Born & Teodoro Laino
Machine learning algorithms have shown great accuracy in predicting chemical reaction outcomes and retrosyntheses. However, designing synthesis pathways remains challenging for existing machine learning models which are trained for single-step prediction. In this manuscript, we propose to recast the retrosynthesis problem as a string optimization problem in a data-driven fingerprint space, leveraging the similarity between chemical reactions and embedding vectors. Based on this premise, multi-step complex synthesis can be conceptualized as sequences that link multidimensional vectors (fingerprints) representing individual chemical reaction steps. We extracted an extensive corpus of chemical synthesis from patents and converted them into multidimensional strings. While optimizing the retrosynthetic path, we use the Euclidean metric to minimize the distance between the expanded trajectory of the growing retrosynthesis string and the corpus of extracted strings. By doing so, we promote the assembly of synthetic pathways that, in the chemical reaction space, will be more similar to existing retrosyntheses, thereby inheriting the strategic guidelines designed by human experts. We integrated this approach into the RXN platform (https://rxn.res.ibm.com/) and present the method’s application to complex synthesis as well as its ability to produce better synthetic strategies than current methodologies.