计算化学公社

 找回密码 Forget password
 注册 Register
Views: 3926|回复 Reply: 0
打印 Print 上一主题 Last thread 下一主题 Next thread

化学标识符(Chemical Identifier)获得和转换

[复制链接 Copy URL]

7

帖子

0

威望

1339

eV
积分
1346

Level 4 (黑子)

跳转到指定楼层 Go to specific reply
楼主
## 一、通过在线的网络服务器
### 1. Chemical Identifier Resolver
来自美国国立卫生研究院的[NCI/CADD group](https://cactus.nci.nih.gov/),主页为(https://cactus.nci.nih.gov/chemical/structure)。可以通过该服务器的URL API,快速批量的获取化学标识符。

URL API scheme:
  1. https://cactus.nci.nih.gov/chemical/structure/"structure identifier"/"representation"
复制代码
Example: Chemical name to Standard InChIKey:
  1. https://cactus.nci.nih.gov/chemical/structure/aspirin/stdinchikey
复制代码
**注意**:输入中的特殊字符如`#`需要进行转换成别的字符,不然会报错。

**Note:** Triple bonds in SMILES strings represented by '#' have to be URL-escaped as '%23' (e.g. the SMILES string of ethyne has to be specified as 'C%23C' instead of 'C#C' if encoded as part of a URL). Similarly, question marks, which can occur in InChI, need to be URL-escaped as %3F.

通过python的网络请求(request):```python
import requests

opsin = 'https://cactus.nci.nih.gov/chemical/structure/{0}/{1}'
ide = 'C#C'  # SMILES of ethyne
ide = ide.replace('#', '%23')
rep = 'stdinchikey'  # the desired output is StdInChIKey
# for more representations
"""
rep = 'smiles'      # the desired output is SMILES
rep = 'stdinchi'    # the desired output is StdInChI
rep = 'iupac_name'  # the desired output is IUPAC name
rep = 'cas'         # the desired output is CAS Registry Number
rep = 'formula'     # the desired output is Chemical Formula
!!! also see in https://cactus.nci.nih.gov/chemical/structure_documentation
"""
url = opsin.format(ide, rep)
response = requests.get(url)
response.raise_for_status()
print(response.text)  # InChIKey=HSFWRNGVRCDJHI-UHFFFAOYSA-N
```

### 2. OPSIN web service
主页:(https://opsin.ch.cam.ac.uk/),
由剑桥大学分子信息学中心维护,这个网站可以将IUPAC命名转换为多种化学标识符,使用教程详见(https://opsin.ch.cam.ac.uk/instructions.html)。

**通过python的网络请求(request):**
```python
import json
import requests

opsin = 'https://opsin.ch.cam.ac.uk/opsin/{0}.{1}'
iupac_name = 'Acetylene'
ext = 'json'
reps = ('smiles', 'stdinchikey')
url = opsin.format(iupac_name, ext)
response = requests.get(url)
response.raise_for_status()
representation = json.loads(response.text)
for rep in reps:
    print(rep + ': ' + representation[rep])
```

### 3. 其它
如[NIST Chemistry WebBook](https://webbook.nist.gov/chemistry/)等。

## 二、化学信息学相关的软件包(software/toolkit related to Cheminformatics)
### 1. rdkit
```python
import json
import requests

opsin = 'https://opsin.ch.cam.ac.uk/opsin/{0}.{1}'
iupac_name = 'Acetylene'
ext = 'json'
reps = ('smiles', 'stdinchikey')
url = opsin.format(iupac_name, ext)
response = requests.get(url)
response.raise_for_status()
representation = json.loads(response.text)
for rep in reps:
    print(rep + ': ' + representation[rep])
```

### 2. openbabel
Open Babel 基本用法[中文教程](https://zhuanlan.zhihu.com/p/40577681)或[官方手册](https://openbabel.org/docs/dev/F ... matics_Formats.html)。

for StdInChIKey: (https://openbabel.org/docs/dev/FileFormats/InChIKey.html)
```python
obabel -:CC(=O)Cl -oinchi     # InChI=1S/C2H3ClO/c1-2(3)4/h1H3
obabel -:CC(=O)Cl -oinchikey  # WETWJCDKMRHUPV-UHFFFAOYSA-N
```

### 3. 其它
详见(https://zhuanlan.zhihu.com/p/74540059)或相关文献。



评分 Rate

参与人数
Participants 2
eV +7 收起 理由
Reason
心向暖阳 + 5 谢谢
panernie + 2 赞!

查看全部评分 View all ratings

本版积分规则 Credits rule

手机版 Mobile version|北京科音自然科学研究中心 Beijing Kein Research Center for Natural Sciences|京公网安备 11010502035419号|计算化学公社 — 北京科音旗下高水平计算化学交流论坛 ( 京ICP备14038949号-1 )|网站地图

GMT+8, 2025-8-15 08:40 , Processed in 0.146983 second(s), 21 queries , Gzip On.

快速回复 返回顶部 返回列表 Return to list