简体   繁体   English

Python function 关于化学式

[英]Python function about chemical formulas

I have a CSV file that contains chemical matter names and some info.What I need to do is add new columns and write their formulas, molecular weights and count H,C,N,O,S atom numbers in each formula.I am stuck with the counting atom numbers part.I have the function related it but I don't know how to merge it and make code work.我有一个包含化学物质名称和一些信息的 CSV 文件。我需要做的是添加新列并在每个公式中写入它们的公式、分子量和计数 H、C、N、O、S 原子数。我被卡住了与计数原子数部分。我有 function 相关但我不知道如何合并它并使代码工作。

import pandas as pd    
import urllib.request    
import copy    
import re    

df = pd.read_csv('AminoAcids.csv')

def countAtoms(string, dict={}):
    curDict = copy.copy(dict)
    atoms = re.findall("[A-Z]{1}[a-z]*[0-9]*", string)

    for j in atoms:
        atomGroups = re.match('([A-Z]{1}[a-z]*)([0-9]*)', j)
        atom = atomGroups.group(1)
        number = atomGroups.group(2)
        try :
            curDict[atom] = curDict[atom] + int(number)
        except KeyError:
            try :
                curDict[atom] = int(number)
            except ValueError:
                curDict[atom] = 1
        except ValueError:
            curDict[atom] = curDict[atom] + 1
    return curDict

df["Formula"] = ['C3H7NO2', 'C6H14N4O2 ','C4H8N2O3','C4H7NO4 ',
'C3H7NO2S ','C5H9NO4','C5H10N2O3','C2H5NO2 ','C6H9N3O2',
'C6H13NO2','C6H13NO2','C6H14N2O2 ','C5H11NO2S ','C9H11NO2',
'C5H9NO2 ','C3H7NO3','C4H9NO3 ','C11H12N2O2 ','C9H11NO3 ','C5H11NO2']
df["Molecular Weight"] = ['89.09','174.2','132.12',
'133.1','121.16','147.13','146.14','75.07','155.15',
'131.17','131.17','146.19','149.21','165.19','115.13',
'105.09','119.12','204.22','181.19','117.15']
df["H"] = 0
df["C"] = 0
df["N"] = 0
df["O"] = 0
df["S"] = 0
df.to_csv("AminoAcids.csv", index=False)
print(df.to_string()) 

If I understand correctly, you should be able to use str.extract here:如果我理解正确的话,你应该可以在这里使用str.extract

df["H"] = df["Formula"].str.extract(r'H(\d+)')
df["C"] = df["Formula"].str.extract(r'C(\d+)')
df["N"] = df["Formula"].str.extract(r'N(\d+)')
df["O"] = df["Formula"].str.extract(r'O(\d+)')
df["S"] = df["Formula"].str.extract(r'S(\d+)')

here is another approach with similar result:这是另一种具有类似结果的方法:

df.join(df['Formula'].str.findall('([A-Z])(\d*)').map(dict).apply(pd.Series).replace('', 1))

>>>
'''
        Formula Molecular Weight   C   H  N  O    S
0       C3H7NO2            89.09   3   7  1  2  NaN
1    C6H14N4O2             174.2   6  14  4  2  NaN
2      C4H8N2O3           132.12   4   8  2  3  NaN
3      C4H7NO4             133.1   4   7  1  4  NaN
4     C3H7NO2S            121.16   3   7  1  2  1.0
5       C5H9NO4           147.13   5   9  1  4  NaN
6     C5H10N2O3           146.14   5  10  2  3  NaN
7      C2H5NO2             75.07   2   5  1  2  NaN
8      C6H9N3O2           155.15   6   9  3  2  NaN
9      C6H13NO2           131.17   6  13  1  2  NaN
10     C6H13NO2           131.17   6  13  1  2  NaN
11   C6H14N2O2            146.19   6  14  2  2  NaN
12   C5H11NO2S            149.21   5  11  1  2  1.0
13     C9H11NO2           165.19   9  11  1  2  NaN
14     C5H9NO2            115.13   5   9  1  2  NaN
15      C3H7NO3           105.09   3   7  1  3  NaN
16     C4H9NO3            119.12   4   9  1  3  NaN
17  C11H12N2O2            204.22  11  12  2  2  NaN
18    C9H11NO3            181.19   9  11  1  3  NaN
19     C5H11NO2           117.15   5  11  1  2  NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM