繁体   English   中英

python:在 FOR 循环中连接多个 pandas 字符串

[英]python: concatenating multiple pandas strings within FOR loop

我正在开发一个 Python 程序,该程序从给定目录循环读取 spefic.SDF 填充,然后以 pandas DF 格式存储有关每个文件的一些信息。 有特定的 function 接受.SDF 文件,然后返回一个数据文件,其中包含一个包含所有必需信息的字符串。 在下面的代码中,我尝试在 many.SDF 填充上应用这个 function(正常工作。),然后在新数据文件中应用 append 所有 linnes(应该包含与处理过的填充数相同的行数)? 如何在 for 循环中正确实现单独 DF 的串联?

def load_sdf_file(file, key):
    """
    Reads molecules from an SDF file and store some of its properties as data file
    """
    df = PandasTools.LoadSDF(file)
    df['Source'] = key
    df['LogP']   = df['ROMol'].apply(Chem.Descriptors.MolLogP)
    df['MolWt']  = df['ROMol'].apply(Chem.Descriptors.MolWt)
    df['LipinskyHBA'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBA)
    df['LipinskyHBD'] = df['ROMol'].apply(Chem.rdMolDescriptors.CalcNumLipinskiHBD)
   
    df = df[['Source','LogP','MolWt','LipinskyHBA','LipinskyHBD']]
    return df


pwd = os.getcwd()
filles='sdf'
results='results'
#set directory to analyse
data = os.path.join(pwd,filles) 


os.chdir(data)
dirlist = [os.path.basename(p) for p in glob.glob(data + '/*.sdf')]
# create a new data file with the same columns as it was in df defined in the function
all = pd.DataFrame(columns=['Source','LogP','MolWt','LipinskyHBA','LipinskyHBD'])

for sdf in dirlist:
        try:
                sdf_name=sdf.rsplit( ".", 1 )[ 0 ]
                key = f'{sdf_name}'
                df = load_sdf_file(sdf,key)
                print(f'{sdf_name}.sdf has been processed')
                # this does not work!
                all.append(df)
        except:
                print(f'{sdf_name}.sdf has not been processed')
 

尝试pandas.concat()并将数据帧存储在列表中:

import pandas as pd

list_of_df = []
for _ in range(10):
    list_of_df.append(pd.DataFrame({'col_a':[1,1,1], 'col_b':[2,2,2]}))
df = pd.concat(list_of_df)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM