簡體   English   中英

錯誤:沒有這樣的模塊:fts4; 使用 - 帶有 anaconda jupyter 筆記本的模糊匹配器包

[英]error: no such module: fts4; using - fuzzymatcher package with anaconda jupyter notebook

我正在使用 jupyter notebook 並嘗試運行此代碼:


import pubchempy as pcp
from rdkit import Chem
import numpy as np
import pandas as pd
import json
import time
import fuzzymatcher

start_time = time.time()
#     CD = pd.read_excel(excelpathin)

    data = pd.ExcelFile(r"C:\Users\USER\Downloads\MOD_REINJ_NEG_ChemSpider Results.xlsx")
    data = pd.ExcelFile(excelpathin)

    df = data.parse (sheet_name=0)
    inKey = list()
    for idx,sd in enumerate(df['Structure']):
        print(idx)
        F = open ("temp.sdf","w")
        F.writelines (sd)
        F.close()
        suppl = Chem.SDMolSupplier ('temp.sdf')
        mol = next (suppl)
        if mol==None:
            inKey.append (np.nan)
        else:
            inKey.append (Chem.MolToInchiKey (mol))

    inKey = pd.DataFrame(inKey)
    inKey.columns = ['InChIKey']
    CD = pd.concat([df, inKey], axis=1, sort=False)
    
    
    print("--- %s seconds --f-add 3 cols" % (time.time() - start_time))

    # From here is the joindata function with modification
    # Load the parse HMDB file
    with open(jsonpathin, 'r') as read_file:
        data = json.load(read_file)

    start_time = time.time()
    # Load the parse HMDB file
    # with open('D:/BCDD/Documents/Tal/Projects/HMDB/DataSets/Parser_HMDB.py Output/serum_metabolites.json', 'r') as read_file:
    #     data = json.load(read_file)


    # Create a data frame from the list of dictionaries
    # df_hmdb = pd.DataFrame(data,  columns=['accession', 'name', 'chemical_formula', 'inchikey', 'disease_name' ])
    df_hmdb = pd.DataFrame(data)
    df_hmdb.drop(['description', 'synonyms', 'kegg_id', 'meta_cyc_id', 'pathway_name'], axis=1)

    df_excel = CD
    # Merge by inchikey
    joindata_by_inchikey = pd.merge(left=df_excel, right=df_hmdb, how='inner', left_on='InChIKey', right_on='inchikey')

    print("--- %s seconds --f-merge by inchikey " % (time.time() - start_time))

    start_time = time.time()
    # Reduce the rows to those we DID find a match by inchkey in bothe data sets
    df_hmdb_reduce_byinchik = df_hmdb.loc[~df_hmdb['inchikey'].isin(df_excel['InChIKey'])]
    df_excel_reduce_byinchik = df_excel.loc[~df_excel['InChIKey'].isin(joindata_by_inchikey['InChIKey'])]


    # joindata_by_name = fuzzymatcher.fuzzy_left_join(df_excel, df_hmdb, left_on="Name", right_on="name")
    joindata_by_name = fuzzymatcher.fuzzy_left_join(df_excel_reduce_byinchik, df_hmdb_reduce_byinchik, left_on="Name",
                                                    right_on="name")

    # Selecting threshold  best_match_score>0.25 maybe adjustments needed
    joindata_by_name = joindata_by_name[joindata_by_name['best_match_score'] > 0.55]
    # Drop columns the
    joindata_by_name.drop(['best_match_score', '__id_left', '__id_right'], axis=1, inplace=True)
    print("--- %s seconds --f-merge by name" % (time.time() - start_time))

    start_time = time.time()
    # Reduce the rows to those we DID find a match by inchkey in and by name both data sets
    df_hmdb_reduce_byname = df_hmdb_reduce_byinchik.loc[~df_hmdb_reduce_byinchik['name'].isin(joindata_by_name['name'])]
    df_excel_reduce_byname = df_excel_reduce_byinchik.loc[
        ~df_excel_reduce_byinchik['Name'].isin(joindata_by_name['Name'])]
    # Remove spaces between letters on  'Formula' ( there is  a warning)
    df_excel_reduce_byname.loc[:, 'Formula'] = df_excel_reduce_byname['Formula'].str.replace(' ', '')

    # Merge by chemical_formula
    joindata_by_CF = pd.merge(left=df_excel_reduce_byname, right=df_hmdb_reduce_byname, how='inner', left_on='Formula',
                              right_on='chemical_formula')

    # This data inculed rows from the original EXCEL file that we did NOT find and match ( by inchikey nor name nor CF)
    df_excel_reduce_byCF = df_excel_reduce_byname.loc[
        ~df_excel_reduce_byname['Formula'].isin(joindata_by_CF['chemical_formula'])]

    # Create a list of all columns of the HMDB JSON data
    colnames = joindata_by_inchikey.columns[6:]
    # Add those names as empty columns to the df_excel_reduce_byCF. reducedata in all the rows from the original Excel
    # that did NOT find a match and added the columns of the HMDB
    reducedata = df_excel_reduce_byCF.reindex(columns=[*df_excel_reduce_byCF.columns.tolist(), *colnames])

    # Append all the data sets
    # out = joindata_by_inchikey.append(joindata_by_name.append(joindata_by_CF))
    out = joindata_by_inchikey.append(joindata_by_name.append(joindata_by_CF.append(reducedata)))

    print("--- %s seconds --f-merge by CF" % (time.time() - start_time))
    

(我不確定代碼是否相關,因為我加載了未在問題中共享的文件)

這是我得到的錯誤:

OperationalError                          Traceback (most recent call last)
<ipython-input-8-26c4a723b687> in <module>
     23 # joindata_by_name = fuzzymatcher.fuzzy_left_join(df_excel, df_hmdb, left_on="Name", right_on="name")
     24 joindata_by_name = fuzzymatcher.fuzzy_left_join(df_excel_reduce_byinchik, df_hmdb_reduce_byinchik, left_on="Name",
---> 25                                                 right_on="name")
     26 
     27 # Selecting threshold  best_match_score>0.25 maybe adjustments needed

~\Anaconda3\lib\site-packages\fuzzymatcher\__init__.py in fuzzy_left_join(df_left, df_right, left_on, right_on, left_id_col, right_id_col)
     39     m = Matcher(dp, dg, s)
     40     m.add_data(df_left, df_right, left_on, right_on,  left_id_col, right_id_col)
---> 41     m.match_all()
     42 
     43     return m.get_left_join_table()

~\Anaconda3\lib\site-packages\fuzzymatcher\matcher.py in match_all(self)
     87         self.scorer.add_data(self)
     88 
---> 89         self.data_getter.add_data(self)
     90 
     91         # Get a table that contains only the matches, scores and ids

~\Anaconda3\lib\site-packages\fuzzymatcher\data_getter_sqlite.py in add_data(self, matcher)
     58                  USING fts4({} TEXT, _concat_all TEXT, _concat_all_alternatives TEXT);
     59               """.format(matcher.right_id_col)
---> 60         con.execute(sql)
     61         con.execute("INSERT INTO fts_target SELECT * FROM df_right_processed")
     62 

OperationalError: no such module: fts4

與fuzzymatcher的錯誤線用於不同的計算機上使用Pycharme工作,但對jupyter筆記本事實並非如此。

我檢查了這個答案:

  1. 操作錯誤:沒有這樣的模塊:fts4
  2. OperationalError:沒有這樣的模塊:fts4

但沒有用,我不明白該怎么做

任何幫助想法暗示想法都得到贊賞

我遇到了與您描述的相同的問題,我找到了解決方案。

  1. 您必須轉到https://www.sqlite.org/download.html並從本頁的 Windows 預編譯二進制文件部分下載適當的 dll 文件。 您必須根據您的硬件架構下載 32 位或 64 位。

  2. 解壓縮 zip 文件並復制內容並粘貼,然后手動將其粘貼到“您的 python \\bin 文件夾”中。

在 (1) 和 (2) 之后,我可以在 Jupyter Notebook 中毫無問題地使用 Fuzzymatcher 包。

如果這些步驟不能解決您的問題,您將看到以下網站: https : //deshmukhsuraj.wordpress.com/2015/02/07/windows-python-users-update-your-sqlite3/comment-page-1/?未批准=947&moderation-hash=9de09b93a31bd079256474474e71b32d#comment-947

查看該網站對我找到問題的答案非常有用。 我希望它對你有用。 最好的事物

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM