简体   繁体   中英

Enrichment Analysis with GSEAPY

I am trying to run an enrichment analysis with gseapy enrichr on a list of gene names that look like the following:

0     RAB4B
1     TIGAR
2     RNF44
3     DNAH3
4    RPL23A
5     ARL8B
6     CALB2
7     MFSD3
8      PIGV
9    ZNF708
Name: 0, dtype: object

I am using the following code:

# run enrichr
# if you are only intrested in dataframe that enrichr returned, please set no_plot=True

# list, dataframe, series inputs are supported
enr = gseapy.enrichr(gene_list = glist2,
                 gene_sets=['ARCHS4_Cell-lines', 'KEGG_2016','KEGG_2013', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_AutoRIF', 'GO_Cellular_Component_AutoRIF_Predicted_zscore', 'GO_Molecular_Function_2018', 'GO_Molecular_Function_AutoRIF', 'GO_Molecular_Function_AutoRIF_Predicted_zscore'],
                 organism='Human', # don't forget to set organism to the one you desired! e.g. Yeast
                 description='test_name',
                 outdir='test/enrichr_kegg',
                 # no_plot=True,
                 cutoff=1 # test dataset, use lower value from range(0,1)
                )

However, I am receiving the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Adjusted P-value'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-78-dad3e0840d86> in <module>
      9                  outdir='test/enrichr_kegg',
     10                  # no_plot=True,
---> 11                  cutoff=1 # test dataset, use lower value from range(0,1)
     12                 )

~/venv/lib/python3.7/site-packages/gseapy/enrichr.py in enrichr(gene_list, gene_sets, organism, description, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
    500     # set organism
    501     enr.set_organism()
--> 502     enr.run()
    503 
    504     return enr

~/venv/lib/python3.7/site-packages/gseapy/enrichr.py in run(self)
    418                               top_term=self.__top_term, color='salmon',
    419                               title=self._gs,
--> 420                               ofname=outfile.replace("txt", self.format))
    421                 if msg is not None : self._logger.warning(msg)
    422             self._logger.info('Done.\n')

~/venv/lib/python3.7/site-packages/gseapy/plot.py in barplot(df, column, title, cutoff, top_term, figsize, color, ofname, **kwargs)
    498     if colname in ['Adjusted P-value', 'P-value']:
    499         # check if any values in `df[colname]` can't be coerced to floats
--> 500         can_be_coerced = df[colname].map(isfloat)
    501         if np.sum(~can_be_coerced) > 0:
    502             raise ValueError('some value in %s could not be typecast to `float`'%colname)

/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3022             if self.columns.nlevels > 1:
   3023                 return self._getitem_multilevel(key)
-> 3024             indexer = self.columns.get_loc(key)
   3025             if is_integer(indexer):
   3026                 indexer = [indexer]

/shared-libs/python3.7/py/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'Adjusted P-value'

It seems that everything is running fine before calculating the adjusted p values. Also, when I insert my gene names into sites like Biomart, I get returns on the values that I input, but I don't know where I'm going wrong with the Adjusted P - Values in my code. Can anyone point me in the right direction? Thanks

How many genes do you have in your gene list? I had same issue. My gene list has about 22000 genes. I only picked top 5000 genes. Then the problem solved. Of course you can change it as you wish. I hope it can help you. Here is my code:

import gseapy

enr_res = gseapy.enrichr(gene_list=glist[:5000], organism='human', gene_sets=['GO_Biological_Process_2018','KEGG_2019_Human','WikiPathways_2019_Human','GO_Biological_Process_2017b'], description='pathway', cutoff = 0.5)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM