feature_pathway_overrepresentation.py 文件源码

python
阅读 29 收藏 0 点赞 0 评论 0

项目:PathCORE-T 作者: greenelab 项目源码 文件源码
def single_side_pathway_enrichment(pathway_definitions,
                                   gene_signature,
                                   n_genes):
    """Identify overrepresented pathways using the Fisher's exact test for
    significance on a given pathway definition and gene signature.
    (FDR correction for multiple testing is applied in
    `_significant_pathways_dataframe`).

    Parameters
    -----------
    pathway_definitions : dict(str -> set(str))
      Pathway definitions, *post*-overlap-correction if this function
      is called from `pathway_enrichment_with_overlap_correction`.
      A pathway (key) is defined by a set of genes (value).
    gene_signature : set(str)
      The set of genes we consider to be enriched in a feature.
    n_genes : int
      The total number of genes for which we have assigned weights in the
      features of an unsupervised model.

    Returns
    -----------
    pandas.Series, for each pathway, the p-value from applying the Fisher's
      exact test.
    """
    if not gene_signature:
        return pd.Series(name="p-value")
    pvalues_list = []
    for pathway, definition in pathway_definitions.items():
        if isinstance(definition, tuple):
            definition = set.union(*definition)

        both_definition_and_signature = len(definition & gene_signature)
        in_definition_not_signature = (len(definition) -
                                       both_definition_and_signature)
        in_signature_not_definition = (len(gene_signature) -
                                       both_definition_and_signature)
        neither_definition_nor_signature = (n_genes -
                                            both_definition_and_signature -
                                            in_definition_not_signature -
                                            in_signature_not_definition)
        contingency_table = np.array(
            [[both_definition_and_signature, in_signature_not_definition],
             [in_definition_not_signature, neither_definition_nor_signature]])
        try:
            _, pvalue = stats.fisher_exact(
                contingency_table, alternative="greater")
            pvalues_list.append(pvalue)
        # FPE can occur when `neither_definition_nor_signature` is very
        # large and `both_definition_and_signature` is very small (near zero)
        except FloatingPointError:
            pvalues_list.append(1.0)
    pvalues_series = pd.Series(
        pvalues_list, index=pathway_definitions.keys(), name="p-value")
    return pvalues_series
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号