matplotlib_charting.py 文件源码

python
阅读 31 收藏 0 点赞 0 评论 0

项目:seniority_list 作者: rubydatasystems 项目源码 文件源码
def mark_quantiles(df, quantiles=10):
    '''add a column to the input dataframe identifying quantile membership as
    integers (the column is named "quantile").  The quantile membership
    (category) is calculated for each employee group separately, based on
    the employee population in month zero.

    The output dataframe permits attributes for employees within month zero
    quantile categories to be be analyzed throughout all the months of the
    data model.

    The number of quantiles to create within each employee group is selected
    by the "quantiles" input.

    The function utilizes numpy arrays and functions to compute the quantile
    assignments, and pandas index data alignment feature to assign month zero
    quantile membership to the long-form, multi-month output dataframe.

    This function is used within the quantile_groupby function.

    inputs
        df (dataframe)
            Any pandas dataframe containing an "eg" (employee group) column
        quantiles (integer)
            The number of quantiles to create.

            example:

            If the input is 10, the output dataframe will be a column of
            integers 1 - 10.  The count of each integer will be the same.
            The first quantile members will be marked with a 1, the second
            with 2, etc., through to the last quantile, 10.
    '''
    mult = 1000
    mod = mult / quantiles
    aligned_df = df.copy()
    df = df[df.mnum == 0][['eg']].copy()
    eg_arr = df.eg.values
    bins_arr = np.zeros_like(eg_arr)
    unique_egs = np.arange(eg_arr.max()) + 1
    for eg in unique_egs:
        eg_count = eg_arr[eg_arr == eg].size
        this_eg_arr = np.clip((np.arange(eg_count) + 1) / eg_count, 0, .9999)
        this_bin_arr = (this_eg_arr * mult // mod).astype(int) + 1
        np.put(bins_arr, np.where(eg_arr == eg)[0], this_bin_arr)

    df['quantile'] = bins_arr
    aligned_df['quantile'] = df['quantile']
    return aligned_df
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号