def mark_quantiles(df, quantiles=10):
'''add a column to the input dataframe identifying quantile membership as
integers (the column is named "quantile"). The quantile membership
(category) is calculated for each employee group separately, based on
the employee population in month zero.
The output dataframe permits attributes for employees within month zero
quantile categories to be be analyzed throughout all the months of the
data model.
The number of quantiles to create within each employee group is selected
by the "quantiles" input.
The function utilizes numpy arrays and functions to compute the quantile
assignments, and pandas index data alignment feature to assign month zero
quantile membership to the long-form, multi-month output dataframe.
This function is used within the quantile_groupby function.
inputs
df (dataframe)
Any pandas dataframe containing an "eg" (employee group) column
quantiles (integer)
The number of quantiles to create.
example:
If the input is 10, the output dataframe will be a column of
integers 1 - 10. The count of each integer will be the same.
The first quantile members will be marked with a 1, the second
with 2, etc., through to the last quantile, 10.
'''
mult = 1000
mod = mult / quantiles
aligned_df = df.copy()
df = df[df.mnum == 0][['eg']].copy()
eg_arr = df.eg.values
bins_arr = np.zeros_like(eg_arr)
unique_egs = np.arange(eg_arr.max()) + 1
for eg in unique_egs:
eg_count = eg_arr[eg_arr == eg].size
this_eg_arr = np.clip((np.arange(eg_count) + 1) / eg_count, 0, .9999)
this_bin_arr = (this_eg_arr * mult // mod).astype(int) + 1
np.put(bins_arr, np.where(eg_arr == eg)[0], this_bin_arr)
df['quantile'] = bins_arr
aligned_df['quantile'] = df['quantile']
return aligned_df
matplotlib_charting.py 文件源码
python
阅读 31
收藏 0
点赞 0
评论 0
评论列表
文章目录