malet.plot_utils.data_processor#
Attributes#
Functions#
|
Select df rows with matching values from |
|
Homogenize index values of |
|
Average over |
Module Contents#
- malet.plot_utils.data_processor.ValueLike#
- malet.plot_utils.data_processor.select_df(df: pandas.DataFrame, filt_dict: Dict[str, ValueLike], *exclude_fields: str, equal: bool = True, drop: bool = False, validate: bool = True) pandas.DataFrame[source]#
Select df rows with matching values from
filt_dictexceptexclude_fields.This is a vectorized, single-pass version of the original implementation.
Original behavior preserved: - Asserts that df is non-empty. - Asserts that filt_dict keys exist in df.index.names. - Validates that requested values exist in each index level. - Raises early if intermediate filtering yields an empty dataframe. - Supports
equal(keep matches) anddrop(drop filtered levels).Performance notes: - Builds ONE boolean mask and slices once, instead of repeated df.loc calls. - Avoids repeated DataFrame materialization inside Python loops.
- Parameters:
df (pandas.DataFrame) – DataFrame with MultiIndex.
filt_dict (Dict[str, Any]) – Mapping from index level to allowed values.
exclude_fields (str) – Index levels to exclude from filtering.
equal (bool) – If True, keep matching rows; otherwise exclude them.
drop (bool) – If True, drop filtered index levels.
validate (bool) – If True, run key/value existence checks.
- Returns:
Filtered DataFrame.
- Return type:
pandas.DataFrame
- malet.plot_utils.data_processor.homogenize_df(df: pandas.DataFrame, ref_df: pandas.DataFrame, filt_dict: Dict[str, ValueLike], *exclude_fields: str, validate: bool = True) pandas.DataFrame[source]#
Homogenize index values of
dfwith reference toselect_df(ref_df, filt_dict).Original intent (unchanged): - Align
dfso that its remaining index grid matches the grid induced byselect_df(ref_df, filt_dict, drop=True).Original caveats (preserved verbatim): - grid should be complete, else some fields in filt_dict will be missing. - also, when metric in filt_dict, step and total_steps can be metric-dependent
and could return empty df.
Performance improvement: - Replaces per-row
select_df+concatwith a single vectorizedMultiIndex membership test using
isin.- Parameters:
df (pandas.DataFrame) – DataFrame to homogenize.
ref_df (pandas.DataFrame) – Reference DataFrame.
filt_dict (Dict[str, Any]) – Filter used to define the reference grid.
exclude_fields (str) – Index levels excluded from filtering.
validate (bool) – Run validation checks.
- Returns:
Homogenized DataFrame.
- Return type:
pandas.DataFrame
- malet.plot_utils.data_processor.avgbest_df(df: pandas.DataFrame, metric_field: str, avg_over: Set[str] = set(), best_over: Set[str] = set(), best_of: Dict[str, Any] = dict(), best_at_max: bool = True, validate: bool = True) pandas.DataFrame[source]#
Average over
avg_overand get best result overbest_over.Original semantics preserved: -
avg_over: aggregate (mean + SEM) over these index levels. -best_over: choose hyperparameter values yielding bestmetric_field. -best_of: restrict best search to a fixed subset of index values,then apply the chosen hyperparameter globally.
best_at_maxcontrols argmax vs argmin selection.
Original internal logic (preserved): ‘’’ - aggregate index : avg_over, best_over - key index : best_of, others ‘’’
Performance improvements: - Vectorized filtering and grouping. - No repeated slicing inside loops. -
homogenize_dfuses index membership instead of concat.- Parameters:
df (pandas.DataFrame) – Base dataframe to operate over.
metric_field (str) – Metric used to select best hyperparameter.
avg_over (Set[str]) – MultiIndex levels to average over.
best_over (Set[str]) – MultiIndex levels to select best over.
best_of (Dict[str, Any]) – Fixed index values for best selection.
best_at_max (bool) – True if larger metric is better.
validate (bool) – Enable validation checks.
- Returns:
Processed DataFrame.
- Return type:
pandas.DataFrame