4. Profiling and Optimizing the Model#
Last Update: 19 August 2023
This notebook demonstrates how to profile a modelx model and subsequently optimize it by using
CashValue_ME_EX1 as an example. The optimized model, named
CashValue_ME_EX4, is included in the
savings library. It is approximately 3 to 4 times faster than
CashValue_ME_EX1, though performance may vary depending on the specific hardware environment.
This approach to profiling and optimization is not exclusive to this model; it can be applied to any models in lifelib that use modelx and heavily depend on pandas. The profiling process is carried out through the following steps:
Measuring run time: Use the
timeitstandard library to measure the run time of
Profiling the Model: Utilize the
stop_stacktracefunctions of modelx to create a profile of the model.
Analyzing Profiling Results: Review the output to identify which cells are consuming the most time.
In our example, we will find that the most time-consuming cells are those heavily relying on pandas operations. This will lead us to the following optimization strategy:
Replace Pandas with Numpy: We will concentrate on replacing pandas DataFrames and Series with numpy arrays to avoid certain time-consuming pandas operations.
It’s important to recognize that these optimizations will come with trade-offs:
Readability: Switching from pandas objects to numpy arrays may reduce the data’s readability, as pandas objects often provide more descriptive indexes than integer-indexed numpy arrays.
Functionality: While pandas offers more complex operations on tabular data, the decision between using numpy for speed or pandas for ease of use will be a choice that needs careful consideration.
Ultimately, these insights and optimization methods will guide us toward a model that balances performance with functionality, based on specific project needs.
Measuring the runtime of the model#
The code below loads
CashValue_ME_EX1, assigns it to
import timeit import pandas as pd import modelx as mx
ex1 = mx.read_model('CashValue_ME_EX1')
In the default configuration of
CashValue_ME_EX1, we have it set up for 1 model point across 10,000 scenarios. This means the computational demand is equivalent to running the calculations for 10,000 model points in just one scenario.
The default model point is identified by the product spec id (
A, which doesn’t include any surrender charge. However, for the purpose of this example, we’re interested in observing how surrender charge rates are determined based on different product specs. So, we’ll be adjusting the model point table to refer to a more diverse set of model points with varying product specs.
ex1.Projection.model_point_table = ex1.Projection.model_point_moneyness # Set multiple model points ex1.Projection.model_point_table['spec_id'] = ['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'A'] # Set various spec IDs ex1.Projection.model_point_moneyness
The product specs by
spec_id are defined in
is_wl column indicates whether each type is whole life or not. To save run time and memory, let’s set
False for all the specs.
ex1.Projection.product_spec_table['is_wl'] = False ex1.Projection.product_spec_table
For the same reason, we reduce the number of scenarios from 10,000 to 1000.
ex1.Projection.scen_size = 1000
Now let’s see how much time the model takes for a run. The code below calculates
result_pv() by measuring the run time by
number=1 indicates the run is performed only once.
timeit.timeit('ex1.Projection.result_pv()', globals=globals(), number=1)
Let’s output the mean of the present value of net cashflows of
ex1, as we want to check it against the result of the optimized model.
Profiling the run#
ex1, we use modelx’s feature to trace a run. modelx offers 3 functions,
stop_stacktrace, to start, output and stop tracing the call stack during a run. The code block below is an idiomatic expression for using the functions:
try: mx.start_stacktrace(maxlen=None) ex1.Projection.result_pv() df = pd.DataFrame.from_dict( mx.get_stacktrace(summarize=True), orient="index") finally: mx.stop_stacktrace() ex1.clear_all()
In this example, we want more concise output on what cells are taking time and how much, so we define our custom function that profiles and reports a run using the code block above.
def get_time_info(model): try: mx.start_stacktrace(maxlen=None) model.Projection.result_pv() df = pd.DataFrame.from_dict( mx.get_stacktrace(summarize=True), orient="index") finally: mx.stop_stacktrace() model.clear_all() # Remove model and space names from index prefixlen = len(model.name + '.Projection.') df.index = [s[prefixlen:] for s in df.index] # Add duration % total = df['duration'].sum() df['dur_perc'] = df['duration'] * 100 / total df = df[['calls', 'duration', 'dur_perc']] return df.sort_values(['dur_perc'], ascending=False)
The code below performs a profile run, and output 10 most time-consuming cells.
ex1.clear_all() # Clear the result from the previous run get_time_info(ex1).iloc[:10]
UserWarning: call stack trace activated UserWarning: call stack trace deactivated
The output tells that
surr_charge_rate(t) is consuming time the most, which is more than 40% of the total run time. Its fomula looks like below.
Optimizing the model#
surr_charge_rate(t) represents the surrener charge rates to be applied at time
t. The surrender charge rates are defined by rate ID (such as
type_1) and duration, and stored in
surr_charge_table as a DataFrame.
surr_charge_table_stacked() transforms the DataFrame into a Series by combining the row and column indexes of the DataFrame into a MultiIndex.
duration type_1 0 0.10 1 0.09 2 0.08 3 0.07 4 0.06 dtype: float64
surr_charge_rate(t) looks up
surr_charge_table_stacked() using a MultiIndex key, which is created from
duration(t) and other cells as in the formula definition below.
def surr_charge_rate(t): idx = pd.MultiIndex.from_arrays( [has_surr_charge() * surr_charge_id(), np.minimum(duration(t), surr_charge_max_idx())]) return surr_charge_table_stacked().reindex(idx, fill_value=0).set_axis( model_point().index, inplace=False)
Many of the formulas in our model rely heavily on pandas operations. However, we’ve found that we can achieve the same results at a significantly lower computational cost by refactoring these formulas. The key is to use numpy arrays in place of pandas DataFrames or Series.
For instance, the formula
surr_charge_rate(t) can be reformulated as:
def surr_charge_rate(t): ind_row = np.minimum(duration(t), surr_charge_max_idx()) return surr_charge_table.values.flat[ surr_charge_table_column() + ind_row * len(surr_charge_table.columns)]
Here, we introduce a new function,
surr_charge_table_column(), which is defined as:
def surr_charge_table_column(): return surr_charge_table.columns.searchsorted( surr_charge_id(), side='right') - 1
In this revised version,
surr_charge_rate(t) outputs a numpy array for each
t rather than a Series. Similarly, other parts of our model can be adjusted to yield numpy arrays instead of Series or DataFrames.
Furthermore, a contributor conducted an in-depth analysis of the present value logic within the model. He found a way to express it more succinctly, making it both clearer and faster compared to the original implementation. Even though the present value for each cash flow item is computed only once, leading to minimal gains in runtime improvement, this refactoring makes the model more efficient overall.
All these refinements have been integrated into
CashValue_ME_EX4, which is now part of our library. For a comprehensive comparison between the original and updated versions, you can view the link below:
In many of the updated formulas, you might notice the
.values expression at the end. This property, when invoked on a DataFrame or Series, returns the contained data as a numpy array, bypassing the DataFrame or Series container.
With all that said, let’s now evaluate the performance of our optimized model.
ex4 = mx.read_model('CashValue_ME_EX4') timeit.timeit('ex4.Projection.result_pv()', globals=globals(), number=1)
ex4.clear_all() # Clear the result from the previous run get_time_info(ex4).iloc[:10]
UserWarning: call stack trace activated UserWarning: call stack trace deactivated