The BasicTerm_SC Model#

Overview#

The BasicTerm_SC model is a variant of BasicTerm_S that is optimized for generating a compiled model using Cython with modelx-cython.

Like the original model, BasicTerm_SC can be exported as a pure Python model (also called a “nomx” model), which is written as standard Python objects and does not depend on modelx. This exported model can then be compiled with Cython using the modelx-cython package (see the relevant blog post for more details on modelx-cython).

A compiled version of BasicTerm_SC runs about 7–8 times faster than its nomx model but yields the same results as BasicTerm_S. With future improvements to modelx-cython, it is expected that the compiled model will run even faster. Although BasicTerm_S can also be compiled by modelx-cython, it achieves only about twice the speed of its nomx model, because it was not written to take full advantage of Cython optimizations.

The BasicTerm_SC model thus serves as an example of how to optimize a modelx model for generating a fast compiled version.

Optimization Strategy#

BasicTerm_SC is derived from BasicTerm_S by applying the following changes to make its cythonized model run faster:

  • Use of primitive types To accelerate cash flow projection over a large number of model points, high-level Python objects (such as strings, lists, dictionaries, and pandas DataFrames) are removed from the Projection space. Formulas in that space are instead written to operate primarily on primitive numeric types, such as int and float.

  • Separate data and projection Cells for reading input data from files have been moved to the Data space. Input data (such as policy attributes) is held in NumPy arrays instead of pandas DataFrames.

  • Parameterize with array indices The Projection space is parameterized by an array index, idx (rather than point_id) to identify model points. idx is used as a key to look up the value of the selected model point in the arrays of policy attributes.

Basic Usage#

This section explains the steps to create the cythonized version of the BasicTerm_SC model.

Install C compiler, Cython, and modelx-cython#

In addition to lifelib and modelx, you need to install Cython and modelx-cython. Because Cython requires a C compiler, install one if necessary by following the instructions in Cython’s official documentation.

Then install Cython and modelx-cython using pip:

> pip install Cython

> pip install modelx-cython

Or using conda, if you’re using Anaconda:

> conda install Cython

> conda install modelx-cython

Copy the basiclife library#

Create your own copy of the basiclife library by following the steps in the Quick Start page. Within the copied folder, you will find a BasicTerm_SC subfolder that contains the model.

Because BasicTerm_SC is a modelx model, you can load and run it from IPython or Spyder (with the spyder-modelx plugin).

For example, in IPython with your current directory set to the location of BasicTerm_SC:

>>> import modelx as mx

>>> model = mx.read_model("BasicTerm_SC")

If you’re using Spyder, open the MxExplorer pane, right-click on an empty area, choose Read Model, then select the BasicTerm_SC folder.

Export and compile the model#

First, export the modelx model to create a pure-Python (nomx) model. Use the export method on the model object as follows (IPython example):

>>> model.export("BasicTerm_SC_nomx")

This command exports the model as a Python package named “BasicTerm_SC_nomx” in the current directory.

You can test the exported (nomx) model by importing it and accessing its cells. For instance:

>>> from BasicTerm_SC_nomx import mx_model

>>> mx_model.Projection[0].result_pv()

>>> mx_model.Projection[9999].result_pv()

(Here, mx_model is the nomx model object.)

Cythonize the model#

The modelx-cython package provides a shell command named mx2cy. To generate a compiled version of BasicTerm_SC, change to the directory containing the exported nomx model (e.g., the folder with BasicTerm_SC_nomx) and run:

> mx2cy BasicTerm_SC

To see help options, run:

> mx2cy --help

usage: mx2cy [-h] [--sample SAMPLE] [--spec SPEC] [--setup SETUP] [--translate-only | --compile-only] model_path

Translate an exported modelx model into Cython and compile it.

positional arguments:
  model_path        Path to an exported modelx model to translate into Cython

options:
  -h, --help        show this help message and exit
  --sample SAMPLE   Path to a sample file to run for collecting type information (default: sample.py)
  --spec SPEC       Path to a spec file for setting parameters (default: spec.py)
  --setup SETUP     Path to a setup file for Cython (default: setup.py)
  --translate-only  Perform translation only (default: False)
  --compile-only    Perform compilation only (default: False)

The mx2cy command requires two files, sample.py and spec.py, both included in the basiclife library. When run, mx2cy does the following:

  1. Executes sample.py to collect run-time type information for BasicTerm_SC_nomx.

  2. Outputs a folder BasicTerm_SC_nomx_cy, containing Cython-translated files.

  3. Compiles the translated files into a binary module using Cython, producing the compiled model in BasicTerm_SC_nomx_cy.

Test the compiled model#

Use the run_sc.py script to test speed and memory usage. For the compiled (Cython) model:

> python run_sc.py --cython
{'value': 1448.9630534601538, 'mem_use': 477.5078125, 'time': 1.5274626000027638}

For comparison, run the nomx version:

> python run_sc.py --nomx
{'value': 1448.9630534601563, 'mem_use': 2050.42578125, 'time': 11.355696699989494}

The script calculates the present value of net cash flows for 10,000 model points and outputs the total present value, the maximum memory usage, and the time taken.

Model Specifications#

BasicTerm_SC consists of two spaces: Data and Projection.

  • The Data space contains references to input files and cells for loading input data. In BasicTerm_S, these tasks were done in the Projection space. In BasicTerm_SC, the model point table is kept as a pandas DataFrame in model_point_table, and each attribute (e.g., policy_term) is stored in a NumPy array. The NumPy array index is used for identifying model points.

  • The Projection space contains the projection logic for a single model point, parameterized by an array index idx (instead of point_id). Formulas here reference the NumPy arrays in Data.

The Data Space#

The Data space is for reading input data from internal files, and provides the data to Projection in the form of numpy arrays. Below is the list the references for input data and associated files in this space.

Input Fiels and References#

Reference

Input File

model_point_table

model_point_table.xlsx

mort_table

mort_table.xlsx

disc_rate_ann

disc_rate_ann.xlsx

Parameters and References

(In all the sample code below, the global variable BasicTerm_SC refers to the BasicTerm_SC model.)

model_point_table#

This reference holds model point data as a pandas DataFrame read from the internal associated file, model_point_table.xlsx.

>>> BasicTerm_SC.Data.model_poit_table
           age_at_entry sex  policy_term  policy_count  sum_assured
point_id
1                    47   M           10             1       622000
2                    29   M           20             1       752000
3                    51   F           10             1       799000
4                    32   F           20             1       422000
5                    28   M           15             1       605000
                ...  ..          ...           ...          ...
9996                 47   M           20             1       827000
9997                 30   M           15             1       826000
9998                 45   F           20             1       783000
9999                 39   M           20             1       302000
10000                22   F           15             1       576000

[10000 rows x 5 columns]

The columns of the DataFrame represents model point attributes, such as age_at_entry, sex, policy_term, policy_count and sum_assured. The columns are then referenced from the cells with the same names. Each of the cells then returns its attribute for all model points as a numpy array. Unlike BasicTerm_S, point_id is not used as the model point identifier. Instead, the array index is used.

disc_rate_ann#

This refernce holds annual discount rates by duration as a pandas Series read from the internal associated file, model_point_table.xlsx.

>>> BasicTerm_SC.Data.disc_rate_ann
year
0      0.00000
1      0.00555
2      0.00684
3      0.00788
4      0.00866

146    0.03025
147    0.03033
148    0.03041
149    0.03049
150    0.03056
Name: disc_rate_ann, Length: 151, dtype: float64

This is referenced from disc_rate_ann_array(), which converts the Series into a numpy array, and is referenced from the Projection space.

mort_table#

This reference holds a mortality table by age and duration as a DataFrame. The table is read from the associated internal file, mort_table.xlsx.

>>> BasicTerm_SC.Data.mort_table
            0         1         2         3         4         5
Age
18   0.000231  0.000254  0.000280  0.000308  0.000338  0.000372
19   0.000235  0.000259  0.000285  0.000313  0.000345  0.000379
20   0.000240  0.000264  0.000290  0.000319  0.000351  0.000386
21   0.000245  0.000269  0.000296  0.000326  0.000359  0.000394
22   0.000250  0.000275  0.000303  0.000333  0.000367  0.000403
..        ...       ...       ...       ...       ...       ...
116  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
117  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
118  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
119  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
120  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000

[103 rows x 6 columns]

This is referenced from mort_table_array(), which converts the DataFrame into a numpy array. mort_table_array() adds rows filled with nan to align the row index with age. mort_table_array() is referenced from the Projection space.

np#

The numpy module.

pd#

The pandas module.

Model point data#

The model point data is stored in an Excel file named model_point_table.xlsx under the library directory.

age_at_entry()

Age at entry

sex()

Sex

policy_term()

Policy term

policy_count()

Policy counts

point_id()

Point ID

sum_assured()

Sum assured

Assumption data#

The mortality table is stored in mort_table.xlsx under the model folder and read into mort_table as a DataFrame. mort_table_array() converts this DataFrame into a NumPy array, adding rows filled with nan to align row indices with ages. mort_table_array() is then used by mort_rate() in the Projection space.

The discount rate data is stored in disc_rate_ann.xlsx under the model folder and read into disc_rate_ann as a Series, which is then converted into a NumPy array.

mort_table_array()

Mortality table as a numpy array

disc_rate_ann_array()

Annual discount rates

The Projection Space#

The Projection space includes projection logic for individual model points.

Projection is derived from basiclife.BasicTerm_S.Projection by applying changes to make its compiled model run faster.

Policy attributes and other input data are read from Data, which is referenced as data in Projection.

In Data, policy attributes, such as policy_term(), are returned as 1-dimensional numpy arrays. Consequently, Projection is parameterized with idx, which represents the array index to identify model points.

Parameters and References

(In all the sample code below, the global variable BasicTerm_SC refers to the BasicTerm_SC model.)

idx#

Array index to identify a model point. Policy attributes, such as policy_term(), are returned as 1-dimensional numpy arrays in Data. idx is defined as a Reference, and its value is used for determining the selected model point. By default, 0 is assigned. To select another model point, assign its array index to it:

>>> BasicTerm_SC.Projection.idx = 2

idx is also defined as the parameter of the Projection Space, which makes it possible to create dynamic child space for multiple model points:

>>> BasicTerm_SC.Projection.parameters
('idx',)

>>> BasicTerm_SC.Projection[1]
<ItemSpace BasicTerm_SC.Projection[1]>

>>> BasicTerm_SC.Projection[2]
<ItemSpace BasicTerm_SC.Projection[2]>
data#

The Data space.

np#

The numpy module.

pd#

The pandas module.

Projection parameters#

This model represents new business, with all model points issued at time 0. The time step is monthly. Cash flows and other time-dependent variables are indexed by t.

Flows that accumulate throughout period t (until t+1) have indices t, while balance items indexed by t represent the value at that exact time.

proj_len()

Projection length in months

duration(t)

Duration in force in years

Model point data#

The same model_point_table.xlsx file under the model folder is referenced to obtain model point data such as ages, sum assured, and terms.

age(t)

The attained age at time t.

age_at_entry()

The age at entry of the selected model point

sex()

The sex of the selected model point

sum_assured()

The sum assured of the selected model point

policy_term()

The policy term of the selected model point.

Assumptions#

mort_rate() reads annual mortality rates from mort_table_array() in Data and converts them to monthly rates via mort_rate_mth().

disc_rate_mth() reads annual discount rates from disc_rate_ann_array() in Data and converts them to monthly discount factors via disc_factor().

lapse_rate() is defined as a simple function of policy duration. expense_acq() is the acquisition expense per policy at t=0, and expense_maint() is the annual maintenance expense per policy, inflated at a constant rate (inflation_rate()).

mort_rate(t)

Mortality rate to be applied at time t

mort_rate_mth(t)

Monthly mortality rate to be applied at time t

disc_factor(t)

Discount factor at time t.

disc_rate_mth(t)

Monthly discount rate

lapse_rate(t)

Lapse rate

expense_acq()

Acquisition expense per policy

expense_maint()

Annual maintenance expense per policy

inflation_factor(t)

The inflation factor at time t

inflation_rate()

Inflation rate

Policy values#

By default, the death benefit for each policy (claim_pp()) equals sum_assured. All model points pay monthly premiums for the entire policy term.

The monthly premium per policy (premium_pp()) is calculated as (1 + loading_prem) * net_premium_pp, where the net premium is set so that the present value of net premiums equals the present value of claims. This product has no surrender value.

claim_pp(t)

Claim per policy

net_premium_pp()

Net premium per policy

loading_prem()

Loading per premium

premium_pp()

Monthly premium per policy

Policy decrement#

Initially, each model point is assumed to have one policy in force. The in-force policies decrease by lapses and deaths each month, and any remaining policies at the end of the policy term reach maturity and exit.

pols_death(t)

Number of death occurring at time t

pols_if(t)

Number of policies in-force

pols_if_init()

Initial Number of Policies In-force

pols_lapse(t)

Number of lapse occurring at time t

pols_maturity(t)

Number of maturing policies

Cashflows#

Cashflows consist of acquisition expenses at t=0, maintenance expenses thereafter, commissions, premiums, and claims. Commissions are assumed to be 100% of premium during the first policy year and zero afterward.

claims(t)

Claims

commissions(t)

Commissions

premiums(t)

Premium income

expenses(t)

Acquisition and maintenance expenses

net_cf(t)

Net cashflow

Present values#

Cells whose names begin with pv_ compute present values of various flows. Although pols_if() is not itself a cashflow, it is used as an annuity factor in net_premium_pp().

pv_claims()

Present value of claims

pv_commissions()

Present value of commissions

pv_expenses()

Present value of expenses

pv_net_cf()

Present value of net cashflows.

pv_pols_if()

Present value of policies in-force

pv_premiums()

Present value of premiums

check_pv_net_cf()

Check present value summation

Results#

result_cf() returns a DataFrame of monthly cashflows and decrements for a selected model point:

>>> result_cf()
      Premiums     Claims  ...  Policies Death  Policies Exits
0    94.840000  34.180793  ...        0.000055        0.008742
1    94.005734  33.880120  ...        0.000054        0.008665
2    93.178806  33.582091  ...        0.000054        0.008588
3    92.359153  33.286684  ...        0.000054        0.008513
4    91.546710  32.993876  ...        0.000053        0.008438
..         ...        ...  ...             ...             ...
116  62.432465  63.534771  ...        0.000102        0.001107
117  62.317757  63.418038  ...        0.000102        0.001105
118  62.203260  63.301519  ...        0.000102        0.001103
119  62.088973  63.185215  ...        0.000102        0.001101
120   0.000000   0.000000  ...        0.000000        0.000000

[121 rows x 8 columns]

result_pv() returns the present values of these cashflows, along with each flow’s percentage relative to the present value of premiums:

>>> result_pv()
              Premiums       Claims    Expenses  Commissions  Net Cashflow
PV         8251.931435  5501.074678  748.303591  1084.601434    917.951731
% Premium     1.000000     0.666641    0.090682     0.131436      0.111241

result_cf()

Result table of cashflows

result_pv()

Result table of present value of cashflows