Generating seriatim model points for cluster analysis example#
This notebook is modified from generate_model_points_with_duration.ipynb in the basiclife library and generates the seriatim policies for the example performed by the cluster_model_points.ipynb notebook. The modifications are:
policy_count
is set to 1 for all the model points.duration_mth
is modified to be positive, i.e. all model points are existing policies.
Columns:
point_id
: Model point identifierage_at_entry
: Issue age. The samples are distributed uniformly from 20 to 59.sex
: “M” or “F” to indicate policy holder’s sex. Not used.policy_term
: Policy term in years. The samples are evenly distriubted among 10, 15 and 20.policy_count
: The number of policies. Uniformly distributed from 0 to 100.sum_assured
: Sum assured. The samples are uniformly distributed from 10,000 to 1,000,000.duration_mth
: Months elapsed from the issue til t=0. Uniformly distributed from 1 to 12 timespolicy_term
- 1.
Number of model points:
10000
Click the badge below to run this notebook online on Google Colab. You need a Google account and need to be logged in to it to run this notebook on Google Colab.
The next code cell below is relevant only when you run this notebook on Google Colab. It installs lifelib and creates a copy of the library for this notebook.
[1]:
import sys, os
if 'google.colab' in sys.modules:
lib = 'cluster'; lib_dir = '/content/'+ lib
if not os.path.exists(lib_dir):
!pip install lifelib
import lifelib; lifelib.create(lib, lib_dir)
%cd $lib_dir
[2]:
import numpy as np
from numpy.random import default_rng # Requires NumPy 1.17 or newer
rng = default_rng(12345)
# Number of Model Points
MPCount = 10000
# Issue Age (Integer): 20 - 59 year old
age_at_entry = rng.integers(low=20, high=60, size=MPCount)
# Sex (Char)
Sex = [
"M",
"F"
]
sex = np.fromiter(map(lambda i: Sex[i], rng.integers(low=0, high=len(Sex), size=MPCount)), np.dtype('<U1'))
# Policy Term (Integer): 10, 15, 20
policy_term = rng.integers(low=0, high=3, size=MPCount) * 5 + 10
# Sum Assured (Float): 10000 - 1000000
sum_assured = np.round((1000000 - 10000) * rng.random(size=MPCount) + 10000, -3)
# Duration in month (Int): 0 < Duration(mth) < Policy Term in month
duration_mth = np.floor((policy_term * 12 - 1) * rng.random(size=MPCount)).astype(int) + 1
# Policy Count (Integer): 1
policy_count = 1
[3]:
import pandas as pd
attrs = [
"age_at_entry",
"sex",
"policy_term",
"policy_count",
"sum_assured",
"duration_mth"
]
data = [
age_at_entry,
sex,
policy_term,
policy_count,
sum_assured,
duration_mth
]
model_point_table = pd.DataFrame(dict(zip(attrs, data)), index=range(1, MPCount+1))
model_point_table.index.name = "policy_id"
model_point_table
[3]:
age_at_entry | sex | policy_term | policy_count | sum_assured | duration_mth | |
---|---|---|---|---|---|---|
policy_id | ||||||
1 | 47 | M | 10 | 1 | 622000.0 | 28 |
2 | 29 | M | 20 | 1 | 752000.0 | 213 |
3 | 51 | F | 10 | 1 | 799000.0 | 39 |
4 | 32 | F | 20 | 1 | 422000.0 | 140 |
5 | 28 | M | 15 | 1 | 605000.0 | 76 |
... | ... | ... | ... | ... | ... | ... |
9996 | 47 | M | 20 | 1 | 827000.0 | 168 |
9997 | 30 | M | 15 | 1 | 826000.0 | 169 |
9998 | 45 | F | 20 | 1 | 783000.0 | 158 |
9999 | 39 | M | 20 | 1 | 302000.0 | 41 |
10000 | 22 | F | 15 | 1 | 576000.0 | 167 |
10000 rows × 6 columns