.. _builtin_fit_example:

Fitting a Two Compartment |poppk| Model
###########################################

The :ref:`simple_fit_example` shows fitting a one compartment PK model, to pre-existing data. In this example, we again utilise a pre-existing data set. We will demonstrate how to fit a two compartment model with absorption and bolus dosing, see :numref:`fig_two_comp_depot_diagram_fit`:-

.. _fig_two_comp_depot_diagram_fit:

.. figure:: /_autogen/quick_start/builtin_fit_example/compartment_diagram.*
    :width: 80%
    :align: center
    
    Two compartment model with depot dosing for :ref:`fit_script`. Here |camt| is the size of the bolus dose specified in the |data_file|. |mka|, |mcl|, |mq|, |mv1|, |mv2| are all |model_params| to be estimated for each individual.
    
This model is called a **two compartment** model, because the |depot| is not included, only the compartments that conventionally represent blood volumes are counted. In this case |central| and |peri|. 
    
This |popy| model is also created by default when :ref:`creating_a_fit_script`, using :ref:`popy_create`.
    
.. note::
    
   See the :ref:`sum_link_builtin_fit_example_fit` obtained by the |popy| developers for this example, including input script and input data file.

    
.. _running_builtin_fit_script:

Run the Fit Script
========================

This fitting example uses these two files:-

.. code-block:: console

    c:\PoPy\examples\builtin_fit_example.pyml
                     builitin_fit_example_data.csv
                            
:ref:`open_a_popy_command_prompt` to setup the |popy| environment in this folder:-

.. code-block:: console

    c:\PoPy\examples\

With the |popy| environment enabled, do:-

.. code-block:: console

    $ popy_edit builtin_fit_example.pyml

To view the script in an editor and then run the :ref:`fit_script` using :ref:`popy_run` from the command line:-

.. code-block:: console

    $ popy_run builtin_fit_example.pyml

When the fit script has completed, you can view the output of the fit using :ref:`popy_view`, by typing the following command:-

.. code-block:: console

    $ popy_view builtin_fit_example.pyml.html

Note the extra '.html' extension in the above command. This command opens a local .html file in your web browser to summarise the result of the fitting.

You can compare your local html output with the pre-computed documentation output, see :ref:`sum_link_builtin_fit_example_fit`. You should expect some minor numerical differences when comparing results with the documentation.
    
.. _summary_of_builtin_fit_results:

Summary of Fit Results
=============================

The results of running the fitting script are |popy|'s best estimate for the presumed unknown |fes| variables:-

.. literalinclude:: /_autogen/quick_start/builtin_fit_example/final_fx_params.txt
    :language: pyml
        
The aim of a :ref:`fit_script` is to optimise the |fes| and |res| maximizing the likelihood of observing the input data given the model structure defined in 'builtin_fit_example.pyml'. The input data in this case, is the |cdvcen| column in 'builtin_fit_example_data.csv', which contains 50 individuals each with 5 observations at random time points following a bolus dose event.

You can visually compare the initial |fx| and fitted |fx| outputs with the input data, see :numref:`table_pred_vs_target_plots_builtin`.

.. _table_pred_vs_target_plots_builtin:

.. list-table:: Model predictions vs original data points for first three individuals 
    
    * - .. thumbnail:: /_autogen/quick_start/builtin_fit_example/images/fit_dense/000000.*
      - .. thumbnail:: /_autogen/quick_start/builtin_fit_example/images/fit_dense/000001.*
      - .. thumbnail:: /_autogen/quick_start/builtin_fit_example/images/fit_dense/000002.* 

In the graphs above the blue dots represent the original data points. The solid blue line represents the model predictions given the final |fx| parameters and fitted |rx| values for each individual. The dashed blue lines represent the model predictions given initial |fx| parameters and |rx| values set to zero.

Note in this model a bolus dose is received by all individuals at time 2.0. Then the amount of drug in the |central| compartment follows a complex |pk| curve as it is first absorbed form the |depot| compartment and then eliminated over time, whilst also interacting with the |peripheral| compartment.

The graphs show that |popy| has adjusted the |fx| and |rx| parameters, so that the |pk| curves more closely match the input data and therefore maximise the likelihood of the data being generated from this model.

The data file included in this example is synthesized from the |pk| model of the same form described in 'builtin_fit_example.pyml' (see :ref:`builtin_gen_example`). So in this case, the model structure is known to be correct, so we should expect a good model fit. 

.. _syntax_of_builtin_fit_script:

Syntax of Fit Script
=======================================

The mixed effect population structure is defined in the |level_params| section as follows:-

.. literalinclude:: 
    /_autogen/quick_start/builtin_fit_example/fit_sections/LEVEL_PARAMS.pyml
    :language: pyml

There are 5 mean |fe| parameters |ie| |fx_main_builtin|, a 5x5 covariance matrix |fx_isv_mat_builtin|, a proportional noise variable :pyml:`f[PNOISE]` and a 5 element vector |rx_vec_builtin| of |res| defined for each individual. There are 50 individuals in the data set, therefore this model is attempting to estimate 6 main |fx| parameters, 15 variance |fx| parameters (the covariance matrix is symmetric) and 5 |rx| per individual. There are 271 parameters in total (i.e 15+6 |fx| + 50*5 |rx|). 
        
The allowable ranges and starting values for the main |fx| are defined using the following syntax:-

.. code-block:: pyml

    f[X] ~ P start_x
   
Here the 'P' is short for 'positive'. This expression is actually a shortcut for:-

.. code-block:: pyml

    f[X] ~ unif(0.0, +inf) start_x

Where a |unif_dist| is used to define a range of allowed values [0.0, +inf]. Note, it's quite common to require |pkpd| model parameters be non-negative, in order to make physical sense. The 'start_x' value is the initial value for |fx| used in the optimisation, which is usually an initial guess by the modeller. 

Each individual has a unique |rx_vec_builtin| vector, because the random effects are defined at the INDIV level. For more info on the syntax above see |level_params|. The |rx| are here defined as a zero-mean, multi-variate normal distribution:-

.. code-block:: pyml

    r[KA, CL, V1, Q, V2] ~ mnorm([0,0,0,0,0], f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv])

Note the second parameter of |mnorm_dist|, the square covariance matrix |fx_isv_mat_builtin| is a global parameter shared by all individuals. 
    
When fitting a model, the |fx_isv_mat_builtin| matrix is defined using a |spd_mat_dist|:-
    
.. code-block:: pyml

    f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv] ~ spd_matrix() [
        [0.05],
        [0.01, 0.05],
        [0.01, 0.01, 0.05],
        [0.01, 0.01, 0.01, 0.05],
        [0.01, 0.01, 0.01, 0.01, 0.05],
    ]

Where **spd** is short for |spd|. This distribution will always return a matrix with positive eigenvalues, starting with an initial matrix:-
    
.. math::

    \begin{pmatrix}
    0.05     & 0.01      & 0.01     & 0.01  & 0.01 \\
    0.01     & 0.05      & 0.01     & 0.01  & 0.01 \\
    0.01     & 0.01      & 0.05     & 0.01  & 0.01 \\
    0.01     & 0.01      & 0.01     & 0.05  & 0.01 \\
    0.01     & 0.01      & 0.01     & 0.01  & 0.05 \\
    \end{pmatrix}
    
Note as the initial matrix is symmetric it is only necessary to specify the lower triangle elements. |popy| will update the 15 free elements of this matrix to increase the likelihood fit.
    
Given the |fx| and |rx| the mapping to the |mx| for each individual is defined in the |model_params| section:-
    
.. literalinclude:: 
    /_autogen/quick_start/builtin_fit_example/fit_sections/MODEL_PARAMS.pyml
    :language: pyml

This shows that the :pyml:`m[KA], m[CL], m[V1], m[Q], m[V2]` parameters for each individual are modelled as log normal distributions with median values of :pyml:`f[KA], f[CL], f[V1], f[Q], f[V2]`. There is a shared proportional noise parameter :pyml:`f[PNOISE]` for all individuals. And small fixed additive noise constant :pyml:`m[ANOISE]`. For more info on the syntax above see |model_params|.

A two compartment model with first order elimination and bolus dosing via a depot compartment is defined in the |derivatives| section:-
    
.. literalinclude:: 
    /_autogen/quick_start/builtin_fit_example/fit_sections/DERIVATIVES.pyml
    :language: pyml

The bolus arrives in the |depot| compartment, due to the :pyml:`@bolus` term appearing on the right hand side of the :pyml:`d[DEPOT]` equation:-

.. code-block:: pyml

    d[DEPOT] = @bolus{amt:c[AMT]} - m[KA]*s[DEPOT] 

The amount of the bolus dose is |camt|, which is defined in the data file. In this case it is always 100 units and occurs at time point 2.0 for all individuals. The  elimination rate from the |depot| compartment is :pyml:`m[KA]`, which is first order with respect to :pyml:`s[DEPOT]`. 
    
The |central| compartment, which represents the blood plasma and where drug concentration :term:`observations` are made is defined as follows:-

.. code-block:: pyml

    d[CENTRAL] = m[KA]*s[DEPOT] - s[CENTRAL]*m[CL]/m[V1] - s[CENTRAL]*m[Q]/m[V1]  + s[PERI]*m[Q]/m[V2]
   
This consists of the input from the |depot| :pyml:`m[KA]*s[DEPOT]` and a first order elimination expression :pyml:`s[CENTRAL]*m[CL]/m[V1]` that represents the removal of the drug from the blood plasma. Here the elimination rate is expressed as a ratio of:-

* :pyml:`m[CL]`: |clear| from |central| compartment
* :pyml:`m[V1]`: |vol| of |central| compartment
   
The last two terms :pyml:`- s[CENTRAL]*m[Q]/m[V1]  + s[PERI]*m[Q]/m[V2]` are simply the negative values of the rates for the |peripheral| compartment:-

.. code-block:: pyml

    d[PERI] = s[CENTRAL]*m[Q]/m[V1] - s[PERI]*m[Q]/m[V2]
    
To compute the objective function for each row of the data set, the |sx| values are used to compute |px| prediction variables which are then compared with the target |cx| values from the data file, as defined below:-
    
.. literalinclude:: 
    /_autogen/quick_start/builtin_fit_example/fit_sections/PREDICTIONS.pyml
    :language: pyml

The predicted variable :pyml:`p[DV_CENTRAL]` is defined as follows:-

.. code-block:: pyml

    p[DV_CENTRAL] = s[CENTRAL]/m[V1]
    
Hence this is a concentration, because we are dividing the amount :pyml:`s[CENTRAL]` by the |vol| of the |central| compartment. The other two lines:-

.. code-block:: pyml
    
    var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
    c[DV_CENTRAL] ~ norm(p[DV_CENTRAL], var)

show that we are comparing the model prediction :pyml:`p[DV_CENTRAL]` with the data |cdvcen| and using |norm_dist| likelihood error model. The variance is a proportional noise model, where the standard deviation of the proportional noise is :pyml:`m[PNOISE]`. Here :pyml:`m[ANOISE]` is fixed to a small positive constant, this is to avoid zero variances when :pyml:`p[DV_CENTRAL]` is close to zero. For more info on the syntax above see |predictions|.
                
|popy| is essentially trying to find the best combination of fixed parameters as follows:-

* :pyml:`f[KA]` - the median elimination rate from the |depot| -> |central| compartment
* :pyml:`f[CL]` - the median |clear| of the |central| compartment
* :pyml:`f[V1]` - the median |vol| of the |central| compartment
* :pyml:`f[Q]` - the median |clear| between the |central| <-> |peripheral| compartments 
* :pyml:`f[V2]` - the median |vol| of the |peripheral| compartment
* |fx_isv_mat_builtin| - The covariance structure of the |fx| parameters above over the population of individuals.
* :pyml:`f[PNOISE]` - the proportional noise not explained by the model in the |cdvcen| data.

The unexplained noise :pyml:`f[PNOISE]` and between subject variance |fx_isv_mat_builtin| obfuscate each other. However the population as a whole contains enough data to solve this problem using maximum likelihood [Sheiner1980]_.

In |popy| the likelihood is optimised iteratively, with the |fx| and |rx| being updated at each iteration. In this case, the likelihood (or objective function) progressed as follows (:numref:`table_obj_vs_time_builtin_fit_example`):

.. _table_obj_vs_time_builtin_fit_example:

.. csv-table:: Objective values vs iteration number and time
    :file: ../../_autogen/quick_start/builtin_fit_example/builtin_fit_example.pyml_output/fit/OBJV_vs_time.csv
    
Note that the objective function is defined as -2 * the log likelihood. Therefore the lower the objective function the more likely the input data will be observed given the current |fx| values. By default |popy| stops the fitting algorithm once the objective function has stopped decreasing.    

.. _builtin_msim_example:

Visual Predictive Check for Two Compartment |poppk| Model
=============================================================

The :ref:`builtin_fit_example` section showed fitting a |pkpd| model to a data set. 

As shown previous in :ref:`simple_msim_example1`. It is possible to use the fitted |fes| values, i.e the optimised |fx| variables, to generate a :term:`visual predictive check`, often abbreviated to 'VPC'.

Running the MSim Script
-----------------------------------

It is presumed that you have already run the 'builtin_fit_example.pyml' script from :ref:`builtin_fit_example`. If you have then you should have access to the following output folder:-

.. code-block:: console
 
    builtin_fit_example.pyml_output/
        msim/
            builtin_fit_example_msim.pyml

You need to :ref:`open_a_popy_command_prompt` in the 'msim' sub folder then do:-

.. code-block:: console

    $ popy_edit builtin_fit_example_msim.pyml

To open the :ref:`msim_script` in an editor. You can then run the script using:-
    
.. code-block:: console

    $ popy_run builtin_fit_example_msim.pyml
    
If you run this script the following .svg file is output:-

.. code-block:: console

    builtin_fit_example_msim.pyml_output/
        DV_CENTRAL_sim,DV_CENTRAL_wrt_TIME_SINCE_LAST_DOSE_comb_quant_sim_vpc/
            000000.svg

This graphic should look something like :numref:`fig_builtin_msim_vpc`:-
    
.. _fig_builtin_msim_vpc:

.. figure:: builtin_msim_vpc.*
    :width: 80%
    :align: center
    
    Visual Predictive Check for Complex |poppk| model.

Here the y axis is the concentration in the |central| compartment and the x axis is the time since the last dose (:term:`TSLD`). See :ref:`simple_msim_example1` for a more general description of the |vpc| plot shown in :numref:`fig_builtin_msim_vpc`.

In this case, the :term:`TSLD` values are grouped into 12 equally spaced bins along the x axis. Note you need a minimum number of data points in each bin and there are only 250 data points in this toy example. Hence the small number of bins.

The blue dots (original data) are mainly shown to give some visual corroboration of the quantiles (solid blue line). In this graph there are only 12 bins and therefore each bin is quite wide, therefore some data points are grouped together inappropriately. This grouping issue is most obvious at the smaller values of TSLD, during the drug uptake period, when the drug is being mainly absorbed into |central| compartment and has not been cleared from the blood plasma yet (see :numref:`fig_builtin_msim_vpc`). Only more data and more bins can really fix the issue.  
    
.. _syntax_in_builtin_msim_script:
    
Syntax in the MSim Script
------------------------------------------

For each individual in the original data set, new synthetic data sets are created by sampling new |res| |rx| variables and new measurement noise for all data rows. |ie| The synthetic populations vary due to sampling the |rx| for each individual here:-

.. code-block:: pyml

    LEVEL_PARAMS:
        INDIV:
            params: |
                r[KA, CL, V1, Q, V2] ~ mnorm([0,0,0,0,0], f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv])

And adding measurement noise here:-

.. code-block:: pyml

    PREDICTIONS: |
        p[DV_CENTRAL_sim] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL_sim]**2
        c[DV_CENTRAL_sim] ~ norm(p[DV_CENTRAL_sim], var)

This procedure creates a set of N new data sets, which can be compared with the original data set. Where N is defined here:-

.. code-block:: pyml

    OUTPUT_OPTIONS: 
        n_pop_samples: 100

You can increase the number of samples, to make the |vpc| more representative of your model. The more complex the |pkpd| model, the more synthetic data samples you will need. 

As this model has more parameters compared to the :ref:`simple_fit_example`, it may be worth increasing the 'n_pop_samples' and re-running the :ref:`msim_script`. This is left as an exercise for the reader.
   
 

