.. _simple_fit_example:
 
Fitting a Simple |poppk| Model using |popy|
############################################

Here we are going to work with the simplest possible single compartment model and bolus dose, see :numref:`fig_simple_popkp_diagram_fit`:-

.. _fig_simple_popkp_diagram_fit:

.. figure:: /_autogen/quick_start/fit_example1/compartment_diagram.*
    :width: 50%
    :align: center
    
    One compartment model with bolus dosing for :ref:`fit_script`. Here |camt| is the amount of the bolus dose. |mke| is the elimination rate and |scentral| is the current amount in the central compartment. See :ref:`variable_types` for a summary of the prefixes used in |popy|.
    
In this example, we will walk through fitting the one compartment model shown in :numref:`fig_simple_popkp_diagram_fit` to a pre-existing |data_file| using |popy|, explaining the commands, input files and output files at each step.
    
.. note::
    
   See the :ref:`sum_link_fit_example1_fit` obtained by the |popy| developers for this example, including input script and input data file.
    
.. _run_fit_script_simple:

Run the Fit Script
====================

To fit a model in |popy|, you need a model file ending in .pyml and a |data_file| in comma separated value format (.csv). See files in your |popy| 'examples' sub directory:-

.. code-block:: console

    c:\PoPy\examples\fit_example1.pyml
                     fit_example1_data.csv
                     
:ref:`open_a_popy_command_prompt` to setup the |popy| environment in this folder:-

.. code-block:: console

    c:\PoPy\examples\

With the |popy| environment enabled, open the script using:-

.. code-block:: console

    $ popy_edit fit_example1.pyml

then call :ref:`popy_run` on the :ref:`fit_script` from the command line:-

.. code-block:: console

    $ popy_run fit_example1.pyml

While the script runs, you will see informative text regarding the progress of the fitting process.
    
You can observe the fitting process proceed through the text outputs in the command window. When completed, you can view the output using:-

.. code-block:: console

    $ popy_view fit_example1.pyml.html

Note the extra '.html' extension in the above command. The command :ref:`popy_view` opens a local .html file in your web browser to summarise the result of the fitting.

You can compare your local html output with the pre-computed documentation output, see :ref:`sum_link_fit_example1_fit`. You should expect some minor numerical differences when comparing results with the documentation. If you are concerned by any differences in results relative to the official |popy| documentation see :ref:`validate_popy`.


.. _summary_of_fit_results:

Summary of Fit Results
=============================

The results of running the fitting script are |popy|'s best estimate for the presumed unknown |fes| variables:-

.. literalinclude:: /_autogen/quick_start/fit_example1/final_fx_params.txt
    :language: pyml
    
In |popy| |fes| are denoted using the |fx| notation, where 'X' is the name of the |fe|.
        
The purpose of a :ref:`fit_script` is to optimise the |fes| and |res| by maximizing the likelihood of observing the input data given the model structure defined in 'fit_example1.pyml'. The input data in this case, is the |cdvcen| column in 'fit_example1_data.csv', which contains 20 individuals each with 5 observations at random time points following a bolus dose event.

You can visually compare the |pk| curves using the initial |fx| and fitted |fx| outputs with the input data in :numref:`table_pred_vs_target_plots_simple`.

.. _table_pred_vs_target_plots_simple:

.. list-table:: Model predictions vs original data points for first three individuals 
    
    * - .. thumbnail:: /_autogen/quick_start/fit_example1/images/fit_dense/000001.*
      - .. thumbnail:: /_autogen/quick_start/fit_example1/images/fit_dense/000001.*
      - .. thumbnail:: /_autogen/quick_start/fit_example1/images/fit_dense/000002.*

In the graphs above the blue dots represent the observed data points. The solid blue line represents the model predictions based on the final |fx| parameters and fitted |rx| values for each individual. The dashed blue lines represent the model predictions based on initial |fx| parameters and |rx| values set to zero.

Note in this model a bolus dose is received by all individuals at time 1.0. Then the amount of dose follows a first order exponential decay curve as the drug is eliminated from the body over time. 

The graphs show how |popy| has optimized the |fx| and |rx| parameters to maximize the likelihood of the data under this model.


.. _more_detailed_explanation_of_fit_script:

Syntax of Fit Script
=======================================

This section explains the fitting script notation to represent the components of a mathematical model, such as fixed and random effects and the equation relating the parameters to the observed data.  In this section, we will look more closely at how the model file works.

The data file included in this example is simulated from a first order |pk| model of the same form described in 'fit_example1.pyml'. The population structure is defined in the |effects| section as follows:-

.. literalinclude:: 
    /_autogen/quick_start/fit_example1/fit_sections/EFFECTS.pyml
    :language: pyml

There are three population |fes| |fx| parameters to be estimated and one |rx| which can take a different value for each individual, sampled from the population distribution. There are 20 individuals in the data set, therefore this model is attempting to estimate 23 parameters in total (i.e 3 f[X] + 20 r[X]). The |fes| are defined as follows:-

.. code-block:: pyml

    f[X] ~ unif(min_x, max_x) start_x
   
Here a uniform distribution is used to define a range of allowed values [min_x, max_x], as a kind of prior. Currently in |popy|, |fx| are restricted to having a |unif_dist| prior.

Note, it is quite common to require |pkpd| model parameters be non-negative, in order to make physical sense. The 'start_x' value is the initial value for |fx| used in the optimisation, which is usually an initial guess by the modeller. The |rx| are here sampled from a zero-mean, univariate normal distribution with a variance |fke_isv| that is optimized for the population:-

.. code-block:: pyml

    r[KE] ~ norm(0, f[KE_isv])

Each individual has a unique set of |rx| values, because the |res| are defined at the |id| level. This has the effect of creating a single |rke| sample for each identity in the data file. For more info on the syntax above see |effects|.
    
The mapping from |fx| and |rx| to the |mx| for each individual is defined in the |model_params| section:-
    
.. literalinclude:: 
    /_autogen/quick_start/fit_example1/fit_sections/MODEL_PARAMS.pyml
    :language: pyml

This models the |mke| elimination rate for each individual as a log normally distribution with a median value of |fke| and variance parametrised by |fke_isv|. There is a shared proportional noise parameter |fpnoise| for all individuals. For more info on the syntax above see |model_params|.

The |derivatives| section defines how the parameters and dosing history relate to the observed data.  In this case, we have simple bolus dosing and first-order elimination:-
    
.. literalinclude:: 
    /_autogen/quick_start/fit_example1/fit_sections/DERIVATIVES.pyml
    :language: pyml
    
The amount of the bolus dose is |camt|, which is taken from the data file for each individual. In this example it is always 100 units and occurs at time point 1.0 for every individual. The |mke| elimination rate parameter is first order with respect to |scentral|. Here |scentral| is the amount in the single compartment. For more info on the syntax above see |derivatives|.
    
For each row of the data set, |cx| values are compared with |px| variables predicted by the model, as defined below:-
    
.. literalinclude:: 
    /_autogen/quick_start/fit_example1/fit_sections/PREDICTIONS.pyml
    :language: pyml

This section shows that we are comparing model prediction :pyml:`p[CEN]` with |cdvcen| using a proportional noise model, where the standard deviation of the proportional noise is :pyml:`m[PNOISE]`. Here :pyml:`m[ANOISE]` is fixed to a small positive constant, in order to avoid zero variances when :pyml:`p[CEN]` is close to zero. For more detailed information on the syntax above see |predictions|.
    
|popy| finds the best combination of the estimated parameters:-

* |fke| - the median elimination rate - which roughly makes sure that the |pk| curves are of the correct shape to find the data.
* |fke_isv| - the magnitude of the variability in |mke| between individuals
* |fpnoise| - the proportional noise not explained by the model in the |cdvcen| data

.. comment from James
    I actually think this paragraph may need a bit of a re-write, as the real problem is that it is a double integral, not so much that the model is not identifiable.

The unexplained noise |fpnoise| and between subject variance |fke_isv| compete with each other to explain the data. For example, do measurements vary from the average model prediction due to measurements lacking precision (or some unknown mechanism) or because subjects just vary a lot in their physiology? This dual explanation for noisy data makes population mixed-effects models difficult to fit. However the population as a whole contains enough data to solve this problem using maximum likelihood [Sheiner1980]_.

In |popy| the likelihood is optimised iteratively, with the |fx| and |rx| being updated at each iteration. In this case, the likelihood (or objective function) progressed as shown in :numref:`obj_vs_time_fit_example1`

.. _obj_vs_time_fit_example1:

.. csv-table:: Objective function at each iteration for simple |poppk| example
    :file: ../../_autogen/quick_start/fit_example1/fit_example1.pyml_output/fit/OBJV_vs_time.csv

Note that the objective function is defined as -2 * the log likelihood (ignoring fixed proportionality constants). Therefore the lower the value of the objective function the better the estimated parameters fit the observed data. By default |popy| stops the fitting algorithm once the objective function has stopped decreasing.  

.. _simple_msim_example1:

Visual Predictive Check for Simple |poppk| Model
=================================================

..  comment
    The :ref:`previous example<simple_fit_example>` showed fitting a |pkpd| model to a data set. 

Given the estimated parameter values, |ie| the optimised |fx| variables, we can check whether the model and its estimate parameters are a good description of the observed data using a :term:`visual predictive check` (VPC).

..
    it is possible to test that the fitted values are sensible by generating what is known as a :term:`visual predictive check`, often abbreviated to 'VPC'.
    
Running the MSim Script
--------------------------------------

When you run a |popy| :ref:`fit_script`, it automatically generates several other scripts, including a 'msim' simulation script.  For the simple model which we have already fitted, this script can be found in:-

.. comment 
    It's presumed that you have already run the 'fit_example1.pyml' script from :ref:`simple_fit_example`. If you have then you should have access to the following output folder:-

.. code-block:: console
 
    fit_example1.pyml_output/
        msim/
            fit_example1_msim.pyml

To view or edit the :ref:`MSim script<msim_script>`, which runs simulation, navigate to:-
    
.. code-block:: console

    fit_example1.pyml_output/
        msim/

:ref:`open_a_popy_command_prompt` in the 'msim' sub folder then do:-

.. code-block:: console

    $ popy_edit fit_example1_msim.pyml
    
To view the :ref:`msim_script`. 

Then you can run the script with the following command:-

.. code-block:: console

    $ popy_run fit_example1_msim.pyml
    
Running the 'fit_example1_msim.pyml' script creates the following .svg file in the output directory:-

.. code-block:: console

    fit_example1_msim.pyml_output/
        DV_CENTRAL_sim,DV_CENTRAL_wrt_TIME_SINCE_LAST_DOSE_comb_quant_sim_vpc/
            000000.svg

This graphic should look something like :numref:`fig_simple_msim_vpc`:-

.. _fig_simple_msim_vpc:

.. figure:: simple_msim_vpc.*
    :width: 80%
    :align: center
    
    Visual Predictive Check for Simple |poppk| model.
    
In the vpc graph the y axis is the amount in the single compartment and the x axis is the time since the last dose (:term:`TSLD`). It's common to use TSLD in a plot that combines all individuals, as different individuals may have been administered doses at different times in a real life analysis, so absolute times are not comparable. 

The TSLD values are grouped into 5 bins along the x axis. Note you need a minimum number of data points in each bin and there are only 100 data points in this simple example, hence the small number of bins.

In :numref:`fig_simple_msim_vpc`, the blue dots represent the original data points. In each bin the 5%,50% and 95% quantiles are plotted for the original data set (see solid blue lines). Also in each bin, the same quantiles are computed for each of the 100 synthetic data samples. The 90% confidence interval for each of these quantiles, calculated across the 100 simulated data sets, is depicted by the shaded blue region.

The key result from the vpc graph is that the solid blue line (|ie| quantiles from the original data set) mostly lie within the shaded blue region (quantile ranges from the synthetic data sets). Since this is the case here, the model performs adequately on the :term:`VPC`. Note that the solid blue line should be within the shaded region approx 90% of the time, because the synthetic quantile ranges are constructed as a 90% confidence interval. You can change the number of simulated data sets in the :ref:`msim_script`. The quantiles of interest and the confidence intervals for those quantiles are specified in the :ref:`vpc_script`.

The blue dots (original data) are mainly shown to give some visual corroboration of the quantiles (solid blue line). In this graph because there are only 5 time axis bins and therefore each time bin is quite wide, the data points on the left side of each time bin tend to have higher concentrations. This within-bin sample distortion is quite common. Only more bins, which in turn require more data, can address this issue.
    
    
Syntax of MSim Script
-----------------------------------------

The :ref:`msim_script` processes these three elements:-

* A data set 
* The model - as automatically defined in the :ref:`msim_script` file
* The estimated model parameters

For each individual in the original data set, new synthetic data sets are created by sampling new |res| |rx| variables for each individual and new measurement noise for all data rows. |ie| The synthetic populations vary due to sampling the |rx| for each individual here:-

.. code-block:: pyml

    EFFECTS:
        ID: |
            r[KE] ~ norm(0, f[KE_isv])

And adding measurement noise here:-

.. code-block:: pyml

    PREDICTIONS: |
        p[CEN_sim] = s[CENTRAL]
        var = (p[CEN_sim]*m[PNOISE])**2 + m[ANOISE]**2
        c[DV_CENTRAL_sim] ~ norm(p[CEN_sim], var)

Note the simulated data c[DV_CENTRAL_sim] has a slightly different name from the original data set field |cdvcen|, in order to avoid name clashes when constructing graphs.

The **~** notation in the |predictions| section of a |popy| script has two slightly different interpretations in fitting versus simulation scripts, in terms of how the operator compares the left hand side (lhs) and right hand side (rhs) of the expression:-

1. In simulation scripts **~** means **sample** the lhs from the distribution on the rhs
2. In fitting scripts **~** means evaluate the **likelihood** of the rhs given the lhs

In a :ref:`msim_script` the former sampling definition is used. In a :ref:`fit_script` the latter likelihood definition is employed. 

This procedure creates a set of N new data sets, which can be compared with the original data set. In this case N=100 is defined in the |output_options| section:-

.. code-block:: pyml

    OUTPUT_OPTIONS: 
        n_pop_samples: 100

You can increase the number of samples, in order to estimate the percentiles, and their confidence intervals, more accurately. If the |pkpd| model contains more parameters or the |data_file| is more structured, you probably need 500-1000 population samples. 

Conceptually, if the model is sensible and the fitted |fx| parameters are well estimated then the artificial data sets generated by sampling the random variables should generate |pkpd| curves that resemble the observed data |pkpd| curves.

If your VPC curves do |not| look like the original data it may be possible to improve upon your model. The pattern of differences between your VPC predictions and the original data set, may give you some clues in how to improve your model.

Note that the VPC says nothing about your models ability to **generalise**, it only compares the model with the original data. For example, if you want to predict the response to much higher doses, than those present in the your original data set, the VPC provides no guarantee that predictions will be accurate.

.. only:: browser

    .. _next_steps_simple_fit:

    Next Steps
    ============

    You can see more complicated :ref:`examples_index`. Another :ref:`fit_script` example walk through is :ref:`builtin_fit_example`.

    Alternatively read the :ref:`input_data_format` description or see examples of using |popy| to generate synthetic data from a single script in :ref:`simple_tut_example`.
