.. _builtin_gen_example:

Generate a Two Compartment |poppk| Data Set
#############################################

The :ref:`builtin_fit_example` section showed fitting a |pkpd| model to a pre-existing data set. However in |popy| it is also possible to use a :ref:`gen_script`, to generate a data set from a model file instead. |ie| The opposite of a :ref:`fit_script`. 

In this example we will demonstrate how to generate new data from a two compartment model with absorption and bolus dosing, see :numref:`fig_two_comp_depot_diagram_gen`:-

.. figure:: /_autogen/quick_start/builtin_gen_example/compartment_diagram.*
    :name: fig_two_comp_depot_diagram_gen
    :width: 80%
    :align: center
    
    Two compartment model with depot dosing for :ref:`gen_script`. This is the same model as :ref:`fig_two_comp_depot_diagram_fit`.
       
The ability to generate synthetic data from a model is especially useful if you wish to demonstrate a model, but do not have access to a real data set. Real data is expensive to obtain and even if it exists may have issues regarding completeness, accuracy or confidentiality. The other disadvantage of real data is that we never know the true underlying model.
 
.. note:: See the :ref:`sum_link_builtin_gen_example_gen` obtained by the |popy| developers for this example, including input script and output data file.
 
.. _running_builtin_gen_script:
 
Running the Gen Script
======================

This generating example make use of this single file:-

.. code-block:: console

    c:\PoPy\examples\builtin_gen_example.pyml
                            
:ref:`open_a_popy_command_prompt` to setup the |popy| environment in this folder:-

.. code-block:: console

    c:\PoPy\examples\

With the |popy| environment enabled, open the :ref:`gen_script` in an editor as follows:-

.. code-block:: console

    $ popy_edit builtin_gen_example.pyml

then execute the script using :ref:`popy_run` from the command line:-

.. code-block:: console

    $ popy_run builtin_gen_example.pyml

When the gen script has completed, you can view the output of the generating process using :ref:`popy_view`, by typing the following command:-

.. code-block:: console

    $ popy_view builtin_gen_example.pyml.html

Note the extra '.html' extension in the above command. This command opens a local .html file in your web browser to summarise the result of the generating process.

You can compare your local html output with the pre-computed documentation output, see :ref:`sum_link_builtin_gen_example_gen`. You should expect some minor numerical differences when comparing results with the documentation.

.. _quick_summary_of_builtin_gen_results:

Summary of Gen Results
=============================

The main inputs of the generating script are the |fes| |fx| variables as defined in the |level_params| of the :ref:`gen_script`. In this case the |fx| are all constant and summarised here:-

.. literalinclude:: /_autogen/quick_start/builtin_gen_example/gen_fx_params.txt
    :language: pyml
    
If the |fx| are random variables, which in |popy| are defined using a **~**, then the :ref:`gen_script` will sample each |fx| variable once. Sampling the |fx| however makes more sense if you are creating multiple synthetic data sets, see :ref:`mgen_script`.
    
Given the global |fx| variables, the :ref:`gen_script` then creates the requested number of individuals (in this case 50) and samples a set of time points (in this case 5) and dosing times (in this case a single bolus dose) for each individual. This step defines the number of rows in the synthetic data set. 

.. 
    Note it's possible to use inter occasion variance to specify more complex t :term:`IOV`

The next stage is to sample any |cx| variables specified for each individual. In this example the only |cx| variables defined in the gen_script are the |cid| field and |camt| value (which in this case is constant for all individuals). The |ctime| and |ctype| fields are created by |popy| automatically. We now have most of a valid |popy| data set, but no observation values are defined yet.

To generate observations the |rx| variables for each individual are sampled. This along with the dose times and observation time period is enough to simulate smooth |pkpd| curves from the |model_params|, |derivatives| and |predictions| defined in the script.

You can visualise the model predictions outputs (|px| variables) by examining the plots for the first three individuals in the data set.

.. _table_synthetic_gen_plots:

.. list-table:: Synthetic data plots for first three individuals
    
    * - .. thumbnail:: /_autogen/quick_start/builtin_gen_example/images/gen_dense/000000.*
      - .. thumbnail:: /_autogen/quick_start/builtin_gen_example/images/gen_dense/000001.*
      - .. thumbnail:: /_autogen/quick_start/builtin_gen_example/images/gen_dense/000002.* 

In :numref:`table_synthetic_gen_plots` above, the dotted blue line represents the model predictions given the |fx| parameters and sampled |rx| values for each individual. No noise is added to this curve and it is plotted at regular unit time steps, therefore it is smooth.

The solid blue dots represent the observations with noise added at randomly sampled time points for each individual. The solid blue dots are the values that end up in the synthetic data file under the |cdvcen| field.

Note in this model a bolus dose is received by all individuals at time 2.0. After the dose, the concentration of the drug in the |central| compartment increases as drug is absorbed from the |depot| compartment. Then the drug concentration falls as the drug is metabolised. The decay curve is first order with an inflection point due to the |peripheral| compartment. 

The doses are the same for all individuals, but the smooth curves generated by the model vary due to each individual having a different |rx| vector.


.. _more_detailed_explanation_of_builtin_gen_script:

Syntax in the Gen Script
=======================================

The |level_params| section defines the population structure that the :ref:`gen_script` will create as follows:-

.. literalinclude:: 
    /_autogen/quick_start/builtin_gen_example/gen_sections/LEVEL_PARAMS.pyml
    :language: pyml

This |level_params| structure is similar to the :ref:`syntax_of_builtin_fit_script` with some additional lines to define new individuals, doses and observation times.

The number of individuals is defined by the following line:-

.. code-block:: pyml

    c[ID] = sequential(50)
    
This specifies a sequence where the first individual is '1', the 2nd is '2' |etc| up to '50'.

This line specifies a single dose record for each individual at time 2.0:-
 
.. code-block:: pyml

    t[DOSE] = 2.0
    
This line request a sample of 5 time points uniformly distributed in the period [1.0, 50.0]:-

.. code-block:: pyml

    t[OBS] ~ unif(1.0, 50.0; 5)
   
The random effects are here sampled from a zero-mean, multi-variate normal distribution, as follows:-

.. code-block:: pyml

    r[KA, CL, V1, Q, V2] ~ mnorm([0,0,0,0,0], f[KA_isv,CL_isv,V1_isv,Q_isv,V2_isv])

Note the second parameter of mnorm, the square covariance matrix |fx_isv_mat_builtin| is a global parameter shared by all individuals. Each individual has a unique |rx_vec_builtin| vector, because the random effects are defined at the INDIV level. For more info on the syntax above see |level_params|.
    
The |model_params| and |derivatives| sections of this :ref:`gen_script` are the same as the :ref:`syntax_of_builtin_fit_script`, so are not discussed here.
    
The |predictions| section in the :ref:`gen_script` defines how the dependent |cx| variables are sampled given the |px| model predictions:-

.. literalinclude:: 
    /_autogen/quick_start/builtin_gen_example/gen_sections/PREDICTIONS.pyml
    :language: pyml
    
|popy| samples |cdvcen| for each row of the data set, to create a synthetic noisy measurement at each time point for each individual.

.. _structure_of_data_file:

Structure of output synthetic data file
=================================================

The |cx| variables are saved to disk. For an example data file see :ref:`sum_link_builtin_gen_example_gen`. The first few lines of the 'synthetic_data.csv' are shown in (:numref:`table_synthetic_csv_data`) below:-

.. _table_synthetic_csv_data:

.. csv-table:: First 10 rows of 'synthetic_data.csv' file
    :header-rows: 1
    :file: ../../_autogen/quick_start/builtin_gen_example/builtin_gen_example.pyml_output/gen/synthetic_data_trunc.csv


This shows some of the typical properties of |popy|'s :ref:`input_data_format`, where the main fields are:-

* TYPE - Specifies either a dose or an observation row.
* ID - The identifier for a given subject.
* TIME - The time stamp of the row event.
* AMT - The size of the dose at a given time.
* DV_CENTRAL - The synthetic observed values.
* DV_CENTRAL_FLAG - Indicates valid measurement rows, 1=valid 0=ignore.
* orig_data_row - The data row number within an individual subject.


.. _random_seed_gen_script:

Controlling Random Seed in |popy| scripts
===========================================

Note that the .csv data file generated by :ref:`gen_script` on your own machine, will likely contain different values due to the random sampling of |re| realizations and then random noise added to each observation.

If you wish to obtain new random results each time your re-run the :ref:`gen_script` then change the 'rand_seed' option to 'auto' as follows:-

.. code-block:: pyml

    METHOD_OPTIONS: {rand_seed: auto}

However if you re-run the :ref:`gen_script` with a fixed number, you should obtain exactly the same results on your machine as before, due to this setting:-

.. code-block:: pyml

    METHOD_OPTIONS: {rand_seed: 12345}
    
Using a fixed number for the 'rand_seed' makes any sampling process in |popy| replicable.    
    




