.. _input_data_format:

|popy| Data Format
####################

The |popy| data file records |obs| and dosing regimens for each individual in a study.

The columns or fields in the data file are split into four main types in :numref:`table_field_types`:-

.. _table_field_types:

.. list-table:: |popy| data fields
    :header-rows: 1

    * - Field
      - Comment
      
    * - :ref:`required_fields`
      - TYPE/ID/TIME
      
    * - :ref:`dosing_fields`
      - dosing regime data
      
    * - :ref:`obs_fields`
      - observed measurements
        
    * - :ref:`extra_fields` 
      - extra co-variate information
      
The data file values for each field can be accessed using the |cx| notation in the |popy| |script_file|.

.. _required_fields:

Required Fields
================

A |popy| data set requires the following fields:-

* :ref:`TYPE` - type of row
* :ref:`ID` - identity 
* :ref:`TIME` - time field 

Note the names 'TYPE', 'ID' and 'TIME' are the default names of these three required fields. You can use other field names if you choose to redefine them in the |script_file| |data_fields| section.

.. _type:

TYPE
------

The 'TYPE' field specifies the event that is happening in each row of the data file. The different types of row are as follows:-

* obs - Measurements that contribute to the log likelihood as defined in the |predictions| section.
* dose - Creates a dose according to the dosing functions in the |derivatives| section.
* pred - Extra prediction data points. |popy| will output extra |px| data at these time points, but they do |not| contribute to the likelihood.
* reset -  Set the |sx| compartment states back to the initial values (usually zero)
* reset+dose - A 'reset' combined with a 'dose' event.

The row types above have direct equivalents in |nonmem| in terms of the |evid| integer values.

Typically a drug trial data set mainly consists mainly of 'obs' and 'dose' rows with a few 'reset' rows, per subject.
 
.. _id:

ID
------

The 'ID' field value defines the individual for a given row. As |popy| is a |poppkpd| system. The 'ID' field is required because the data is split over multiple individuals to form a population.

Note that non-population analysis can be performed in |popy| by assigning all rows the same 'ID' value. 

.. _time:

TIME
------

The 'TIME' field defines the time stamp for each row. 

The time field is required to be monotonically increasing, unless a |TYPE| = 'reset' or 'reset+dose' row is reached. Note that when the :ref:`ID` identifier changes between rows, then an implicit 'reset' occurs.

For an example of a valid combination of TYPE/ID/TIME data see :numref:`table_popy_time`.

.. _table_popy_time:

.. list-table:: |popy| time reset example 
    :header-rows: 1

    * - |type|
      - |id|
      - |time|
      - comment
      
    * - obs
      - Bob
      - 0.0
      - observation at time zero
    
    * - dose
      - Bob
      - 4.0
      - dose for bob at time 4.0
      
    * - obs
      - Bob
      - 4.0
      - observation for bob at time 4.0
         
    * - obs
      - Bob
      - 8.0
      - later observation
          
    * - obs
      - Ruth
      - 0.0
      - time goes back, ok cos new ID

    * - dose
      - Ruth
      - 10.0
      - dose for Ruth at time 10.0
      
    * - obs
      - Ruth
      - 20.0
      - later observation
      
    * - reset
      - Ruth
      - 30.0
      - |sx| reset at time 30.0
      
    * - obs
      - Ruth
      - 1.0
      - observation following reset

In :numref:`table_popy_time` the time always increases or stays the same in consecutive rows, but time is allowed to go backwards after a new ID or a reset.


.. _dosing_fields:

Dosing Fields
===============

Dosing events are created in the data file using 'dose' values in the |type| field.

There are two methods of associating data dose rows with the |derivatives| section in the |popy| |script_file|, as follows:-

* :ref:`single_dose_type`
* :ref:`multi_dose_types`

The first involves using just the 'dose' value, the second involves defining dose type names.

The amount of each dose is usually specified in an |amt| field, see below.

.. _amt:

AMT
------

Note in |popy| AMT is |not| a keyword. It is just the conventional name for the dose amount field used in this documentation. See |nm_amt| for the |nonmem| keyword.

.. _single_dose_type:

Single Dose Type
-------------------

The simplest way to create doses at a set of fixed times is shown in :numref:`table_popy_single_doses`.

.. _table_popy_single_doses:

.. list-table:: |popy| single dose type example 
    :header-rows: 1

    * - |type|
      - |time|
      - |amt|
      - comment
      
    * - dose
      - 1.0
      - 100
      - dose of 100 at time 1.0
      
    * - dose
      - 2.0
      - 200
      - dose of 200 at time 2.0
      
    * - dose
      - 3.0
      - 100
      - dose of 100 at time 3.0

Note that this creates 3 doses at times [1.0, 2.0, 3.0]. The script file loading this data set should have a |derivatives| section something like:-

.. code-block:: pyml

    DERIVATIVES: |
        d[DEPOT] = @bolus{amt: c[AMT]} - m[KE] * s[DEPOT]

Note that the :ref:`@bolus` dose has no name associated with it.

.. _multi_dose_types:

Multiple Dose Types
---------------------

If you have multiple types of dose in your analysis, |eg| two different drugs being prescribed, then you need to give each dose type a name, as shown in :numref:`table_popy_multi_doses`.

.. _table_popy_multi_doses:

.. list-table:: |popy| multi dose type example 
    :header-rows: 1

    * - |type|
      - |time|
      - AMT_DRUG1
      - AMT_DRUG2
      - comment
      
    * - dose:drug1
      - 1.0
      - 100
      - 0
      - 100 units of drug1
      
    * - dose:drug2
      - 2.0
      - 0
      - 200
      - 200 units of drug2
      
    * - dose:drug1
      - 3.0
      - 50
      - 0
      - 50 units of drug1

The data file above creates 2 doses of drug1 and 1 dose of drug2. The script file loading this data set should have a |derivatives| section something like:-

.. code-block:: pyml

    DERIVATIVES: |
        dose[drug1] = @bolus{amt: c[AMT_DRUG1]}
        dose[drug2] = @bolus{amt: c[AMT_DRUG2]}
        d[DEPOT1] = dose[drug1] - m[KE1] * s[DEPOT1]
        d[DEPOT2] = dose[drug2] - m[KE2] * s[DEPOT2]

The important aspect here is that the :ref:`@bolus` doses are defined with names 'drug1' and 'drug2'. These names also appear in the |type| field in the data set as 'dose:drug1' and 'dose:drug2'.

An alternative naming syntax is as follows:-

.. code-block:: pyml

    DERIVATIVES: |
        d[DEPOT1] = @bolus{amt: c[AMT_DRUG1], name: 'drug1'} - m[KE1] * s[DEPOT1]
        d[DEPOT2] = @bolus{amt: c[AMT_DRUG2], name: 'drug2'} - m[KE2] * s[DEPOT2]

Note that when creating a |popy| data set, you only need to specify a name for each type of dose. You can leave the modelling decision of where each dose appears in the compartment model to a later time.
        
.. _obs_fields:

Observation Fields
=====================

Another important set of fields in the data file are the columns that define observed measurements. Observation rows are defined by setting |type| = 'obs'.

This section shows examples of the following:-

* :ref:`single_obs_field`
* :ref:`single_obs_field_missing`
* :ref:`multiple_obs_fields`

Note in each case the |predictions| section of the |popy| |script_file| is associated with observation fields in the data file in order to compute the likelihood correctly.

.. _single_obs_field:

Single Observed Field
----------------------

An example of a single observed field is shown in :numref:`table_single_obs`.

.. _table_single_obs:

.. list-table:: |popy| single observed field example 
    :header-rows: 1

    * - |type|
      - DRUG_CONC
      
    * - obs
      - 10.5
      
    * - obs
      - 15.5
      
    * - obs
      - 2.0

In this simple case the |predictions| section may look something like:-

.. code-block:: pyml

    PREDICTIONS: |
        p[DRUG_CONC] = s[CEN]/m[V]
        c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])
  
Note that the :pyml:`c[DRUG_CONC]` references the 'DRUG_CONC' field of the data set. Here the likelihood is computed by comparing the model prediction :pyml:`p[DRUG_CONC]` and the data file observation :pyml:`c[DRUG_CONC]` for **all** rows of the data set, where |type| = 'obs'. 

Therefore all values of the data column 'DRUG_CONC' have to be valid observations. If you have missing values then you need to use the data structure in :ref:`single_obs_field_missing`.
    
.. _single_obs_field_missing:

Observed Field with missing data
-----------------------------------

An example of a single observed field, with some **missing** data is shown in :numref:`table_single_obs_missing`.
 
.. _table_single_obs_missing:

.. list-table:: |popy| single observed field missing data example 
    :header-rows: 1

    * - |type|
      - DRUG_CONC
      - DRUG_CONC_FLAG
      - comment
      
    * - obs
      - 10.5
      - 1
      - DRUG_CONC valid
      
    * - obs
      - 0.0
      - 0
      - DRUG_CONC invalid
      
    * - obs
      - -5.0
      - 0
      - DRUG_CONC invalid
      
    * - obs
      - 2.0
      - 1
      - DRUG_CONC valid

In this case the |predictions| section may still look something like:-

.. code-block:: pyml

    PREDICTIONS: |
        p[DRUG_CONC] = s[CEN]/m[V]
        c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])
  
However not all the |type| = 'obs' rows contribute to the likelihood in this case. Only the rows that have |type| = 'obs' **and** DRUG_CONC_FLAG = 1. 

It is similar to having the following 'if' statement in your |predictions| section:-

.. code-block:: pyml

    PREDICTIONS: |
        p[DRUG_CONC] = s[CEN]/m[V]
        if c[DRUG_CONC_FLAG] > 0.5:
            c[DRUG_CONC] ~ norm(p[DRUG_CONC], m[ANOISE_var])

You can include the 'if' statement in your |predictions| section if you like, but it is not required (or encouraged).

Note also that missing out the 'DRUG_CONC_FLAG' field from your data set, has a similar effect to creating a 'DRUG_CONC_FLAG' field and setting all the values to 1. |ie| Flags default to 1 in |popy|.
            
If you have multiple observation types in your data set then flag fields become more important, see the example data structure in :ref:`multiple_obs_fields`.

.. _multiple_obs_fields:

Multiple Observed Fields
---------------------------

An example of multiple observed fields, is shown in :numref:`table_multiple_obs`.
 
.. _table_multiple_obs:

.. list-table:: |popy| multiple observed fields
    :header-rows: 1

    * - |type|
      - DRUG1
      - DRUG1_FLAG
      - DRUG2
      - DRUG2_FLAG
      - comment
      
    * - obs
      - 10.5
      - 1
      - 0.2
      - 1
      - Both drugs valid
      
    * - obs
      - 10.5
      - 1
      - 0.0
      - 0
      - only drug1 valid
      
    * - obs
      - -4.1
      - 0
      - 0.0
      - 0
      - both drugs invalid
      
    * - obs
      - -4.1
      - 0
      - 0.5
      - 1
      - only drug2 valid

In this case the |predictions| section may look something like:-

.. code-block:: pyml

    PREDICTIONS: |
        p[DRUG1] = s[CEN1]/m[V1]
        c[DRUG1] ~ norm(p[DRUG1], m[ANOISE_var1])
        p[DRUG2] = s[CEN2]/m[V2]
        c[DRUG2] ~ norm(p[DRUG2], m[ANOISE_var2])
        
Here |popy| uses the 'DRUG1_FLAG' and 'DRUG2_FLAG' fields from the data set to only compute the likelihood from valid observations. You don't have to use 'if' statements in the |predictions| section to achieve this.

.. _extra_fields:

Extra Fields
=====================

The other columns of the |popy| data file are available to use in the following :term:`verbatim` sections:-

* |model_params|
* |states|
* |derivatives|
* |predictions|

For example see below for a simple example of :ref:`covariate modelling <covariates>` using the |model_params|:-

.. code-block:: pyml

    MODEL_PARAMS: |
        m[X] = f[X] + f[X_Y_EFFECT]*c[Y]

Here the |mx| parameter is modelled as having a linear relationship with the :pyml:`c[Y]` covariate from the data file.

It is also possible to use |cx| variables in the other sections. One usage case is when you already have |pk| parameters estimated (from a previous study) and wish to use these |cx| variables in the |derivatives| section, instead of estimating |mx| parameters for each individual.

..  comment
    We don't have any PD examples yet, so add this later.
    Maybe an example of loading in previous |pk| results in a |pd| example???

.. only:: browser

    .. _next_steps_data_format:

    Next Steps
    ================

    You can use the information above to construct your own |popy| data sets from real data. If you have a previously constructed |nonmem| data set then see :ref:`nonmem_dat_to_popy_dat` for guidance on how to convert such a data set to |popy| format.

    See :ref:`simple_tut_example` for an example of creating a synthetic |popy| data file from a single script. It is also possible to create multiple data sets, see :ref:`simple_mtut_example`.
