.. _preprocess:

PREPROCESS
#############

An optional :term:`verbatim` section that creates extra |cx| variables after loading in a :ref:`input data file<input_data_format>` and can also remove some rows from the data. A kind of flexible filter implemented in |python|.

The |preprocess| is available in the following scripts:-

* :ref:`fit_script`
* :ref:`sim_script`
* :ref:`mfit_script`
* :ref:`msim_script`

|ie| where there is a |data_file| loaded by the script.

.. _example_preprocess:

Example PREPROCESS section
=======================================

.. code-block:: pyml

    PREPROCESS: |
        # exclude negative concentrations
        if c[CONC] < 0.0: return
        # create new OCCASION variable
        if c[DAY] <= 3:
            c[OCCASION] = 1
        elif 3 < c[DAY] <= 6:
            c[OCCASION] = 2
        elif 6 < c[DAY] <= 8:
            c[OCCASION] = 3
        else:
            c[OCCASION] = 4

The example above shows the two operations a |preprocess| section can perform, namely:-

* Exclude data rows
* Create extra |cx| data columns

The line:-

.. code-block:: pyml

    if c[CONC] < 0.0: return
    
Removes all rows from the data set with CONC less than zero. The null return is a |popy| convention for ignoring a particular row.

The other rows create a new :pyml:`c[OCCASION]` variable, as follows:-

.. code-block:: pyml

    if c[DAY] <= 3:
        c[OCCASION] = 1
    elif 3 < c[DAY] <= 6:
        c[OCCASION] = 2
    elif 6 < c[DAY] <= 8:
        c[OCCASION] = 3
    else:
        c[OCCASION] = 4
        
The simple |python| assignment to :pyml:`c[OCCASION]` creates the 'OCCASION' field. The :python:`if/elif/else` statements are standard |python| syntax and partition the data rows into occasions according to the existing :pyml:`c[DAY]` data field.

Note that the remaining sections of the script file, |eg| |effects|, |derivatives| etc are able to use the new :pyml:`c[OCCASION]` variable as though it already existed in the data file.

The use of |python| syntax here means the above can be expanded in arbitrary complex ways to add more |cx| variables or exclude other rows from the data set. 

Note a common usage of the |preprocess| section is to remove an individual from the analysis as follows:-

.. code-block:: pyml

    PREPROCESS: |
        # exclude an individual
        if c[ID] == '7': return

Or potentially multiple individuals:-
        
.. code-block:: pyml

    PREPROCESS: |
        # exclude multiple individuals
        if c[ID] in ['7','9','41']: return
        
Note here the 'ID' field is a |python| :term:`string<str>` |not| a :term:`float` or :term:`integer<int>`.

.. _rules_preprocess:

Rules for PREPROCESS section
=======================================

Like all :term:`verbatim` sections the |preprocess| section of the config file accepts free form pseudo |python| code, but there are some rules regarding which variables are allowed in a |preprocess| section as follows:- 

* **Only** |cx| variables and local |python| variables are allowed 
* |cx| on the |rhs| and within :python:`if` statements must be previously defined on the |lhs| or in the |data_file|
* |cx| declared on the |lhs| must |not| already exist in the |data_file|
* return must always be **null**

So you can |not| use |mx|, |fx|, |rx|, |dx| etc variables in this section. 

The |preprocess| function is run once, shortly after loading in the |data_file|, so it is efficient to create required |cx| variables in this section, as opposed to creating temporary variables in the |model_params| or |derivatives| sections. 

Like all :term:`verbatim` sections it is possible to introduce syntax errors by writing malformed |python|. This will hopefully be picked up when |popy| attempts to compile or run the |preprocess| function as a temporary .py file.
