.. _distributions:

Probability Distributions
###########################

The distributions available for use in |popy| models are shown in :numref:`table_prob_dists`:-

.. _table_prob_dists:

.. list-table:: Probability Distributions
    :header-rows: 1
    
    * - Name
      - Syntax
         
    * - :ref:`Uniform <unif_dist>`
      - x ~ unif(min_x, max_x) init_x

    * - :ref:`Normal <norm_dist>`
      - y ~ norm(mean, var)
      
    * - :ref:`Censored Normal <cennorm_dist>`
      - y ~ cennorm(mean, var, LLQ=llq, ULQ=ulq)
    
    * - :ref:`Rectified Normal <rectnorm_dist>`
      - y ~ rectnorm(mean, var, LLQ=llq, ULQ=ulq)
      
    * - :ref:`Truncated Normal <truncnorm_dist>`
      - y ~ truncnorm(mean, var, MIN=min, MAX=max)
      
    * - :ref:`Truncated Censored Normal <trunccennorm_dist>`
      - y ~ trunccennorm(mean, var, MIN=min, LLQ=llq, ULQ=ulq, MAX=max)
     
    * - :ref:`Truncated Rectified Normal <truncrectnorm_dist>`
      - y ~ truncrectnorm(mean, var, MIN=min, LLQ=llq, ULQ=ulq, MAX=max)
    
    * - :ref:`Multi Normal<mnorm_dist>`
      - y_vec ~ mnorm(mean_vec, var_mat)
            
    * - :ref:`Bernoulli <bern_dist>` 
      - y ~ bernoulli(p)
      
    * - :ref:`Poisson <poisson_dist>`
      - y ~ poisson(p)
      
    * - :ref:`Binomial <bin_dist>`
      - y ~ binomial(p, n)
      
    * - :ref:`Negative Binomial <neg_bin_dist>`
      - y ~ negbinomial(p, n)
      

.. _unif_dist:
    
Uniform Distribution
======================

Uniform is a continuous univariate distribution, written as:-

.. code-block:: pyml

    x ~ unif(min_x, max_x) init_x

The uniform distribution is used to define a range of values for an unknown scalar that you wish |popy| to estimate.

The input parameters are:-

* min_x - the **minimum** value that variable 'x' is allowed to take during estimation.
* max_x - the **maximum** value that variable 'x' is allowed to take during estimation.
* init_x - the **initial** value that variable 'x' takes at the start of estimation. 
    
The output 'x' and inputs 'min_x', 'max_x', 'init_x' are all continuous values.
    
For more information see :wiki_link:`Uniform Distribution on Wikipedia <Uniform_distribution>`.

.. _uniform_dist_example:

Uniform Distribution Examples
--------------------------------
    
You use the :ref:`unif_dist` in the |effects| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    f[KE] ~ unif(0.001, 100) 0.05
        
The above expressions limits the :pyml:`f[KE]` variable to the range [0.001, 100] with an initial starting value of 0.05.

Alternatively you can do:-

.. code-block:: pyml
    
    f[KE] ~ unif(0.001, +inf) 0.05

Which limits :pyml:`f[KE]` to be greater than 0.001. Note the an equivalent shortcut is available as follows:-

.. code-block:: pyml
    
    f[KE] ~ P 0.05
    
Where 'P' stands for +ve. You can also have an unconstrained variable as follows:-

.. code-block:: pyml
    
    f[KE] ~ U 0.05
    
Where 'U' stands for unlimited. The equivalent long form is:-
    
.. code-block:: pyml
    
    f[KE] ~ unif(-inf, +inf) 0.05
    
    
.. _norm_dist:

Normal Distribution
=====================

The Normal distribution is used for continuous variables and written in |popy| as:-

.. code-block:: pyml

    x ~ norm(mean, var)

The Normal models a Gaussian distribution with two parameters 'mean' and 'var'. 

The input parameters are:-

* mean - the expected value of the Normal
* var - the variance of the Normal
    
The output 'x' and inputs 'mean', 'var' are all continuous values
    
For more information see :wiki_link:`Normal Distribution on Wikipedia <Normal_distribution>`.

.. _norm_dist_re_example:

Normal Random Effect Example
-----------------------------
    
You can use the :ref:`norm_dist` in the |effects| section of a |popy| script, to define a |rx| |re| variable as follows:-
    
.. code-block:: pyml
    
    EFFECTS: 
        ID: |
            r[KE] ~ norm(0, f[KE_isv])
        
Here the :pyml:`r[KE]` scalar variable is defined as a normal with mean zero and positive scalar variance :pyml:`f[KE_isv]`.

:pyml:`r[KE]` is defined at the 'ID' level, so each individual in the population has an independent sample of this normal distribution.

.. _norm_dist_lik_example:

Normal Likelihood Example
----------------------------
    
You can use the :ref:`norm_dist` in the |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        c[DV_CENTRAL] ~ norm(p[DV_CENTRAL], var)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[DV_CENTRAL]` observation from the |data_file|, when modelled as a Normal variable, with mean p[DV_CENTRAL] and variance 'var'.
      

.. _cennorm_dist:

Censored Normal Distribution
===============================

The |cennorm_dist| is used to model whether the output of a Normal random variable lies within a particular range and is written in |popy| as:-

.. code-block:: pyml

    x ~ cennorm(mean, var, LLQ=llq, ULQ=ulq)

The |cennorm_dist| models a Censored Gaussian distribution with two parameters 'mean' and 'var' and two limit parameters 'llq' and 'ulq'.

The input parameters are:-

* mean - the expected value of the Normal
* var - the variance of the Normal
* llq - |llq_long| - optional parameter - default value is -inf
* ulq - |ulq_long| - optional parameter - default value is +inf
    
The inputs 'mean', 'var', 'llq', 'ulq' are all continuous values. The default values above imply that the following:-

.. code-block:: pyml

    x ~ cennorm(mean, var)
    
Is the same as this:-

.. code-block:: pyml

    x ~ cennorm(mean, var, LLQ=-inf, ULQ=+inf)

Which is completely uninformative, as for any value of x the likelihood is one and the log likelihood contribution zero.

The 'llq' and 'ulq' values define 3 adjacent regions in the range [-inf, +inf]. When sampling from a Censored Normal, the output can take one of three values:-

* llq - represents a sample in the range [-inf, llq]
* (llq + ulq)/2 - represents a sample in the range [llq, ulq]
* ulq - represents a sample in the range [ulq, +inf]

The probability of each value is computed using the cumulative normal distribution for each range. The sum of all three range probabilities will sum to one. 

For more information see :wiki_link:`Cumulative Normal Distribution on Wikipedia <Cumulative_distribution_function>`.

.. _cennorm_dist_lik_example:

Censored Normal Likelihood Example
-----------------------------------
    
You can use the |cennorm_dist| in the |predictions| section of a |popy| :ref:`fit_script` to model |blq_long| (|blq|) data, |ie| observations that are |not| observed directly, but are known to be below a certain |llq_long| (|llq|), as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        llq = 2.0
        if c[DV_CENTRAL] <= llq:
            c[DV_CENTRAL] ~ cennorm(p[DV_CENTRAL], var, LLQ=llq)
        else:
            c[DV_CENTRAL] ~ norm(p[DV_CENTRAL], var)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed |cdvcen| observation from the |data_file|. |cdvcen| observations greater than |llq| are modelled as a Standard Normal variable, with mean p[DV_CENTRAL] and proportional variance 'var'. |cdvcen| observations less than |llq| are modelled as a cumulative normal distribution with the same mean and variance lying within the range [-inf, llq]. This |blq| data model is referred to as method 'M3' in [Beal2001]_.

Note that any value for |cdvcen| in the data set less than or equal to llq is treated as a |blq| observation by this model.

Also note that |popy| requires the keyword syntax 'LLQ=llq' here to clarify the purpose of the third |cennorm_dist| parameter. It is also possible to model |alq_long| (|alq|) observations, as follows:-

.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        ulq = 100.0
        if c[DV_CENTRAL] >= ulq:
            c[DV_CENTRAL] ~ cennorm(p[DV_CENTRAL], var, ULQ=ulq)
        else:
            c[DV_CENTRAL] ~ norm(p[DV_CENTRAL], var)
            
Or potentially both |blq| and |alq| observations:-

.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        llq = 2.0
        ulq = 100.0
        if c[DV_CENTRAL] <= llq or c[DV_CENTRAL] >= ulq:
            c[DV_CENTRAL] ~ cennorm(p[DV_CENTRAL], var, LLQ=llq, ULQ=ulq)
        else:
            c[DV_CENTRAL] ~ norm(p[DV_CENTRAL], var)

The 'if' statement above, makes it reasonably clear how |popy| models |blq| and |alq| data, when fitting a model, however these formulae are quite long winded and difficult to sample from, so in practice it is recommended to use a :ref:`rectnorm_dist` instead, see below.

.. _rectnorm_dist:

Rectified Normal Distribution
===============================

The |rectnorm_dist| combines a |cennorm_dist| and a |norm_dist|. Its primary purpose is modelling of |blq| and |alq| observations. It is written in |popy| as:-

.. code-block:: pyml

    x ~ rectnorm(mean, var, LLQ=llq, ULQ=ulq)

The |rectnorm_dist| models |blq| and |alq| |obss| using a |cennorm_dist| and a |norm_dist| for fully observed data, with shared parameters 'mean' and 'var' over the following ranges:-

* [-inf, llq] - cennorm(mean, var, LLQ=-inf, ULQ=llq)
* [llq, ulq] - norm(mean, var)
* [ulq, +inf] - cennorm(mean, var, LLQ=ulq, ULQ=+inf)

The input parameters are:-

* mean - the expected value of the Normal
* var - the variance of the Normal
* llq - |llq_long| - optional parameter - default value is -inf
* ulq - |ulq_long| - optional parameter - default value is +inf
    
The inputs 'mean', 'var', 'llq', 'ulq' are all continuous values. The default values above imply that the following:-

.. code-block:: pyml

    x ~ rectnorm(mean, var)
    
Is the same as this:-

.. code-block:: pyml

    x ~ rectnorm(mean, var, LLQ=-inf, ULQ=+inf)

Which is the same as a |norm_dist|:-

.. code-block:: pyml

    x ~ norm(mean, var)

The 'llq' and 'ulq' values define 3 adjacent regions in the range [-inf, +inf]. When sampling from a Rectified Normal, the output can take one of three types of value:-

* llq - represents a sample in the range [-inf, llq]
* [llq, ulq] - a standard Normal sample in the range [llq, ulq]
* ulq - represents a sample in the range [ulq, +inf]

The discrete probability of a value of |llq| or less is computed using the cumulative normal distribution over the range [-inf, llq]. The discrete probability of a value of |ulq| or more is computed using the cumulative normal distribution over the range [ulq, +inf]. The continous probability density function (pdf) in the range [llq, ulq] is computed from the standard Normal distribution. The area under the pdf in the range [llq, ulq] added to the |blq| and |ulq| discrete probabilities sums to one.

For more information see :wiki_link:`Rectified Gaussian Distribution on Wikipedia <Rectified_Gaussian_distribution>`.

.. _rectnorm_dist_lik_example:

Rectified Normal Likelihood Example
-------------------------------------
    
You can use the |rectnorm_dist| in the |predictions| section of a |popy| :ref:`fit_script` to model |blq_long| (|blq|) data, |ie| observations that are |not| observed directly, but are known to be below a certain |llq_long| (|llq|), as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        llq = 2.0
        c[DV_CENTRAL] ~ rectnorm(p[DV_CENTRAL], var, LLQ=llq)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed |cdvcen| observation from the |data_file|. |cdvcen| observations greater than |llq| are modelled as a Standard Normal variable, with mean p[DV_CENTRAL] and proportional variance 'var'. |cdvcen| observations less than |llq| are modelled as a cumulative normal distribution with the same mean and variance lying within the range [-inf, llq]. This |blq| data model is referred to as method 'M3' in [Beal2001]_ and recommended by [Ahn2008]_.

Note that any value for |cdvcen| in the data set less than or equal to llq is treated as a |blq| observation by this model. You can vary the |llq| limit for each observation by specifying the limit as a separate field in the |data_file|:-

.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        c[DV_CENTRAL] ~ rectnorm(p[DV_CENTRAL], var, LLQ=c[LLQ])

You can then remove the |blq| limit for selected observations by setting :pyml:`c[LLQ]` to zero or a large negative number. Sometimes a |blq| observation is recorded in the |data_file| using a separate flag field and the |cdvcen| value itself is then the |llq|. In this case you could use the |PREDICTIONS| section above and |PREPROCESS| the data to compute a suitable :pyml:`c[LLQ]` field as follows:-

.. code-block:: pyml
    
    PREPROCESS: 
        if c[BLQ_FLAG] > 0.5:
            c[LLQ] = c[DV_CENTRAL]
        else:
            c[LLQ] = -inf

Also note that |popy| requires the keyword syntax 'LLQ=llq' here to clarify the purpose of the third |rectnorm_dist| parameter. It is also possible to model |alq_long| (|alq|) observations, as follows:-

.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        ulq = 100.0
        c[DV_CENTRAL] ~ rectnorm(p[DV_CENTRAL], var, ULQ=ulq)
            
Or potentially both |blq| and |alq| observations:-

.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        llq = 2.0
        ulq = 100.0
        c[DV_CENTRAL] ~ rectnorm(p[DV_CENTRAL], var, LLQ=llq, ULQ=ulq)
       
The functionality above is the same as combining a |cennorm_dist| and a |norm_dist| using an 'if' statement, see :ref:`cennorm_dist_lik_example`. However using |rectnorm_dist| is recommended as it is more compact and also more flexible. For example the syntax above works in the context of a |gen_script|, |sim_script| or |tut_script| as well as a |fit_script|. |ie| you can sample from a |rectnorm_dist| easily.

The |rectnorm_dist| is the principle way that |popy| modellers are encouraged to deal with |blq| and |alq| data.

.. _truncnorm_dist:

Truncated Normal Distribution
===============================

The |truncnorm_dist| is based on the |norm_dist|, but with the domain of the distribution limited to a range [min,max]. It can be used to restrict a |norm_dist| to say all positive values. It is written in |popy| as:-

.. code-block:: pyml

    x ~ truncnorm(mean, var, MIN=min, MAX=max)

The input parameters are:-

* mean - the expected value of the Normal
* var - the variance of the Normal
* min - minimum value of truncated range - optional parameter - default value is -inf
* max - maximum value of truncated range - optional parameter - default value is +inf
    
The inputs 'mean', 'var', 'min', 'max' are all continuous values. The default values above imply that the following:-

.. code-block:: pyml

    x ~ truncnorm(mean, var)
    
Is the same as this:-

.. code-block:: pyml

    x ~ truncnorm(mean, var, MIN=-inf, MAX=+inf)

Which is the same as a |norm_dist|:-

.. code-block:: pyml

    x ~ norm(mean, var)

Note the |truncnorm_dist| is different from the |rectnorm_dist|. A |truncnorm_dist| rescales its |pdf|, so that the area under the curve in the domain [min, max] is one. There is zero probability mass outside of the [min, max] range. A |rectnorm_dist| keeps the same |pdf| as the |norm_dist| within the range [llq, ulq], but includes the cumulative probability outside this region to achieve a total probality of one. 

For more information see :wiki_link:`Truncated Normal Distribution on Wikipedia <Truncated_normal_distribution>`.

.. _truncnorm_dist_lik_example:

Truncated Normal Likelihood Example
-------------------------------------
    
You can use the |truncnorm_dist| in the |predictions| section of a |popy| :ref:`fit_script` to model data that is known to occur in a certain range, |eg| all positive data:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        c[DV_CENTRAL] ~ truncnorm(p[DV_CENTRAL], var, MIN=0)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed |cdvcen| observation from the |data_file|. 

Also note that |popy| requires the keyword syntax 'MIN=min' here to clarify the purpose of the third |truncnorm_dist| parameter. It is also possible to model observations with a known upper limit, |eg| data that is known to be negative, as follows:-

.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        c[DV_CENTRAL] ~ truncnorm(p[DV_CENTRAL], var, MAX=0)
            
Or potentially observations that are known to be within a single standard deviation:-

.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        std = sqrt(var)
        min = p[DV_CENTRAL] - std
        max = p[DV_CENTRAL] + std
        c[DV_CENTRAL] ~ truncnorm(p[DV_CENTRAL], var, MIN=min, MAX=max)
       
Note that if values of |cdvcen| lie outside the range [min, max] then this model will make little sense, as the likelihood of such observations are zero and the loglikelihood is -inf.

You might wish to use |truncnorm_dist| to generate synthetic positive only observations from a model. The alternative, is possibly to use |rectnorm_dist| and generate sythetic data with a small positive |llq|.

.. _trunccennorm_dist:

Truncated Censored Normal Distribution
=========================================

The |trunccennorm_dist| is based on the |cennorm_dist|, but with the domain of the distribution limited to a range [min,max]. It can be used to restrict a |cennorm_dist| to say all positive values. It is written in |popy| as:-

.. code-block:: pyml

    x ~ trunccennorm(mean, var, MIN=min, LLQ=llq, ULQ=ulq, MAX=max)

The input parameters are:-

* mean - the expected value of the Normal
* var - the variance of the Normal
* min - minimum value of truncated range - optional parameter - default value is -inf
* llq - |llq_long| - optional parameter - default value is -inf
* ulq - |ulq_long| - optional parameter - default value is +inf
* max - maximum value of truncated range - optional parameter - default value is +inf
    
The inputs 'mean', 'var', 'min', 'llq', 'ulq', 'max' are all continuous values. The default values above imply that the following:-

.. code-block:: pyml

    x ~ trunccennorm(mean, var)
    
Is the same as this:-

.. code-block:: pyml

    x ~ trunccennorm(mean, var, MIN=-inf, LLQ=-inf, ULQ=+inf, MAX=+inf)

Which is completely uninformative, as for any value of x the likelihood is one and the log likelihood contribution zero.

Note the |trunccennorm_dist| rescales a |cennorm_dist|, so that the area under the curve in the domain [min, max] is one. There is zero probability mass outside of the [min, max] range. Effectively the range [-inf,+inf] is split into 5 sub ranges:-

* [-inf, min] - zero probability
* [min, llq] - cumulative normal probability
* [llq, ulq] - cumulative normal probability
* [ulq, max] - cumulative normal probability
* [max, +inf] - zero probability

.. _trunccennorm_dist_lik_example:

Truncated Censored Normal Likelihood Example
----------------------------------------------
    
You can use the |trunccennorm_dist| in the |predictions| section of a |popy| :ref:`fit_script` to model data that is known to occur in a certain range, |eg| all positive data:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        if c[DV_CENTRAL] <= llq:
            c[DV_CENTRAL] ~ trunccennorm(p[DV_CENTRAL], var, MIN=0, LLQ=2.0)
        else:
            c[DV_CENTRAL] ~ truncnorm(p[DV_CENTRAL], var, MIN=0)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the |cdvcen| known positive observation from the |data_file|, with a |llq| of 2.0. 

The model above shows how you might implement the 'M4' method described in [Beal2001]_, which conditions on the |blq| data being positive. However a more convenient notation for doing this is described in |truncrectnorm_dist| below.


.. _truncrectnorm_dist:

Truncated Rectified Normal Distribution
=========================================

The |truncrectnorm_dist| is based on the |rectnorm_dist|, but with the domain of the distribution limited to a range [min,max]. It can be used to restrict a |rectnorm_dist| to say all positive values. It is written in |popy| as:-

.. code-block:: pyml

    x ~ truncrectnorm(mean, var, MIN=min, LLQ=llq, ULQ=ulq, MAX=max)

The input parameters are:-

* mean - the expected value of the Normal
* var - the variance of the Normal
* min - minimum value of truncated range - optional parameter - default value is -inf
* llq - |llq_long| - optional parameter - default value is -inf
* ulq - |ulq_long| - optional parameter - default value is +inf
* max - maximum value of truncated range - optional parameter - default value is +inf
    
The inputs 'mean', 'var', 'min', 'llq', 'ulq', 'max' are all continuous values. The default values above imply that the following:-

.. code-block:: pyml

    x ~ truncrectnorm(mean, var)
    
Is the same as this:-

.. code-block:: pyml

    x ~ truncrectnorm(mean, var, MIN=-inf, LLQ=-inf, ULQ=+inf, MAX=+inf)

Which is the same as a |norm_dist|:-

.. code-block:: pyml

    x ~ norm(mean, var)

Note the |truncrectnorm_dist| rescales a |rectnorm_dist|, so that the area under the curve in the domain [min, max] is one. There is zero probability mass outside of the [min, max] range. Effectively the range [-inf,+inf] is split into 5 sub ranges:-

* [-inf, min] - zero probability
* [min, llq] - cumulative normal probability
* [llq, ulq] - standard normal probability
* [ulq, max] - cumulative normal probability
* [max, +inf] - zero probability

.. _truncrectnorm_dist_lik_example:

Truncated Rectified Normal Likelihood Example
-----------------------------------------------
    
You can use the |truncrectnorm_dist| in the |predictions| section of a |popy| :ref:`fit_script` to model data that is known to occur in a certain range, |eg| all positive data:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        c[DV_CENTRAL] ~ truncrectnorm(p[DV_CENTRAL], var, MIN=0, LLQ=2.0)
        

The above syntax in a :ref:`fit_script` specifies the likelihood of the |cdvcen| known positive observation from the |data_file|, with a |llq| of 2.0. 

The model above shows the recommend way for |popy| modellers to implement the 'M4' method described in [Beal2001]_, which conditions on the |blq| data being positive. 

The |truncrectnorm_dist| is easier to sample from and therefore use in a |gen_script|, |tut_script| and |sim_script| compared to the 'if' statment method show in :ref:`trunccennorm_dist_lik_example`.

Note in many cases it may be easier and more appropriate to use the 'M3' method and the |rectnorm_dist| shown in :ref:`rectnorm_dist_lik_example`.

.. _mnorm_dist:
    
Multivariate Normal Distribution
==================================

Multivariate-Normal distribution is used for vectors of continuous variables and written like this:-

.. code-block:: pyml

    output_vector ~ mnorm(mean_vector, covariance_matrix)
    
The Multivariate Normal is a generalisation of the :ref:`norm_dist` with two parameters 'mean_vector' and 'covariance_matrix', as follows:-

* mean_vector - the mean of the 'output_vector'
* covariance_matrix - the covariance of the 'output_vector' elements
    
The 'output_vector' must have the same number of dimensions as the 'mean_vector'. Also the 'covariance_matrix' needs to be |spd| with a matching dimensionality. See :ref:`matrices` for examples of how to define the covariance matrix.
    
For more information see :wiki_link:`Multivariate Normal Distribution on Wikipedia <Multivariate_normal_distribution>`.

.. _mnorm_dist_re_example:

Multivariate Normal Random Effect Example
------------------------------------------
    
You can use the :ref:`mnorm_dist` in the |effects| section of a |popy| script, to define a vector of |rx| |res| variables as follows:-
    
.. code-block:: pyml
    
    EFFECTS:
        ID: |
            r[KA,CL,V] ~ mnorm([0, 0, 0], f[KA_isv,CL_isv,V_isv])
        
Here the :pyml:`r[KA,CL,V]` variable is defined as a 3 element vector with mean zero. :pyml:`[0,0,0]` is a 3 element 'mean_vector' and :pyml:`f[KA_isv,CL_isv,V_isv]` is a 3x3 'covariance_matrix'. The :pyml:`f[KA_isv,CL_isv,V_isv]` matrix can be a diagonal or square symmetric matrix, see :ref:`matrices`.

The :pyml:`r[KA,CL,V]` is defined at the 'ID' level, so each individual in the population has an independent sample of this multivariate normal distribution.


.. _bern_dist:
    
Bernoulli Distribution
=========================

The Bernoulli is univariate discrete distribution used to model binary variables, and written in |popy| as:-

.. code-block:: pyml

    y ~ bernoulli(prob_success)

The Bernoulli models the distribution of a single Bernoulli trial. 

The input parameters are:-

* prob_success - probability of success of the bernouilli trial
    
The output 'y' is a binary value, |ie| either 1 for success or 0 for failure. 'prob_success' is a real valued number in the range [0,1].
    
For more information see :wiki_link:`Bernoulli Distribution on Wikipedia <Bernoulli_distribution>`.
    
.. _bern_dist_example:

Bernoulli Likelihood Example
-----------------------------
    
You can use the :ref:`bern_dist` in the |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        conc = s[X]/m[V]
        p[DV_BERN] = 1.0 / (1.0+ exp(-conc))
        c[DV_BERN] ~ bernoulli(p[DV_BERN])
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[DV_BERN]` binary observation from the |data_file|, when modelled as a Bernoulli variable, with success rate dependent on 'conc' via a logistic transform.
    
.. _poisson_dist:
    
Poisson Distribution
======================

The Poisson is a discrete univariate distribution, to model discrete count variables, written in |popy| as:-

.. code-block:: pyml

    y ~ poisson(lambda)

The Poisson models the distribution of the number of events occurring within a fixed time interval, if each individual event occurs independently and at constant rate 'lambda'. 

The input parameters are:-

* lambda - the expected number of occurrences within the time interval
    
The output 'y' is the observed count, |ie| a non-negative integer value. 'lambda' is a positive real valued number, which represents the mean rate of event occurrence.
    
For more information see :wiki_link:`Poisson Distribution on Wikipedia <Poisson_distribution>`.
    
.. _poisson_dist_example:

Poisson Likelihood Example
----------------------------
    
You can use the :ref:`poisson_dist` in the |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        c[COUNT] ~ poisson(m[LAMBDA])
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[COUNT]` count observations from the |data_file|, when modelled as a Poisson process with estimated rate parameter :pyml:`m[LAMBDA]`.
    
    
.. _bin_dist:
    
Binomial Distribution 
================================

The binomial is a univarite discrete distribution, written in |popy| as:-

.. code-block:: pyml

    num_successes ~ binomial(prob_success, num_trials)
    
The binomial models the distribution of the number of successes given a fixed number of independent :ref:`Bernoulli<bern_dist>` trials.
    
The input parameters are:-

* prob_success - probability of success of each bernouilli trial
* num_trials - number of bernouilli trials

Here the output 'num_successes' is an integer. 'num_trials' is also an integer and 'prob_success' is a real valued number in the range [0,1].
    
For more information see :wiki_link:`Binomial Distribution on Wikipedia <Binomial_distribution>`. 

.. _bin_dist_example:

Binomial Likelihood Example
--------------------------------------
    
You can use the :ref:`bin_dist` in |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        conc = s[X]/m[V]
        p[DV_B] = 1.0 / (1.0 + exp(-conc))
        c[DV_B] ~ binomial(p[DV_B], c[N_OBS])
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[DV_B]` count data from the |data_file| when modelled as the number of successes of :pyml:`c[N_OBS]` trials performed. Here success rate is dependent on 'conc' via a logistic transform. 
    
.. _neg_bin_dist:
    
Negative Binomial Distribution 
================================

The negative binomial is a univarite discrete distribution, written in |popy| as:-

.. code-block:: pyml

    num_fails ~ negbinomial(prob_success, num_of_successes)
    
The negative binomial models the distribution of the number of failures for a series of independent :ref:`Bernoulli<bern_dist>` trials until the success count reaches 'num_of_successes'.
    
The input parameters are:-

* prob_success - probability of success of each bernouilli trial
* num_of_successes - number of successful bernouilli trials before num_fails output

Here the output 'num_fails' is an integer. 'num_of_successes' is also an integer and 'prob_success' is a real valued number in the range [0,1].
    
For more information see :wiki_link:`Negative Binomial Distribution on Wikipedia <Negative_binomial_distribution>`. However note that the wikipedia page inverts the definition of success/failure. In practice there are many ways of parameterising the negative binomial parameterisation, |popy| uses the |scipy| parameterisation described here:-

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.nbinom.html

.. _neg_bin_dist_example:

Negative Binomial Likelihood Example
--------------------------------------
    
You can use the :ref:`neg_bin_dist` in |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        conc = s[X]/m[V]
        p[DV_NB] = 1.0 / (1.0 + exp(-conc))
        c[DV_NB] ~ negbinomial(p[DV_NB], 1)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[DV_NB]` count data from the |data_file| when modelled as the number of failures of a Bernoulli variable (with success rate dependent on 'conc' via a logistic transform) until the occurrence of the first success. 
         
    
    
..  comment 
    Hide custom likelihood examples for now cos not exposing use_laplacian currently.
    
    .. _custom_lik_dists:
         
    Custom Likelihood Distributions
    ==================================

    In the |predictions| section of a :ref:`fit_script` can specify your own customised log likelihood distribution using the syntax:-

    .. code-block:: pyml

        log_lik ~ custom(expression)
        
    For example :-
          
    .. code-block:: pyml
        
        PREDICTIONS: 
            conc = s[X]/m[V]
            p[DV_BERN] = 1.0 / (1.0+ exp(-conc))
            c[DV_BERN] ~ custom(-2*log(p[DV_BERN]))
          
    Is equivalent to using the inbuilt :ref:`bern_dist` as follows:-
          
    .. code-block:: pyml
        
        PREDICTIONS: 
            conc = s[X]/m[V]
            p[DV_BERN] = 1.0 / (1.0+ exp(-conc))
            c[DV_BERN] ~ bernoulli(p[DV_BERN])

    Note here the custom log likelihood is expressed as:-

    .. math::

        -2 * log(p)
        
    Where :math:`p` is the :wiki_link:`Probability mass function on Wikipedia <Probability_mass_function>` for the distribution used to compute the likelihood.
        
.. comment
    using custom() silently switches to using the use_laplacian: True option in JOE (which is a bit sneaky) - and makes the objective function comparison invalid. Maybe it would be better to expose 'use_laplacian' in the binary version and throw an informative error if the ~custom() function is used?
    
.. comment
    and more custom() examples e.g. for normals etc, a continuous pdf version.
    