.. _distributions:

Probability Distributions
###########################

The distributions available for use in |popy| models are shown in :numref:`table_prob_dists`:-

.. _table_prob_dists:

.. list-table:: Probability Distributions
    :header-rows: 1
    
    * - Name
      - Syntax
      - Type
         
    * - :ref:`Uniform <unif_dist>`
      - x ~ unif(min_x, max_x) init_x
      - continuous univariate

    * - :ref:`Normal <norm_dist>`
      - y ~ norm(mean, var)
      - continuous univariate
      
    * - :ref:`Multi Normal<mnorm_dist>`
      - y_vec ~ mnorm(mean_vec, var_mat)
      - continuous multivariate
            
    * - :ref:`Bernoulli <bern_dist>`
      - y ~ bernoulli(p)
      - discrete univariate
      
    * - :ref:`Poisson <poisson_dist>`
      - y ~ poisson(p)
      - discrete univariate
      
    * - :ref:`Negative Binomial <neg_bin_dist>`
      - y ~ negbinomial(p, r)
      - discrete univariate
      

      

.. _unif_dist:
    
Uniform Distribution
======================

Uniform is a continuous univariate distribution, written as:-

.. code-block:: pyml

    x ~ unif(min_x, max_x) init_x

The uniform distribution is used to define a range of values for an unknown scalar that you wish |popy| to estimate.

The input parameters are:-

* min_x - the **minimum** value that variable 'x' is allowed to take during estimation.
* max_x - the **maximum** value that variable 'x' is allowed to take during estimation.
* init_x - the **initial** value that variable 'x' takes at the start of estimation. 
    
The output 'x' and inputs 'min_x', 'max_x', 'init_x' are all continuous values.
    
For more information see :wiki_link:`Uniform Distribution on Wikipedia <Uniform_distribution>`.

.. _uniform_dist_example:

Uniform Distribution Examples
--------------------------------
    
You use the :ref:`unif_dist` in the |effects| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    f[KE] ~ unif(0.001, 100) 0.05
        
The above expressions limits the :pyml:`f[KE]` variable to the range [0.001, 100] with an initial starting value of 0.05.

Alternatively you can do:-

.. code-block:: pyml
    
    f[KE] ~ unif(0.001, +inf) 0.05

Which limits :pyml:`f[KE]` to be greater than 0.001. Note the an equivalent shortcut is available as follows:-

.. code-block:: pyml
    
    f[KE] ~ P 0.05
    
Where 'P' stands for +ve. You can also have an unconstrained variable as follows:-

.. code-block:: pyml
    
    f[KE] ~ U 0.05
    
Where 'U' stands for unlimited. The equivalent long form is:-
    
.. code-block:: pyml
    
    f[KE] ~ unif(-inf, +inf) 0.05
    
    
.. _norm_dist:

Normal Distribution
=====================

The Normal distribution is used for continuous variables and written in |popy| as:-

.. code-block:: pyml

    x ~ norm(mean, var)

The Normal models a Gaussian distribution with two parameters 'mean' and 'var'. 

The input parameters are:-

* mean - the expected value of the Normal
* var - the variance of the Normal
    
The output 'x' and inputs 'mean', 'var' are all continuous values
    
For more information see :wiki_link:`Normal Distribution on Wikipedia <Normal_distribution>`.

.. _norm_dist_re_example:

Normal Random Effect Example
-----------------------------
    
You can use the :ref:`norm_dist` in the |effects| section of a |popy| script, to define a |rx| |re| variable as follows:-
    
.. code-block:: pyml
    
    EFFECTS: 
        ID: |
            r[KE] ~ norm(0, f[KE_isv])
        
Here the :pyml:`r[KE]` scalar variable is defined as a normal with mean zero and positive scalar variance :pyml:`f[KE_isv]`.

:pyml:`r[KE]` is defined at the 'ID' level, so each individual in the population has an independent sample of this normal distribution.

.. _norm_dist_lik_example:

Normal Likelihood Example
----------------------------
    
You can use the :ref:`norm_dist` in the |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        p[DV_CENTRAL] = s[CENTRAL]/m[V1]
        var = m[ANOISE]**2 + m[PNOISE]**2 * p[DV_CENTRAL]**2
        c[DV_CENTRAL] ~ norm(p[DV_CENTRAL], var)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[DV_CENTRAL]` observation from the |data_file|, when modelled as a Normal variable, with mean p[DV_CENTRAL] and variance 'var'.
      

.. _mnorm_dist:
    
Multivariate Normal Distribution
==================================

Multivariate-Normal distribution is used for vectors of continuous variables and written like this:-

.. code-block:: pyml

    output_vector ~ mnorm(mean_vector, covariance_matrix)
    
The Multivariate Normal is a generalisation of the :ref:`norm_dist` with two parameters 'mean_vector' and 'covariance_matrix', as follows:-

* mean_vector - the mean of the 'output_vector'
* covariance_matrix - the covariance of the 'output_vector' elements
    
The 'output_vector' must have the same number of dimensions as the 'mean_vector'. Also the 'covariance_matrix' needs to be |spd| with a matching dimensionality. See :ref:`matrices` for examples of how to define the covariance matrix.
    
For more information see :wiki_link:`Multivariate Normal Distribution on Wikipedia <Multivariate_normal_distribution>`.

.. _mnorm_dist_re_example:

Multivariate Normal Random Effect Example
------------------------------------------
    
You can use the :ref:`mnorm_dist` in the |effects| section of a |popy| script, to define a vector of |rx| |res| variables as follows:-
    
.. code-block:: pyml
    
    EFFECTS:
        ID: |
            r[KA,CL,V] ~ mnorm([0, 0, 0], f[KA_isv,CL_isv,V_isv])
        
Here the :pyml:`r[KA,CL,V]` variable is defined as a 3 element vector with mean zero. :pyml:`[0,0,0]` is a 3 element 'mean_vector' and :pyml:`f[KA_isv,CL_isv,V_isv]` is a 3x3 'covariance_matrix'. The :pyml:`f[KA_isv,CL_isv,V_isv]` matrix can be a diagonal or square symmetric matrix, see :ref:`matrices`.

The :pyml:`r[KA,CL,V]` is defined at the 'ID' level, so each individual in the population has an independent sample of this multivariate normal distribution.


.. _bern_dist:
    
Bernoulli Distribution
=========================

The Bernoulli is univariate discrete distribution used to model binary variables, and written in |popy| as:-

.. code-block:: pyml

    y ~ bernoulli(prob_success)

The Bernoulli models the distribution of a single Bernoulli trial. 

The input parameters are:-

* prob_success - probability of success of the bernouilli trial
    
The output 'y' is a binary value, |ie| either 1 for success or 0 for failure. 'prob_success' is a real valued number in the range [0,1].
    
For more information see :wiki_link:`Bernoulli Distribution on Wikipedia <Bernoulli_distribution>`.
    
.. _bern_dist_example:

Bernoulli Likelihood Example
-----------------------------
    
You can use the :ref:`bern_dist` in the |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        conc = s[X]/m[V]
        p[DV_BERN] = 1.0 / (1.0+ exp(-conc))
        c[DV_BERN] ~ bernoulli(p[DV_BERN])
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[DV_BERN]` binary observation from the |data_file|, when modelled as a Bernoulli variable, with success rate dependent on 'conc' via a logistic transform.
    
.. _poisson_dist:
    
Poisson Distribution
======================

The Poisson is a discrete univariate distribution, to model discrete count variables, written in |popy| as:-

.. code-block:: pyml

    y ~ poisson(lambda)

The Poisson models the distribution of the number of events occurring within a fixed time interval, if each individual event occurs independently and at constant rate 'lambda'. 

The input parameters are:-

* lambda - the expected number of occurrences within the time interval
    
The output 'y' is the observed count, |ie| a non-negative integer value. 'lambda' is a positive real valued number, which represents the mean rate of event occurrence.
    
For more information see :wiki_link:`Poisson Distribution on Wikipedia <Poisson_distribution>`.
    
.. _poisson_dist_example:

Poisson Likelihood Example
----------------------------
    
You can use the :ref:`poisson_dist` in the |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        c[COUNT] ~ poisson(m[LAMBDA])
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[COUNT]` count observations from the |data_file|, when modelled as a Poisson process with estimated rate parameter :pyml:`m[LAMBDA]`.
    
    
.. _neg_bin_dist:
    
Negative Binomial Distribution 
================================

The negative binomial is a univarite discrete distribution, written in |popy| as:-

.. code-block:: pyml

    num_succeses ~ negbinomial(prob_success, num_of_fails)
    
The negative binomial models the distribution of the number of successes for a series of independent :ref:`Bernoulli<bern_dist>` trials until the failure count reaches 'num_of_fails'.
    
The input parameters are:-

* prob_success - probability of success of each bernouilli trial
* num_of_fails - number of unsuccessful bernouilli trials before num_successes output

Here the output 'num_successes' is an integer. 'num_of_fails' is also an integer and 'prob_success' is a real valued number in the range [0,1].
    
For more information see :wiki_link:`Negative Binomial Distribution on Wikipedia <Negative_binomial_distribution>`.

.. _neg_bin_dist_example:

Negative Binomial Likelihood Example
--------------------------------------
    
You can use the :ref:`neg_bin_dist` in |predictions| section of a |popy| :ref:`fit_script` as follows:-
    
.. code-block:: pyml
    
    PREDICTIONS: 
        conc = s[X]/m[V]
        p[DV_NB] = 1.0 / (1.0 + exp(-conc))
        c[DV_NB] ~ negbinomial(p[DV_NB], 1)
        
The above syntax in a :ref:`fit_script` specifies the likelihood of the observed :pyml:`c[DV_NB]` count data from the |data_file| when modelled as the number of successes of a Bernoulli variable (with success rate dependent on 'conc' via a logistic transform) until occurrence of first failure. 
         
    
    
..  comment 
    Hide custom likelihood examples for now cos not exposing use_laplacian currently.
    
    .. _custom_lik_dists:
         
    Custom Likelihood Distributions
    ==================================

    In the |predictions| section of a :ref:`fit_script` can specify your own customised log likelihood distribution using the syntax:-

    .. code-block:: pyml

        log_lik ~ custom(expression)
        
    For example :-
          
    .. code-block:: pyml
        
        PREDICTIONS: 
            conc = s[X]/m[V]
            p[DV_BERN] = 1.0 / (1.0+ exp(-conc))
            c[DV_BERN] ~ custom(-2*log(p[DV_BERN]))
          
    Is equivalent to using the inbuilt :ref:`bern_dist` as follows:-
          
    .. code-block:: pyml
        
        PREDICTIONS: 
            conc = s[X]/m[V]
            p[DV_BERN] = 1.0 / (1.0+ exp(-conc))
            c[DV_BERN] ~ bernoulli(p[DV_BERN])

    Note here the custom log likelihood is expressed as:-

    .. math::

        -2 * log(p)
        
    Where :math:`p` is the :wiki_link:`Probability mass function on Wikipedia <Probability_mass_function>` for the distribution used to compute the likelihood.
        
.. comment
    using custom() silently switches to using the use_laplacian: True option in JOE (which is a bit sneaky) - and makes the objective function comparison invalid. Maybe it would be better to expose 'use_laplacian' in the binary version and throw an informative error if the ~custom() function is used?
    
.. comment
    and more custom() examples e.g. for normals etc, a continuous pdf version.
    