An analytic approach to probability is naturally established if we map the sample space to some mathematical structure suitable for analysis; this is the motivation of random variables. When extending a deterministic variable to a stochastic one, the "first order uncertainty" is variance, not expectation.

**Random variable** $\chi$ is a measurable mapping from a probability space to a measurable space:
$\chi \in \mathcal{M}(\Omega, \Sigma; X, \Sigma_X)$ and $P(\Omega) = 1$.
The codomain of a random variable is typically a Euclidean space with the Lebesgue sigma-algebra,
or a Banach or Hilbert space with the Borel sigma-algebra:
$(\mathbb{R}^n, \mathcal{L})$, $(H, \mathcal{B(T_d)})$.
**Induced sigma-algebra** $\Sigma_\chi$ on the domain by a random variable
is the collection of preimages of measurable sets in the codomain:
$\Sigma_\chi := \{\chi^{-1}(B) : B \in \Sigma_X\}$.

**Distribution** $\mu: \Sigma_X \mapsto [0,1]$ of a random variable
is the probability measure induced on its codomain:
$\forall B \in \Sigma_X$, $\mu(B) = P(\chi^{-1}(B))$.

**Cumulative distribution function** (CDF) $F_\chi(x)$ of a real random variable
is the real function that gives the probability measure of the half-line to the left of each value:
$F_\chi(x) = \mu(-\infty, x]$.
CDF is a convenient representation of the distribution:
it always exists and is equivalent to the distribution if the sigma-algebra on the codomain is Borel.
**Probability density function** (PDF) $f_\chi (x)$ of a real random variable
is the derivative of a cumulative distribution function, if exists:
$f_\chi (x) = \mathrm{d} F_\chi (x) / \mathrm{d} x$.
**Probability mass function** (PMF) $f_\chi (x)$ of a discrete random variable
is the real function that assigns each value of the random variable its induced measure:
$f_\chi (x) = P(\chi^{-1}(x))$.

**Expectation** $\mathbb{E} \chi$ of a random variable is its Lebesgue integral:
$\mathbb{E} \chi = \int_\Omega \chi~\mathrm{d}P$.
Lebesgue integral provides a uniform definition
for the expectation of discrete and continuous random variables,
and ensures closure of function spaces, e.g. Banach and Hilbert spaces of functions.

Theorem (change of variables): The Lebesgue integral of a real random variable on a probability space equals the Stieltjes integral of the identity function w.r.t. the cumulative distribution function: $\int_\Omega \chi~\mathrm{d} P = \int_X x~\mathrm{d} \mu = \int_{\mathbb{R}} x~\mathrm{d} F_\chi$.

**Characteristic function** of a random variable
can be thought of as the Fourier transform of the PDF, but unlike PDF it always exists:
(1) scalar form: $\varphi_\chi (t) \equiv \mathbb{E} e^{it\chi} =
\int_{\mathbb{R}} e^{itx} \mathrm{d} \mu$;
(2) vector form: $\Phi_{\mathbf{\chi}}(\mathbf{w}) \equiv \mathbb{E}e^{i \mathbf{w}^T \mathbf{\chi}}
= \mathcal{F} f_{\mathbf{\chi}}(\mathbf{x})$.
The characteristic function uniquely determines the distribution of a random variable:
$f_{\mathbf{x}}(x) = \mathcal{F}^{-1} \Phi_{\mathbf{x}}(w)$.

Weak convergence of random variables implies pointwise convergence of the corresponding characteristic functions.

If a random variable has moments up to the k-th order, then the characteristic function is $k$ times continuously differentiable on the entire real line. If a characteristic function has a k-th derivative at 0, then the random variable has moments up to the k-th order if $k$ is even, and up to the k-1-th order if $k$ is odd. The k-th moment can be computed as $\mathbb{E} \chi^K = (-i)^k \varphi_\chi^{(k)} (0)$, if the right-hand side is well defined.

Table: Standard Form of Dominant Moments

Name | Definition | Interpretation | Dimension | Range† |
---|---|---|---|---|

mean | first raw moment | central tendency | as is | $(-\infty, \infty)$ |

standard deviation | second central moment | variation | as is | $[0,\infty)$ |

skewness | normalized third central moment | lopsidedness | dimensionless | $(-\infty, \infty)$ |

excess kurtosis | excess normalized fourth central moment, centered at normal distribution | (for symmetric distribution) probability concentration on center and tails against the standard deviations | dimensionless | $[-2, \infty)$ |

† If exists.

Classification of positive random variables by concentration: [@Taleb2018]

- compact support;
- sub-Gaussian: $\exists a > 0: F(x) = \mathcal{O}(e^{-ax^2})$;
- Gaussian;
- sub-exponential: no exponential moment; sum dominated by the maximum for large values [@Embrechts1979];
- power law (p>3): finite mean & variance;
- power law (2<p≤3): finite mean;
- power law (1<p≤2);