Important asymptotic theorems
Machine learning algorithms are very populuar. However, machine learing algorithms are not stable/consistant on the performance because lots of them are not using statistical inference. Thus, statistical theory for estimating function which has established hundreds of years ago becomes a more and more interesting research direction.
In this blog, I will introduce a few important asymptotic theorems that are fundamental to prove some machine learning algorithms, such as SVM and Markov Chain.
Fatou-Lebesgue Lemma:
if the random variable
By using Fatou-Lebesgue Lemma
, we can prove the (a) Monotone convergence Theorem
, and the (b) Lebesgue Dominated Convergence Theorem
.
(a) Monotone convergence Theorem: If
is a sequence of nonnegative measurable functions denoted by and , then(b) Lebesgue Dominated Convergence Theorem:If the random variables
, then we have , almost surely for all n. Then , , and .
Partical converge relation:
Borel-Cantelli Lemma: for {
Borel-Cantelli Lemma
is useful in problems related to the a.s. convergence. It could be written as
Laws of Large Numbers: When the convergence is in probability or law, this is known as weak law of large numbers (WLLN). if
Central Limit Theorems: Let
Slutsky’s Theorem: Let
SVM1could be an application of Lebesgue Dominated Convergence Theorem
and Central Limit Theorem
. We can use the theorem and Partical converge relation
to prove the hinge loss function, when the data is not linearly separable. By limiting on Hilbert space, a weakly convergent subsequence. We can apply asymptotic normality property on the regularization parameter
The Markov chain2 can be proved by using Borel-Cantelli Lemma
. The probability of having state from
SVM is supervised learning model with associated learning algorithm that analyzes data used for classification and regression analysis.↩
A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In probability theory and related fields, a Markov process, named after the Russian mathematician Andrey Markov, is a stochastic process that satisfies the Markov property. The defining characteristic of a Markov chain is that no matter how the process arrived at its present state, the possible future states are fixed. In other words, the probability of transitioning to any particular state is dependent solely on the current state and time elapsed. The state space, or set of all possible states, can be anything: letters, numbers, weather conditions, sales volume,etc.↩