
Hidden Markov Models (HMMs)
Let's consider a stochastic process X(t) that can assume N different states: s1, s2, ..., sN with first-order Markov chain dynamics. Let's also suppose that we cannot observe the state of X(t), but we have access to another process O(t), connected to X(t), which produces observable outputs (often known as emissions). The resulting process is called a Hidden Markov Model (HMM), and a generic schema is shown in the following diagram:

For each hidden state si, we need to define a transition probability P(i → j), normally represented as a matrix if the variable is discrete. For the Markov assumption, we have:

Moreover, given a sequence of observations o1, o2, ..., oM, we also assume the following assumption about the independence of the emission probability:

In other words, the probability of the observation oi (in this case, we mean the value at time i) is conditioned only by the state of the hidden variable at time i (xi). Conventionally, the first state x0 and the last one xEnding are never emittied, and therefore all the sequences start with the index 1 and end with an extra timestep corresponding to the final state.
HMMs can be employed in all those contexts where it's impossible to measure the state of a system (we can only model it as a stochastic variable with a known transition probability), but it's possible to access some data connected to it. An example can be a complex engine that is made up of a large number of parts. We can define some internal states and learn a transition probability matrix (we're going to learn how to do that), but we can only receive measures provided by specific sensors.
Sometimes, even if not extremely realistic, but it's useful to include the Markov assumption and the emission probability independence into our model. The latter can be justified considering that we can sample all the peak emissions corresponding to precise states and, as the random process O(t) is implicitly dependent on X(t), it's not unreasonable to think of it like a pursuer of X(t).
The Markov assumption holds for many real-life processes if either they are naturally first-order Markov ones, or if the states contain all the history needed to justify a transition. In other words, in many cases, if the state is A, then there's a transit to B and finally to C. We assume that when in C, the system moved from a state (B) that carries a part of the information provided by A.
For example, if we are filling a tank, we can measure the level (the state of our system) at time t, t+1, ... If the water flow is modeled by a random variable because we don't have a stabilizer, we can find the probability that the water has reached a certain level at time t, p(Lt=x|Lt-1). Of course, it doesn't make sense to condition over all the previous states, because if the level is, for example, 80 m at time t-1, all the information needed to determine the probability of a new level (state) at time t is already contained in this state (80 m).
At this point, we can start analyzing how to train a hidden Markov model, and how to determine the most likely hidden states given a sequence of observations. For simplicity, we call A the transition probability matrix, and B the matrix containing all P(oi|xt). The resulting model can be determined by the knowledge of those elements: HMM = { A, B }.