Fundamentals of Information Theory
Shannon’s information theory is a way to quantify information and to mathematically frame communication.
A communication takes place between two endpoints:
- sender: made of an information source and an encoder
- receiver: made of an information destination and a dencoder
Information is carried by a channel in the form of a sequence of symbols of a finite alphabet.
The receiver gets information only through the channel:
- it will be uncertain on what the next symbol is, until the symbol arrives
- thus we model the sender as a random variable
Acquiring information is modeled as getting to know an outcome of a random variable . The amount of information depends on the distribution of .
Intuitively: the closer is to a uniform distribution, the higher the amount of information I get from knowing an outcome.
Encoding maps each outcome as a finite sequence of symbols: more symbols should be needed when more information is sent.
Measuring uncertainty: Entropy
The desiderable properties of a measure of uncertainty are:
- Non negative measure of uncertainty
- "combining uncertainties" should map to adding entropies
Let be a discrete random variable with outcomes in with for all
The entropy of is
The measurement unit of entropy depends on the base of the logarithm: typical case for is bits.
Example: is a sequence of 6 unif. random letters (), then .
Shannon’s noiseless coding theorem
It is possible to encode the outcomes of i.i.d. random variables, each one with entropy , into no less than bits per outcome. If bits are used, some information will be lost.
Direct conseguence:
- Arbitrarily compression of bitstrings is impossible without loss
- Cryptographic hashes must discard some information
- Guessing a piece of information (= one outcome of ) is at least as hard as guessing a bit long bitstring
- overlooking for a moment the effort of decoding the guess
Min-Entropy
It is possible to have distributions with the same entropies.
We define the min-entropy of as $H_{\inf}infty}(X) = −log(max_{i}p_i)$.
Intuitively: it’s the entropy of a r.v. with uniform distribution, where the probability of each outcome is .
Guessing the most common outcome of is at least as hard as guessing a $H_{\inf}infty}(X)$
bit long bitstring.
Example of min-entropy
Consider the function: $$ X = \begin{cases} X = 0^{128} &\text{with Pr } \frac{1}{2} \ \ X = a, a \in 1\lbrace 0, 1 \rbrace^{128} &\text{with Pr } \frac{1}{2^{128}} \end{cases} $$
Predicting an outcome shouldn’t be too hard: just say
- $H_{\
inf}infty}(X) = -log_2\frac{1}{2} = 1b$
So min-entropy tells us that guessing the most common output is as hard as guessing a single bit string.