Information Theory

Master course in Communication Engineering

INFORMATION THEORY
Prof. Alberto Bononi Tel. 0521 905760 alberto.bononi@unipr.it http://www.tlc.unipr.it/bononi/didattica/TI/TI.html
Course Objectives Provide an introduction to Information Theory, with applications mainly in the area of signal compression and digital communications.

Classes (A.A. 2024/2025) All Classes will be held in presence in room B/3 (scientific complex):
Monday 10:30-12:30; Friday 10:30-12:30. Videolectures are available from a previous year (WARNING_how_to_set_audio_quality_in_videos), along with class notes. ID and password to access the videos/slides will be communicated to you in class on the first lecture.

Office Hours Monday 15:00-17:00. Please schedule an appointment first by sending me an email to alberto.bononi[AT]unipr.it. We can meet in person in my office, or online in the following Teams Virtual Classroom.

Credits This course is worth 6 credits (CFU)

Prerequisites Entry-level courses covering: Probability theory and stochastic processes; Fourier analysis in continuous and discrete time; Fourier analysis of linear time-invariant systems. A short guide to review background material can be found here . Video lectures of the preparatory course held in September 2017 can be found here . To get prep-course userid and password, please send me an email.

Textbook
[1] T. M. Cover, J. A. Thomas, "Elements of Information Theory". John Wiley and Sons, 1991.

Complementary Reading
[2] R. Blahut, "Principles and Practice of Information Theory". Addison-Wesley, 1988.
[3] J. Cioffi, "Ch. 8: Fundamental Limits of Coding and Sequences", http://www.stanford.edu/~cioffi

Exams

Oral only, to be scheduled on an individual basis. When ready, please contact the instructor by email alberto.bononi[AT]unipr.it by specifying the requested date. The exam consists of solving some proposed exercises and explaining theoretical details connected with them, for a total time of about 1 hour. You can have a summary of important formulas written in A SINGLE A4 sheet to consult if you so wish.
NOTE: The exam may be split into two distinct parts and scheduled on different days at the student's request: Part 1 Data Compression; Part 2 Channel Coding.
IMPORTANT NOTE: even if you register on ESSE3 for an exam, please send email to alberto.bononi[AT]unipr.it to inform me directly and to schedule the time and date of the effective test, which is an individual interview.

Syllabus (2 hours each class)(chapters and numbering from your textbook)

CLASS 1:
Intro:
Course organization, objectives, textbooks, exam details. Sneaky preview of the course, motivations, applications. Assigned Reading of Ch.1 of textbook. Physical justification and definition of entropy. Examples of entropy calculation. Up to sec. 2.1.

CLASS 2:
Definition of joint and conditional entropy, example 2.2.1. Relative entropy, mutual information and their relation. Chain rules for PMFs and entropy.

CLASS 3:
Relative conditional entropy, conditional mutual information, chain rules for D and I. Inequalities for D and I. max and min of H, H(X|Y)<=H(X) and generalizations. Convex functions. Jensen's inequality, examples.

CLASS 4:
first hour: logsum inequality, convexity of D, concavity of H. Concavity of I in p(x) and convexity in p(y|x). Exercise: mixing increases entropy. Second hour: Definition of Markov chain and first properties for 3 random variables (RV) X,Y,Z. Data processing inequality. Counter-example.

CLASS 5:
first hour: sufficient statistics: definition in terms of mutual information. Examples: number of successes in repeated trials; sample mean in estimation of common mean in a vector of independent Gaussian RVs. Sufficient statistics and hypothesis testing: factorization theorem. Second hour: Fano inequality. Exercise 2.32.

CLASS 6:
Exercises 2.5, 2.4, 2.27, 2.30 (after brief introduction to the method of Lagrange multipliers), 2.21.

CLASS 7:
Ch 3 asymptotic equipartition property (AEP): introduction. Probability theory refresher: i.p. convergence, Chebychev inequality, Weak law of large numbers, AEP. Typical set and properties. Example with binary sequences.

CLASS 8:
first hour: relation among Typical set and high-probability sets. Theorem 3.3.1. Second hour: Problem solving: Exercises 3.8, 3.9. Ch 4: Entropy rates: introduction. Definition of discrete-time stochastic process and stationarity.

CLASS 9:
First hour: introduction to discrete-time Markov chains (DTMC): transition matrix, update law (Chapman-Kolmogorov), stationary distribution. Two-state example: state diagram, evolution towards limit distribution. Evaluation with flux balancing. Second hour: Entropy rates H and H'. H=H' for stationary processes. Statement of AEP theorem for stationary ergodic sources (Shannon/Breiman/McMillan). Explicit evaluation of H for DTMC. Examples.

CLASS 10:
Doubly-stochastic matrices and uniform steady-state distribution. Connections with entropy as defined in statistical thermodynamics: DTMC on microstates with doubly-stochastic transition matrix. Entropy increases towards steady-state distribution entropy. Example 4 (eq 4.50-4.52). Hidden Markov models (HMM): entropy rate.

CLASS 11:
Problem solving: Ex. 4.1 mixing increases entropy. Conditions for observable Y in a HMM to have a DTMC. Examples where Y is not a DTMC. Point a. of Ex. 4.18 on Entropy Rate of stationary but not ergodic process.

CLASS 12:
Problem solving: First hour: points b, c of Ex. 4.18. Second hour: Es 4.10 entropy rate of a second order markov process: study of hidden markov chain. Ex. 4.6.

CLASS 13:
Ch 5: Data compression. Examples of codes. Kraft inequality. Search of optimal codes with Lagrange multipliers method. Noiseless coding theorem.

CLASS 14:
Comments on first Shannon Theorem: when p is not dyadic. Quasi-optimal Shannon Codes. Shannon super-codes are asymptotically optimal. Extra cost on minimal code length when using a PMF that differs from the true PMF. McMillan Theorem: every uniquely decodable theorem satisfies Kraft inequality. Introduction to Huffman codes: examples 1, 2.

CLASS 15:
Huffman codes: example 3 (dummy symbols), Exercise 5.32, example 5.73 (set of different optimal codelengths). Competitive optimality of Shannon code. Proof of optimality of Huffman code.

CLASS 16:
First hour: Optimal compression of Markov sources. Description of Lempel-Ziv algorithm for universal compression. Second hour: Channel capacity: introduction, definition of discrete memoryless channel (DMC), examples of capacity computation: ideal channel, noisy channel with disjoint outputs, noisy typewriter.

CLASS 17:
Capacity of binary symmetric channel (BSC), binary erasure channel (BEC). Symmetric, weakly-symmetric and Gallager-symmetric channels. Convexity of C on convex set of input PMFs. Hints to numerical techniques to evaluate max I.

CLASS 18:
Introduction to proof of II Shannon Theorem. Channel Coding, ideas on typical-sequence decoding. Jointly typical set and its properties. Average and maximum error probability, achievable rate and operative channel capacity. Statement of II Shannon theorem.

CLASS 19:
First hour: proof of direct part of II Shannon theorem. Second hour: proof of converse part of II Shannon theorem.

CLASS 20:
First hour: joint source-channel coding theorem. Second hour: exercises on channel capacity: 7.8, 7.9 (Z channel) 7.3 (memory increases capacity) 7.12 (unused symbol). Ex 7.23 assigned as homework.

CLASS 21:
Differential entropy (Ch 9): definition, examples (uniform, Gaussian); AEP, properties of Typical set. 2^h=edge of Typical set. Joint and conditional diff. entropy. Ex: multivariate Gaussian. Relative Entropy and mutual information. Inequalities. Hadamard. Shift and change of scale. multivariate Gaussian maximizes entropy at given covariance matrix.

CLASS 22:
Mutual information for discrete X and continuous Y. Ex: evaluation for PAM signal with equally likely symbols on discrete-time memoryless additive Gaussian channel (DTMAGC). Capacity of DTMAGC. Sampling Theorem and Shannon Capacity formula. Gaussian additive noise is a worst case for capacity.

CLASS 23:
Parallel Gaussian channels: capacity. discrete time additive Gaussian channels (DTAGC) with memory: capacity. Introduction to Toeplitz matrices and Toeplitz distribution theorem. DTAGC capacity evaluation (water-pouring).

CLASS 24:
Continuous-time additive Gaussian channel (CTAGC) with memory: capacity evaluation of CTAGC using equivalent input noise, Karhunen-Loeve basis and continuous-time Toeplitz distribution theorem. Examples.