Chapter 4 Mesurable funtions

A big part of measure theory is the study of functions which are compatible with the measure spaces. We begin with a basic definition which will be satisfied by all the functions we are interested in.

Definition 4.1 (Mesasurable functions) If \((E, \mathcal{E})\) and \((F, \mathcal{F})\) are two measurable spaces and \(f\) is a function \(E \rightarrow F\), then we say \(f\) is measurable if for every \(A \in \mathcal{F}\) we have \(f^{-1}(A) \in \mathcal{E}\).

Lemma 4.1 Suppose that \(\mathcal{A} \subset \mathcal{F}\) is such that \(\sigma(\mathcal{A})= \mathcal{F}\). If \(f\) is a function such that for every \(A \in \mathcal{A}\) we have \(f^{-1}(A) \in \mathcal{E}\) then \(f\) is measurable.

Proof. First we note that

\[ f^{-1}\left( \bigcup_i A_i \right) = \bigcup_i f^{-1}(A_i), \]

and

\[ f^{-1}(B \setminus A) = f^{-1}(B) \setminus f^{-1}(A). \]

Now if we consider \(\{ A \in \mathcal{F} \, :\, f^{-1}(A) \in \mathcal{E}\}\) then this is a \(\sigma\)-algebra, as \(\mathcal{E}\) is a \(\sigma\)-algebra and \(f^{-1}\) preserves set operations. Therefore, \(\{ A \in \mathcal{F} \, :\, f^{-1}(A) \in \mathcal{E}\}\) is a \(\sigma\)-algebra containing \(\mathcal{A}\) therefore \(\mathcal{F} \subseteq \{ A \in \mathcal{F} \, :\, f^{-1}(A) \in \mathcal{E}\}\) so \(f\) is measurable.

Remark. In particular note that the above lemma means that whenever we have \(f: E \rightarrow \mathbb{R}\) and \(\mathbb{R}\) is equipped with the Borel \(\sigma\) algebra, we know that \(f\) is measurable if \(f^{-1}((-\infty, b])\) is a measurable set for every \(b\).

Lemma 4.2 If \(E, F\) are topological spaces, equipped with their Borel \(\sigma\)-algebras, and we have \(f:E \rightarrow F\) is a continuous map then \(f\) is measurable.

Proof. This is on the exercise sheet and a solution on the solution sheet.

Lemma 4.3 If \((E, \mathcal{E}), (F, \mathcal{F})\) and \((G, \mathcal{G})\) are all measurable spaces and \(f : E \rightarrow F\) and \(g: F \rightarrow G\) are measurable then so is \(g \circ f\).

Proof. Take any set \(A \in \mathcal{G}\) then \((g\circ f)^{-1}(A) = \{ x \in E \,:\, g(f(x)) \in A\}\). Let us call \(B = \{y \in F \,:\, g(y) \in A\} = g^{-1}(A)\) then \((g \circ f)^{-1}(A) = \{ x \in E \,:\, f(x) \in B\} = f^{-1}(B)\). Then as \(g\) is measurable and \(A \in \mathcal{G}\) then \(B \in \mathcal{F}\). In the same way as \(f\) is measurable and \(B \in \mathcal{F}\) then \(f^{-1}(B) \in \mathcal{E}\). As \(f^{-1}(B) = (g \circ f)^{-1}(A)\) this shows that \((g \circ f)^{-1}(A) \in \mathcal{E}\) for every \(A \in \mathcal{G}\) and hence \(g \circ f\) is measurable.

Lemma 4.4 Suppose that \(f: \mathbb{R} \rightarrow \mathbb{R}\) is a monotone function. Then \(f\) is measurable with respect to the Borel \(\sigma\)-algebra.

Proof. Suppose without loss of generality that \(f\) is increasing then \(f^{-1}((-\infty, b])\) is \(\emptyset, (-\infty, \infty)\) or \((-\infty, a)\) or \((-\infty, a]\) for some \(a\). All these possibilities are Borel measurable sets.

Lemma 4.5 If \(f_n\) is a sequence of measurable function taking values in \((\mathbb{R}, \mathcal{B}(\mathbb{R})\) then the following functions are all measurable:

  • \(-f_1\)
  • \(\lambda f_1\) for \(\lambda >0\) a fixed contant.
  • \(f_1 \wedge f_2\)
  • \(f_1 \vee f_2\)
  • \(f_1+f_2\),
  • \(f_1 f_2\),
  • \(\sup_n f_n\),
  • \(\inf_n f_n\),
  • \(\limsup_n f_n\),
  • \(\liminf_n f_n\).

Proof. We only show two result. The rest are on the assignment.

In order to show that any of these functions are measureable we want to look at \(f^{-1}((-\infty, b])\) or a similar set. \((f_1 \vee f_2)^{-1}((-\infty, b]) = \{ x \,:\, \max\{f_1(x), f_2(x)\} \leq b\} = \{ x \,:\, f_1(x) \leq b \, \mbox{and} \, f_2(x) \leq b\} = \{ x \,:\, f_1(x) \leq b \} \cap \{x \,:\, f_2(x) \leq b\} = f_1^{-1}((-\infty, b]) \cap f_2^{-1}((-\infty, b])\). Now since \(f_1\) and \(f_2\) are both measureable the sets \(f_1^{-1}((-\infty, b])\) and \(f_2^{-1}((-\infty, b])\) are both measurable. We also know that the intersection of two measurable sets is measurable.

\((f_1+f_2)^{-1}((b,\infty)) = \{ x \,:\, f_1(x) + f_2(x) > b \}\). Now if \(f_1(x) > b-f_2(x)\) then there exists a \(q \in \mathbb{Q}\) such that \(f_1(x)> q > b - f_2(x)\). Let us define the set \(A= \bigcup_{q \in \mathbb{Q}} \{ x \,:\, f_1(x) > q\} \cap \{ x\,:\, f_2(x) > b-q\}\). Since \(f_1,f_2\) are both measurable \(A\) is a countable union of measurable sets so measurable. We can also see that if \(x \in A\) then \(f_1(x) + f_2(x) > b\) and our observation shows that in fact \(A= \{ x \,:\, f_1(x) + f_2(x) > b \}\). Therefore, \(f_1+f_2\) is measurable.

Definition 4.2 (Image measure) We can use a measurable function \(f\) to define an image measure. Suppose \(\mu\) is a measure on \((E, \mathcal{E})\) and \(f\) is a measurable function \((E, \mathcal{E}) \rightarrow (F, \mathcal{F})\) then we can define a new measure \(\nu\) by saying that

\[ \nu(A) = \mu(f^{-1}(A)), \]

for every \(A \in \mathcal{F}\). We write \(\nu = \mu \circ f^{-1}\).

We can use the notion of image measure to construct further measures from Lebesgue measure.

Lemma 4.6 Suppose \(g: \mathbb{R} \rightarrow \mathbb{R}\) and that \(g\) is non-constant, right-continuous and non-decreasing. Let us define \(g(-\infty) = \lim_{x \rightarrow -\infty} g(x)\) and \(g(\infty) = \lim_{x \rightarrow \infty} g(x)\) and let us call the interval \(I:= (g(-\infty),g(\infty))\) (this might be the whole of \(\mathbb{R}\). Define a partial inverse to \(g\) by \(f: I \rightarrow \mathbb{R}\) by

\[ f(x) = \inf \{y \,:\, x \leq g(y)\}. \]

Then \(f\) is left-continuous and non-decreasing and \(f(x) \leq y\) if and only if \(x \leq g(y)\).

Proof. Fix \(x \in I\) and consider the set \(J_x = \{ y \in \mathbb{R}\,:\, x \leq g(y)\}\) by definition of \(I\) we know that \(J_x\) is non empty and is not the whole of \(\mathbb{R}\) (this shows that \(f\) is well defined). As \(g\) is non-decreasing, if \(y \in J_x\) and \(y' \geq y\), then \(y' \in J_x\). As \(g\) is right-continuous, if \(y_n \in J_x\) and \(y_n \downarrow y\) then \(y \in J_x\) (noting the \(\leq\) sign in \(J_x\)). Now using this we have that if \(x \leq x'\) then \(J_x \supseteq J_{x'}\) so \(f(x) \leq f(x')\). We also have that if \(x_n \uparrow x\) then \(J_x = \bigcap_n J_{x_n}\), so \(f(x_n) \rightarrow f(x)\).

Theorem 4.1 Let \(g\) be a non-constant, right-continuous and non-decreasing function from \(\mathbb{R} \rightarrow \mathbb{R}\). There exists a unique Radon measure on \(\mathbb{R}\) such that for all \(a,b \in \mathbb{R}\) with \(a < b\)

\[ \mathrm{d}g((a,b]) = g(b) - g(a). \]

We call this measure the Lebesgue Steitjles measure associated with \(g\). Furthermore, every Radon measure on \(\mathbb{R}\) can be represented this way.

Proof. Define \(I\) and \(f\) as in the Lemma above. Then we can construct \(\mathrm{d}g\) as the image measure of Lebesgue measure on \(I\). That is to say we can let \(\mathrm{d}g = \lambda \circ f^{-1}\). If this is the case then

\[ \mathrm{d}g ((a,b]) = \lambda \left(\{ x \, :\, f(x) > a, f(x) \leq b \} \right) = \lambda ((g(a), g(b)]) = g(b) - g(a). \]

The standard argument for uniqueness of measures (as for that of Lebesgue measure) gives uniqueness of this measure.

Finally, if \(\nu\) is a Radon measure on \(\mathbb{R}\) then we can define a function \(g\), by

\[ g(y) = \nu((0,y]), \quad y \geq 0, \quad g(y) = -\nu((y,0]), \quad y<0. \]

Then \(\nu = \mathrm{d}g\) by uniqueness.

4.0.1 Random variables and the measure theoretic formulation of probability - NONEXAMINABLE

The structure of measure theory allows us to put probability theory on a firm footing. This section is mainly as an example and for those of you who are interested in probability.

Definition 4.3 We call a measure space \((\Omega, \mathcal{F}, \mathbb{P})\) a probability space (and use different letters for the different bits) if \(\mathbb{P}(\Omega) = 1\). In this setting we still have \(\Omega\) is a set, \(\mathcal{F}\) is a \(\sigma\)-algebra and \(\mathbb{P}\) is a measure. We call \(\mathbb{P}\) a probability measure.

The set \(\Omega\) is not really described and we view it as the space of all possible outcomes. We call \(A \subset \Omega\) and event if \(A \in \mathcal{A}\), and \(\mathbb{P}(A)\) the probability of an event happening.

Definition 4.4 A random variable, \(X\) is a measurable function from a probability space \((\Omega, \mathcal{F}, \mathbb{P})\) to another measurable space \((E, \mathcal{A})\).

Under this way of writng things, with \(B \in \mathcal{A}\), we have \(\mathbb{P}(X \in B) = \mathbb{P}(\{ \omega \in \Omega \,:\, X(\omega) \in B\}) = \mathbb{P}( X^{-1}(B))\), where \(X^{-1}(B) \in \mathcal{F}\) as \(X\) is measurable. We call \(X^{-1}(B)\) the event that \(X \in B\). When working with probability people usually suppress the argument \(\omega\).

Definition 4.5 The law of a random variable, \(X\) is the image measure of \(\mathbb{P}\) under the measurable function \(X\). I.e. if \(X: (\Omega, \mathcal{F}, \mathbb{P}) \rightarrow (E, \mathcal{A})\) then we define a measure \(\mu_X\) on \((E, \mathcal{A})\) by \(\mu_X(B) = \mathbb{P}(X \in B)\).

Remark. The law of a random variable is an object which allows us to understand both probability densities and discrete probability distributions in the same way.

If \(X\) takes values in \(\mathbb{R}\) then the distribition function of \(X\), \(F_X(x) = \mu_X((-\infty, x])\). If \(X\) has density \(f(x)\) then \(\mu_X\) is equal to the measure given by \(\mu_X(A) = \int_A f(x) \mathrm{d}x\), though we still don’t know how to integrate over complicated sets.

4.1 Convergence of measurable functions

Definition 4.6 (Almost everywhere / Almost surely) We use the short hand almost everywhere (or almost surely in a probability space) to discuss properties that are true everywhere except a measure zero set.

Definition 4.7 (Convergence almost everywhere) Let \((E, \mathcal{E}, \mu)\) be a measureable space. A sequence of measureable functions, \((f_n)_{n \geq 1}: E \rightarrow F\), converges almost everywhere to \(f\) if

\[ \mu \left( \{ x \in E \,:\, f_n(x) \not\to f(x) \} \right) = 0 \]

Definition 4.8 (Convergence in measure) Let \((E, \mathcal{E}, \mu)\) be a measureable space. A sequence of real valued measureable functions, \((f_n)_{n \geq 1}: E \rightarrow F\), converges in measure to \(f\) if for every \(\epsilon > 0\)

\[ \mu \left( \{ x \, :\, |f(x) - f_n(x)| > \epsilon \} \right) \rightarrow 0, \quad \mbox{as}\, n \rightarrow \infty. \]

Example 4.1 The sequence of functions \(f_n(x) = x^n\) converges to \(0\) Lebesgue almost everywhere on \([0,1]\), and in measure, but it doesn’t converge pointwise as it doesn’t converge at \(x=1\).

Example 4.2 The- sequence of functions \(f_n(x) = 1_{[n,n+1]}(x)\) converges to 0 Lebesgue almost everywhere (in fact everywhere) but not in measure.

Example 4.3 Consider the sequence of functions \(f_1 = 1_{[0,1/2)}, f_2 = 1_{[1/2, 1)}, f_3 = 1_{[0,1/4)}, f_4 = 1_{[1/4, 1/2)}, f_5= 1_{[1/2, 3/4)}, f_6 = 1_{[3/4,1)}, f_7 = 1_{[0,1/8)}, f_8 = 1_{[1/8, 1/4)} \dots\) then \(f_n\) converges to 0 in measure, but \(f_n(x)\) does not converge for any \(x\).

We can prove a quasi-equivalence between these two notions of measure. Before we do this we need to introduce a very useful lemma, the Borel-Cantelli Lemma. We introduce it here as it is used to prove the following theorem but it is a useful tool to have whilst doing measure theory, particularly probability theory. First let us also introduce some more notation

Definition 4.9 Let \((A_n)_n\) be a sequence of measurable sets then we have the following names

\[ \limsup_n A_n = \bigcap_n \bigcup_{m \geq n} A_m = \{ A_n \, \mbox{infinitely often}\,\}, \]

and

\[ \liminf_n A_n = \bigcup_n \bigcap_{m \geq n} A_m = \{ A_n \, \mbox{eventually}\,\}. \]

The last names are more common when the \(A_n\) are events in a probability space.

Lemma 4.7 (First Borel-Cantelli Lemma) Let \((E, \mathcal{E}, \mu)\) be a measure space. Then if \(\sum_n \mu(A_n) < \infty\) it follows that \(\mu(\limsup_n A_n) = 0)\).

Proof. For any \(n\) we have

\[ \mu(\limsup_n A_n) \leq \mu \left( \bigcup_{m \geq n} A_m\right) \leq \sum_{m \geq n} \mu(A_m). \]

Then the right hand side goes to zero as \(n \rightarrow \infty\), so \(\mu(\limsup_n A_n) = 0\).

Lemma 4.8 (Second Borel-Cantelli Lemma - NONEXAMINABLE) Let \((E, \mathcal{E}, \mu)\) be a probability space (\(\mu(E) =1\)). Then suppose that the events are all inedepenent and that \(\sum_n \mu(A_n) = \infty\) then it will follow that \(\mu(\limsup_n A_n) =1\).

Proof. First we note that \(\mu(A_i^c \cap A_j^c) = \mu ((A_i \cup A_j)^c) = 1 - \mu(A_i \cup A_j) = 1 - \mu(A_i) - \mu(A_j)+ \mu(A_i)\mu(A_j) = (1-\mu(A_i))(1-\mu(A_j))\). We use the inequality \(1-a \leq e^{-a}\). Let \(a_n = \mu(A_n)\) then

\[ \mu \left( \bigcap_{m=n}^N A_m^c \right) = \Pi_{m=n}^N (1-a_m) \leq \exp \left( - \sum_{m=n}^N a_m \right) \rightarrow 0, \quad \mbox{as}\, N \rightarrow 0. \]

Therefore, \(\mu \left( \bigcap_{m \geq n} A_m^c \right) = 0\) for every \(n\). So \(\mu(\limsup_n A_n ) = 1- \mu(\bigcup_n \bigcap_{m \geq n} A_m^c) = 1\).

Theorem 4.2 Let \((E, \mathcal{E}, \mu)\) be a measure space and \((f_n)_n\) be a sequence of measurable functions. Then we have the following:

  • Suppose that \(\mu(E) < \infty\) and that \(f_n \rightarrow 0\) almost everywhere, then \(f_n \rightarrow 0\) in measure.
  • If \(f_n \rightarrow 0\) in measure then there exists some subsequence \((n_k)_k\) such that \(f_{n_k} \rightarrow 0\) almost everywhere.

Proof. Suppose that \(f_n \rightarrow 0\) almost everywhere. Then

\[ \mu(\{ x\,:\, |f_n(x)| \leq \epsilon \}) \geq \mu \left( \bigcap_{m \geq n} \{ x \,:\, |f_m(x)| \leq \epsilon\}\right) \uparrow \mu \left(|f_n| \leq \epsilon \, \mbox{eventually} \right) \geq \mu(\{ x \,:\, f_n(x) \rightarrow 0\}) = \mu(E), \]

therefore,

\[ \mu(\{ x\,:\, |f_n(x)|> \epsilon) = \mu(E) - \mu(\{ x\,:\, |f_n(x)| \leq \epsilon \}) \rightarrow 0. \]

Now suppose that \(f_n \rightarrow 0\) in measure. We can find a subsequence \(n_k\) such that

\[ \mu(\{ x \,:\, |f_{n_k}(x)| > 1/k \}) \leq 2^{-k}. \]

Therefore

\[ \sum_k \mu(\{ x \,:\, |f_{n_k}(x)| > 1/k \})<\infty.\]

Therefore by the first Borel-Canteli lemma we have that

\[ \mu \left(\{ x\,:\, |f_{n_k}(x)|>1/k \, \mbox{infinitely often}\} \right) = 0. \]

Therefore \(f_{n_k} \rightarrow 0\) almost everywhere.

4.2 Egoroff’s Theorem and Lusin’s Theorem

Theorem 4.3 (Egoroff's Theorem) Let \((E, \mathcal{E}, \mu)\) be a finite measure space and \((f_n)_n\) be a sequence of real valued measurable functions on \(E\). If \(\mu\) is finite and if \(f_n\) converges \(\mu\)-almost everywhere to \(f\) then for each positive \(\epsilon\) there is a set \(A\) with \(\mu(A^c)< \epsilon\), such that \(f_n\) converges uniformly on \(A\) to \(f\).

Proof. For each \(n\) let \(g_n(x) = \sup_{j \geq n}|f_j(x)-f(x)|\). Then \(g_n\) is a positive function which is finite almost everywhere. The sequence \((g_n)_n\) converges to 0 almost everywhere and so in measure. Therefore, for each positive integer \(k\) we can find \(n_k\) such that

\[ \mu \left( \{ x \,:\, g_{n_k} > 1/k \} \right) < \epsilon 2^{-k}. \]

Define sets \(A_k = \{ x\,:\, g_{n_k}(x) \leq 1/k\}\) and let \(A= \bigcap_k A_k\). The set \(A\) has

\[ \mu(A^c) = \mu \left( \bigcup_k A_k^c \right) \leq \sum_k \mu(A_k^c) \leq \sum_k \epsilon 2^{-k} = \epsilon. \]

We want to show that \(f_n\) converges uniformly to \(f\) on \(A\). For each \(\delta\) there exists a \(k\) such that \(1/k < \delta\), then as \(A \subseteq A_k\), we have that for every \(n \geq n_k\),

\[ |f_n - f| \leq g_{n_k} \leq 1/k < \delta, \]

uniformly on all \(x \in A_k\) and hence in \(A\).

This proof motivates the following definition of convergence of functions, which we won’t use a lot.

Definition 4.10 (Almost uniform convergence) We say a sequence of functions \((f_n)_{n \geq 1}\) converges almost uniformly on a measure space \((E, \mathcal{E}, \mu)\) if for every \(\epsilon >0\) there exists a set \(A\) with \(\mu(A^c)< \epsilon\) with \(f_n \rightarrow f\) uniformly on \(A\).

We can use Egoroff’s theorem to prove a result called Lusin’s theorem. First let us recall the definition of regularity

Definition 4.11 Let \(E\) be a topological space and \(\mu\) be a measure on \((E, \mathcal{B}(E))\) then say \(\mu\) is regular if for every \(A \in \mathcal{B}(E)\) we have

  • \[\mu(A) = \inf \{ \mu(U) \,:\, A \subseteq U, \mbox{$U$ is open}\},\]
  • \[\mu(A) = \sup \{ \mu(K) \,:\, K \subseteq A, \mbox{$K$ is compact}\}.\]

Theorem 4.4 (Lusin's Theorem) Suppose that \(f\) is a measurable function and \(A \subseteq \mathbb{R}^d\) is a Borel set and \(\lambda(A) < \infty\) then for any \(\epsilon >0\) there is a compact subset \(K\) of \(A\) with \(\lambda(A \setminus K) < \epsilon\) such that the restriction of \(f\) to \(K\) is continuous.

Remark. This theorem can be generalised to locally compact Hausdorff spaces, see Cohn’s book.

Proof. Suppose first that \(f\) only takes countably many values, \(a_1, a_2, a_3, \dots\) on \(A\) the let \(A_k = \{ x \in A \,:\, f(x) = a_k\}\), by measurablility of \(f\) we can see that \(A_k = f^{-1}(\{a_k\})\) is measurable. We know that \(A = \bigcup_n A_n\) so by continuity of measure \(\lambda(\bigcup_{k=1}^n A_k) \uparrow \lambda(A)\). Since \(\lambda(A) < \infty\) we have that for any \(\epsilon >0\) there exists \(n\) such that \(\lambda(A \setminus \bigcup_{k=1}^n A_k) < \epsilon/2\). By the regularity of Lebesgue measure we can find compact subsets \(K_1, \dots, K_n\) such that \(\lambda(A_k \setminus K_k) \leq \epsilon/2n\) for \(k=1,\dots,n\). Then let \(K = \bigcup_{k=1}^n K_k\). This is a compact subset of \(A\) and

\[ \lambda(A \setminus K) \leq \lambda(A\setminus \bigcup_{k=1}^n A_k) + \lambda(\bigcup_{k=1}^n A_k \setminus \bigcup_{k=1}^n K_k ) < \epsilon/2 + \epsilon/2. \]

Now \(f\) restricted to \(K\) is continuous since the \(K_i\) are disjoint and \(f\) is constant on each \(K_i\).

Now we have proved the special case where \(f\) takes countably many values we can use this to prove the theorem for general \(f\). Let \(f_n = 2^{-n} \lfloor 2^n f \rfloor\) then \(2^{-n} \geq f(x)-f_n(x) \geq 0\) so \(f_n(x) \rightarrow f(x)\), uniformly. Now, \(f_n\) can only take countably many values, so by our special case of Lusin’s theorem there exists a \(K_n \subseteq A\), compact, such that \(\lambda(A \setminus K_n) \leq \epsilon 2^{-n}\), and \(f_n\) is continuous on \(K_n\). Now let \(K_\infty = \bigcap_n K_n\), then \(K_\infty\) is compact and \(\lambda(A \setminus K_\infty) = \lambda (\bigcup_n(A \setminus K_n)) \leq \sum_n \epsilon 2^{-n} = \epsilon\). Now we have that \(f_n\) converges uniformly to \(f\) on \(K_\infty\) and \(f_n\) is continuous on \(K_\infty\) for each \(n\). As the uniform limit of continuous functions is continuous this shows that \(f\) is continuous on \(K_\infty\).