13 min read

Hilbert Space

  1. A complex vector space \(H\) is called an inner product space if to each ordered pair of vectors \(x,y\in H\) there is associated a complex number \((x,y)\), the so-called “inner product” of \(x\) and \(y\), such that the following rules hold:
    \((a)\) \((y,x)=\overline{(x,y)}\). (The bar denotes complex conjugation.)
    \((b)\) \((x+y,z)=(x,z)+(y,z),\quad x,y,z\in H\)
    \((c)\) \((ax,y)=a(x,y),\quad x,y\in H, a\text{ is a scalar}\)
    \((d)\) \((x,x)\ge0,\quad \forall x\in H\)
    \((e)\) \((x,x)=0\) only if \(x=0\).
    \((f)\) By \((d)\), we define \(\lVert x\rVert\), the norm of the vector \(x\in H\), to be the non-negative square root of \((x,x)\). Thus \(\lVert x\rVert^2=(x,x)\)
    And
    \((b),(c)\) may be combined into the statement: \(\forall x\in H\), the mapping \(x\to(x,y)\) is a linear functional on \(H\).
    \((a),(c)\) show that \((x,ay)=\overline{a}(x,y)\).
    \((a),(b)\) imply the second distributive law: \[(z,x+y)=(z,x)+(z,y)\]

  2. The Schwarz Inequality \[|(x,y)|\leq\lVert x\rVert\lVert y\rVert,\quad\forall x,y\in H\]
    There is a complex number \(a\) such that \(|a|=1\) and \(a(y,x)=|(x,y)|\). For any real \(r\), we then have \[(x-ray,x-ray)=(x,x)-ra(y,x)-r\bar{a}(x,y)+r^2(y,y)\] The expression on the left is real and not negative. Hence \[\lVert x\rVert^2-2r|(x,y)|+r^2\lVert y\rVert^2\ge0\] for every real \(r\). If \(\lVert y\rVert^2=0\), we must have \(|(x,y)=0\), otherwise, \[\lVert x\rVert^2-2r|(x,y)|+r^2\lVert y\rVert^2\ge0\] is false for large positive \(r\). If \(\lVert y\rVert^2>0\), take \[r=\frac{|(x,y)|}{\lVert y\rVert^2}\] and obtain \[|(x,y)|^2\leq \lVert x\rVert^2\lVert y\rVert^2\].

  3. The Triangle Inequality \[\lVert x+y\rVert\leq\lVert x\rVert+\lVert y\rVert,\quad x,y\in H\]
    By the Schwarz inequality, \[\lVert x+y\rVert^2=(x+y,x+y)=(x,x)+(x,y)+(y,x)+(y,y)\\ \leq \lVert x\rVert^2+2\lVert x\rVert\lVert y\rVert+\lVert y\rVert^2\\ =(\lVert x\rVert+\lVert y\rVert)^2\]

  4. We define the distance between \(x\) and \(y\) to be \(\lVert x-y\rVert\), thus \(H\) is a metric space. If this metric space is complete, or if every Cauchy sequence converges in \(H\), then \(H\) is called a Hilbert space.

  5. If \((x,y)=0\) for some \(x,y\in H\), we say that \(x\) is orthogonal to \(y\),and write \(x\perp y\). Since \((x,y)=0\) implies \((y,x)=0\), the relation \(\perp\) is symmetric.
    Let \(x^{\perp}\) denote the set of all \(y\in H\) which are orthogonal to \(x\), and if \(M\) is a subset of \(H\) of \(H\), let \(M^{\perp}\) be the set of all \(y\in H\) which are orthogonal to every \(x\in M\).
    \(x^{\perp}\) is a subspace of \(H\), since \(x\perp y\) and \(x\perp y'\) implies \(x\perp (y+y')\) and \(x\perp ay\).

  6. A closed subspace of \(H\) is a subspace that is a closed set relative to the topology induced by the metric of \(H\). Also \(x^{\perp}\) is precisely the set of points where the continuous function \(y\to(x,y)\) is \(0\). Hence \(x^{\perp}\) is a closed subspace of \(H\). Since \[M^{\perp}=\underset{x\in M}{\bigcap}x^{\perp}\] \(M^{\perp}\) is an intersection of closed subspaces, and it follows that \(M^{\perp}\) is a closed subspace of \(H\).

  7. Every nonempty, closed, convex set \(E\) in a Hilbert space \(H\) contains a unique element of smallest norm. In other words, there is one and only one \(x_0\in E\) E such that \(\lVert x_0\rVert\leq\lVert x\rVert\) for every \(x\in E\).
    Since \[\lVert x+y\rVert^2+\lVert x-y\rVert^2=(x+y,x+y)+(x-y,x-y)\\ =(x,x)+(y,y)+(x,y)+(y,x)+(x,x)+(-y,-y)+(x,-y)+(-y,x)\\ =2(x,x)+2(y,y)\\ =2\lVert x\rVert^2+2\lVert y\rVert^2,\quad(x,y\in H)\] This is known as the parallelogram law: If we interpret \(\lVert x\rVert\) to be the length of the vector \(x\). The sum of the squares of the diagonals of a parallelogram is equal to the sum of the squares of its sides, a familiar proposition in plane geometry.
    Let \(\delta=\inf\;\{\lVert x\rVert:x\in E\}\). For any \(x,y\in E\), \[\lVert \frac{1}{2}x+\frac{1}{2}y\rVert^2+\lVert \frac{1}{2}x-\frac{1}{2}y\rVert^2=\frac{1}{4}\lVert x+y\rVert^2+\frac{1}{4}\lVert x-y\rVert^2\\ =\frac{1}{2}\lVert x\rVert^2+\frac{1}{2}\lVert y\rVert^2\] Since \(E\) is convex, \((x+y)/2\in E\). Hence \[\lVert x-y\rVert^2=2\lVert x\rVert^2+2\lVert y\rVert^2-\lVert x+y\rVert^2\\ \leq 2\lVert x\rVert^2+2\lVert y\rVert^2-4\delta^2,\quad(x,y\in E)\]
    If also \(\lVert x\rVert=\lVert y\rVert=\delta\), then \(x=y\), then \(\delta\) is unique. The definition of \(\delta\) shows that there is a sequence \(\{y_n\}\) in \(E\) so that \(\lVert y_n\rVert\to\delta\) as \(n\to\infty\). Since \[\lVert y_n-y_m\rVert^2\leq 2\lVert y_n\rVert^2+2\lVert y_m\rVert^2-4\delta^2,\quad(y_n,y_m\in E)\] then as \(n\to\infty\) and \(m\to\infty\), \(2\lVert y_n\rVert^2+2\lVert y_m\rVert^2-4\delta^2\to0\). This shows that \(\{y_n\}\) is a Cauchy sequence. Since \(H\) is complete, there exists an \(x_0\in H\) so that \(y_n\to x_0\), \(\lVert y_n-x_0\rVert\to0\), as \(n\to\infty\). Since \(y_n\in E\) and \(E\) is closed, \(x_0\in E\). Since the norm is a continuous function on \(H\), it follows that \[\lVert x_0\rVert=\lim_{n\to\infty}\lVert y_n\rVert=\delta\]

  8. Let \(M\) be a closed subspace of a Hilbert space \(H\).

  1. Every \(x\in H\) has a unique decomposition: \[x=Px+Qx,\;\;Px\in M,\; Qx\in M^{\perp}\] \(P\) and \(Q\) are called the orthogonal projections of \(H\) onto \(M\) and \(M^{\perp}\).
  2. \(Px\) and \(Qx\) are the nearest points to \(x\) in \(M\) and in \(M^{\perp}\), respectively.
  3. The mappings \(P:H\to M\) and \(Q:H\to M^{\perp}\) are linear.
  4. \(\lVert x\rVert^2=\lVert Px\rVert^2+\lVert Qx\rVert^2\)
  5. If \(M \ne H\), then there exists \(y \in H\), \(y \ne 0\), such that \(y\perp M\). Suppose that \(x'+y'=x''+y''\) for some vectors \(x',x''\) in \(M\) and \(y',y''\) in \(M^{\perp}\). Then \[x'-x''=y''-y'\] Since \(x'-x''\in M,\; y''-y'\in M^{\perp}\) and \(M\cap M^{\perp}=\{0\}\), we have \(x'=x'',y'=y''\). Then \(Px\) and \(Qx\) are unique.
    The set \[x+M=\{x+y:y\in M\}\] is closed and convex. Define \(Qx\) to be the element of smallest norm in \(x+M\). Define \(Px=x-Qx\). Since \(Qx\in x+M\), then \(Px\in M\). Thus \(P\) maps \(H\) into \(M\).
    Assume \(\lVert y\rVert=1\), the minimizing property of \(Qx\) shows that \[(Qx, Qx)=\lVert Qx\rVert^2\leq\lVert Qx-ay\rVert^2=(Qx-ay,Qx-ay)\] for every scalar \(a\). Then \[0\leq-a(y,Qx)-\bar{a}(Qx,y)+a\bar{a}\] With \(a=(Qx,y)\), this gives \(0\leq-|(Qx,y)|^2\), so that \((Qx,y)=0\). Thus \(Qx\in M^{\perp}\). Then \(Q\) maps \(H\) into \(M^{\perp}\). Then (a) is proved.
    We have already seen that \(Px\in M\), if \(y\in M\), then \[\lVert x-y\rVert^2=\lVert Px+Qx-y\rVert^2=\lVert Qx\rVert^2+\lVert Px-y\rVert^2\] which is obviously minimized when \(y=Px\), then \(Px\) is the nearest points to \(x\) in \(M\). Then (b) is proved.
    Since \[ax+by=P(ax+by)+Q(ax+by)=aPx+bPy+aQx+bQy\] then \[P(ax+by)-aPx-bPy=aQx+bQy-Q(ax+by)\] The left side is in \(M\), the right side in \(M^{\perp}\). Hence both are \(0\), then \[P(ax+by)=aPx+bPy\] and \[Q(ax+by)=aQx+bQy\] so \(P\) and \(Q\) are linear. Then (c) is proved.
    Since \(Px\perp Qx\), then \[\lVert x\rVert^2=\lVert Px+Qx\rVert^2=\lVert Px\rVert^2+\lVert Qx\rVert^2\] Then (d) is proved.
    Take \(x \in H\), \(x \notin M\), and put \(y = Qx\). Since \(Px \in M\), \(x \notin Px\), hence \(y = x - Px \ne 0\). Then (e) is proved.
  1. If \(L\) is a continuous linear functional on \(H\), then there is a unique \(y\in H\) such that \[Lx=(x,y)\quad(x\in H)\]
    If \(Lx=0\) for all \(x\), take \(y=0\). Otherwise, define \[M=\{x:Lx=0\}\] The linearity of \(L\) shows that \(M\) is a subspace. The continuity of \(L\) shows that \(M\) is closed. Since \(Lx\ne 0\) for some \(x\in H\), then there sxists \(z\in M^{\perp},\quad z\ne0\), with \(\lVert z\rVert=1\). Since \[L\Bigl((Lx)z-(Lz)x\Bigr)=(Lx)(Lz)-(Lz)(Lx)=0\] then \(\Bigl((Lx)z-(Lz)x\Bigr)\in M\). Thus \(\Biggl[\Bigl((Lx)z-(Lz)x\Bigr),z\Biggr]=0\). Since \(\lVert z\rVert=1\) then \[Lx=(Lx)(z,z)=(Lz)(x,z)=(x,\overline{Lz}z)\] Then \(y=\overline{Lz}z\). For if \((x,y)=(x,y')\) for all \(x\in H\), then \((x,y-y')=0\) for all \(x\in H\), in particular, \((y-y',y-y')=0\), hence \(y-y'=0\), then \(y\) is unique.

  2. If \(V\) is a vector space, if \(x_1,\cdots,x_n\in V\) and if \(c_1,\cdots,c_n\) are scalars, then \(c_1x_1+\cdots+c_nx_n\) is called a linear combination of \(x_1,\cdots,x_n\). The set \(\{x_1,\cdots,x_n\}\) is called independent if \(c_1x_1+\cdots+c_nx_n=0\) implies that \(c_1=\cdots=c_n=0\). The set \([S]\) of all linear combinations of all finite subsets of \(S\) is a vector space; \([S]\) is the smallest subspace of \(V\) which contains \(S\); \([S]\) is called the span of \(S\).

  3. A set of vectors \(u_a\) in a Hilbert space \(H\), where \(a\) runs through some index set \(A\), is called orthonormal if it satisfies the orthogonality relations \((u_a,u_b)=0\) for all \(a\ne b,\;a\in A,\;b\in A\), and if it is normalized so that \(\lVert u_a\rVert=1\) for each \(a\in A\).
    If \(\{u_a:a\in A\}\) is orthonormal, we associate with each \(x\in H\) a complex function on the index set \(A\), defined by \[\hat{x}(a)=(x,u_a),\quad(a\in A)\] The numbers \(\hat{x}(a)\) is called Fourier coefficients of \(x\), relative to the set \(\{u_a\}\).

  4. Suppose that \(\{u_a:a\in A\}\) is an orthonormal set in \(H\) and that \(F\) is a finite subset of \(A\). Let \(M_F\) be the span of \(\{u_a:a\in F\}\).
    if \(\varphi\) is a complex function on \(A\) that is \(0\) outside \(F\), then there is a vector \(y\in M_F\), namely \[y=\sum_{a\in F}\varphi(a)u_a\] that has its Fourier coefficients \[\hat{y}(a)=(y,u_a)=\Bigl(\sum_{a\in F}\varphi(a)u_a,u_a\Bigr)=\varphi(a)\] for every \(a\in A\). Also \[\lVert y\rVert^2=\Bigl(\sum_{a\in F}\varphi(a)u_a,\sum_{a\in F}\varphi(a)u_a\Bigr)=\sum_{a\in F}|\varphi(a)|^2=\sum_{a\in F}|\hat{y}(a)|^2\]

  5. Suppose that \(\{u_a:a\in A\}\) is an orthonormal set in \(H\) and that \(F\) is a finite subset of \(A\). Let \(M_F\) be the span of \(\{u_a:a\in F\}\). If \(x\in H\) and \[s_F(x)=\sum_{a\in F}\hat{x}(a)u_a\] then \[\lVert x-s_F(x)\rVert<\lVert x-s\rVert\] for every \(s\in M_F\), except for \(s=s_F(x)\) and \[\sum_{a\in F}|\hat{x}(a)|^2\leq\lVert x\rVert^2\]
    Since \(s_F(x)\) is the projection of \(x\) in the subspace \(M_F\) and \((x-s_F)\perp u_a, a\in F\) then \(\lVert x-s_F(x)\rVert\) is the nearest distance between \(x\) and subspace \(M_F\). then \[\lVert x-s_F(x)\rVert<\lVert x-s\rVert\] and \[\sum_{a\in F}|\hat{x}(a)|^2=\lVert s_F\rVert^2\leq\lVert x\rVert^2\]

  6. If \(0\leq \varphi(a)\leq\infty\) for each \(a\in A\), then \[\sum_{a\in A}\varphi(a)\] denotes the supremum of the set of all finite sums \(\varphi(a_1)+\cdots+\varphi(a_n)\) where \(a_1,\cdots,a_n\in A\) and are distinct.
    \(\sum_{a\in A}\varphi(a)\) is thus precisely the Lebesgue integral of \(\varphi\) relative to the counting measure \(\mu\) on A. In this context one usually writes \(\ell^p(A)\) for \(L^p(\mu)\).
    A complex function \(\varphi\) with domain \(A\) is thus in \(\ell^2(A)\) if and only if \[\sum_{a\in A}|\varphi(a)|^2<\infty\] \(\ell^2(A)\) is a Hilbert space, with inner product \[(\varphi,\psi)=\sum_{a\in A}\varphi(a)\overline{\psi(a)}\] the sum over \(A\) stands for the integral of \(\varphi\overline{\psi}\) with respect to the counting measure; note that \(\varphi\overline{\psi}\in \ell^1(A)\) because \(\varphi\) and \(\psi\) are in \(\ell^2(A)\).

  7. Let \(S\) be the class of all complex, measurable, simple functions on \(X\) such that \[\mu\Bigl(\{x:s(x)\ne0\}\Bigr)<\infty\] if \(1\leq p<\infty\), then \(S\) is dense in \(L^p(\mu)\). (A subset \(A\) of a topological space \(X\) is called dense (in \(X\)) if every point \(x \in X\) either belongs to \(A\) or is a limit point of \(A\); that is, the closure of \(A\) is constituting the whole set \(X\). Informally, for every point in \(X\), the point is either in \(A\) or arbitrarily “close” to a member of \(A\)).
    It is clear that \(S\subset L^p(\mu)\). Suppose \(f\ge0,f\in L^p(\mu)\) and let simple measurable functions \(\{s_n\}\) such that \[0\leq s_1\leq s_2\leq\cdots\leq f\] and as \[n\to\infty,\quad s_n(x)\to f(x),\quad \forall x\in X\] Since \(0\leq s_n\leq f\), we have \(s_n\in L^p(\mu)\), hence \(s_n\in S\). Since \(|f-s_n|^p\leq f^p\), the dominated convergence theorem shows that as \[n\to\infty,\quad \lVert f-s_n\rVert_{p}\to0\] Thus \(f\) is in the \(L^p\)-closure of \(S\).
    This theorem shows that the functions \(\varphi\) that are zero except on some finite subset of \(A\) are dense in \(\ell^2(A)\).
    If \(\varphi\in\ell^2(A)\), then \(\{a\in A:\varphi(a)\ne0\}\) is at most countable. For if \(A_n\) is the set of all \(a\) where \(|\varphi(a)|>\frac{1}{n}\), then the bumber of elements of \(A\) is at most \[\sum_{a\in A_n}|n\varphi(a)|^2\leq n^2\sum_{a\in A}|\varphi(a)|^2\] each \(A_n(n=1,2,3,\cdots)\) is thus a finite set.

  8. An isometry is simply a mapping that preserves distances. Suppose that \(X\) and \(Y\) are metric spaces, \(X\) is complete, \(f: X \to Y\) is continuous mapping, \(X\) has a dense subset \(X_0\) on which \(f\) is an isometry, and \(f(X_0)\) is dense in \(Y\). Then \(f\) is an isometry of \(X\) onto \(Y\). Thus the distance between \(f(x_1)\) and \(f(x_2)\) in \(Y\) is equal to that between \(x_1\) and \(x_2\) in \(X\), for all points \(x_1, \;x_2\in X_0\).
    Pick \(y\in Y\). Since \(f(X_0)\) is dense in \(Y\), there is a sequence \(\{x_n\}\) in \(X_0\) such that \(f(x_n)\to y\) as \(n\to\infty\). Thus \(\{f(x_n)\}\) is a Cauchy sequence in \(Y\). Since \(f\) is an isometry on \(X_0\), it follows that \(\{x_n\}\) is also a Cauchy sequence. The completeness of \(X\) implies now that \(\{x_n\}\) converges to some \(x\in X\), and the continuity of \(f\) shows that \(f(x)=\lim f(x_n)=y\). Then \(f\) is an isometry of \(X\) onto \(Y\).

  9. Let \(u_a:a\in A\) be an orthonormal set in \(H\), and let \(P\) be the space of all finite linear combinations of the vectors \(u_a\). The inequality \[\sum_{a\in A}|\hat{x}(a)|^2\leq\lVert x\rVert^2,\quad \forall x \in H\] holds and \(x\to\hat{x}\) is a continuous linear mapping of \(H\) onto \(\ell^2(A)\) whose restriction to the closure \(\bar{P}\) of \(P\) is an isometry of \(\bar{P}\) onto \(\ell^2(A)\).
    Let \(F\) be a finite subset of \(A\), let \(M_F\) be the span of \(\{u_a:a\in F\}\), then \[\sum_{a\in F}|\hat{x}(a)|^2=\lVert s_F\rVert^2\leq\lVert x\rVert^2,\; s\in M_F\] then \[\sum_{a\in A}|\hat{x}(a)|^2\leq\lVert x\rVert^2\] holds and is called Bessel inequality. Bessel inequality shows that mapping \(\hat{x}\) maps \(H\) into \(\ell^2(A)\). The linearity of \(\hat{x}\) is obvious. If we apply Bessel inequality to \(x-y\) we see that \[\lVert \hat{x}(y)-\hat{x}(x)\rVert_2=\lVert \hat{y}-\hat{x}\rVert_2\leq\lVert y-x\rVert\] Thus \(\hat{x}\) is continuous.
    Since \[\lVert x\rVert^2=\sum_{a\in F}|\hat{x}(a)|^2,\quad F\subset A\] then \(\hat{x}\) is an isometry of \(P\) onto the dense subspace of \(\ell^2(A)\) consisting of those functions whose support is a finite set \(F\subset A\). The fact that the mapping \(x\to\hat{x}\) carries \(H\) onto \(\ell^2(A)\) is known as the Riesz-Fischer theorem.

  10. Let \(\{u_a:a\in A\}\) be an orthonormal set in \(H\). Each of the following four conditions on \(\{u_a\}\) implies the other three:

  1. \(\{u_a\}\) is a maximal orthonormal set in \(H\). Maximal orthonormal sets are often called orthonormal bases.
  2. The set \(P\) of all finite linear combinations of members of\(\{u_a\}\) is dense in \(H\).
  3. The equality \[\sum_{a\in A}|\hat{x}(a)|^2=\lVert x\rVert^2\] holds for every \(x\in H\).
  4. The equality \[\sum_{a\in A}\hat{x}(a)\overline{\hat{y}(a)}=(x,y)\] holdsfor all \(x,y\in H\). which is known as Parseval’s identity. Since \(\hat{x}\) and \(\hat{y}\) are in \(\ell^2(A)\), hence \(\hat{x}\overline{\hat{y}}\) is in \(\ell^1(A)\), so that the sum \[\sum_{a\in A}\hat{x}(a)\overline{\hat{y}(a)}=(x,y)\] is well defined.
    If \(P\) is not dense in \(H\), then its closure \(\bar{P}\) is not all of \(H\), since if \(M \ne H\), then there exists \(y \in H\), \(y \ne 0\), such that \(y\perp M\), then \(P^{\perp}\) contains a nonzero vector. Thus \(\{u_a\}\) is not maximal when \(P\) is not dense, and (a) implies (b). If (b) holds, so does (c). Since Hilbert space identity: \[4(x,y)=\lVert x+y\rVert^2-\lVert x-y\rVert^2+i\lVert x+iy\rVert^2-i\lVert x-iy\rVert^2\] which is equally valid with \(\hat{x},\hat{y}\) in place of \(x, y\), simply because \(\ell^2(A)\) is also a Hilbert space.
    Finally, if (a) is false, there exists \(u\ne0,u\in H\) so that \((u,u_a)=0\) for all \(a\in A\). If \(x=y=u\), then \((x,y)=\lVert u\rVert^2>0\) but \(\hat{x}(a)=0,\;\; \forall a\in A\), hence (d) fails. Thus (d) implies (a).
  1. Two algebraic systems of the same nature are said to be isomorphic if there is a one-to-one mapping of one onto the other which preserves all relevant properties. Two vector spaces are isomorphic if there is a one-to-one linear mapping of one onto the other. The linear mappings are the ones which preserve the relevant concepts in a vector space, namely, addition and scalar multiplication.
    Two Hilbert spaces \(H_1\) and \(H_2\) are isomorphic if there is a one-to-one linear mapping \(\Lambda\) of \(H_1\) onto \(H_2\) which also preserves inner products: \((\Lambda x, \Lambda y) = (x, y),\quad \forall x, y \in H_1,\quad \Lambda x, \Lambda y\in H_2\). Such a \(\Lambda\) is an isomorphism (or, more specifically, a Hilbert space isomorphism) of \(H_1\) onto \(H_2\).
    If \(\{u_a:a\in A\}\) is a maximal orthonormal set in a Hilbert space \(H\), and if \(\hat{x}(a)=(x,u_a)\), then the mapping \(x\to\hat{x}\) is a Hilbert space isomorphism of \(H\) onto \(\ell^2(A)\).

  2. A set \(\mathscr P\) is said to be partially ordered by a binary relation \(\leq\) if

  1. \(a\leq b\) and \(b\leq c\) implies \(a\leq c\).
  2. \(a\leq a\) for every \(a\in\mathscr P\).
  3. \(a\leq b\) and \(b\leq a\) implies \(a= b\).
    A subset \(\mathscr Q\) of a partially ordered set \(\mathscr P\) is said to be totally ordered (linearly ordered) if every pair \(a,b\in \mathscr Q\) satisfies either \(a\leq b\) or \(b\leq a\).
  1. The Hausdorff Maximality Theorem: Every nonempty partially ordered set contains a maximal totally ordered subset.

  2. Every orthonormal set \(B\) in a Hilbert space \(H\) is contained in a maximal orthonormal set in \(H\).
    Let \(\mathscr P\) be the class of all orthonormal sets in \(H\) which contain the given set \(B\). Partially order \(\mathscr P\) by set inclusion. Since \(B\in\mathscr P,\quad \mathscr P\ne\varnothing\). Hence \(\mathscr P\) contains a maximal totally ordered class \(\Omega\). Let \(S\) ne the union of all members of \(\Omega\). Then \(B\subset S\) and \(S\) is a maximal orthonormal set.
    If \(u_1,u_2\in S\), then \(u_1\in A_1,u_2\in A_2\) for some \(A_1,\; A_2\in \Omega\). Since \(\Omega\) is total ordered, \(A_1\subset A_2\;\;(or A_2\subset A_1)\), so that \(u_1\in A_2,\quad u_2\in A_2\). Since \(A_2\) is orthonormal, \((u_1,u_2)=0\) if \(u_1\ne u_2\), \((u_1,u_2)=1\) if \(u_1\ne u_2\). Thus \(S\) is an orthonormal set.
    Suppose \(S\) is not maximal. Then \(S\) is a proper subset of an orthonormal set \(S^*\). Then \(S^*\notin\Omega\) and \(S^*\) contains every member of \(\Omega\). Hence we may adjoin \(S^*\) to \(\Omega\) and still have a total order. This contradicts the maximality of \(\Omega\).

BIBLIOGRAPHY

1. Rudin W. Real and complex analysis. Tata McGraw-hill education; 2006.