Vectors are not just arrows

Vectors are pointy arrows in in space, at least that’s what most of us were taught. This expression does a good job describing the behavior of vectors when manipulated through scaling, addition, and transformations. However, it does not provide a good description of what vectors could represent.

Consider $$v = (1,1,2),$$ $$f(x) = 2-x^2,$$ $$T(x,y,z) = (x+2y,\ 5y-3z),$$ and $$g(x) = \exp(x) + sin(\pi x).$$ All these mathematical objects are vectors in distinct spaces – $v$ belongs to the vector space $\mathbb{R}^3$, $f(x)$ belongs to the vector space ${\mathcal{P}_{2}}(\mathbb{R})$, $T$ belongs to the vector space $\mathcal{L(\mathbb{R}^3,\mathbb{R}^2)}$, and $g(x)$ belongs to the vector space of continuous real valued functions $C(\mathbb{R})$. You don’t have to understand what these symbols and Greek letters represent for now.

Roughly speaking$^{[1]}$, a set $V$ is a vector space over $\mathbb{R}$ if

  1. The sum of any two elements of $V$ is contained in $V$,
  2. Scaling any element of $V$ by some real number produces an element that is contained in $V$, and
  3. $V$ contains $0$.

Note that the last condition requires the zero to be of the same structure as all elements in $V$. For example, $0$ in $\mathbb{R}^3$ is the vector $(0,0,0)$. It follows from his definition that many kinds mathematical objects could be made into vector spaces.

The study of vector spaces and their properties is crucial in various fields in mathematics, most notably Linear Algebra and Functional Analysis. This article will primary focus on vector spaces in the context of Linear Algebra, as a good understanding of Linear Algebra is required in order to study how other mathematical, scientific, and engineering fields utilize this powerful construction. It is important to note that one article, or even a hundred articles, does not do Linear Algebra any justice. Therefore, my hope is to expose readers to the beauty of this side of mathematics, and a bigger hope is to motivate some of the readers to study these concepts for their own sake.

Linear Algebra could be broken down into three parts: vector spaces, linear maps, and inner products. It turns out that the most interesting results from linear algebra are derived from the use of inner products. However, before discussing inner products, linear maps need to be defined.

Linear Maps

A linear map is a function $T: V \rightarrow W$ that maps vectors belonging to the vector space $V$ to vectors belonging to the vector space $W$ while preserving linearity. Linearity in this case requires that $T$ is homogeneous and additive. Homogeneity is just fancy way to state that for a vector $v \in V$ and a number $\lambda \in \mathbb{R}$, applying $T$ on the vector $\lambda v$ produces the same result as applying $T$ on $v$ and then multiplying by $\lambda$. In more mathematical terms, we write

$$T(\lambda v) = \lambda T(v) \ \text{ for every } \ v \in V \ {and} \ \lambda \in \mathbb{R}.$$

In almost the same way, for all vectors $u,v \in V$, $T$ is additive when applying $T$ on the vector $u+v$ produces the same result as applying $T$ on $u$ and $v$ individually, and then summing the results. This is expressed as

$$T(u + v) = T(u) + T(v) \ \text{ for every } \ u,v \in V.$$

If these two descriptions appear like jargon, then consider the following examples. You are at the gym lifting relatively heavy weights; wights that are challenging to lift, yet you still manage to lift them. It is the beginning of January, therefore, you run into many friends you almost never see at the gym. While lifting, your mathematician friend instructs you that you are doing it all wrong. He says you should lift light weights for many repetitions instead of lifting heavy weight only for a few repetitions. What your fiend intends to say is that the benefits weight lifting are homogenous in which lifting $5kg$ for $50$ repetitions yields the same benefits as lifting $50kg$ for $5$ reptations; which will save you from the burden of lifting the heavier weights. This could not be further form the truth as in reality muscle hypertrophy(the muscle building process) is better achieved through lifting challenging weights – meaning that the relationship between lifting and muscle hypertrophy is not homogenous.

To illustrate additivity, imagine that you are on a trip with your best friend and just found out that you have an exam coming up in ten days. Unfortunately, you and your friend, who is also taking the same exam, will only be back home in five days. You start worrying and consider studying for the exam during the trip. This does not go well with your friend as he spent most of has savings on this trip. He comes up with a genius idea to save the day – upon arriving back home, one shall spend the whole five days studying from the lecture notes while the other shall spend them on studying from the tutorial questions; therefore, 10 days would be spent cumulatively. Now for this story to work, and not pose any ethical dilemmas, suppose that the exam allows for peer collaboration. Therefore, on exam day, you put your brains together and submit the same answers. You also happen to have a third friend who took the same exam. He was too broke to join you both on the trip, and consequently, he spent the whole ten days studying from both lecture notes and tutorial questions. Again, to give credence to the story, you did not ask the third friend to help you with the exam as that would have come in bad taste. Assuming that all three in the friend group have the same level of ability, who scored higher? If you, your best friend, and that third friend scored just about the same, then the relationship between the variety of content used in studying and the exam outcome is additive. In this case, the relationship between studying and exam performance is the function $T$ while the contents used in studying are the vectors found in $V$. In reality, studying from both materials will produce what is called the interaction effect that causes better than linear outcomes.

The concept of linearity is not exclusive to Linear Algebra, or mathematics in general. Take for a fact what was presented in the article On Antifragility where it was reveled that fragile, and antifragile, natural systems exhibit nonlinearity.

Inner products

Now that the notion of linear maps has been cleared, we introduce the main appeal of this article. An inner product is a function $\langle \cdot, \cdot \rangle : V \times V \rightarrow \mathbb{R}$ mapping pairs of elements of $V$ to a number in $\mathbb{R}$. To be more clear, on some vector space $V$, the inner product function accepts vectors $u$ and $v$ and outputs some number that, roughly speaking, represents how similar$^{[2]}$, or dissimilar, these two vectors are.

Any function $f: V \times V \rightarrow \mathbb{R}$ is said to be an inner product on $V$ if it satisfies the requirements below:

  1. Positivity: $\langle v, v \rangle \geq 0$ for all $v \in V$.
  2. Definiteness: $\langle v, v \rangle = 0$ if and only if $v = 0$.
  3. Additivity on the first slot: $\langle u + v, w \rangle = \langle u, w \rangle + \langle v, w \rangle$ for all $u,v,w \in V.$
  4. Homogeneity on the first slot: $\lambda \langle u, v \rangle = \langle \lambda u, v \rangle$ for all $u,v \in V$ and all $\lambda \in \mathbb{R}.$
  5. Symmetry$^{[3]}$: $\langle u, v \rangle = \langle v, u \rangle$ for all $u,v \in V.$

These conditions ensure that the inner product is a linear function$^{[4]}$. When an inner product $\langle \cdot, \cdot \rangle$ on the vector space $V$ is defined, $V$ becomes an inner product space. Inner product spaces possess many interesting properties, one of them being orthogonality. When two vectors have a zero inner product, they are said to be orthogonal. As mentioned before, the inner product, roughly speaking, measures how similar any two vectors are, and when the inner product equals zero, the two vectors are dissimilar.

The concept of orthogonality could be used to construct an orthonormal basis of $V$. The list of vectors $e_1, \dots, e_n$ is called an orthonormal basis of $V$ if every vector in $V$ could be expressed as a linear combination of theses vectors, and $$ \begin{aligned} \langle e_i, e_j \rangle =
\begin{cases}
1 & \text{if } i=j \\ 0 & \text{otherwise } \
\end{cases} \end{aligned}. $$

It turns out that each vector $v \in V$ could be expressed as a linear combination the list $e_1, \dots, e_n$ by projecting $v$ onto each of $e_i$ as follows

$$v = \langle v, e_1 \rangle e_1 + \dots + \langle v, e_n \rangle e_n$$ Conceptually, $\langle v, e_i \rangle$ represents how much of $e_i$ is needed to construct $v$.

A step away form abstractions

So far, everything has been abstracted away, and it was all for a good reason. When developing and proving mathematical theorems, the less assumptions we have about the nature of what is being studied, the stronger the theorems are. In the preceding discussions, $V$ was only assumed to be a vector space.

Consider polynomials of the form $$p(x) = a_0 + a_1 x + a_2 x^2 + \dots + a_m x^m$$ where $a_0, a_1, a_2, \dots, a_m \in \mathbb{R}$. There are an infinite number of choices of coefficients $a_0, a_1, a_2, \dots, a_m$ which means that there is an infinite number of such polynomials. We could collect all these polynomials and place them in a set denoted by ${\mathcal{P}_{m}}(\mathbb{R})$ – we call this set the set of all polynomials of degree at most $m$.

Notice that for any polynomial $p \in {\mathcal{P}_{m}}(\mathbb{R})$ and scalar $\lambda \in \mathbb{R}$

$$
\begin{aligned}
(\lambda p)(x) &= \lambda p(x)\\
&= \lambda (a_0 + a_1 x + a_2 x^2 + \dots + a_m x^m) \\
&= \lambda a_0 + \lambda a_1 x + \lambda a_2 x^2 + \dots + \lambda a_m x^m
\end{aligned} $$

As ${\mathcal{P}_m}(\mathbb{R})$ already includes all polynomial of the desired form, it follows that $\lambda p \in {\mathcal{P}_m}(\mathbb{R})$. Similarly, for every $p, q \in \mathcal{{P}_m}(\mathbb{R})$,

$$
\begin{aligned}
(p+q)(x) &= p(x) + q(x) \\
&= (a_0 + a_1 x + \dots + a_m x^m) + (b_0 + b_1 x + \dots + b_m x^m) \\
&= (a_0+b_0) + (a_1+b_1) x + \dots + (a_m+b_m) x^m
\end{aligned}, $$ which shows that $p+q \in {\mathcal{P}_m}(\mathbb{R})$. Lastly, the zero polynomial $0$ is found in ${\mathcal{P}_m}(\mathbb{R})$. These properties imply that ${\mathcal{P}_m}(\mathbb{R})$ (the set of polynomials of degree at most $m$) is a vector space.

Approximating $e^x$ and $\sin(x)$

The classic route of estimating functions with polynomials is through the use of Taylor series. Constructing the Taylor series of a function is quite simple in practice. Given a function $f(x)$, we write

$$p_f(x) = \sum_{i=0}^{\infty}{a_n x^n},$$ where

$$a_n = \frac{f^{(n)}(0)}{n!}.$$

A consequence of the construction of the Taylor coefficients is that the function $f$ must be infinitely differentiable at zero. Moreover, the Taylor series is not guaranteed to correctly estimate $f$ on all of $\mathbb{R}$. Fortunately, it is well established that the Taylor series of $e^x$ and $\sin(x)$ converge to their underlying functions for all $x \in \mathbb{R}$. Moreover, we could restrict computations to the first $m$ finite terms of the Taylor series, and in many cases, it only takes a very small degree $m$ to reasonably approximate the desired function on relatively small intervals.

Interestingly enough, using Linear Algebra for the task of estimating $e^x$ and $\sin(x)$ yields better results! Recall that for some arbitrary positive integer $m$, ${\mathcal{P}_m}(\mathbb{R})$ is a vector space. With this in mind, the space could be made into an inner product space by defining the inner product

$$\langle p, q \rangle = \int_a^b{pq}$$ where $p,q \in {\mathcal{P}}_m(\mathbb{R})$ and $[a,b]$ is the interval we concern ourselves with.

We fix $a$ and $b$ to obtain an orthonormal basis $e_1, \dots, e_m$ of ${\mathcal{P}}_m(\mathbb{R})$$^{[5]}$. With an orthonormal basis in hand, we can go ahead and project some arbitrary, non-polynomial, function $f$ onto ${\mathcal{P}_m}(\mathbb{R})$ using the expression below. $$q_f = \langle f, e_1 \rangle e_1 + \dots + \langle f, e_m \rangle e_m$$

where $\langle f, e_1 \rangle, \dots, \langle f, e_m \rangle$ are numbers and could be interpreted as how much of $e_i$ is in the projection $f$. Here, $q_f$ is called the orthogonal projection of $f$ onto ${\mathcal{P}_{m}}(\mathbb{R})$ – it is a polynomial in that provides an approximation of $f$.

To get a sense of what has been presented thus far, consider the following interactive graph of $f(x) = e^x$ on the domain $[-4,4]$ and its polynomial approximations using the methods presented above for various degrees $m$.

$$\text{Figure 1: Approximations of } e^x$$

Although the corresponding polynomial approximates of $e^x$ of both procedures converge to $e^x$ as $m \rightarrow \infty$, it is clear that the orthogonal projections procedure provides a faster rate of convergence (hint: use the slider). To get a different sense of the major difference of the two procedures, consider the task of approximating $g(x) = \sin(x)$ on the interval $[-3\pi, 3\pi]$$^{[6]}$.

$$\text{Figure 2: Approximations of } \sin(x)$$

It is not hard to see the differences in how the two approximates converge to $\sin(x)$; this was not clear from the example of $e^x$. The two methods are fundamentally different. Taylor series provide reasonable approximates of the function of interest around some neighborhood of $x=0$, and this neighborhood tends to grow as the degree of the polynomial is increased. On the other hand, orthogonal projections find the “shadow” $q_g$ of $\sin(x)$ – which could be interpreted as minimizing the dissimilarity between the polynomial $q_g$ and $\sin(x)$ uniformly on all of $[a,b]$.

As a sidenote, consider the classic counterexample used to disprove the convergence of Taylor series:

$$h(x) = \begin{aligned} \begin{cases} e^{-1/x^2} & \text{if } x\neq 0 \\ 0 & \text{if } x =0 \ \end{cases} \end{aligned}. $$

This function is infinitely differentiable at zero, for which all those differentials equal zero. Therefore, the Taylor series of $h$ is the constant polynomial $p_h(x) = 0$. As $h(x) \neq 0$ for all $x \neq 0$, the Taylor series of $h$ is only correct at $x=0$. In a similar way to what has been presented thus far, the graph below compares the Taylor and Orthogonal Projections methods when applied to approximate $h(x)$ on $[-4,4]$.

$$\text{Figure 3: Approximations of } h(x)$$

As opposed to using Taylor approximations, using orthogonal projections provides polynomials that converge to $h(x)$ on $[a,b]$. This is because the flatness of $h$ does not influence the parameters of these polynomials in the same way as in the Taylor case.

Last words

Mathematics is filled with beauty that tends to shy away. The results in this article were not, and are not supposed to be, easily digestible. The article’s aim was only to reveal the rewards of the rigorous treatment of mathematics. Therefore, the most fitting conclusion would be that abstractions do not necessarily imply non-applicability.


$[1]$: In more rigorous terms, $V$ must be closed under addition and scalar multiplication. Moreover, the addition of elements of $V$ must be both associative and commutative and scalar multiplication must be associative and distributive. And lastly, for every $v \in V$, $v+0=v$, $v+(-v) =0$, and $1v=v$.

$[2]$: Inner products are chosen on the basis of how we desire to express similarity between vectors. For example, the vectors (2,-1) and (1,2) are considered to be dissimilar in Euclidean space as the Euclidean inner product(known as the dot product) is $\langle (2,-1), (1,-2) \rangle = 2 \times 1 + (-1) \times 2 = 0$. However, using the wighted inner product defined by $\langle (a_1,a_2), (b_1,b_2) \rangle = 3 a_1 b_1 + a_2 b_2$ results in an inner product $\langle (2,-1), (1,-2) \rangle = 3 \times (2 \times 1) + (-1) \times 2 = 4$. When the inner product of any two vectors equals zero, the vectors are said to be orthogonal.

$[3]$: If the vector space $V$ is defined over the complex numbers $\mathbb{C}$, the condition would be called conjugate symmetry and would entail $\langle u, v \rangle =\overline{ \langle v, u \rangle}$ for all $u,v \in V$.

$[4]$: On both $u$ and $v$ only if $V$ is a vector space over $\mathbb{R}$. If $V$ is a vector space over $\mathbb{C}$, then the inner product is linear only with respect with $u$.

$[5]$: Orthonormal bases of inner product spaces could be obtained by using the Gram-Schmidt Procedure.

$[6]$: One need not to estimate $sin(x)$ outside $[-\pi,\pi]$ since $sin(x) = sin(x+2\pi)$ for all $x \in \mathbb{R}$.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *