Matrices Represent Linear Functions

A matrix is simply a grid of numbers that represents a linear function.

A linear function is one where each component of the output vector is a linear combination of the components of the input vector. If you’re talking about a scaler valued function (one output), your output is a linear combination of all your inputs. Thus you have one linear combination with as many terms as you have inputs. If you have a scaler valued univariate function, (1 input and 1 output), your output is a linear combination of your single input. Thus not only do you have a single linear combination, but that linear combination only has 1 term. In other words, your output is just a scaled version of your input (plus a constant of course).

So, for each component in your output, you will have a linear combination, and that linear combination will have as many terms as you have inputs. So in total, you will have num_outputs * num_inputs coeffecients! You can write down all these coeffecients in a little grid called a matrix. Thus the numbers in a matrix tell you the coeffecients of all the linear combinations in a linear function. This is why we say that a matrix represents a linear function (aka transformation), because it contains all the coeffecients of the linear transformation.

Example

Let’s say you have the function y=3x. Clearly, this is a linear function. Your output (y) is a linear combination of your input (x). This is a scaler valued univariate function (1 output, 1 input), thus we only have one linear combination of 1 term, thus 1 coeffecient is sufficient to represent this function! We can store this lil coeffecient in a lil matrix, like so: [3].

When someone comes a long and gives us an input for our linear function, we know that the output will be a linear combination of this input (since we know our function is linear). Since the input has only 1 component, the output is simply a scaled version of this input. In other words, we can simply multiply the coeffecient in our matrix by any input to generate an output for the function.

A more complicated example

Let’s say you have a function with 2 inputs and 2 outputs (a vector valued multivariate function). One way to represent a function of 2 inputs and 2 outputs, is via 2 scaler valued functions that take the same 2 inputs:

  1. o1 = 3*i1 + 4*i2
  2. o2 = 2*i1 + 5*i2

o1 stands for output one, i1 stands for input one, o2 stands for output two, i2 stands for input two.

As you can see, we have 2 linear combinations (one for o1 and another for o2), each with 2 terms (one specifying coeffecient of i1, the other specifying the coeffecient of i2). Thus we need 4 coeffecients to represent this vector valued multivariate linear function. We can store these coeffecients in a matrix like so:

[3 4] <-- 3 and 4 are the coeffecients for o1's linear combination
[2 5] <-- 2 and 5 are the coeffecients for o2's linear combination

By convention, we store the coeffecients of each liner combination on a seperate row. So 3,4 is the coeffecients for o1’s linear combination while 2,5 are the coeffeceitns for o2’s linear combination.

Let’s say we have an input, <4,4>, and we want to feed it to this linear function (aka linear transformation). How do we calculate an output? Well, we know that the first component of our output vector will be the linear combination of our input components using the coeffecients 3 and 4. Similarly, we know that the second component of our output vector will be a linear combination of our input vector with coeffecients 2 and 5.

To recap, a linear function is one where each component of the output vector is a linear combinaion of the components of the input vector. We store the coeffecients of the linear combinations in a matrix. Since all you need to represent a linear function/transformation are these coeffecients, a matrix represents a linear function/transformation.

So really, a matrix is nothing special, it is just one way to represent a linear function. What is special is linear functions themselves. They are much easier to work with (i.e. solve) than non linear functions. Even when what we are interested in isn’t linear, we can usually approximate it linearly.

Summary

A linear function (aka linear transformation) is when each component of your output vector is a linear combination of each component of your input vector. You will have as many linear combinations as you have output components, and each of your linear combinations will have as many terms as you have input components. Thus you can represent a linear transformation by num_input_components * num_output_components coeffecients. You can store these coeffecients in a matrix. By convention, you store the coeffecients of each linear combination on a seperate row.