(A new question of the week)
Looking for a new topic, I realized that a recent question involves determinants, and an older one provides the background for that. We’ll continue the series on determinants by seeing how they can be used in finding the inverse of a matrix, and how something called the adjugate matrix might fit in (with side trips into Cramer’s Rule and row reduction).
Finding an inverse using determinants
This question came from Sarah, in February of last year:
I was studying matrices, and was thinking, is there some proof on finding the inverse of a matrix?
I know how to do it step by step by heart but l do not understand what I’m doing and why it is like that.
For example, the inverse uses the determinant of a matrix – how do you interpret it? For instance, if the determinant of a 3×3 matrix is 2, what is that telling you about the matrix?
We also find minors – if an element has a minor of -1, what does that really mean, please?
We’ve recently seen what a determinant means, algebraically and geometrically; but the “meaning” in this context is a little different. We haven’t yet looked at minors, which are determinants of sub-matrices.
Doctor Fenton answered:
Hi Sarah,
Yes, there are ways of proving that a given algorithm does produce an inverse to a matrix, and there is more than one way to compute the inverse, one of which is to use determinants.
It would help to know what you already know about matrices. Do you use matrices to solve systems of linear equations, to transform vectors (column matrices), or for some other application?
Sarah replied,
Thanks for your reply. I’m using it in a course about mathematical economics where it is mostly applied to finding inverses to solve a system of 3 equations. If you’re familiar with some economic theory, there is also an application to find OLS estimators in a regression.
We had covered matrices before, but now l want to understand a bit deeper what I’m actually doing.
So l know if I’m using determinants, l can find the reciprocal of that and multiply by the adjoint, where the adjoint is the transpose of the cofactor matrix but beyond that, l still don’t know what the determinant is. I’ve always learnt it as “ad – bc”.
Even minors, l get the definition that you delete the ith and jth row and column and find determinant of resultant matrix, but doing that by heart is a bit strange because l don’t understand why l am doing that, in the sense l don’t know what the minor shows you and how it leads to the inverse matrix. I think that logic is why you can only apply inverses to square matrices, although to solve systems of equations, number of equations = number of unknowns shouldn’t be a problem.
We had previously covered row reduction technique, l also know Laplace expansion and the short hand rule. And we have solved systems using Cramer’s rule.
Thanks
We’ll touch on most of these topics: Finding the inverse using what she calls the “adjoint“, more often today called the “adjugate“, and also by row reduction; “minors” in a determinant (used in finding the adjugate, and also in the Laplace expansion for evaluating a determinant; and Cramer’s rule for solving a system of equations.
Sometime we will look into what matrices are, why they are added and multiplied as they are, and so on. But we’ll see the basics of multiplication and inverses momentarily.
What is a matrix inverse?
Doctor Fenton responded, first stating what an inverse is:
Thank you for clarifying what you already know. Using the adjugate (previously called the adjoint) matrix to find the inverse is not the most efficient way to compute the inverse. I will illustrate the ideas with 2×2 matrices, although the idea works for square matrices of any size (only square matrices can have an inverse).
When I multiply two 2×2 matrices AB, with
A = [a11 a12] and B = [b11 b12], [a21 a22] [b21 b22]note that the product is
[a11b11+a12b21 a11b12+a12b22] = [ [a11 a12][b11] [a11 a12][b12] ] [a21b11+a22b21 a21b12+a22b22] [ [a21 a22][b21] [a21 a22][b21] ] = [A(B1) A(B2)]where B1 and B2 are the first and second columns of B. That is, to multiply A by the matrix B=[B1 B2] on the right, you just multiply each of the columns in B by A.
To help us follow this, I’ll make a simple 2×2 example:
$$A=\begin{bmatrix}1&2\\3&4\end{bmatrix},B=\begin{bmatrix}2&-1\\1&3\end{bmatrix}\\
AB=\begin{bmatrix}1&2\\3&4\end{bmatrix}\begin{bmatrix}2&-1\\1&3\end{bmatrix}=\begin{bmatrix}1\cdot2+2\cdot1&1\cdot-1+2\cdot3\\3\cdot2+4\cdot1&3\cdot-1+4\cdot3\end{bmatrix}=\begin{bmatrix}4&5\\10&9\end{bmatrix}=Y$$
The first column of the product is A times the first column of B:
$$\begin{bmatrix}1&2\\3&4\end{bmatrix}\begin{bmatrix}2\\1\end{bmatrix}=\begin{bmatrix}1\cdot2+2\cdot1\\3\cdot2+4\cdot1\end{bmatrix}=\begin{bmatrix}4\\10\end{bmatrix}$$
That’s how we multiply. So what is the inverse?
The inverse of a matrix A (if it exists) is the matrix A-1 such that
AA-1 = A-1A = I ,
where I is the identity matrix.
If A is invertible, and we want to solve the matrix equation AX=B, where
X is a 2x1 column matrix [x1] and B is a column matrix [b1], [x2] [b2]we multiply AX=B by A-1 and get X = A-1B as the solution.
For our A, the inverse (which we’ll calculate below in two ways) turns out to be $$A^{-1}=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix},$$ which we can check by seeing that $$AA^{-1}=\begin{bmatrix}1&2\\3&4\end{bmatrix}\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}=\begin{bmatrix}1\cdot-2+2\cdot\frac{3}{2}&1\cdot1+2\cdot-\frac{1}{2}\\3\cdot-2+4\cdot\frac{3}{2}&3\cdot1+4\cdot-\frac{1}{2}\end{bmatrix}=\begin{bmatrix}1&0\\0&1\end{bmatrix}$$ and
$$A^{-1}A=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}\begin{bmatrix}1&2\\3&4\end{bmatrix}=\begin{bmatrix}-2\cdot1+1\cdot3&-2\cdot2+1\cdot4\\\frac{3}{2}\cdot1-\frac{1}{2}\cdot3&\frac{3}{2}\cdot2-\frac{1}{2}\cdot4\end{bmatrix}=\begin{bmatrix}1&0\\0&1\end{bmatrix}.$$
If we wanted to solve the equation \(AX=Y\), $$\begin{bmatrix}1&2\\3&4\end{bmatrix}X=\begin{bmatrix}4&5\\10&9\end{bmatrix},$$ we could multiply both sides by \(A^{-1}\) to get
$$X=A^{-1}Y=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}\begin{bmatrix}4&5\\10&9\end{bmatrix}=\begin{bmatrix}-2\cdot4+1\cdot10&-2\cdot5+1\cdot9\\\frac{3}{2}\cdot4-\frac{1}{2}\cdot10&\frac{3}{2}\cdot5-\frac{1}{2}\cdot9\end{bmatrix}=\begin{bmatrix}2&-1\\1&3\end{bmatrix},$$ which is our B above.
Inverse by solving equations
So, how do we find that inverse matrix?
To simplify notation by reducing the number of super- and subscripts, let me denote the inverse matrix of A, A-1, by C, so that C1 is the first column of C and C2 the second.
The equation AA-1 = AC = I can be written as
AC = A[C1 : C2] = (AC1 : AC2] = [E1 : E2] ,
since
E1 = [1] is the first column of I and E2 = [0] is the second. [0] [1]Then AC1=E1 and AC2=E2, which says that C1 is the solution to AX=E1, and C2 is the solution to AX=E2.
In our example, we find the two columns of the inverse by solving $$AC_1=E_1$$ $$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_1=\begin{bmatrix}1\\0\end{bmatrix}$$ and $$AC_2=E_2$$ $$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_2=\begin{bmatrix}0\\1\end{bmatrix}$$
But you know how to solve AX=B by row reducing the augmented matrix [A:B] (the matrix A augmented with B as an extra column) to the form [I:X], so that the solution X is the last column of the reduced augmented matrix.
Then, to find the inverse matrix, we augment the matrix A with the identity matrix [A:I] (a 2×4 matrix) and row reduce to the form [I:C], and the inverse matrix will be the right half of the reduced 2×2 matrix. (If the left half cannot be reduced to I, then the matrix A is not invertible.) That is the efficient way to find A-1.
This is the standard method that he referred to before, and which we’ll see below. But we can also use determinants to solve this equation, which will lead to the adjugate. For that, keep reading …
Cramer’s rule and the inverse
Finding \(C_1\) and \(C_2\) each amounts to solving a system of equations, which we can do with determinants:
If you solve
ax + by = u cx + dy = vwith elimination, multiplying the first equation by d and the second equation by b, and then subtracting, you get
(ad – bc)x = du – bv,
so
x = (du – bv)/(ad – bc), or
or
[u b] det [v d] x = --------- , [a b] det [c d]and similarly y = (av – cu)/(ad – bc) is a quotient of determinants. This indicates where determinants can come from and can lead to Cramer’s Rule, but using determinants is not the best way to find the inverse.
Here we have derived Cramer’s Rule by brute force in the 2×2 case. As Wikipedia puts it,
Consider a system of n linear equations for n unknowns, represented in matrix multiplication form as follows: $$A\mathbf{x}=\mathbf{b}$$
where the n × n matrix A has a nonzero determinant, and the vector \(\mathbf{x}=(x_1,\dots,x_n)^T\) is the column vector of the variables. Then the theorem states that in this case the system has a unique solution, whose individual values for the unknowns are given by: $$x_i=\frac{\det(A_i)}{\det(A)}\; \; \; i=1,\dots n$$ where \(A_i\) is the matrix formed by replacing the i-th column of A by the column vector \(\mathbf{b}\).
So let’s solve our system this way, in order to find the inverse of A:
To find the first column of our inverse, we need to solve
$$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_1=\begin{bmatrix}{\color{Green}1}\\{\color{Green}0}\end{bmatrix}$$
Cramer’s rule gives this solution:
$$C_{11}=\frac{\begin{vmatrix}{\color{Green}1}&2\\{\color{Green}0}&{\color{Red}4}\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{1\cdot{\color{Red}4}-2\cdot0}{1\cdot4-2\cdot3}=\frac{{\color{Red}4}}{-2}=-2$$
$$C_{21}=\frac{\begin{vmatrix}1&{\color{Green}1}\\ {\color{Red}3}&{\color{Green}0}\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{1\cdot0-1\cdot{\color{Red}3}}{1\cdot4-2\cdot3}=\frac{-{\color{Red}3}}{-2}=\frac{3}{2}$$
But observe that the determinant on the top, in each case, is just the element (4 or 3) opposite the 1, with an alternating sign; I’ve highlighted them. These, as we’ll see, are cofactors.
So the first column is $$C_{1}=\begin{bmatrix}-2\\\frac{3}{2}\end{bmatrix}$$
Similarly, to solve
$$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_2=\begin{bmatrix}{\color{Green}0}\\{\color{Green}1}\end{bmatrix}$$
we use
$$C_{12}=\frac{\begin{vmatrix}{\color{Green}0}&{\color{Red}2}\\{\color{Green}1}&4\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{0\cdot4-{\color{Red}2}\cdot1}{1\cdot4-2\cdot3}=\frac{-{\color{Red}2}}{-2}=1$$
$$C_{22}=\frac{\begin{vmatrix}{\color{Red}1}&{\color{Green}0}\\3&{\color{Green}1}\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{{\color{Red}1}\cdot1-0\cdot3}{1\cdot4-2\cdot3}=\frac{{\color{Red}1}}{-2}=-\frac{1}{2}$$
So the second column of the inverse is $$C_{2}=\begin{bmatrix}1\\-\frac{1}{2}\end{bmatrix}$$
This gives us the inverse I showed before,
$$A^{-1}=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}$$
We almost used the adjugate here, though we haven’t yet even talked about what it is. We’ll get there eventually, but first, he answered the side questions:
Determinants have a geometric interpretation. The determinant of
[a b] [c d]is the area of the parallelogram with sides given by the vectors (a,b) and (c,d) in the plane. I don’t know of any significance of this fact for solving linear systems, other than the fact that if the determinant is 0, then the system either has no solution or infinitely many solutions, depending upon the right side B.
Does this help?
This is the subject of our last two posts.
Finding the inverse by row reduction
Sarah asked for a little more:
Thank you so much for that, Dr Fenton.
Just to make sure l understood, could you kindly illustrate through an example? I can then apply that myself to a 3×3, don’t worry 🙂
Why is there such an emphasis on determinants not being the most efficient way, please?
The part on deriving the determinant and how it can lead to Cramer’s Rule is very interesting, thank you.
What about the part on minors, particularly interpreting them – the idea behind WHY we delete the ith row and jth column and take the determinant of the resultant matrix.
Thank you!
Doctor Fenton replied with, first, a statement of what we did above with Cramer’s Rule:
By an example, I assume that you want an example of using row reduction to compute an inverse of a matrix. In the 2×2 case, the determinant approach gives the inverse matrix of
[a b]-1 [ d -b] [c d] = 1/(ad-bc) [-c a]which doesn’t require much computation.
That matrix is, in fact, the adjugate.
Then he gave an example of the more efficient method of finding inverses, before getting back to minors:
For a 3×3 example, to find
[ 1 -1 0]-1 [ 1 0 -1] [-6 2 3] ,we write
[ 1 -1 0 1 0 0] [ 1 0 -1 0 1 0] [-6 2 3 0 0 1]and row reduce to
[ 1 0 0 -2 -3 -1] [ 0 1 -1 -3 -3 -1] [ 0 0 1 -2 -4 -1] ,so
[ 1 -1 0]-1 [-2 -3 -1] [ 1 0 -1] = [-3 -3 -1] [-6 2 3] [-2 -4 -1] .
We’ll see the adjugate method, for the same matrix, later.
The reason for preferring row operations is because of complexity. Even in the 3×3 case, the arithmetic work required is not onerous, but for larger matrices, there is a big difference. It’s not hard to see that in general, computing an nxn determinant requires computing n! terms, while row-reducing an nxn matrix to upper triangular form takes roughly n3/6 operations, so reducing the left half of the augmented n x (2n) matrix to the identity will take about n3/3 operations. For n=2 or 3, n! and n3/3 are comparable, but for larger n, say n=10, 10! is over 3×106, while 103/3 is about 300. For n=100, the value of 100! is an integer with 158 digits, while 1003/3 is in the hundreds of thousands.
To compute the value of large determinants, it is more efficient to use row operations to transform the matrix to upper triangular form, since the determinant of a triangular matrix is just the product of its diagonal elements, and the effects of two operations on a determinant is easy to determine: interchanging rows changes the sign of the determinant; multiplying a row by a constant multiplies the determinant by the same constant; and replacing a row by the sum of itself and another row doesn’t change the determinant.
This provides a way to find determinants that is quicker than doing it directly; but in the adjugate method we’re about to see, we’d need to calculate many determinants!
Minors and cofactors
The adjugate is defined in terms of minors, which arise in the Laplace expansion of a determinant; so he explained that first. Here is what it looks like for a 3×3 determinant, starting with the algebraic definition we saw two weeks ago:
As for the Laplace expansion, I don’t know how Laplace discovered it, but if you look at the 3×3 case,
[a b c] det [d e f] = aei + cdh + bfg - ceg - afh - bdi = a(ei-hf) + b(fg-di) + c(dh-eg) [g h i] = a det[e f] - b det [d f] + c det [d e] [h i] [g i] [g h] .You can pick any row (or column) and rewrite the determinant as a sum of the entries in that row (or column) times determinants which are the minors of the entries.
Each element of one row (here, the top) is multiplied by the determinant of the matrix formed by removing that element’s row and column. The minor of the bold entry here is the determinant of the part in red, and the cofactor is the minor multiplied by \(\pm1\):
\begin{vmatrix}\mathbf{a}&b&c\\d&{\color{Red}e}&{\color{Red}f}\\g&{\color{Red}h}&{\color{Red}i}\end{vmatrix}
\begin{vmatrix}a&\mathbf{b}&c\\ {\color{Red}d}&e&{\color{Red}f}\\ {\color{Red} g}&h&{\color{Red}i}\end{vmatrix}
\begin{vmatrix}a&b&\mathbf{c}\\ {\color{Red}d}&{\color{Red}e}&f\\ {\color{Red} g}&{\color{Red}h}&i\end{vmatrix}
The same pattern is true, almost trivially, of the 2×2 determinant: the minors are just the diagonally opposite entries, as I mentioned above.
Inverse by adjugate
Sarah now asked for the one missing piece:
Thank you Dr Fenton! This is why l love asking questions here – l always learn more than l ever thought l would before asking!
The part about number of operations isn’t as obvious to me, but l do get the gist why row operations are quicker.
Could you elaborate on the notion of minors, please? I’m still unsure what a minor of 4 would really be saying. I think there’s more to it that l just don’t know about.
And what about proving that 1/det multiplied by adjugate indeed gives you the inverse matrix, please?
Thank you 🙂
Doctor Fenton answered:
As I think I said earlier, I just regard minors as quantities which arise in evaluating determinants. As a determinant, it has a geometric interpretation as an area or volume in 2 or 3 dimensions, but I am not aware of any geometric significance to that fact. The Laplace expansion (or cofactor expansion) tells you that the absolute value of a 3×3 determinant is a volume of a 3-dimensional parallelepiped, which is a linear combination of some 2-dimensional areas (the areas corresponding to the minors of the determinant), but I don’t know that this interpretation helps understand what a determinant is.
This could be interesting to think more about, but if there is a meaning, it is not obvious.
Now we finally get to the adjugate:
As for the inverse formula of an invertible matrix A, you form the cofactor matrix C of A, where the entry in the ith row and jth column is cij, the cofactor of the entry aij in A (that is, (-1)i+jMij), obtained by deleting the ith row and jth column of A. Next, you transpose the cofactor matrix, CT. This is the adjugate matrix.
Then the matrix product ACT is
[a11 a12 ... a1n][c11 c21 ... cn1] [a21 a22 ... a2n][c12 c22 ... c2n] [ : : :][ : : : ] [an1 an2 ... ann][c1n c2n ... cnn] ,so the 11 entry of the product is
a11c11+a12c12 + … + a1nc1n
which is exactly the cofactor expansion of det(A). The 12 entry of the product is
a11c21+a12c22 + … + a1nc2n ,
which is the cofactor expansion of the determinant of the matrix
[a11 a12 ... a1n] [a11 a12 ... a1n] [ : : : ] [an1 an2 ... ann] .This matrix has a repeated row, so the determinant of this matrix is 0.
Then the product ACT is
[det(A) 0 0 ... 0 ] [ 0 det(A) 0 ... 0 ] [ 0 0 det(A) ... 0 ] [ : : : ... : ] [ 0 0 0 ... det(A)] ,which is det(A)I, where I is the nxn identity matrix.
2×2 example
We’ve already done this in our 2×2 example. With $$A=\begin{bmatrix}1&2\\3&4\end{bmatrix},$$ the cofactor matrix is $$C=\begin{bmatrix}4&-3\\-2&1\end{bmatrix},$$ swapping diagonally opposite entries and changing the sign of every other one. Its transpose is $$C^T=\begin{bmatrix}4&-2\\-3&1\end{bmatrix},$$ which is the adjugate. Dividing this by the determinant, \(1\cdot4-2\cdot3=-2,\) we get $$A^{-1}=\begin{bmatrix}\frac{4}{-2}&\frac{-2}{-2}\\\frac{-3}{-2}&\frac{1}{-2}\end{bmatrix}=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}.$$ This is what we got before.
Can you see the connection between this and what we did with Cramer’s Rule?
3×3 example
Now let’s do a 3×3 example; using the example Doctor Fenton used above, I’ll take $$A=\begin{bmatrix}1&-1&0\\1&0&-1\\-6&2&3\end{bmatrix}.$$
The cofactor of the first entry, \(a_{11}\), is $$(-1)^{1+1}\begin{vmatrix}0&-1\\2&3\end{vmatrix}=2,$$ so that is the first entry. The cofactor of \(a_{12}\), is $$(-1)^{1+2}\begin{vmatrix}1&-1\\-6&3\end{vmatrix}=-(-3)=3,$$Continuing, the cofactor matrix is $$C=\begin{bmatrix}2&3&2\\3&3&4\\1&1&1\end{bmatrix},$$ and the adjugate is $$C^T=\begin{bmatrix}2&3&1\\3&3&1\\2&4&1\end{bmatrix}.$$
Its determinant is (using cofactors in the first row) $$\det(A)=\begin{vmatrix}1&-1&0\\1&0&-1\\-6&2&3\end{vmatrix}=1\cdot2+-1\cdot3+0\cdot2=2-3+0=-1.$$
So the inverse is $$A^{-1}=\frac{C^T}{\det(A)}=\frac{1}{-1}\begin{bmatrix}2&3&1\\3&3&1\\2&4&1\end{bmatrix}=\begin{bmatrix}-2&-3&-1\\-3&-3&-1\\-2&-4&-1\end{bmatrix},$$ as we got by row reduction.We can check this by multiplying:
$$AA^{-1}=\begin{bmatrix}1&-1&0\\1&0&-1\\-6&2&3\end{bmatrix}\begin{bmatrix}-2&-3&-1\\-3&-3&-1\\-2&-4&-1\end{bmatrix}=\\\begin{bmatrix}1\cdot-2+-1\cdot-3+0\cdot-2&1\cdot-3+-1\cdot-3+0\cdot-4&1\cdot-1+-1\cdot-1+0\cdot-1\\1\cdot-2+0\cdot-3+-1\cdot-2&1\cdot-3+0\cdot-3+-1\cdot-4&1\cdot-1+0\cdot-1+-1\cdot-1\\-6\cdot-2+2\cdot-3+3\cdot-2&-6\cdot-3+2\cdot-3+3\cdot-4&-6\cdot-1+2\cdot-1+3\cdot-1\end{bmatrix}=\\\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}$$
Pingback: Adjoints and Inconsistency: A Questionable Test – The Math Doctors