| CS 230 |
When we write a program, our first concern is that the program should be correct, in other words, that it should produce correct answers. However, we also need to concern ourselves with the efficiency of programs -- that is, how much time and memory our program takes to solve the problem. From a practical standpoint, a program which is guaranteed to produce the correct answer, but which takes three billion years to run, is not much more useful than a program that gives a wrong answer almost immediately.
When we analyze the efficiency of a program, we are generally trying to estimate how much time or memory the program needs, as a function of the input size. The function describing how much time the algorithm needs is called the time complexity, while the function describing how much memory or storage is called the space complexity of the algorithm.
How to describe the size of the input depends on the problem we are trying to solve. For instance, recall the simple code we used in Lecture 3 to do multiplication:
(define multiply-1
(lambda ((a <number>) (b <integer>))
(if (zero? b)
0
(+ a (multiply-1 a (- b 1))))))
This function uses time proportional to the magnitude of b. In other words, we mean that there is some constant c so that the function takes no more than cb units of time. For this reason, we say that the runtime of this function is linear in b.
The asymptotic behaviour of a function f refers to the growth of f(n) as n becomes large. When f(n) describes the running time of an algorithm on a problem of size n, we typically ignore small values of n, since we are usually interested in estimating how slow the program will be on large inputs. A typical rule of thumb is:
The slower the asymptotic growth rate, the better the algorithm.
(Although this is often not the whole story). By this measure, a linear algorithm is better than a quadratic algorithm. That is because for any positive constants c and d, there is always some value of n at which the magnitude of cn2 overtakes dn (in this case, whenever n > d/c). For moderate values of n, the quadratic algorithm could very well take less time than the linear one, for instance if c is significantly smaller than d. However, the linear algorithm will always be better for sufficiently large inputs.
One way of gauging the runtime of an algorithm is to say that f(n) is an upper bound on the running time for any input of size n. This is called worst-case analysis. The algorithm might well take less time on some inputs of size n, but it doesn't matter -- if the algorithm takes n2 steps on any single input of size n, it's still a quadratic algorithm.
A popular alternative to worst-case analysis is called average-case analysis. Here, we do not look for an upper bound on the worst-case running time, but instead try to calculate the expected time spent on a randomly chosen input. This type of analysis is typically harder, since it usually involves probabilistic arguments, and often requires assumptions about the distribution of inputs that may be difficult to justify.
In estimating the running time of multiply-1 (or any other procedure) we don't know (or care) what the constant c is. We know that it is a constant of moderate size, but other than that it is not important; we have enough evidence from the asymptotic analysis to know that a logarithmic time algorithm for multiplication (see below) is faster than the linear version, even though the constants may differ somewhat. (This does not alway hold, the constants can sometimes make a difference, but it has been a good enough rule-of-thumb that it is quite widely applied.)
We may not even be able to measure the constant c directly. For example, we may know that a given expression of the language, such as if takes a constant number of machine instructions, but we may not know exactly how many. Moreover, the same sequence of instructions executed on a Pentium will take less time than on a 486 (although the difference will be roughly a constant factor). So these estimates are usually only accurate up to a constant factor. For these reasons, we usually ignore constant factors in comparing asymptotic running times.
Computer scientists have developed a convenient notation for hiding the constant factor. We write O(n) (read: "order n") instead of "cn for some constant c." Thus an algorithm is said to be O(n) or linear time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn on inputs of size n. An algorithm is said to be O(n2) or quadratic time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn2 on inputs of size n. O(1) means constant time.
Polynomial time means nO(1), or nc for some constant c. Thus any constant, linear, quadratic, or cubic (O(n3)) time algorithm is a polynomial-time algorithm.
This is called big-O notation. It captures the important differences in the asymptotic growth rates of functions.
One important advantage of big-O notation is that it makes algorithms much easier to analyze, since we can conveniently ignore low-order terms. For example, an algorithm that runs in time
10n3 + 24n2 + 3n log n + 144
is still a cubic algorithm, since
10n3 + 24n2 + 3n log n + 144
≤
10n3 + 24n3 + 3n3 + 144n3
≤ (10 + 24 + 3 + 144)n3
=
O(n3).
Of course, since we are ignoring constant factors, any two linear algorithms will be considered equally good by this measure. There may even be some situations in which the constant is so huge in a linear algorithm that even an exponential algorithm with a small constant may be preferable in practice. This is a valid criticism of asymptotic analysis and big-O notation. However, as a rule of thumb it has served us well. Just be aware that it is only a rule of thumb -- the asymptotically optimal algorithm is not necessarily the best one.
Some common orders of growth seen often in complexity analysis are
| O(1) | constant |
| O(log n) | logarithmic |
| O(n) | linear |
| O(n log n) | "n log n" |
| O(n2) | quadratic |
| O(n3) | cubic |
| nO(1) | polynomial |
| 2O(n) | exponential |
Here log means log2 or the logarithm to the base 2, although it doesn't really matter since logarithms to different bases differ by a constant factor. Note also that 2O(n) and O(2n) are not the same!
Let f and g be functions from positive integers to positive integers. We say f = O(g) (read: "f is order g") if there exists a fixed constant c such that for all n,
f(n) < cg(n).
Equivalently, f is O(g) if the function f(n)/g(n) is bounded above.
We say f = o(g) (read: "f is little-o of g") if for all arbitrarily small real c > 0, for all but perhaps finitely many n,
f(n) < cg(n).
Equivalently, f is o(g) if the function f(n)/g(n) tends to 0 as n tends to infinity.
Here are some examples:
log(n log n / n1+c)
= log(n log n) - log(n1+c)
= log n + log log n - (1+c)log n
= log log n - c log n
and observe that it tends to negative infinity.
We now introduce some convenient rules for manipulating expressions involving order notation. These rules, which we state without proof, are useful for working with orders of growth:
Now we can use these ideas to analyze the asymptotic running time of Scheme procedures. The use of order notation can greatly simplify our task here. We assume that the primitive operations of our language, such as arithmetic operations, tests, and primitives like cons, car, and cdr all take constant time.
Consider the following multiplication routine:
(define multiply-1
(lambda ((a <number>) (b <integer>))
(if (zero? b)
0
(+ a (multiply-1 a (sub1 b))))))
What is the order of growth of the time required by multiply-1 as a function of n, where n is the magnitude of the parameter b? Note that the "size" of a number can be measured either in terms of its magnitude or in terms of the number of digits (the space it takes to write the number down). Often the number of digits is used, but here we use the magnitude. Note that it takes only about log10 x digits to write down a number of magnitude x, thus these two measures are very different. Make sure you know which one is being used.
We assume that all the primitive operations in the multiply-1 procedure (zero?, if, +, and dec) and the overhead for procedure calls take constant time. Thus if n = 0, the procedure takes constant time. If n > 0, the time taken on an input of magnitude n is constant time plus the time taken by the recursive call on n-1. Thus the running time T(n) of multiply-1 is a solution of
| T(n) = T(n-1) + O(1) | for n > 0 |
| T(0) = O(1) |
This is called a recurrence relation. It simply states that the time to multiply a number a by another number b of size n > 0 is the time required to multiply a by a number of size n-1 plus a constant amount of work (the primitive operations performed). Furthermore, the time to multiply a by zero is a constant (only constant-time primitives are involved). In other words, there are constants c and d such that T(n) satisfies
| T(n) = T(n-1) + c | for n > 0 |
| T(0) = d |
This more specific recurrence relation has a unique closed form solution, namely
T(n) = d + cn
which is O(n), so the algorithm is linear in the magnitude of its second argument. One can obtain this equation by generalizing from small values of n, then prove that it is indeed a solution to the recurrence relation by induction on n.
Now consider the following procedure for multiplying two numbers:
(define multiply-2
(lambda ((a <number>) (b <integer>))
(cond ((zero? b) 0)
((even? b)
(multiply-2 (* 2 a) (quotient b 2)))
(else
(+ a (multiply-2 a (sub1 b)))))))
Again we want an expression for the running time in terms of n, the magnitude of the parameter b. The recurrence for this problem is more complicated than the previous one:
| T(n) = T(n-1) + O(1) | if n > 0 and n is odd |
| T(n) = T(n/2) + O(1) | if n > 0 and n is even |
| T(0) = O(1) |
We somehow need to figure out how often the first versus the second branch of this recurrence will be taken. It's easy if n is a power of two, i.e. if n = 2m for some integer m. In this case, the second branch of will only get taken when n = 1, because 2m is even except when m = 0, i.e. when n = 1. Note further that T(1) = O(1) because T(1) = T(0) + O(1) = O(1) + O(1) = O(1). Thus, for this special case we get the recurrence
| T(n) = T(n/2) + O(1) | if n > 0 and n is a power of 2 |
| T(0) = O(1) |
or
| T(n) = T(n/2) + c | for n > 0 and n is a power of 2 |
| T(0) = d |
for some constants c and d. The closed form solution of this is, for powers of 2,
T(n) = d + c log2 n
which is O(log n).
What if n is not a power of 2? The running time is still O(log n) even in this more general case. Intuitively, this is because if n is odd, then n-1 is even, so on the next recursive call the input will be halved. Thus the input is halved at least once in every two recursive calls, which is all you need to get O(log n).
A good way to handle this formally is to charge the cost of a call to multiply-2 on an odd input to the recursive call on an even input that must immediately follow it. We reason as follows: on an even input n, the cost is the cost of the recursive call on n/2 plus a constant, or
T(n) = T(n/2) + O(1)
as before. On an odd input n, we recursively call the procedure on n-1, which is even, so we immediately call the procedure again on (n-1)/2. Thus the total cost on an odd input is the cost of the recursive call on (n-1)/2 plus a constant. In this case we get
T(n) = T((n-1)/2) + O(1)
In either case,
T(n) ≤ T(n/2) + O(1)
which has the solution O(log n). This approach is more or less the same as explicitly unwinding the cond clause that handles odd inputs:
(define multiply-2
(lambda ((a <number>) (b <integer>))
(cond ((zero? b) 0)
((even? b)
(multiply-2 (* 2 a) (quotient b 2)))
(else
(+ a (multiply-2 (* 2 a) (quotient (sub1 b) 2)))))))
then analyzing the rewritten program, without actually doing the rewriting.
Charging one operation to another (bounding the number of times one thing can happen by the number of times that another thing happens) is a common technique for analyzing the running time of complicated algorithms.
Order notation is a useful tool, and should not be thought of as being just a theoretical exercise. For example, the practical difference in running times between the logarithmic multiply-2 and the linear multiply-1 is noticeable even for moderate values of n. With the version of Scheme on my Mac, (multiply-1 100 100) takes 16 milliseconds and (multiply-2 100 100) takes less than 1 millisecond -- not a big difference, overall, and certainly not enough to notice. However, when b = 10,000, multiply-1 takes 8.65 seconds whereas multiply-2 takes less than 1 millisecond. This is already quite a huge difference -- about a factor of 9000.
When b = 100,000, multiply-1 runs out of stack space before finishing, because there are too many deferred operations. The multiply-2 procedure still only requires less than a millisecond for this value. n. By the time b gets to around 1050, multiply-2 takes the 16 milliseconds or so needed by multiply-1 on b = 100.
The key points in this handout are: