# Andreas Lauschke Consulting

## Convergence Acceleration with Canonical Contractions of Continued Fractions

Oftentimes the exact value of a function is not required and it can be difficult or expensive (time-consuming) to calculate. On today's computing platforms the operations +,-,*, and / are fast and efficient. Other functions, e. g. log or exp or sqrt are expensive.

Of course, your mileage will vary. On today's computing platforms the calculation of a log or exp or sqrt is not slow in ABSOLUTE terms. A modern CPU can do this in a few milliseconds. But when hundreds of millions of such calculations are required, e. g. in problems of simulation/sampling or data mining, the problem starts growing. Hundreds of millions of milliseconds *DO* add up!

I have seen in numerous real-life cases in large institutions that when people found that their simulation or sampling ended up getting too slow, the usual thing happened: "We have to buy faster computers" or "We have to buy a computing grid". For decades it has been the path of least resistance to simply "throw more hardware at it" when things became too slow (tax codes actually make this quite rewarding because computer hardware can be depreciated at accelerated rates in most countries). This is not a rant against grid computing, in fact, I am a friend of it, and in many cases grid computing is the only viable solution, but I have seen numerous cases where "throwing more hardware at it" would not have been necessary and was done because of lack of thinking. Oftentimes better math is cheaper and cleaner for the environment than bigger hardware technology and more electricity to power it!

Why are Continued Fractions so cool? What's their advantage?

Continued Fractions have properties that other approximation approaches and in general, iterated function systems, do not possess. Just to mention a few:

• The so-called Contraction Theorems make it possible to contract continued fractions. This is like "missing out several steps in between". It's like when running marathon you can make steps that are twice, four times, eight times as big, which, however, require more time to make. But if the time it takes to make steps twice as big is less than twice the time for "normal" steps, you gain speed!
• Continued Fractions have better convergence properties than power series. Whereas many power series for functions converge only in a tight convergence radius between the first poles near the evolution point in a real interval, a continued fraction theorem guarantees that for all meromorphic functions that have a continued fraction representation, this continued fraction converges EVERYWHERE in the COMPLEX plane EXCEPT at the poles. (So instead of a real interval you get the whole complex plane minus a nullset!)
• As already mentioned above, continued fractions are cheap to compute. They only require the four arithmetic operations +,-,*,/. Sometimes they also require ^ (exponentiation), but whenever they do, for sure the power series requires even more of these!
• Continued Fractions usually are numerically more stable than power series. Consider that in a power series every term consists of a fraction as a factor in front of a monomial, and a monomial that has an exponent. As more and more terms are needed in a power series to improve the approximation, the factorials in the denominators of the factorials get bigger and the exponents of the monomials get bigger. Together with the fact that power series cannot capture poles and convergence towards constants (see next item) this disadvantage becomes even worse as in problems for which power series are unsuitable, ever more terms are required to try to "improve" a bad approximation, resulting in bigger and bigger exponents and monomials and factorials and denominators! As an example, just compare the first few terms of the Taylor series for the tangent function with the first few terms of the simple continued fraction expansion. You need to calculate the Bernoulli numbers to compute
• which converges quite slowly and doesn't capture the poles, and for the continued fraction you only need

 or equivalently

which converge quite fast and capture the poles (two poles per step). And then with the series you only get convergence between -Pi/2 and Pi/2, whereas the continued fraction expansion converges in the whole complex plane (except at the poles).

• Continued Fractions fall into the realm of rational function approximation (again, unlike power series!). x can appear in a denominator, which makes it possible to capture poles. Power series cannot capture poles/hyperbolas or convergence towards a constant as a polynomial has to move towards +/- Infinity as x moves towards +/- Infinity. For many functions in real life power series are downright unsuitable. But:
• All series, and not just power series, can be converted to continued fractions. This opens a path to take an existing series, convert it to a continued fraction, and try to improve the convergence speed with a contraction theorem or other continued fraction "tool"!
• Many continued fractions collapse to the Pade approximation, which some people find easier to understand (subjective viewpoint).
• The Backwards-Recursion Algorithm

The evaluations of the continued fraction expansions presented here use the backward recurrence method. This method has been shown to be extremely stable for most continued fraction expansions, which is extremely important on numerical platforms that incur truncation/round-off error due to the limitations of machine precision. It can be shown that the backward recurrence method ("from tail to head") is vastly more stable (even self-correcting) than the forward recurrence method ("from head to tail") for two important classes of continued fractions: the Stieltjes continued fractions (which includes the C-fractions) and those that fulfill the parabolic convergence region theorem. Several function classes with known Stieltjes continued fraction expansions include: exponential integrals, incomplete gamma functions, logarithms of gamma functions, the error function, ratios of successive Bessel functions of the first kind, Euler's hypergeometric function, as well as various elementary transcendental functions. The forward recurrence method (which solves a second-order linear difference equation), however, can be computationally more efficient due to the carry-over of results from one step to the next, which is a property the backward recurrence method does not possess.

The backward recurrence method of the continued fraction expansion is also more stable than its conversion to a Pade approximation (even when several forms of the Horner form of the numerator and denominator polynomials are used), which is very important on strictly numerical platforms.

Aren't there better approximations with other functions? If I don't limit myself to +,-,*,/, couldn't I come closer?

Absolutely. But you give up speed. Any function that is more involved than these four arithmetic operations is slower, usually MUCH slower. Some of the best approximations use special functions (e. g. hypergeometric functions, exponential integrals, etc.), but these are very expensive to compute. In fact, people seek series and continued fraction approximations to special functions because they are willing to accept a small error for a major speed-up.

What's the relevance of the base-10 logarithmic plot?

There are many ways to measure error. Depending on what the focus is (do you want to measure "goodness" or "badness"?) one should pick a measure that is diagnostic for the object to be studied. Usually when studying convergence behavior it is common in mathematics to look at the maximum error and try to minimize it. The idea is: given that it is a maximum error, if you can minimize or manage that, then you can control the convergence, because all other errors are even smaller. This is the approach to control "badness" and minimize it to ensure the quality of the approximation or convergence.

The point of the base-10 log plot of the relative error measures "goodness" of the approximation. The approximations shown here are so good that the curves on the left comparison plots appear like "on top of one another". It is impossible to see the quality of the approximation because they are virtually identical. So to measure the quality of the approximation we have to "zoom in", and given that the logarithm goes to -Infinity with exponential speed for x->0 we can use this to "amplify" visually the closeness of the approximation. The base-10 log additionally has the advantage that the negative of the number tells us how many decimal digits are correct. Thus, the display is:

Log[10, Abs[1-approximated/actual]].

If the approximation is good, the ratio to the actual number is around 1 (this is in line with what we would expect from a measure for a "relative error" -- a relative error is a ratio), subtracted from 1 gives a number around 0, we take the absolute value of this because the log of a negative number is a complex number (which is not helpful here), and then the log of that number goes towards -Infinity. If the approximation is bad, the ratio is off from 1, subtracted from 1 is off from 0, and then the log of the absolute value of that number will be somewhere around 0. That means the number of correct decimal digits is low (or there are none).

One consequence of this is that really bad approximations don't show using this diagnostic measure. If the approximation is off by powers of 10, the base-10 log will just be a small positive number. But for high quality approximations these can be discarded anyway. When the objective is to find high-quality approximations we need to look at measures for goodness of the approximation, not badness.

For example, look at the pages for the tangent and exponential functions. Already for n=0 the red and blue curves are identical. This is not the case for the gray and green curves. As the red and blue curves appear to be identical, it is necessary to "zoom in" or "amplify visually" the convergence behavior. Looking at the base-10 log plot or the relative error then tells us how many decimal digits are correct for a given x. So when the blue curve is between -40 and -60 it means you get between 40 and 60 correct decimal digits!

Should I always pick the contraction? Isn't it always faster?

By all means, NO! It is up to the individual computing platform and tweaking of compiler and runtime options to find out which one of the two continued fraction expansions is faster for a prescribed accuracy. In the case of the contracted continued fraction the terms to be computed are more complicated, which costs computing time, but fewer steps are required to attain a prescribed accuracy. This is a question that cannot be answered a-priori as compiled computing platforms are too different and the values chosen for compiler and runtime settings are too diverse to make a definitive statement here. However, already for n=0 or n=1 the convergence acceleration of the contracted continued fraction is so significant that the number of additional computations required in every step by the contracted continued fraction is not of importance. On today's computing platforms the standard arithmetic operations +,-,*, and / are "cheap" and fast, which ties in with another advantage of continued fraction expansions: they simply don't need any further arithmetic operations than these four!

For example, using the logarithm example and picking n=0, the comparison is between

and

Both yield the same accuracy, the latter is the contraction of the former, and which one is computed faster depends entirely on the computing platform and compiler and runtime settings. Using simpler terms means having to go "deeper", and going "shallow" means using more complicated terms.

Notice the quality of the approximations. As far out as 30 both formulae above have an error of less than 3% (using only n=0!).

It is left as an exercise to the enthusiastic reader to determine how many terms are needed (and to study their complexity!) in the Taylor series for the natural logarithm with an evolution point of 0 or 1 when an error of 3% at 30 is considered acceptable. (Hint: don't jump out the window. To print this number you'd need more paper than there are trees in Brazil!) Then do the same as a Pade approximation, and you'll agree with me just how much more powerful rational function approximation is! (even when there are no poles)