This penalty method is based on the paper "Penalty and front-fixing methods for the numerical solution of American option problems" from Nielsen, Skavhaug, and Tveito.

The following webMathematica example is an extension of the backwards parabolic PDE solver method with adaptive grid using NDSolve which I have contributed to the new Mathematica Cookbook by Sal Mangano, published by O'Reilly. (recipe 14.8, page 574) The method presented in recipe 14.8 only prices European options, but with the penalty method shown below it is effectively possible to price American options like European options, and NDSolve provides the very flexible method of lines for parabolic problems, so a solution for this (nonlinear backwards parabolic) problem based on NDSolve remains the method of choice.

In the following the parameter sensitivities are demonstrated with webMathematica. A more rigorous explanation of the theory behind penalty methods is shown further down on this page.

The penalty parameters used here are: C is 10^6, epsilon is 10^-10.

The strike k is 12, time goes from 0 to 1 year in steps of 1/12 of a year (one month).

The grid used is

It is constructed so that the grid points are the densest at the strike, and then from 2k through 3k constant step sizes are used.

In the following you can vary the degree parameter (must be an even number) to change how dense the points near the strike are, compared to the points away from the strike.

Theoretical Explanation of the Penalty Method

To solve a PDE system, like the B/S PDE system, the PDE and three conditions are needed, two conditions in space and one in time. That makes it a Dirichlet problem. There are others, e. g. with a derivative condition it's called a von Neumann problem. This is not done here, I solve the traditional
B/S PDE with Dirichlet conditions.
That leads to an "open boundary value problem" or "moving boundary value problem", because the point between the normal price process region and the intrinsic value region (can be thought of as "European region" vs "American region" or "not in exercise region" vs "early exercie region", whatever terminology is used for that) is not known, but would need to be known to change the pricing method. So the boundary of the PDE process is "open", it's unknown, hence the name "open boundary value problem". Wikipedia article:
Boundary Value Problem

Penalty methods are nothing new. They have been used for centuries, including on applications in PDE solution methods. Here is a Mathematica demonstration I implemented that uses a penalty method that solves Linear Optimization Problems:
Affine Scaling Interior Point method

The European PDE is
PDE = 0
and the American partial differential inequality is
PDE ≤ 0

with PDE being a shorthand for the B/S PDE term. So in the European case it is an equality, in the American case it is an inequality. One way to eliminate the problem that the open boundary is not known although it has to be dealt with (because it divides the region into two subintervals where the PDE shows different behavior), is to transform the problem into what is called a "linear complementarity problem". Basically, a product of two differences is created, and that difference is set to 0. That way at least one of the two factors is forced to zero (and when a difference is 0, the terms are equal), it's impossible that BOTH factors are different from 0, because then the product is no longer 0. So with
(a-b) (c-d) = 0
a = b or c = d holds (or both), it's impossible that a is different from b AND c different from d at the same time, because then the product could no longer be 0 (hence the term "complementarity problem", they "complement" each other in the sense that when one is different from 0, the other must be zero, but never both non-zero.
Wikipedia article:
Linear Complementarity Problem

Also, this is nothing new, this has been done in mathematics for centuries.
What is now done is adding a penalty term to the European PDE and set it EQUAL to 0 (although in the American case it has to be less than 0). If the penalty term is chosen in the right manner, and I'll explain the term in a moment, a term is constructed that is basically 0 (mathematicians use "epsilon" for such cases) in the European case, because for PDE=0 it's already the case that the PDE is 0, so PDE+penaltyterm=0 must mean penaltyterm must be 0, or it's no longer the European PDE case. In the American case, however, the penaltyterm needs to go to -infinity (or infinity, it doesn't matter in which direction it "runs away"). In that case the American condition is enforced, because the PDE needs to be ≤ 0.
Now, the unknown boundary is exactly the point where the European region touches the American region, and the pricing method would have to change. So by adding the penaltyterm the PDE solver effectively switches the method without ever having computed the open boundary (which was the root of the problem, because there is no closed-form solution for it, and to find it algorithmically is expensive).

Now use a penalty term like
epsilon=10^-10
C=10^5
penaltyterm=(epsilon C) / (epsilon-(K-S))

The actual numbers epsilon and C are not very important, it only needs to be ensured that their product is sufficiently small (10^-10 10^5 = 10^-5, a proxy for 0), but C alone is "big" (a proxy for infinity). If now we are in the European region, K is different from S, their difference is some number (unimportant what it is), subtracting from epsilon is also irrelevant, epsilon C is 10^-5 (proxy for 0), so the whole term is basically 0. That's what we wanted in the European case, because PDE = 0. If we are in the American region, K equals S, the difference is 0, epsilon cancels against epsilon, so the whole term is C, which is 10^5 (proxy for infinity).
And there you have it. By adding a certain, well-crafted penalty term to the European B/S PDE, you can price American. Now all that is needed is a backwards parabolic solver. We use Mathematica's NDSolve function for that. It uses the "method of lines" internally, which is a very general method that can be used to solve all PDEs, including elliptic and hyperbolic problems (not always efficient, but for backwards parabolic problems it is).

Comparison to Traditional Tree Methods
Now, unlike most option pricing methods, which compute option by option (take a tree, computation is submitted to get an approximation for ONE option, and for another option the algorithm has to be submitted again), with the penalty method a whole option SURFACE is obtained! A good PDE solver will store its output in memory and not just produce ONE number. The output is an array of values, an array of option prices for every space/time coordinate. Now all that is needed is to "look up" the option price that is required by reading the value for the space/time coordinate from memory. It's a discrete grid, so it's necessary to interpolate in case there is no exact node in the grid for the stock price and time to expiration you need, but then a simple interpolation using the neighbors is done (again here's where Mathematica shines, because the return from NDSolve is already an InterpolationObject, you can read off values directly from the InterpolationObject, no need for you to implement any interpolation functions, M does that for you).

Greeks
The "standard greeks" also come for free with that, the information for delta, gamma, and theta is contained in the grid structure returned from the PDE solver. As every node in the grid has neighbors (exceptions are only at the boundary, so its "neighbor towards the interior" is taken), it's trivial to compute any derivative in terms of space and time, and that is delta, gamma, and theta. For vega and rho it's necessary to submit again to get a new interpolation back, because rate and volatility are *shape* defining inputs, price and time to expiration are not. But, if you compare that with a traditional tree method, it's necessary to submit a new tree computation for every greek, and multiple times so, including for the "standard greeks", not just vega and rho. And it's necessary to submit at least two trees to get one derivative value, if the simplest of linear approximations is used, the secant rule. The derivative approximation for the first derivative is (f(x+h)-f(x-h))/(2h). So it becomes necessary to submit two times, one for f(x+h) and one for f(x-h), and that is just for the simple secant rule, which is the simplest and least exact approximation (it's first order, and for a second-order approximation 3 f terms are needed). For higher order approximations more f terms are required, so the running time grows. And this provides only ONE greek. Thus, to get delta, gamma, theta, rho, and vega it's now necessary to submit 10 trees. And that provides the five greeks ONLY for one
price/time pair! And trees are expensive! If the five greeks are needed for another price/time pair, 10 trees have to be submitted again and computed with the secant rule, and so forth. If a second-order approximation is desired, 15 trees have to be submitted and computed with that for the approximation. And unless this is cached ("pre-populate in memory"), this is computed "on the fly", live, while you are trading. Slow, slow, slow. The interpolation object from the solver resides in memory, so a trading application can simply read off the values when needed. By the time the option price (or a greek) is needed, it's already computed, waiting to be read off from memory.

NDSolve Performance
To compute the entire surface with M's NDSolve (method of lines used internally) always took around 47 ms on a 2003 Pentium computer with Windows 2000. I've seen tree implementations that took 100 - 200 ms, and as said, that's just ONE price, and not a whole grid of values!

Precision
The B/S PDE makes no stipulations as to the complexity of the terms S, K, T, rho, sigma. Usually pricing models assume vol and rate are constant numbers. This is hardly realistic. As we all know, rates are a curve ("rate curve", "yield curve"), and vol usually exhibits a "smile" or other pattern. The B/S PDE works for these non-constant / function-based expressions for rate and vol as well. It's merely a description of the process, the PDE itself doesn't require any term to be constant (the solvers people implement usually do, because they can't handle non-constant / function-based expressions for vol and rate). The PDE itself was derived with arbitrary S, K, T, rho, and sigma (some of them probably need to be positive). The method of lines works always, and allows us to use functions for them. So, in the above I was using a 1dim rate curve and a 2dim vol surface, and that allowed me to model things extremely realistically. I used a real rate curve (also usually derived as an interpolation from some node points) and an interpolation of values for the vol surface, and voilà!
NDSolve accepts a user-defined grid as input. Given that a plain vanilla option payout profile has a "kink" at the strike (a point where the function is not differentiable), a solver will produce a more precise solution if the grid is made denser around the strike, and as the
B/S PDE is quite well-behaved the grid can be made coarser "in the wings". It also speeds up the method a bit, because NDSolve doesn't overshoot or has trouble with the step-sizing mechanism, but that is an implementation detail specific to NDSolve, so it's not all that relevant from the theoretic viewpoint. However, where a function in the space or time constraints is not continuous or has a "kink", the grid SHOULD be made denser (for precision), and where the function is smooth it should be made coarse (for speed). One gets oscillations and other undesired artefacts in the output near the strike if the grid is uniform (constant spacing everywhere), which lowers precision, and in some cases can even cause complete blow-up. One has to keep in mind that B/S PDE, the O/U PDE, and the heat equation are backwards in time, which means that the problem can never be solved GENERALLY. These problems are sufficiently benign, but for extreme parameter inputs any backwards parabolic solver will blow up, even on the B/S and O/U PDEs (and the heat equation, which is equivalent).

Flexibility
This penalty method is amenable to all problems that can be priced with the B/S PDE. Likewise, it can be used for pricing with the O/U PDE (Ornstein/Uhlenbeck), for example to price Eurodollar futures, where returns are not log-normally distributed, but normally. In that case it's just necessary to replace the B/S PDE with the O/U PDE, and for futures it is necessary to replace carry with 0, because futures don't have carry. Nat gas and crude exhibit "volatility spike-up" as time to expiration approaches. The vols go up with decreasing time to expiration. Eurodollars show the opposite, that is called "vol roll-down": the vols go down towards 0 as expiration approaches. Stocks are wildly different, some stocks increase their vols, some decrease as time goes to 0. It's now possible to model vol spike-up or vol roll-down with appropriately chosen functions, for example obtained as curve fittings/regressions and interpolations from historic vol data. In fact, crude and nat gas also show a lot of seasonal behavior in their vols, but their "month-specific" vol behavior is usually the same. What that means is that the vol behavior of all Jan expirations of the past n years is always quite similar, but the Jan vol behavior is very different from, say, the Sep expiration vol behavior. So we can construct a "Jan vol function" from historic data of Jan expirations, a "Feb vol function" from historic data of Feb expirations, asf. That makes the modelling very precise, even including seasonality in the pricing. Of course, using data from the past 5 to 10 years to construct a vol function is an assumption that can be wrong. Energy prices have gone beserk in the past years, that was causing vol behaviors to change accordingly. But, it is still MUCH better than using a CONSTANT number!

Speeding up with L2 Norm
Another way to model this is by using the L2 norm as a replacement of the vol function. This makes the process to solve the PDE much faster (because it is a constant number), but the L2 norm does, of course, not represent the behavior of the function as accurately as the function itself. One has to study the particular case to determine if the increase in speed is worth the loss in precision.

Final Comment
It should be added that the new PDE with the penalty term is no longer linear. The penalty term makes it a non-linear problem. The B/S PDE is of the type "linear backwards parabolic", and the penalty term destroys linearity. But a PDE solver doesn't require the PDE to be linear. The method of lines works on non-linear problems as well.

Literature
Nielsen, Skavhaug, Tveito: Penalty and front-fixing methods for the numerical solution of American option problems
several papers by Vetzal and Forsyth

This website is powered by webMathematica.
Please visit www.webMathematica.com to find out how you can empower your website with Mathematica.

Copyright (c) 2010 Andreas Lauschke Consulting. All Rights Reserved.