
This blog, together with
Romain's, is one of the main homes of stories about how
Rcpp can help with getting code to run faster in the context of the
R system for statistical programming and analysis. By making it easier to get already existing C or C++ code
to R, or equally to extend R with new C++ code,
Rcpp can help in
getting stuff done.
And it is often fairly straightforward to do so.
In this context, I have a nice new example. And for once, it is work-related. I generally cannot share too much of what we do there as this is, well,
proprietary, but I have this nice new example. The other day, I was constructing (large) time series of implied volatilities. Implied volatilities
can be thought of as the complement to an option's price: given a price (and all other observables which can be thought of as fixed), we compute an
implied volatility price (typically via the standard
Black-Scholes model).
Given a changed implied volatility, we infer a new price -- see
this
Wikipedia page for more details. In essence, it opens the door to all sorts of arbitrage and relative value pricing adventures.
Now, we observe prices fairly frequently to create somewhat sizeable time series of option prices. And each price corresponds to one matching implied
volatility, and for each such price we have to solve a small and straightforward optimization problem: to compute the implied volatility given the
price. This is usually done with an iterative root finder.
The problem comes from the fact that we have to do this (i) over and over and over for large data sets, and (ii) that there are a number of callbacks
from the (generic) solver to the (standard) option pricer.
So our first approach was to just call the corresponding function
GBSVolatility
from the
fOption package from the trusted
Rmetrics project by Diethelm Wuertz et al. This worked fine, but even with the usual tricks of splitting over
multiple cores/machines, it simply took too long for the resolution and data amount we desired. One of the problems is that this function (which
uses the proper
uniroot
optimizer in R) is not inefficient per se, but simply makes to many function call back to the option pricer as
can be seen from a quick glance at the code. The helper function
.fGBSVolatility
gets called time and time again:
R> GBSVolatility
function (price, TypeFlag = c("c", "p"), S, X, Time, r, b, tol = .Machine$double.eps,
maxiter = 10000)
TypeFlag = TypeFlag[1]
volatility = uniroot(.fGBSVolatility, interval = c(-10, 10),
price = price, TypeFlag = TypeFlag, S = S, X = X, Time = Time,
r = r, b = b, tol = tol, maxiter = maxiter)$root
volatility
<environment: namespace:fOptions>
R>
R> .fGBSVolatility
function (x, price, TypeFlag, S, X, Time, r, b, ...)
GBS = GBSOption(TypeFlag = TypeFlag, S = S, X = X, Time = Time,
r = r, b = b, sigma = x)@price
price - GBS
<environment: namespace:fOptions>
So the next idea was to try the corresponding function from my
RQuantLib package which
brings (parts of)
QuantLib to R. That was seen as been lots faster already. Now, QuantLib is pretty big and so
is RQuantLib, and we felt it may not make sense to install it on a number of machines just for this simple problem. So one evening this week I
noodled around for an hour or two and combined (i) a basic Black/Scholes calculation and (ii) a standard univariate zero finder (both of which can be
found or described in numerous places) to minimize the difference between the observed price and the price given an implied volatility. With about
one hundred lines in C++, I had something which felt fast enough. So today I hooked this into R via a two-line wrapper in quickly-created package using
Rcpp.
I had one more advantage here. For our time series problem, the majority of the parameters (strike, time to maturity, rate, ...) are fixed, so we can
structure the problem to be vectorised right from the start. I cannot share the code or more the details of my new implementation. However, both
GBSVolatility
and
EuropeanOprionImpliedVolatility
are on CRAN (and as I happen to maintain these for
Debian, also just one
sudo apt-get install r-cran-foptions r-cran-rquantlib
away if you're
on
Debian or
Ubuntu). And writing the other solver is really not that
involved.
Anyway, here is the result, courtesy of a quick run via the
rbenchmark package. We create a vector of length
500; the implied volatility computation will be performed at each point (and yes, our time series are much longer indeed). This is replicated 100
times (as is the default for
rbenchmark) for each of the three approaches:
xyz@xxxxxxxx:~$ r xxxxR/packages/xxxxOptions/demo/timing.R
test replications elapsed relative user.self sys.self user.child sys.child
3 zzz(X) 100 0.038 1.000 0.040 0.000 0 0
2 RQL(X) 100 3.657 96.237 3.596 0.060 0 0
1 fOp(X) 100 448.060 11791.053 446.644 1.436 0 0
xyz@xxxxxxxx:~$
The new local solution is denoted by
zzz(X)
. It is already orders of magnitude faster than the
RQL(x)
function using
RQuantLib (which is, I presume, due to my custom solution internalising the loop). And
the new approach is a laughable amount faster than the basic approach (shown as
fOp
) via
fOptions. For one hundred replications of solving implied volatilities for
all elements of a vector of size 500, the slow solution takes about 7.5 minutes --- while the fast solution takes 38 milliseconds. Which comes to a
relative gain of over 11,000.
So sitting down with your C++ compiler to craft a quick one-hundred lines, combining two well-known and tested methods, can reap sizeable benefits. And
Rcpp makes it trivial to call this from
R.