benchmark n.
[techspeak] An inaccurate measure of computer performance. "In the computer
industry, there are three kinds of lies: lies, damn lies, and benchmarks." Well-known
ones include Whetstone, Dhrystone, Rhealstone (see h), the Gabriel LISP benchmarks
(see gabriel), the SPECmark suite, and LINPACK. See also machoflops, MIPS, smoke
and mirrors.
(http://www.tuxedo.org/~esr/jargon/html/entry/benchmark.html)
There are at least two fundamental objections to benchmarks with regards to applying them to
calculators. The first is that a simple benchmark by its very nature measures a narrow aspect of the
calculator's performance. The second objection is not the fault of the benchmark, but the benchmark
results can be taken well out of context to infer that one calculator is 'better' than another in some
vague, broad sense.
I do not claim that some HP calculator is 'better' than some TI calculator, or vice versa, because such a
claim cannot be made on the basis of a single benchmark. Instead, I examine this benchmark because
it is instructive and interesting as a specific example of a benchmark.
The Savage benchmark is a simple iterative floating-point calculation:
1. Set a = 1
2. Set a = tan(atan(exp(ln(sqrt(a*a)))))+1
3. Repeat step 2 n times
where
tan() is the tangent function
atan() is the arc-tangent function
exp() is the natural-logarithm-base e exponentiation function e^x
ln() is the natural logarithm function
sqrt() is the square root function
n is the number of loop iterations
and the calculation is performed with the angle mode in radians.
The two results for this benchmark are relative accuracy and execution time. The relative accuracy is
(a-(n+1))/(n+1), where a is the final value of a. Note that relative accuracy can be interpreted as the
number of correct significant digits. If the relative accuracy is 1E-9, then we have nearly nine accurate
significant digits. In general, the execution time is specified as time/n, or the time for each iteration. If
the same number of iterations is used for both calculators, then the total execution time can be
reported and compared.
If all the functions in the benchmark calculation were performed with no loss of precision, the relative
accuracy would be zero, since a reduces to n+1.
Benchmarks can be grouped in two classes, 'synthetic' and 'live'. A live benchmark attempts to
simulate a realistic mix of the all the operations the user would perform with the calculator. A synthetic
benchmark attempts to measure a much more narrow aspect of performance. The Savage benchmark
is clearly a synthetic benchmark, because it only measures a few functions out of the hundreds
available on the calculator. Further, it is particularly 'synthetic' because we would never go to the
trouble to actually calculate this expression as a real problem: we know that the final result is n+1.
6 - 84