Abstract
Interval arithmetic has traditionally been hampered by low performance of programs written to take advantage of it. I suggest a method based on IEEE Std 7542008, which has the potential of being much faster.
Introduction
Interval arithmetic is one attempt to answer the question of computation accuracy  given a set of floatingpoint operations, how accurate are the results?
Interval arithmetic replaces a single floatingpoint number with a range of numbers which we know contains the value. For example, measuring a 3m x 4m room with a metric tape measure might give results accurate to within 0.5cm, which would be represented as [2.995, 3.005] x [3.995, 4.005]. Any operations on intervals are performed in a manner which preserves this property. For example,
add(a, b) == add([a_inf, a_sup], [b_inf, b_sup]) == [a_inf + b_inf, a_sup + b_sup].
A basic implementation
The basic idea of interval arithmetic is to perform each operation twice – once in “round to infinity” (RD) mode, and once in “round to +infinity” (RU) mode. While this is guaranteed to work, it has the following disadvantages:

Each operation incurs three changes of the rounding mode – to RU, to RD, and back to the default (typically “round to nearest” or RN).

On most modern processors, each of these switches is extremely expensive. For example, on latemodel Intel processors reading the SSE rounding mode is expensive, while setting it is a serializing operation!
The RU approach
An approach that works nicely for some operations is to take advantage of the fact that RD( a ) == RU( a ). In order to do this, we store the interval as [a, b] rather than [a, b]. It is obvious that addition/subtraction may be performed directly with only RU, while multiplication and division may be performed with some additional signbit flipping.
This has the advantages of:
However, it cannot be universally used. For example, sqrt(a) == NaN, not sqrt(a).
The “round to nearest” approach
The problem of interval arithmetic boils down to an issue of selection. Given an infinitelyprecise result A for one of the bounds, we need to round down if we are calculating the lower bound, or round up if we are calculating the upper bound of the interval.
We know that RN( A ) is equal to either RD( A ) or RU( A ), but which?
It turns out that this information can be made available. Furthermore, use of “round to nearest” mode is required in order to make this work.
Exact transforms
Given two floatingpoint numbers a, b, it is possible to calculate exact transforms as follows:
 TwoSum()  (a, b) are transformed into (s,e) such that s = RN(a + b) and e = a + b  s
s = RN(a + b)
a' = RN(s  b)
b' = RN(s  a')
da = RN(a  a')
db = RN(b  b')
e = RN(da + db)
The TwoSum() algorithm requires 6 floatingpoint operations.
 TwoProd  (a, b) are transformed into (p, e) such that p = RN(a * b) and e = a * b  p
Split(x, n)
Require: C = 2^n + 1, where n = (bits in mantissa + 1)/2
t1 = RN(C * x)
t2 = RN(x  t1)
xh = RN(t1 + t2)
xl = RN(x  xh)
TwoProd(x, y)
(xh, xl) = Split(x, s)
(yh, yl) = Split(y, s)
p = RN(x * y)
t1 = RN(p + RN(xh * yh))
t2 = RN(t1 + RN(xh * yl))
t3 = RN(t2 + RN(xl * yh))
e = RN(t3 + RN(xl * yl))
The TwoProd() algorithm requires 17 floatingpoint operations
If e is negative, then RN( a + b ) == RU( a + b ), and RD( a + b ) == prev( s ). If e is positive, then RN( a + b ) == RD( a + b ), and RU( a + b ) == succ( s ). Similar arguments apply to TwoProd().
Basic Functions
The TwoSum() and TwoProd() procedures are sufficient for performing interval addition/subtraction and multiplication. If an fma() operation is present (it is mandatory under IEEE 7542008), it is possible to calculate the following functions:
p = RN(a * b)
e = fma(a, b, p)
The TwoProdFMA() algorithm performs a TwoProd() with 2 floatingpoint operations, instead of 17.
 Division()  q = RN(a / b); r = fma(q, b, a)
 Reciprocal()  q = RN(1.0 / b); r = fma(q, b, 1.0)
 Sqrt()  q = RN( sqrt(a) ); r = fma(q, q, a)
If r is negative, then the result = RN( A ) == RU( A ) and RD( A ) == prev( result ). If r is positive, then the result = RN( A ) == RD( A ) and RU( A ) == succ( result ).
Note that:
 prev(x) and succ(x) are required by IEEE Std 754, and are implemented in C++ 2011 as std::nextafter()
 If an fma() operation is not present in hardware, it may be simulated in software by using the TwoProd() and TwoSum() operations. This requires a total of 35 floatingpoint operations – 17 for TwoProd(), and 18 for the 3 TwoSum() operations required.
Testing
The following C++ implementations of interval arithmetic were tested:

“Basic” interval arithmetic – sets the rounding mode to RU, RD, and default as necessary

“RU” interval arithmetic – stores [a, b] as [a, b], eliminating the requirement to set the rounding mode to RD in many cases.

“round to nearest” interval arithmetic – implements the algorithms defined in this paper.
In all cases, SIMD instructions were used if available and useful.
The test involves 100,000,000 operations with random operands, as follows:
 Addition
 Multiplication
 division (0.0 not included in the denominator’s interval)
 square sqr(x) = (x*x)
 square root
 hypot(x, y) = sqrt(sqr(x), sqr(y)).
This hypot() function was chosen because it includes addition (which may be optimized by the RU method), squaring (which may also optimized by the RU method), and square root (which may be optimized only by the “round to nearest” method). It is also easily calculated in straightforward C++ code.
The code was compiled using Visual Studio 2013 in x64 Release mode, using the /fp:strict compiler option. Note that use of this option is essential if accurate results are to be obtained.
The results (in nanoseconds per operation) on an Intel Core i7 2670QM processor are as follows:
Speed Test Results
 Basic
 RU
 Round to nearest

Addition
 104
 79
 25

Multiplication
 230
 194
 61

Division
 280
 217
 94

Square
 238
 185
 42

Square Root
 139
 83
 44

hypot()
 804
 702
 173

Notes:

The times are given after subtracting the loop overhead and the time required to generate random inputs.

Unlike the latest Intel processors, this 2670QM processor does not contain fma() in hardware, and must simulate it in software. This significantly reduces the advantage of the “round to nearest” method for multiplication, division, square, and square root operations.
Summary
An interval arithmetic library that uses roundtonearest mode is practical, and on current popular hardware – significantly (3+ times) faster than the directed rounding approach.
Future directions
An interval math package that will include most of the forward functions required by the IEEE1788 Standard for interval arithmetic is currently under development.
References

IEEE Std 7542008

IEEE Std 17882015

Handbook of FloatingPoint Arithmetic, JeanMichel Muller et al., Birkhäuser

Interval Arithmetic: from Principles to Implementation, T. Hickey, Q. Ju, M.H. van Emden

Interval Operations in Rounding to Nearest, S.M. Rump, P. Zimmerman, S. Boldo, G. Melquiond