Representation of Numbers ¶

The number of bits is usually fixed for any given computer. Using binary representation gives us an insufficient range and precision of numbers to do relevant engineering calculations.

Floating Point Numbers

To achieve the range of values needed with the same number of bits, we use floating point numbers or float for short. Instead of utilizing each bit as the coefficient of a power of 2, floats allocate bits to three different parts: the sign indicator, \(s\), which says whether a number is positive or negative; characteristic or exponent, \(c\), which is the power of 2; and the fraction, \(f\), which is the coefficient of the exponent. Almost all platforms map Python floats to the IEEE754 double precision - 64 total bits. 1 bit is allocated to the sign indicator, 11 bits are allocated to the exponent, and 52 bits are allocated to the fraction. With 11 bits allocated to the exponent, this makes 2048 values that this number can take. Since we want to be able to make very precise numbers, we want some of these values to represent negative exponents (i.e., to allow numbers that are between 0 and 1 (base10)). To accomplish this, 1023 is subtracted from the exponent to normalize it. The value subtracted from the exponent is commonly referred to as the bias. The fraction is a number between 1 and 2. In binary, this means that the leading term will always be 1, and, therefore, it is a waste of bits to store it. To save space, the leading 1 is dropped.

Special values:

Note that the exponent \(c\) = 000…000 and \(c\) = 111…111 are reserved for these special cases, which limits the exponent range for the other numbers.

In Python, we could get the float information using the sys package as shown below:

import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

import numpy as np

np.spacing(1e9)

1.1920928955078125e-07

1e9 == (1e9 + np.spacing(1e9)/3)

True

largest = (2**(2046-1023))*((1 + sum(0.5**np.arange(1, 53))))
largest

1.7976931348623157e+308

sys.float_info.max

1.7976931348623157e+308

smallest = (2**(1-1023))*(1+0)
smallest

2.2250738585072014e-308

sys.float_info.min

2.2250738585072014e-308

sys.float_info.max + 2 == sys.float_info.max

True

sys.float_info.max + sys.float_info.max

inf

2**(-1075)

0.0

2**(-1075) == 0

True

2**(-1074)

5e-324

def machineEpsilon():
    eps = 1.0
    while eps + 1.0 != 1.0:
        eps /= 2.0
    return eps
eps = machineEpsilon()
print("Machine epsilon: {}".format(eps))

Machine epsilon: 1.1102230246251565e-16

# # Density of Floating Point Numbers
# This example enumerates all possible floating point nubmers in a floating point system and shows them in a plot to illustrate their density.
import matplotlib.pyplot as pt
import numpy as np

significand_bits = 4
exponent_min = -3
exponent_max = 4

fp_numbers = []
for exp in range(exponent_min, exponent_max+1):
    for sbits in range(0, 2**significand_bits):
        significand = 1 + sbits/2**significand_bits 
        fp_numbers.append(significand * 2**exp)
        
fp_numbers = np.array(fp_numbers)
print(fp_numbers)

pt.plot(fp_numbers, np.ones_like(fp_numbers), "+")
#pt.semilogx(fp_numbers, np.ones_like(fp_numbers), "+")

[ 0.125      0.1328125  0.140625   0.1484375  0.15625    0.1640625
  0.171875   0.1796875  0.1875     0.1953125  0.203125   0.2109375
  0.21875    0.2265625  0.234375   0.2421875  0.25       0.265625
  0.28125    0.296875   0.3125     0.328125   0.34375    0.359375
  0.375      0.390625   0.40625    0.421875   0.4375     0.453125
  0.46875    0.484375   0.5        0.53125    0.5625     0.59375
  0.625      0.65625    0.6875     0.71875    0.75       0.78125
  0.8125     0.84375    0.875      0.90625    0.9375     0.96875
  1.         1.0625     1.125      1.1875     1.25       1.3125
  1.375      1.4375     1.5        1.5625     1.625      1.6875
  1.75       1.8125     1.875      1.9375     2.         2.125
  2.25       2.375      2.5        2.625      2.75       2.875
  3.         3.125      3.25       3.375      3.5        3.625
  3.75       3.875      4.         4.25       4.5        4.75
  5.         5.25       5.5        5.75       6.         6.25
  6.5        6.75       7.         7.25       7.5        7.75
  8.         8.5        9.         9.5       10.        10.5
 11.        11.5       12.        12.5       13.        13.5
 14.        14.5       15.        15.5       16.        17.
 18.        19.        20.        21.        22.        23.
 24.        25.        26.        27.        28.        29.
 30.        31.       ]

# # Floating Point Arithmetic and the Series for the Exponential Function
import numpy as np
import matplotlib.pyplot as pt

# What this example does is sum the series exp(x) 
a = 0.0
x = 1e0 # flip sign
true_f = np.exp(x)
e = []

for i in range(0, 10): # crank up
    d = np.prod(
            np.arange(1, i+1).astype(np.float))    
    # series for exp
    a += x**i / d
    print(a, np.exp(x), x**i / d)    
    e.append(abs(true_f-a)/true_f)

pt.semilogy(e)

1.0 2.718281828459045 1.0
2.0 2.718281828459045 1.0
2.5 2.718281828459045 0.5
2.6666666666666665 2.718281828459045 0.16666666666666666
2.708333333333333 2.718281828459045 0.041666666666666664
2.7166666666666663 2.718281828459045 0.008333333333333333
2.7180555555555554 2.718281828459045 0.001388888888888889
2.7182539682539684 2.718281828459045 0.0001984126984126984
2.71827876984127 2.718281828459045 2.48015873015873e-05
2.7182815255731922 2.718281828459045 2.7557319223985893e-06

# # Floating Point and the Harmonic Series
# 
import numpy as np

n = int(0)
float_type = np.float32
my_sum = float_type(0)

while True:
    n += 1
    last_sum = my_sum
    my_sum += float_type(1 / n)
    
    if n % 200000 == 0:
        print("1/n = %g, sum0 = %g"%(1.0/n, my_sum))

1/n = 5e-06, sum0 = 12.7828
1/n = 2.5e-06, sum0 = 13.4814
1/n = 1.66667e-06, sum0 = 13.8814
1/n = 1.25e-06, sum0 = 14.1666
1/n = 1e-06, sum0 = 14.3574
1/n = 8.33333e-07, sum0 = 14.5481
1/n = 7.14286e-07, sum0 = 14.7388
1/n = 6.25e-07, sum0 = 14.9296
1/n = 5.55556e-07, sum0 = 15.1203
1/n = 5e-07, sum0 = 15.311
1/n = 4.54545e-07, sum0 = 15.4037
1/n = 4.16667e-07, sum0 = 15.4037
1/n = 3.84615e-07, sum0 = 15.4037
1/n = 3.57143e-07, sum0 = 15.4037
1/n = 3.33333e-07, sum0 = 15.4037
1/n = 3.125e-07, sum0 = 15.4037
              .
              .
              .
Traceback (most recent call last):

  File "H:\discoD\Programacion\Python\Errors.py", line 129, in 
    if n % 200000 == 0:

KeyboardInterrupt

# # Picking apart a floating point number
# Never mind the details of this function...

def pretty_print_fp(x):
    print("---------------------------------------------")
    print("Floating point structure for %r" % x)
    print("---------------------------------------------")
    import struct
    s = struct.pack("d", x)

    def get_bit(i):
        byte_nr, bit_nr = divmod(i, 8)
        return int(bool(
            s[byte_nr] & (1 << bit_nr)
            ))

    def get_bits(lsb, count):
        return sum(get_bit(i+lsb)*2**i for i in range(count))

    # https://en.wikipedia.org/wiki/Double_precision_floating-point_format

    print("Sign bit (1:negative):", get_bit(63))
    exponent = get_bits(52, 11)
    print("Stored exponent: %d" % exponent)
    print("Exponent (with offset): %d" % (exponent - 1023))
    fraction = get_bits(0, 52)
    if exponent != 0:
        significand = fraction + 2**52
    else:
        significand = fraction
    print("Significand (binary):", bin(significand)[2:])
    print("Shifted significand:", repr(significand / (2**52)))

pretty_print_fp(3)

# Things to try:
# 
# * Twiddle the sign bit
# * 1,2,4,8
# * 0.5,0.25
# * $2^{\pm 1023}$, $2^{\pm 1024}$
# * `float("nan")`

---------------------------------------------
Floating point structure for 3
---------------------------------------------
Sign bit (1:negative): 0
Stored exponent: 1024
Exponent (with offset): 1
Significand (binary): 11000000000000000000000000000000000000000000000000000
Shifted significand: 1.5

# ****Floating Point Error in Horner's Rule Polynomial Evaluation****
# The following example is taken from *Applied Numerical Linear Algebra* by James Demmel, SIAM 1997.
# 
import numpy as np

#Evalue  polynomial in factored form
def f(x):
    return (x-2.)**9

#coefficients for expanded form
coeffs = np.asarray([-512., 2304., -4608., 5376., -4032., 2016., -672., 144., -18., 1.])

#Evaluate polynomial using coefficients
def p(x):
    return np.inner(coeffs, np.asarray([x**i for i in range(10)]))

#Evaluate Horner's rule for polynomial
def h(x):
    y = 0.
    #[::-1] looks at all elements with stride -1, reversing the order
    for c in coeffs[::-1]:
        y = x*y+c
    return y

#Define 8000 points between 1.92 and 2.08
xpts = 1.92+np.arange(8000.)/50000.

import matplotlib.pyplot as pt

#plot functions evaluated at each point using Horder's rule and using factored form
pt.plot(xpts,[h(x) for x in xpts],label='Evaluation by Horner\'s rule')
pt.plot(xpts,[f(x) for x in xpts],label='Evaluation in factored form')
pt.legend()
pt.show()

import decimal
a, b, c = 0.1, 0.2, 0.3

# ‘Decimal’ object instead of normal float type
dec_a = decimal.Decimal('0.1')
dec_b = decimal.Decimal('0.2')
dec_c = decimal.Decimal('0.3')
print('float a+b-c =',a+b-c)
print('decimal a+b-c =',dec_a+dec_b-dec_c)

float a+b-c = 5.551115123125783e-17
decimal a+b-c = 0.0

# # Conditioning of evaluating tan()
import numpy as np
import matplotlib.pyplot as pt

# Let us estimate the sensitivity of evaluating the $\tan$ function:
x = np.linspace(-5, 5, 1000)
pt.ylim([-10, 10])
pt.plot(x, np.tan(x))

x = np.pi/2 - 0.0001
#x = 0.1
print(x)

print(np.tan(x))

dx = 0.00005
print(np.tan(x+dx))

# ## Condition number estimates
# ### From evaluation data
# 
print(np.abs(np.tan(x+dx) - np.tan(x))/np.abs(np.tan(x)) / (np.abs(dx) / np.abs(x)))

# ### Using the derivative estimate

import sympy as sp

xsym = sp.Symbol("x")

f = sp.tan(xsym)
df = f.diff(xsym)
print(df)

# Evaluate the derivative estimate. Use `.subs(xsym, x)` to substitute in the value of `x`.

print((xsym*df/f).subs(xsym, x))

1.5706963267948966
9999.999966661644
19999.99998335545
31413.926693068603
tan(x)**2 + 1
15706.9633726542

# Round-off error by floating-point arithmetic 
print(4.9 - 4.845 == 0.055)

print(4.9 - 4.845)

print(4.8 - 4.845)

print(0.1 + 0.2 + 0.3 == 0.6)

# Accumulation of round-off error
# If we only do once
print(1 + 1/3 - 1/3)

def add_and_subtract(iterations):
    result = 1
    
    for i in range(iterations):
        result += 1/3

    for i in range(iterations):
        result -= 1/3
    return result

# If we do this 100 times
print(add_and_subtract(100))

# If we do this 1000 times
print(add_and_subtract(1000))

# If we do this 10000 times
print(add_and_subtract(10000))

False
0.055000000000000604
-0.04499999999999993
False
1.0
1.0000000000000002
1.0000000000000064
1.0000000000001166

# # Catastrophic Cancellation
import numpy as np

# Let's make two numbers with very similar magnitude:
x = 1.48234
y = 1.48235

print(x-y)
# Now let's compute their difference in double precision:
x_dbl = np.float64(x)
y_dbl = np.float64(y)
diff_dbl = x_dbl-y_dbl

print(repr(diff_dbl))

# * What would the correct result be?
# -------------
# Can you predict what will happen in single precision?

x_sng = np.float32(x)
y_sng = np.float32(y)
diff_sng = x_sng-y_sng

print(diff_sng)

-1.0000000000065512e-05
-1.0000000000065512e-05
-1.001358e-05

# # Truncation Error vs Rounding Error
# In this example, we'll investigate two common sources of error: Truncation error and rounding error.
# **Task:** Approximate a function (here: a parabola, by a line)
import numpy as np
import matplotlib.pyplot as pt

center = -1
width = 6

def f(x):
    return - x**2 + 3*x

def df(x):
    return -2*x + 3

grid = np.linspace(center-width/2, center+width/2, 100)

fx = f(grid)
pt.plot(grid, fx)
pt.plot(grid, f(center) + df(center) * (grid-center))

pt.xlim([grid[0], grid[-1]])
pt.ylim([np.min(fx), np.max(fx)])

import math
nsides = 6.
length = 1.
pi = length*nsides/2.
print('\t\t  sides',end='\t\t\t')
print('   S       ',end='\t\t\t')
print('pi(calc) ',end='\t\t')
print('diff pi(real-calc)')
print('%15.2f' %nsides,end='\t\t')
print('%.15f' % length,end='\t')
print('%.15f' % pi,end='\t')
print('%.15f' % abs(math.pi-pi))
for i in range(30):
    length = (2. - (4. - length**2)**0.5)**0.5
    nsides *= 2
    pi = length*nsides/2.
#    print('-'*30)
    print('%15.1f' %nsides,end='\t\t')
    print('%.15f' % length,end='\t')
    print('%.15f' % pi,end='\t')
    print('%.15f' % abs(math.pi-pi))

	sides			 S       	     pi(calc) 	      diff pi(real-calc)
           6.00		1.000000000000000	3.000000000000000	0.141592653589793
           12.0		0.517638090205042	3.105828541230250	0.035764112359543
           24.0		0.261052384440103	3.132628613281237	0.008964040308556
           48.0		0.130806258460286	3.139350203046872	0.002242450542921
           96.0		0.065438165643553	3.141031950890530	0.000560702699263
          192.0		0.032723463252972	3.141452472285344	0.000140181304449
          384.0		0.016362279207873	3.141557607911622	0.000035045678171
          768.0		0.008181208052471	3.141583892148936	0.000008761440857
         1536.0		0.004090612582340	3.141590463236762	0.000002190353031
         3072.0		0.002045307360705	3.141592106043048	0.000000547546745
         6144.0		0.001022653813994	3.141592516588155	0.000000137001638
        12288.0		0.000511326923607	3.141592618640789	0.000000034949004
        24576.0		0.000255663463975	3.141592645321216	0.000000008268577
        49152.0		0.000127831731987	3.141592645321216	0.000000008268577
        98304.0		0.000063915865994	3.141592645321216	0.000000008268577
       196608.0		0.000031957932997	3.141592645321216	0.000000008268577
       393216.0		0.000015978971709	3.141593669849427	0.000001016259634
       786432.0		0.000007989482381	3.141592303811738	0.000000349778055
      1572864.0		0.000003994762034	3.141608696224804	0.000016042635011
      3145728.0		0.000001997367121	3.141586839655041	0.000005813934752
      6291456.0		0.000000998711352	3.141674265021758	0.000081611431964
     12582912.0		0.000000499355676	3.141674265021758	0.000081611431964
     25165824.0		0.000000249344118	3.137475099502783	0.004117554087010
     50331648.0		0.000000124672059	3.137475099502783	0.004117554087010
    100663296.0		0.000000061439062	3.092329219213245	0.049263434376548
    201326592.0		0.000000029802322	3.000000000000000	0.141592653589793
    402653184.0		0.000000014901161	3.000000000000000	0.141592653589793
    805306368.0		0.000000000000000	0.000000000000000	3.141592653589793
   1610612736.0		0.000000000000000	0.000000000000000	3.141592653589793
   3221225472.0		0.000000000000000	0.000000000000000	3.141592653589793
   6442450944.0		0.000000000000000	0.000000000000000	3.141592653589793

import math
print('step    \t pi(calc)    \t\t diff')
pi = 0.
numerator = 1./3.**0.5

for n in range(31):
    pi += numerator/(2.*n+1.)*6.
    numerator *= -1./3.
    print('%4d' % n,end='\t')
    print('%.15f' % pi,end='\t')
    print('%.15f' % abs(math.pi-pi))

step    	 pi(calc)    		 diff
   0	3.464101615137755	0.322508961547962
   1	3.079201435678005	0.062391217911788
   2	3.156181471569955	0.014588817980162
   3	3.137852891595681	0.003739761994112
   4	3.142604745663085	0.001012092073292
   5	3.141308785462884	0.000283868126909
   6	3.141674312698838	0.000081659109045
   7	3.141568715941785	0.000023937648008
   8	3.141599773811507	0.000007120221714
   9	3.141590510938081	0.000002142651712
  10	3.141593304503083	0.000000650913289
  11	3.141592454287647	0.000000199302146
  12	3.141592715020381	0.000000061430588
  13	3.141592634547315	0.000000019042478
  14	3.141592659521715	0.000000005931922
  15	3.141592651733998	0.000000001855795
  16	3.141592654172576	0.000000000582783
  17	3.141592653406166	0.000000000183627
  18	3.141592653647827	0.000000000058034
  19	3.141592653571404	0.000000000018389
  20	3.141592653595636	0.000000000005843
  21	3.141592653587935	0.000000000001859
  22	3.141592653590388	0.000000000000595
  23	3.141592653589605	0.000000000000188
  24	3.141592653589855	0.000000000000062
  25	3.141592653589775	0.000000000000018
  26	3.141592653589801	0.000000000000008
  27	3.141592653589792	0.000000000000001
  28	3.141592653589795	0.000000000000002
  29	3.141592653589794	0.000000000000001
  30	3.141592653589794	0.000000000000001

import math

def f(x):  
    return x**2+math.exp(x)+math.log(x)+math.sin(x)
def fp(x): 
    return 2.*x+math.exp(x)+1./x+math.cos(x)

# Starting from h = 1E-2
x, h = 0.5, 1E-2
fp_exact = fp(x)
print('Exact = %.16f\n' % fp_exact)
print ('\th \t\t   Numerico \t\t       diff')
while h>1E-15:
    fp_numeric = (f(x+h) - f(x))/h	
    print('%.1e, ' % h, end=' ')
    print('%.16f, ' % fp_numeric, end=' ')
    print('%.16f' % abs(fp_numeric-fp_exact))
    h /= 10.   # retry with smaller h

Exact = 5.5263038325905010

  h 		Numerico 		diff
1.0e-02,  5.5224259820642496,  0.0038778505262513
1.0e-03,  5.5258912717413011,  0.0004125608491998
1.0e-04,  5.5262623253238274,  0.0000415072666735
1.0e-05,  5.5262996793148380,  0.0000041532756629
1.0e-06,  5.5263034173247396,  0.0000004152657613
1.0e-07,  5.5263037901376313,  0.0000000424528697
1.0e-08,  5.5263038811759193,  0.0000000485854184
1.0e-09,  5.5263038589714579,  0.0000000263809570
1.0e-10,  5.5263038589714579,  0.0000000263809570
1.0e-11,  5.5263127407556549,  0.0000089081651540
1.0e-12,  5.5262461273741783,  0.0000577052163226
1.0e-13,  5.5311311086825290,  0.0048272760920280
1.0e-14,  5.5511151231257818,  0.0248112905352809

import math

def f(x):  
    return x**2+math.exp(x)+math.log(x)+math.sin(x)
def fp(x): 
    return 2.*x+math.exp(x)+1./x+math.cos(x)

x, h = 0.5, 1E-2
fp_exact = fp(x)
print('Exact = %.16f\n' % fp_exact)
print ('  h \t    Numerico     \t       diff')
while h>1E-15:
    fp_numeric = (f(x+h/2.) - f(x-h/2.))/h
    print('%.0e, ' % h, end=' ')
    print('%.16f,' % fp_numeric, end=' ')
    print('%.16f' % abs(fp_numeric-fp_exact))
    h /= 10.

Exact = 5.5263038325905010

  h 	    Numerico     	       diff
1e-02,  5.5263737163485871, 0.0000698837580861
1e-03,  5.5263045313882486, 0.0000006987977477
1e-04,  5.5263038395758635, 0.0000000069853625
1e-05,  5.5263038326591731, 0.0000000000686722
1e-06,  5.5263038325481508, 0.0000000000423501
1e-07,  5.5263038323261062, 0.0000000002643947
1e-08,  5.5263038367669983, 0.0000000041764974
1e-09,  5.5263036369268530, 0.0000001956636480
1e-10,  5.5263038589714579, 0.0000000263809570
1e-11,  5.5263127407556549, 0.0000089081651540
1e-12,  5.5266902165840284, 0.0003863839935274
1e-13,  5.5266902165840284, 0.0003863839935274
1e-14,  5.5511151231257818, 0.0248112905352809

import math

def f(x):  
    return x**2+math.exp(x)+math.log(x)+math.sin(x)
def fp(x): 
    return 2.*x+math.exp(x)+1./x+math.cos(x)

x, h = 0.5, 1E-2
fp_exact = fp(x)
print('Exact = %.16f\n' % fp_exact)
print ('  h \t     Numerico    \t       diff')
while h>1E-15:
    fp_numeric = \
    (8.*f(x+h/4.)+f(x-h/2.)-8.*f(x-h/4.)-f(x+h/2.))/(h*3.)
    print('%.0e, ' % h, end=' ')
    print('%.16f,' % fp_numeric, end=' ')
    print('%.16f' % abs(fp_numeric-fp_exact))
    h /= 10.

Exact = 5.5263038325905010

  h 	     Numerico    	       diff
1e-02,  5.5263038315869801, 0.0000000010035208
1e-03,  5.5263038325903402, 0.0000000000001608
1e-04,  5.5263038325925598, 0.0000000000020588
1e-05,  5.5263038327701954, 0.0000000001796945
1e-06,  5.5263038328442100, 0.0000000002537091
1e-07,  5.5263038249246188, 0.0000000076658822
1e-08,  5.5263037257446959, 0.0000001068458051
1e-09,  5.5263040070011948, 0.0000001744106939
1e-10,  5.5263127407556549, 0.0000089081651540
1e-11,  5.5263571496766408, 0.0000533170861399
1e-12,  5.5258020381643282, 0.0005017944261727
1e-13,  5.5215091758024446, 0.0047946567880564
1e-14,  5.5807210704491190, 0.0544172378586181

Representation of Numbers ¶

Floating Point Numbers

Machine epsilon \(\epsilon_{m}\)

Error in Numerical Methods

Significant digits

Sources of Error

Conditioning

Absolute Condition Number

(Relative) Condition Number

Posedness and Conditioning

Stability and Accuracy

Rounding error in operations