**2's compliment**

Fig 2's compliment representation |

**Positive number**in 2s compliment

sign bit an-1 is zero

n-1 bits are allowed for the magnitude of number so total distinct positive numbers will be 2^(n-1)

numbers range from all zero to all ones(2^0+2^1+2^2 ...2^(n-2)) i.e from 0 to 2^(n-1) - 1

**Negative Number**in 2s compliment

sign bit an-1 is 1

for magnitude of number all bits are used in calculation including the most significant an-1 bit

Calculation of magnitude considers most significant bit as negative and rest as positive so

magnitude is calculated as -2^(n-1)+ sum(0 to n-2)ai*2^i

**smallest number in negative**that can be represented is when all the n-1 bits are zero and only the msb contributes that is -2^(n-1)

**Maximum number in negative**2s compliment is when all n-1 bits are 1 and they reduce the effect of msb, it is -2^(n-1) + (2^n-2 + 2^(n-3) ....2^1+2^0) = -1

**Observations**

single representation of zero, when all n bits are zero

we can represent -2^(n-1) but not +2^(n-1) in 2s compliment

when bits of positive number n are complimented(1s compliment) it becomes -(n+1) in decimal example

+3 is 0011, its compliment 1100 is -4

+7 is 0111 its compliment 1000 is -(8)

To obtain the negation of the number that is +n to -n we need to first take 1s compliment which will give us -(n+1) then we can add single digit to get the required number -n

**negation of zero**will give zero only if we ignore the carry, exception case is for negation of -2^n-1 also where all bits will become zero , we get same number -2^n-1

**Arithmetic Operations in 2s compliment**

**Addition**

discard the extra bit when adding , two 4bit numbers discard the 5 msb if any of result. It is different from overflow which gives incorrect result.

when result is large than can be handled in word size we call it overflow. ALU must signal not to use the result.

example of overflow

add 1001 //-7

1010 //-6

10011 // +3

Overflow can occur whether or not there is carry

overflow occurs when sign of result for two positive numbers or two negative numbers is opposite

**Multiplication**

//booth algorithm

Division

**Overflow**

when adding unsigned binary numbers, overflow occurs when the final carry-out is a 1

In 2's complement, the final carry-out is always ignored, and overflow occurs when there is a sign change

**One's Complement**

used to represent negative numbers

positive numbers same as in sign magnitude system (with first bit zero )

negative number is obtained by inverting the bits of positive number

zero has two representation +0 and -0

range of number sis -(2^n-1 - 1) to (2^n-1 -1)

**Floating Point Representation**

number is represented approx to fixed number of

**significant**digits and scaled using

**exponent**

way to represent scientific numbers

A floating point number consist of

1) significand mantissa or coefficient signed digit string.

significand length determines to which precision number can be represented, prportional to accuracy

2)a signed integer exponent scale factor proportional to range, more bits to the exponent means more range

|---|--------------|---------------------|

|---|--------------|---------------------|

sign(1bit) E(8bit) Significand(23bit)

Fig. 32 bit floating point Format

S*B^e

B is bias, implicit, same for all numbers. bias value is typically 2^e-1 - 1

radix point is assumed to be after MSB bit of signigicand

advantage of biased representation is that non negative floating point numbers can be treated as integers for comparison purpose

**sign**: 0 positive, 1 negative

**exponent**: stored in biased form, that is, bias is subtracted from the field to get true exponent

exponent is 8 bits means number from 0 to 255 with bias it becomes -127 to +128

**Significand:**stored in normalized form, msb should be one and radix point should be after msb

msb one is taken as implicit and need not be stored, so 23 bit of significand can represnt 24 bits

In normalized value of significand lies in [1,2)

**IBM/360 floating point format**

total length 32 bits

exponent 7 bits

fraction 24 bits

bias 2^7-1 = 64 unlike IEEE, 2^n-1 -1 bias

base is 16 unlike IEEE base-2

there are no hidden 1 significand

so normalization is to 0.bbbb only

If number is represented in IBM/360 floating point like

1/0 eeeeeee fffffff..

fraction has no hidden 1.

we get the decimal for the binary representation exponent and raise it to the base 16

(sign) 0.ffff * 16 ^exponent

**IEEE 754 floating point standard**

**single precision**: 1 bit sign, 8 bit exponent, 23 bit fraction (24 bit significand = 1+ fraction) (“1” is implied)

**double precision**: 1 bit sign, 11 bit exponent, 52 bit fraction (53 bit significand)

^{}

Fractional portion of the significand represents a fraction between 0 and 1 (each bit from left to right has weight 2-1, 2-2, …)

Since 0 has no leading 1, it is given the reserved exponent value 0 so that hardware won’t attach a leading 1 to it

Exponent is “biased” (biased notation) to make sorting easier

all 0s is smallest exponent (most negative) all 1s is largest (most positive)

bias of 127 for single precision and 1023 for double precision

(–1)

^{sign}× (1 + fraction) × 2

^{exponent – bias}

Exponent and mantissa of all zeros and ones are special values

**Range of exponents**

exponent of all zero(0) and all ones(255) defines special values

For single precision

it can range from 1 to 254 and in bias we would say -126 to +127

for double precision

-1022 to +1023

**Mantissa number of bits**

A normalized number would add a hidden 1 making the mantissa of 2

24 bit for single precision

53 bit for double precision

**Denomalized or subnormal number**

demormal numbers have hidden bit of significand as 0

have exponent of all zeros

0.0 0 or 1 00000000 00000000000000000000000 (hidden bit is a 0) subnormal 0 or 1 00000000 not all zeros (hidden bit is a 0) normalized 0 or 1 > 0 any bit pattern (hidden bit is a 1)

**Representation of Zero**

Two representation of zero +0 and -0

exponent and mantissa all 0

in hex 0x0000 0000 and 0x8000 0000

**Representation of infinity and NaN**

• +∞ an -∞ are denoted with an exponent of all 1's

s e f +∞ 0 11111111 00000... (0x7f80 0000) -∞ 1 11111111 00000... (0xff80 0000) NaN ? 11111111 (not all zero)

(S is either 0 or 1, E=0xff, and F is anything but all zeros)

significand fraction lies betwee 1/B <= s <=1

where B is base

for base 16 fraction lies betwee 1/16 and 1 inclusive

for base 2 its .5 to 1 inclusive

**Error in representing recurring fractions**

xT – true value

xA – approximate value

True Error in xA (exact value of the error) = E(x) = xT - x A

True Relative Error in xA = xT-xA / xT

True Percentage Relative Error in xA = xT-xA / xT * 100

Problem Consider 16 bit floating point format with 1 bit for sign 5 for exponent and 10 for fraction exponent is in excess 15 representation

a) decimal for number in this format

0 01001 0101000000

Solution

sign bit is positive

exponent is in excess 15 so e+15 = 9

e=-6

so the value is 1.0101

_{2}* 2

^{-6}which is (1 + 1/4 + 1/16)/64 = 2.05078 * 10

^{-2}

b)convert decimal number -50.75 in above representation

50

_{10}= 110010

_{2}and .75

_{10}= .11

_{2}

so -50.75 = -1.1001011 * 2

^{5}

^{ }biased exponent value will be 15 + 5 = 20

_{10 }= 10100

_{2}

leaving the first bit in significand

Number is 1 10100 1001011000

Problem 6 bit wide number represented in 2's compliment what is maximum and minimum number

Solution Minimum when first bit is 1 i.e -2^n-1 = -32

Max number when first bit is zero and sum of all n-1 bits i.e 2^n-1 -1 = 31

Problem Consider bit pattern 1010 1100 1011 0101 0011 0000 0011 1000 what value does it represent in

1)2's compliment

since number is negative(msb is 1) we find the value by

1)subtracting 1

2)fliping bits

2)single precision floating point

sign = negative

exponent 8 bits after sign 010 11001, its biased so we have to subtract 127 from it

i.e 89-127 = -38

significand is remaining bits with an implicit 1 so

1.011 0101 0011 0000 0011 1000

number is - 1.011 0101 0011 0000 0011 1000 * 2^-38

Problem Convert - 1101001.10011 to IEEE format

Solution

a) Normalize : -1.10100110011 * 2 ^6

b)exponent = 6+127= 133 = 10000101

significand 10100110011 and remaining bits goes zero

Problem Consider IBM base-16 single precision format

represent decimal number 2.25 in this format

Solution

1)get sign bit, 02)express number in binary

(2.25)d = (10.01)b

Base 16 format means number is represented

**Offset Binary Numbers (Excess-K)**

all ones correspond to maximal positive value 111111...

**excess 2^n-1**

take an example where 4 bit number is stored in excess 8

4 bit number 1111

in unsigned notation its 15 in decimal

in offset biased notation the number is 15 - 8 =7

so decimal range for number is from -8, all zero's 0 - 8

and max would be 7, when all 1's 15-8

zero would be represented by 1000 because 8- 8(excess-8)

**comparison to 2's compliment**

If we look at the minimum and maximum number above we can see that if we invert the first bit we get the same number in 2's compliment

like minimum -8 is 0000 in excess 8

invert first bit 1000 and interpret as 2's compliment its same

for 8 we have 0111 in excess-8

invert first bit 1111 and in 2's compliment it is 8

**Example**

excess 2^n-1 is used in IBM mainframe 360/370 to store the exponent

excess (2^n-1 - 1) is used in IEEE 754 to store the exponent

**converting IEEE to**

**decimal number**

IEEE single precision 1 bit sign 8 bit exponent rest signigicand

number in significand lies between 1/2 <=s <= 1

For single precision

total length = 32 bits

exponent = 8 bits

bias = 2^8-1 -1 =127

fraction is 32 - (8+1) = 23 bits

for double precision

total lenght = 64 bits

exponent= 11bits

bias is 2^11-1 -1 =1023

fraction is 64 - (11+1) = 52

steps

1)get the sign bit from

2)get the next 8 bits for the exponent from the representation

this is in excess 2^8-1 -1 i.e excess-127 for single precision

3)subtract 127 from the exponent value

e = extracted exponent - 127

4) get the 24 bit fraction f the representation

and one to the fraction 1.f

now number is

(sign) (1.f) * 2^e

**Convert from binary to IEEE format**

**convert from decimal to IEEE**

1)convert decimal number into binary

2)normalize it form 1.bbb * 2^e

3) find the exponent by adding bias

4)add bbbb... to significand and store exponent

when bbb... lenght is less than number of bits to store store remaining zeros

(sign) (exponent) bbbbbbbbbb

Example

1)10.6

10 = 1010 , .6=10011001...

For single precision

1) so number in binary is 1010.10011001 with repeating pattern 1001

2) normalize it, 1.01010011001... * 2^3

3) exponent is 3+127 = 130, in binary 10000010

4) sign is 0

fraction becomes repeating pattern for total length of 23

0 10000010 01010011001..

**Converting decimal to IBM/360 floating point format**

Steps

1)get the sign bit of the number

2)express decimal number in binary notation

3)normalize it, its base-16 so we move 4 bits at a time instead of 1

since we are moving radix point in binary *2^4(16) will move radix right 4 places

and if we moved n times exponent is n

like for binary 10.01 we represent it in normal base 16 as

.001001 * 16^1

for binary .00001

.1*16^-1

4)fraction is to the right of radix point in normalized form

5)exponent is 64+power of base-16, express it in binary

Example

2.25

1)sign = +

2)binary for 2.25 decimal = 10.01

3)normalized value = .001001 * 16^1

4)fraction value is 001001 and rest zeros for 24 bits total

5)exponent is 64 + 1 = 65, in binary 1000001

6) number is 0 1000001 001001000000000000000000

**Floating Point Arithmetic**

1)Align radix point, exponent values should be same for both opernads

2)perform operation

3)

**Overflow and Underflow**

**Overflow**

overflow is check by checking exponent value before and during normalization

positive exponent exceeds the maximum possible exponent value

represented by setting to infinity

**Underflow**

occurs when number is too small(near zero) and getting a 1 before radix point in normalized form causes exponent field to be zero

underflow may result in demormalized values where hidden bit will be zero and exponent wil be all zeros

precision would be reduced

**largest negative number(magnitude)**

exponent all 1's = 255-127 = 128

fraction all 1's = (.111...) = (1/2)^1 +(1/2)^2 .....(1/2)^23 = ((1/2)^24 - (1/2))/(-1/2) = -2^-23+1

significand would be 2-2^-23

sign -1

number is -(2-2^-23) *2^128

**Negative overflow**

numbers less than -(2-2^-23) *2^128 are negative underflow

**smallest magnitude negative number**

exponent all 0's = -127

fraction all 0's =0

significand =1+0 =1

number - 2^-127

**Negative underflow**

numbers greater than - 2^-127 are negative underflow

**smallest positive number expressible**

exponent all zeros

fraction all zeros

significand 1

number 2^-127

**Positive underflow**

numbers less than 2^-127 are positive underflow

**largest positive number**

all exponent bits 1

all fraction bits 1 = 1-2^-23

significand = 2-2^-23

number (2-2^-23)*2^128

**Rouding**

when number cannot represented in limited precision bits, we round the results

there are number of ways of rounding

IEEE standard requires

ALU register also has three extra bits other than significand 1.bbbbb

extra bits in order from lsb

1)guard bit

2)round bit

3)sticky bit

when mantissa has to be shifter to align the radix point, bits that fall off the lsb goes into extra bits.These bits can also be set in division and multiplication

guard and round bits are for precision

if sticky bit is set to 1 it remains at 1 to indicate what was beyond lsb

**round toward 0**. truncates any bits that the representation does not hold. result is approximate value represented err on the side of being closer to 0.**round toward positive infinity**. chooses representations that err on the side of being to "the right" as viewed on a number line.**round toward negative infinity**. chooses representations that err (for representing approximate values) on the side of being to "the left" as viewed on a number line.**round to nearest**. This method attempts to distinguish which approximate value that may be represented is closer to the desired value.

**Number Conversions**

All Operations on 7bit number

**Decimal Number to 1's compliment form**

1)find the representation for positive number that is 127

2)negate the bits

Represent -127 in 1's compliment

solution

representation of 127 is 0111111 so 1's compliment is negation of this that is1000000

Represent 0 in 1's compliment

1's compliment has two representation for zero +0(all zero's) and -0(will all 1's)

**Decimal Number to signed representation**

1)include sign bit (msb) =1 when number is negative else 0

2)represent the number with remaining n-1 bits in normal form

Represent -23 in signed representation

1)msb =1

2)binary of 23 is 16+4+2+1 = 010111

so signed number is 1010111

**Decimal Number to 2's compliment**

for negative number

1)find binary for +N

2) -N = (1's compliment[n] +1)

Note remember to convert number to required number of bits

example if we convert 23 to 10111 in 5 bits instead of 6 and perform operation like23 = 10111

2's compliment = (01000 +1) = 01001 it gives positive number which is incorrect

represent -23

1)23=010111

2)1's compliment of 23= 1101000

2's compliment is 1101001

**2's compliment to decimal number**

For negative

if number has more 1's then zero then

1)B = (1's compliment B + 1)

example represent 2's compliment 10111 in decimal

B = -(01000 +1) = -(01001) = -9

http://pages.cs.wisc.edu/~cs354-1/cs354/karen.notes/flpt.apprec.html

www.cs.sunysb.edu/~cse220/

**Homework**/HW1-sol.pdf //IEEE and floating point

## No comments:

## Post a Comment