Wednesday, December 29, 2010

Computer Network - Data Link Layer

Maximum data rate
For noiseless channel
If V be the number of discrete level and H be the bandwidth in Hz
maximum data rate = 2H log V
2H samples per second are required to completely reconstruct signals passed through loss pas filter of bandwidth H


For noisy channel
maximum data rate = Hlog(1+S/N)
conversion of S/N ratio to decible  = 10log(10,S/N) (to the base 10)
its irrespective of number of discrete levels

Data link layer
checks error transmission flow control
header and trailer attached to the network packet and called as frame
Ack is not a necessary service for data link its just optimization because instead of requiring n/w layer to send complete pkt again data link layer send the fragments that are not recieved


No Ack connectionless service
appropiate in case of real time traffic like voice where bad data is better than late data. used for less error prone channels

with Ack connectionless service
for unreliable channels like wireless


Framing methods
How do the receiver know in the continuous stream of data different frames
1)character count
2)flag byte with byte stuffing
include flag byte before header and after trailer
for each flag byte include escape character and one for each escape byte in data also
3)
to ensure that frame sequence never appears in the data
to frame sync sequence 01111110 containing six adjacent 1's a 0 bit is stuffed after every five 1's

Error detection and correction
error detecting code ,having enough information to identify error and
error correcting code, having enough redundacy to correct the corrupted data
consider data of m bits and r redundant bits
n = m+r
to detect d error we need distance d+1 code
to correct d error we would need 2d+1 code
 (n+1)2^m <= 2^n where n=m+r

Hamming rule
m+r+1 <= 2^r
this is lower limit on number of bits needed to correct single error in any of the n bits in message
we need to detect error in all the transmitted m+r bits and  
additional 1 is for no error

bits that are power of 2  {1 2 4 8 ...} are check bits rest are filled with data bits
each check bit forces the parity of some collection of bits including itself  A bit may be included in several parity computation

to which check bits the databit in position k contributes
like data bit at position contributes to check bits 1 2 8

A data bit k is checked by those check bits only

Problem A host uses odd parity to compute a hamming code
1)what is hamming code for bit sequence 1011001?
solution
from the hamming rule formula we find the minimum check bits required
it comes as r=4 for this message of size m=7 so total n =11
leave the check bits position which would be 1 2 4 8
and  calculate check bit for each data bit at that position
1 2  3  4  5 6 7 8 9 10 11
--------------------------
_ _  1  _  0 1 1 _ 0  0   1
find the check bits that contribute at each data positions

databit  check bit positions 
-------------------

3               0011
5               0101
6               0110
7               0111
9               1001
10             1010
11             1011

so check bit at position 2^0 forces parity for data bits at position {3,5,7,9,11} ={10101} odd
at position 2^1 to {3,6,7,10,11} = {11101} even
at position 2^2 to {5,6,7} ={011}even
at position 2^3 to {9,10,11} = {001} odd

so check bits at position 2 and 4 are 1
so hamming code is
0111 0 1 10 0  0   1

b)which bit is in error if following hamming code is recieved 0010011
solution
we have to include the check bit itself in parity check here
2^0 bit for {1,3,5,7} = 0101 error
2^1 for {2,3,6,7} = 0111 no error
2^2 for {4,5,6,7} =0011 error

error is in data position  2^0+2^2 = 5

Cyclic redundancy check
for polynomial of degree k include k zero bits at the end of the message m and divide m by polynomial g


upon receiving the checksummed frame receiver divides it by G(x) that is (T(x)+E(x))/G(x) T(x)/G(x) is zero so result of computation is E(x)/G(x) those error that are factor of polynomial will give zero other would give non zero result and would be caught

one bit error with CRC
E(x) = x^i so if G(x) has two or more terms it will give non zero remainder hence detected

Detecting double bit error
G(x) should not divide x^k + 1 for any k up to maximum  length of frame

Data link layer
control information or frame header(info also included)
1)kind whether any data in the frame or not
2)seq and ack


Problem A packet is divided in to 10 frames. chance frame getting correctly at receiver is .80 how many packet transmission required for successful
Solution  probability of successful transmission of packet = 1- .8^10
expected number of tranmission = {1 to infty} i (1-p)^i-1
geometric distribution
value = 1/p


making two changes to any valid code gives another valid code in case of parity so it has hamming distance of 2

MAC
Static channel allocation
if there are N users bandwith is divided in to N equal sized portions
mean time delay T
capacity of channel C bps
arrival rate of frames l frames/sec with mean 1/u bits/frame

service rate that is how many frames the channel can process = C/(1/u) =Cu  frames/sec
T = 1/ (uC - l)
that is when there is contention for channel we get the delay of T

For N subchannels
each with C/N bps capacity
frame arrival rate for each l/N frames/sec
Now T' = 1/(u(C/N) - (l/N))  = NT
that is mean dealy is N times that of single channel delay time

Dynamic Channel Allocation
Pure Aloha
feedback property of broadcasting : that is whenever a station transmits data it will also receive the same data so it has a way to know that if data got corrupted in transmission
terminal transmits whenever they have data to send without worry about collision
if two terminals try to transmit at the same time there will be collision data gets garbled and they need to send data again with random backoff time

Efficiency of Pure Aloha
frame time is the amount of time needed to transmit the fixed length frame
An infinite population of user generate new frames according to a poisson distribution with
mean N frames / frame time
If N>1 then  there will be collision on each transmission, that is channel can handle only 1 frame per frame time
we should have N<1 for reasonable throughput

There will be retransmissions of frame that suffered collision consider probability of k transmission attempts per frame time to be poisson distribution with mean G per frame time. G>= N
at low load G~= N at high loads number of collisons will be high so G>N

If there are N stations in the system
each station transmits with the probability p'
G is Average number of packets in the system
In period time t average packets will be  Np'
In vulnerable period of 2t it would be 2N*p'   //each transmits with p' only but for period of 2t

G' for vulnerable period is therefore 2Np'

Since G is the mean number of packet and from the poisson distribution we can find

Find the probability of k frame transmission give mean G
P[k] that is k packet trammisions occur

Probability that k frames are generated during a given frame time according to poisson distribution is
P[k] = (G^k e^-G)/k! where G is the mean number of frames


Expected number of transmission required for each sent frame = {k 1 to infty} kPk
e^2Np' or e^2G
What is throughput
P probability that a packet is successfully transmitted  //different from p' which is specific to station
Throughput average number of packets that can successfully go through
S = GP that is this fraction of transmitted frames escape collision

Find P, the probability that a packet is successfully transmitted
if t is the time to send frame then 2t is vulnerable period

P = probability of no other traffic during vulnerable period
probability of zero frame P[0] =  e^-Gfor duration of 2t its P[0] = e^-G'
so probability of no traffic during vulnerable period is e^-2G   or e ^-2Np'
and S = G * e^-2G  or
S in term of p' and N is S = Np' * e^-2Np'

When is maximum throughput achieved
Maximum through put is at G =1/2 or when power of  e is =1 by taking derivative wrt G
and is S= 1/2e or .184 so 18% is best channel utilization

Slotted Aloha
divides the time into discrete intervals each of frame time
user need to agree on slot boundaries
data is sent only at beginning of next slot
vulnerability period is halved t only

Probability of  successfully transmitting a packet in vulnerable period is e^-G
Probability of no other traffic in vulnerable period is e^-G which leads to
Throughput S =G e^-G
maximum throughput at G=1 with S =1/e = .368 that is 36.8%

Expected number of transmission required for each sent frame = {k 1 to infty} kPk
Pk probability requiring k transmission is from binomial distribution its
(1-p)^k-1*(p) where p is probability of successful transmission that is without collision
in case of slotted aloha its e^-G so
E = {k 1 to infty} k * (e^-G) (1-e^-G)^k-1  = e^G


Carrier Sense Multiple Access Protocol
stations listens to channel transmission and act accordingly

Persistent and Nonpersistent CSMA
first listen to channel, if busy wait until idle
if collision occurs , station waits random amount of time and starts all over again
1 persistent because transmits with probability 1 when finds channel idle

Cases in which collision can occur
if first station starts sending and second and already started transmission but because of propagation delay first station had sensed channel as free
while third station is transmitting and two stations are waiting for channel to be free they both will start with probability 1 so there will be collision

Non persistenet CSMA
In case of busy channel it doesnot continously sense the channel instead waits random amount of time

p persistent CSMA
applies to slotted channels
in case of idle channel it transmits with probability p and with probability 1-p defers till next slot

CSMA with collision detection
In CSMA stations donot transmit when they detects the channel as busy
In case of collision detection stations aborts transmission if they detect collision
saves channel bandwidth and time

How much time will it take for station to realize there has been collision
consider t to be propagation delay between farthest stations
transmission from station1 reaches station2 at time t0+t  if station1 transmits at t0
now if station2 starts transmitting just before the signal from station1 reaches station2 there will be collision and station2 will find it immediately but it will take 2t-e time to reach station1
So in the worst case a station cannot be sure that it has seized the channel until it has transmitted for period 2t without hearing collision
we model it as slotted aloha with 2t contention period(slot width)
Since station must continually  monitor the channel for collision so CSMA CD with single channel are half duplex because the receiving logic is used in sensing

Collision Free protocol
A bit map method
contention period consists of N slots and jth station sends a 1 in the jth slot if it has data queued for transmission After contention period is over each station is aware of other stations and they start transmitting in numerical order
Performance
low order stations has to wait on average 1.5N slots and
high order stations wait on average .5N slots
the mean for all stations is N slots
overhead per frame is N bits and amount of data d bits then efficiency is d/N+d
at high load its d/d+1


Ethernet
Manchester Encoding
used in Ethernet
to unambigously determine the start end or middle of each bit without reference to external clock
each bit is divided into two equal intervals.
A binary 1 bit is sent by having the voltage set high during the first interval and low in the second
A binary 0 bit is just the reverse low during first and then high.
requires twice as much bandwidth than straight binary encoding
to send data at 10Mbps signal has to change 20million times/sec

Differential Manchester
1 bit is indicated by absence of a transition at start of interval
0 bit is indicated by presence of a transition at start of interval
signal changes in the middle for both

Ethernet MAC protocol
preamble of 8bytes
sender and destination address 6byte address
0 for ordinary address and 1 for group address, group address allows multiple station to listen to single address its called multicast and address of all 1's is broadcast address
there are 48 - 2
type field tell the which process to give frame to according to proctocol name in type
data field is up to 1500 bytes maximum or including the destination and source address its 1518bytes
This doesnot include the Preamble and Start framedelimiter which are total 8 bytes
minimum data length also specified because
1)byte of 0 causes problem of stray bits in the channel so to distinguish valid frames from garbage there is minimum size of 64 bytes for frame and if data portion of frame is less than 46 bytes the padding is done
2)if the stations complete transmission of short frame before the noise burst gets back to it the sender would assume successful transmission only after time 2t can it know if collision occured
so for 10Mbps LAN with cable length of 2500meters , the worst round trip time has been determined as 50microsec Therefore minimum frame should be transmitted for at least this much time
Transmission of a bit on 10Mbps takes 100nsec so smallest frame should have 500 bits and so is the minimum frame for ethernet is 512 bits or 64bytes

So minimum frame size in ethernet including preamble and sfd is 512 + 64 bits

Checksum 32bit hashcode of data only error detection like CRC

Binary exponential backoff Algorithm(in case of collision not when senses channel busy)
After collision time is divided into discrete slots whose length is equal to 2t
After first collision each station waits either 0 or 1 slot times before trying again
After second collision each station waits either 0 1 2 3 at random
After third it would be 0  ..2^3-1

After 10 collisions its fixed at 1023 only if collision still happens and reaches 16 then its reported as failure

the value k that each station chooses depends on which collision number is it for that station

Problem
If two stations A and B collide  at first transmission A wins first backoff transmits successfully and then both tries to send again and collides again what is probability that A wins second backoff
Here slots avaiable for A are 0 and 1 because its 1st round for A
and for B slots available are 0 1 2 3 because its 2nd round for B
So the probability that A wins is if A chooses 0 and B chooses 123 anything or A chooses 1 and B chooses 2or 3
P(A wins) = 1/2 * 3/4 + 1/2 *2/4 = 5/ 8

How transmission in Ethernet works
1)Once it has the ethernet frame it senses the channel
If idle then start transmitting the frame
If channel is busy it must wait until it find the channel idle plus a additional wait of 96 bit times once it find it idle
2)monitors the channel after starting transmission If no signal from other stations found it has successful transmission if it finds the signal from other stations it stops transmission and transmits 48 bit jam signal instead
3)After transmitting the jam signal it enters exponential backoff and chooses the value K at random from 0 to 2^k-1 Then it waits for K*512 bit times and repeats again.

Why is jam signal required
In case B aborts the transmission after detecting transmission from A and has not put enough bits on the line, A may not be able to detect the signal and hence B needs to send jam signal


Performance of Ethernet
k stations are always ready to transmit
if each tranmits during contention period with probability p then probability A that some station acquires the channel is
A = k(1-p)^k-1p

A is maximized when p=1/k with A -> 1/e as k>infty

Expected number of slots per contention  = {j:1 to infty} jA(1-A)^j-1  = 1/A
each slot has duration of 2t the mean contention interval is 2t/A
for optimum value of p, expected number of contentions  =e
and contention interval would be at most 2te = 5.4e

Channel Efficiency
If mean frame takes P sec to transmit
Channel efficiency = P/(P+2t/A)
the longer the cable the longer will be contention interval

P=F/B framelength /bandwidth
A = 1/e for optimal case
t = L/c  , c is speed of signal propagation and L is length of cable

Channel efficiency = F/B/(F/B + 2(L/c)*(e)) = 1/(1+2BLe/cF)

Problem bandwidth C =100Mbps  10000 bits/frame  arrival rate 900frames/sec Delay for average frame?
Solution T = 1/(uC -l) 
mean frame length 1/u = 1/10000 =  10^-4

Problem Let N stations share 56-kbps pure Aloha. Each outputs 1000bit frame on average once 100sec
maximum value of N
Solution
required bandwidth for each station is 1000/100 bit/sec 10bit/sec
In Pure aloha usable bandwidth  is .184*56kbps = 10.3kbps
so N * (bandwidth for each station) = allowed bandwidth
N= 10300/10

In pure aloha, at small load,transmission can start instantly for slotted it has to wait for next slot introducing mean 1/2 slot of delay

Problem 10000 stations compete for slotted aloha channel Average station makes 18request/hour A slot is 125micro sec What is approx total channel load
Solution  Load is number of request in slot time
number of request per sec = 18/3600 = .005 request/sec
total number of request = .005*10000 = 50 request/sec
In slot period of 125micro sec number of request = 50 *125*10^-6 req/sec = 50/8000

Problem All formula  for slotted Aloha
G is number of requests in the vulnerable period = (number of request/sec) * vulnerable period
Solution Probability of success in slotted aloha is e^-G
Probability of success on kth transmission is (1-e^-G)^k e^-G
Expected number of transmission attempts =e^G

Problem In slotted Aloha 10% of slots are idle
1)what is G
probability of successful transmission is .1
so G= -ln P =-ln .1 =2.3
2)channel overload or underload
since G>1 so its overload

Problem If backlogged station donot accept new arrivals at MAC and unbacklogged accepts only single
Consider arrival rate of .05 arrival/slottime at each station by independent process, backlogged attempt retransmission with probability .02 in each slot
1)what is probability of arrival of new frame at MAC of unbacklogged & backlogged station
unbacklogged station will accept only first arrival
Probability of no arrivals = e^G
probability for at least one arrival at unbacklogged  = 1-e^-G  where G is .05
= .05
MAC of backlogged do not accept new arrivals so P=0
2)If there are 20 stations and 5 of them are backlogged what is probibility that one of the backlogged will see successfull transmission
For successful transmission no arrivals should be at backlogged station p= (e^-G)^15 = (1-.05)^15 = .472
For unbacklogged only one of them should transmit that is rest 4 should not transmit(
C(5 1)(1-.02)^4 * (.02)

Backlogged station do not accepts new arrivals at MAC because they donot have buffer So effective arrival rate at medium will decrease as the number of backlogged station increases
Poisson arrival rate is distributed equally at each station so if we arrival rate at medium is l then arrival rate at station is l/m for each m stations

Probability of unbacklogged station to transmit is 1- e^-G
Probability of i unbacklogged station transmit at next slot =
(ways to choose among i unbacklogged stations) (probability of transmitting for i unbacklogged stations)*(probability of not transmitting for remaining unbacklogged station)


Problem How long does a station has to wait before it can retransmit again
1)bit map protocol
assume frame is d bits If stations is the highest numbered
N bit of contention period + period when all N-1 transmits ( when last station wants to send just after its contention slot is gone)
= N + (N-1)*d   (assumption same 1bit for single slot time)

Problem Two CSMA/CD stations are trying to transmit multiframe file After each frame is sent they contend for channel using backoff algorithm
1)What is probability that contention ends on round k
Using backoff exponential algo
At round one they have single slot and they both try to transmit in that and hence collides
At round two there are 2 slots waiting for 0 and 1
At i th round station  can choose to wait in any 2^i-1 slots

Probability of choosing any one slot in ith period is 1/(2^i-1)
probability of both station choosing same slot is (1/2^i-1) * (1/2^i-1) = 1/2^2(i-1)
Total probability of choosing any of the 2^i-1 slot is hence P = 2^i-1 * (1/2^2(i-1)) = 1/2^i-1
this is probability of collision in ith round
for success on kth round we should have k-1 round in collison and kth round on success
that is
{product i:1 to k-1 }(1/2^i-1)  * (1- 1/2^i-1) 

 Problem what is baud rate in case of Ethernet
Solution
Ethernet uses manchester encoding which uses two signals to represent a bit so baud rate is 2 times the data rate because it has 1/2 bits/signal. In case of binary encoding baud rate = data rate
so for Ethernet 10mbps it should be 20megabaud

Problem Consider node A and B on the same 10Mbps Ethernet, propagation delay between two nodes is 225bit times
1)Node A starts transmitting and before it finishes node B starts transmitting .Can a finish transmitting before it detects that B has transmitted
The first bit from A reaches B at t=225 In the worst case B starts transmission at t=224 this transmission would reach A at 224+225(sent from B) =449 Since minimum frame size is 512+64 bit times A should be still transmitting and would detect collision
2)Suppose A and B both sends frame at the same time, the frames collide and then both chose different values of K in CSMA/CD Can retransmission collide Assume A chooses K=0 and B K=1?
Solution
At t=0 A and B starts transmitting
At t=225 both hears each other signals and start transmitting jam signal which is 48 bit time so
At t=225+48=273 both finishes transmitting jam signal, last bit of jam signal takes another 225 bit time to reach A and B
Immediately after sending the jam signal A B starts in exponential backoff
At t=273 A starts sensing the channel for transmission and B waits for 1*512 bit time that is 273+512=785  
A waits for the jam signal from B to reach it
At t=273+225=498 jam signal from B reaches A and A finds the channel idle and now before transmitting A has to wait for 96 bits
At t=498+96=594 A starts transmitting
A's first bit would reach B at 594+225 =819
so B would find the channel idle at 785 and would start wait in 96 bit period that is 881
At t=819 B would find signal from A and would not transmit

Problem Consider CSMA/CD network with 1Gbps over 1km cable. The speed of signal is 2*10^5km/sec.What is the minimum frame size
Solution propagation time = 5microsec
RTT = 10microsec
It should transmit for at least RTT period that is 1Gps*10microsec = 10^9 *10*10^-6 = 10000bits


Baud and Bit rate
Baud rate is signal /sec while bit rate is bits/sec
bit rate = baud rate * bits/signal

If we are using 1 bit per signal its same
If we are using
Baud rate in case of serial data transfer baud rate = bit rate single bit is used to represent signal
1 start bit 1/2 stop bit 0/1 parity bit for serial communication asynchronous
Example
1)calculate overhead for data size 7 bit ,1stop and no parity
overhead = 2/9 *100
2)represent 0101 1010 with 1 stop and no parity
start bit represented with 1 bit and stop with 0
1 0101 1010 0
3)time to transfer 4Mbits at 9600 baud rate
since in serial communication baud rate = bit rate we have
time = 4Mbits/9600 sec =


Stop and wait protocol

Line utilization is 1/(1+2a)   or transmission time/ (transmission+RTT)
L packet size and  u bandwidth
transmission time = L/u
a= propagation time / transmission time
transmission time is time to transmit a frame


Sliding window

Link utilization
1             when n >1+2a
N/1+2a   else


Problem Consider one way propagation time of .35sec, transmission rate of 1Mbps, and packet of 1000 bits ack time negligible
1)minumum timeout period for stop and wait ARQ

Timeout from end of the packet is 2*propagation delay
if we consider from start of packet it will add tramission time to this qunatity
2)minimum window size for go-Back N for 100% utilization assuming no errors?
window should be sufficiently large so as to transmit for the period until ack recieved
N /(1+2a) >=1
N >= 1+2a
frame transmission time = 1000/10^6 sec = 1ms
propagation time = .35
a= .35/10^-3 =  350

N = 701

Problem Consider 10Mbps bandwidth with propagation delay of 270ms and frame size of 10K-bit What is link utilization for stop and wait ARQ assuming probability of single frame error =10^-3
Solution line utlization = (1-P)/(1+2a)
For goback N its
N(1-p)/(q+2a)(1-p+Np)



Token Ring
Token cirulates in the ring topology LAN
when station wants to transmit data it converts token into data frame and start transmission at last it converts it in token back and circulates it in the ring
At each station a delay is introduced on data transmission
Release after transmission

Ring latency number of bits that can be in transit simultaneously or time for bit to circulate ring

Reinsertion strategies
1)Multi-Token operation: Free token transmitted immediately after the last bit of data frame ,that is, as soon as station has transmitted its data it will release free token
2)Single-Token operation: free token is inserted after last bit of busy token is received back that is free token is independent of data frame and depends on the ring length
If frame is longer than latency then that would mean busy token is recieved and its still transmitting so station would release free token when last bit of data frame is transmitted like multi token
3)Single frame operation
free token inerted when last bit of transmitted frame is received back at station

utilization = 1/ (1+a/N)
X maximum frame transmission time allowed per station
Throughput
Multi token



Problem consider a ring topology LAN with total length 2d with two stations d apart Ack is recieved by circulation of packet itself There are N=10 other stations on the ring each introduces delay of 1 bit
Solution
1)find the 1bit delay that is howlong is delay at each station in seconds
1Mbps is capacity so it would take 1microsec delay
2)total delay for N+2 stations = 12microsec
3)find how long it takes for 100 bit packet to fully circulate the ring and return to sender
propagation delay is 2d*5microsec = 10microsec Adding one bit delay gives 22microsec to fully circulate distance of 2d
4)transmission time of host 100/1M = 100microsec
5)host A will recieve the last bit at 122microsec for 100 bit packet
6)so for 10000 packets it would take 10000*122 = 1.22sec

Problem Consider there are N stations in token ring LAN Frame size is L bit, RMbps speed ,b bits latency per adapter d m between each stations with v speed find effective transfer time for all 3 strategy
Solution
1)Reinsertion after completion of transmission or Multi-Token operation
that is token is inserted after the transmission is complete
time to pass token to next station =  propagation time + bit latency delay, assuming token transmission time negligible
effective transmission time = frame transmission time + time to pass token to next station
                                        = L/R + (d/v+b/R)
Efficiency = L/R /(L/R+(d/v+b/R))

2)reinsertion after return of token or Single Token operation
where token is inserted after last bit of busy token is received back or if transmission is still in progress then when busy token is received then after transmission ends

Effective time = Max(frame transmission time,ring latency ) + time to pass token to next station
ring latency = time required for bit to circulate ring = N(d/v+b/R)
                   = Max{L/R,N(d/v+b/R)} + (d/v+b/R)
3)Reinsertion after return of frame or Single frame operation
time for last bit of frame to reach back to station = frame transmission time + ring latency
Effective transmission time = frame transmission time + ring latency + time to pass token to next station
Effective transmission time = L/R +  N(d/v+b/R) + (d/v+b/R)

Throghput
1)Single Frame
effective time = frame transmission X + ring latency τ'
consider normal ring latency a' = τ'/X
ρmax =  MX/(M(X+τ') +τ') = 1/(1+a'(1+1/m))
1)Multiple token
effective time = frame transmission X + ring latency τ'
consider normal ring latency a' = τ'/X
ρmax =  MX/(M(X+τ') +τ') = 1/(1+a'(1+1/m))


If each station transmits up to k frames/token Maximum throghput ?


References

nptel.iitm.ac.in/courses/.../Computer%20networks/pdf/M3L2.pdf

Monday, December 27, 2010

Database Normalization - Normalized Forms , ER diagram and Relational algebra

Relational Algebra
Select Operation
select tuples that satisfy given predicate
the predicate is the where clause
lowest number of tuples selection operation can return is zero and at most n when relation r has n tuples

Project operation
Project operation selects the columns which should be retrieved
that is similiar to select A,B,C from .... where to select only column A B C we need to use project
eliminates duplicates
so a projection on relation r may return at most n tuples, if relation r has n tuples, if all distinct tuples and will return 1 if all same tuples
it restricts the columns unlike selection which restricts rows

Union operation
tuples that appear in either of two relations or both of them
union should be taken on compatible relations
conditions or union opearator to be valid
a) relation must have same number of attributes
b)domain of the attributes must be same

Set difference
tuples in one relation but not in another
same condition like union holds for set difference operation also that is arity and domain condition

Cross product
argument should have distinct names, distinguish same attribute name with table alias
tuple for each pair of tuple in r and s
n1*n2 number of tuples in crossproduct
to get the correct result from cross product we need to identiy tuples in r with tuples in s
that is some condition of the form (r.A = r.B and .. )

Rename operation
allows us to give name to the relation expression
example
a way to find the maximum balance like max function in sql is like
select all those balances which are not greatest that  is they are less than some other value
and subtract them from the available balance values
we can use positional parameters $1 $2 to address the attributes instead o giving them names

Additional operations this can be expressed in terms of fundamental operations
Set intersection
can be rewritten with set difference like
r intersection s = r- (r-s)

Natural join operation
allows to combine certain selection and cartesian product in single operation
in terms of fundamental operation
1)first get the same column names R intersection S which gives same attributes that is {A1,A2..}|
2)perform selection with matching these columns like {r.A1 = s.A1 and r.A2 = s.A2..} in predicate
3) perform projection for R union S attributes
does not returns duplicate tuples
natural join is associative
when no attributes in common that is R intersection S = empty then it returns cartesian product
natural join would return only one attribute for common attribute from r and s


Theta join

Division operation
for queries with statement like 'for all'
S attribute set is subset of R every attribute of S is also in R
r / s is relation on the schema R-S( all attributes in R not in S) that is it contains only attributes R-S
conditions
1)for every tupple ts in s there is a tuple tr in r satisfying

find tuples that donot appears in r that is from r x s - r that is all those tuples for which value in r exists for all values in s will be eliminated in this step then we can subtract is from r to get the required tuples
example
r be relation below and R={x,c}
x    c
-------
x1 c1
x2 c2
x3 c1
x3 c2

and S be relation below and S= {c}
c
--
c1
c2
we want to find those x for which there all c are mapped like x3 which has both c1 and c2 relation tuple
1)find those tuples which are not mapped
R-S = {x}
1)we find cross product   projection(r)_R-S * s_c  which will give all possible relations that is for each x there will be mapping for all c's
3) now we subtract r relation from above cross product which will eliminate all the tuples from r which are in s
we will be left with tuples which donot have all c mapped to x

Outer join

Tuple relational calculas
non procedural
{t| p(t)} set of all tuple t such that predicate P is true for t
{t | e r(Q(t))} r(Q (t)) is the relation of which t is tuple r is created from expression on relation
t[A] value of tuple on attribute A
the attribute on which we specify the condition is only contained in the final expression result

example
find loan number of each loan with amt > 1200
predicate t[loan_number] = s[loan_number] and s[loan_number] > 1200
so the result will have only single attribute loan_number

if there are two relations involved in the statement then we use there exist clause for each and connect them
implications if p then q


Translating to Tuple calculas form
1)if asked to find only A attribute then in the complete predicate we will use t on only A
2)if n relations are involved we need n tuple relation like u element of r,  x element o s ..
3)if we are using implication then we also need to specify what happen when this is not true like if we are asked to find all the customers who have account at all branches in brooklyn
if they have branch we can specify will predicall for all and implication but we also need to connect this relation with expression that tell include all customer names if no branch at brooklyn
4)for every wheare clause involving two relations we write it like
w is tuple such that on relation r {r[A] = s[B]}
5) If we are asked to find something like
find all projects which supplier S supplies entirely
find all the supplier which do not sell green color parts
first we find the complement that is in first case we find the projects which other suppliers also supply
and then subtract it from the project supplier S supplies
in second case we find suppliers which sells green  color parts and subtract them from supplier list

we write query like divison of RA in tuple calculas as
we use 'for all' clause on the relation s whose all attribute values should be mapped and we check for each value in s if there exist a tuple in r mapped to it

SQL

All aggregate function except count(*) ignores null values in input
all aggregate functions returns value of null when applied to empty collection
Nested queries
Set membership

some comparison returns true if  value of tuple is greater that atleast one
=some is identical to IN
<>some is not same as  NOT IN

Integrity constraint
Referential integrity
Child table or referencing table stores the foreign key that references the parent table or referenced table primary key
no insert can be done in the child table if corresponding foreign key does not exist in parent table
Rules
Restrict disallow the deletion or update of  referenced data that is deletion from parent table if not allowed
Set to Null set to null in referencing table associated data on update or delete
Cascade on update in referenced table update the associated data in referencing table and on delete in referenced table delete all associated dependent rows
No action  differs from Restrict in sense that checked at end of statement

Normalization
why use normalization
repetition of inormation, data will be repeated for each instance used in the relation,wastage of space
updation would be problem we need to check all rows to update the value, costly updates
inability to represent certain inormation,  if certain attributes are not available we cannot insert data other attributes and
while deleting we may delete all the data not leaving any master record trace

Functional dependency
type of constraint on set of relations
a  ---> b
set of attributes a, b belong to relation r
holds on schema R, if any legal relation r(R) for all pairs of tuple t1 and t2 in r if t1[a] = t2[a] then t1[b] = t2[b] that is for a value of 'a' there corresponds a unique value of 'b'  one to many not allowed

superkey
if K-->R that is for all pair of tuples if t1[K] = t2[K] then t1=t2

Normal forms

First Normal form
there should not be set of values assigned to attribute that is
no field of  relation should be like {address1.1,address1.2} that would require to parse the value with extra programming effort

Trivial dependency
dependency that are satisfied by all realtions
A  -> A for attributes A
or a ---> b where b is subset of a

Closure of functional dependecy
F+ set of all functional dependecy logically implied by F

Axioms
1) Reflexivity
trivial dependecy A--->B  if B is subset of A
2)Augmentation rule
If A--->B and C is set of attributes then AC-->BC
or
If If A--->B and C is set of attributes then AC-->B
or
If A--->B and C->D  then AC-->BD
3)transitivity
If A--->B and B--->C then A ->C


Rules
Union rule:
If A->B and A->C then A->BC
2)Decomposition rule reverse of union
If A->BC holds then A->B and A-> must also hold
3)pseudo transitive
If A->B and BC->D then AC->D


Closure of attribute set

Finding the set of attributes determined by A
algorithm
set result =A
while result changes do
  for each functional dependency B->C
  if B is subset of result  include C in the result  //

Problem
A, E ---> D -- 1
A --> C -- 2
E --> A -- 3
C ---> E -- 4

find (BA)+
Solution
result =BA
1)find lhs attribute set, superset of BA
2 so we include C, result= BAC 
2) repeat A with new result
i)result=BACE   from 4
ii)result = BACED from 1

Problems Find all CK for
A, D ---> C -- 1
D ---> E      -- 2
E, B ---> A  -- 3
A ---> B      -- 4
Solution:
1) find the attribute set that determines all other
A,D -> CEB
2)can we remove any of the A or D from the above dependency

since D doesnot appear on right side of any functional dependecy that cannot be determined by any other attribute so D cannot be removed
A is determined by EB and B is determined by A so we cannot remove A also
3)checking for other CK
find which other attributes derive the first CK i.e AD
applying psuedotransitivity on 2 and 3 we get DB->A

b)
A ---> C, D   -- 1
C, D ---> E   -- 2
E, A ---> B -- 3
Solution
1) since A doesnot appear on right of any FD so A should be in CK
2) A can derive any other attribute set if we check
A ->E
A-> EA
A->B
so A is the CK


c)
A, E ---> D -- 1
A --> C -- 2
E --> A -- 3
C ---> E -- 4
Solution
BA ->ABCED
AB is the CK
we see A is determined by E so BE is also CK
and E can be determined from C so EC is also CK


d)
B --> A,E   -- 1
E, B --> C   -- 2
A, C ---> D   -- 3
A ---> B      -- 4
Solution
B->ABECD
so is A


Cannonical Cover
extraneous attribute : if attribute can be removed from functional dependency without changing the closure
1) attribute a of A set is extraneous if for FD A -->B
F logically implies  F-{A->B} U (A-a) -->B

example
F: AB->C and A->C
Lets see if A is extraneous or not in AB->C
F' = B->C U A->C
does F logically implies above F'
no F does not implies B->C


Check if B is extraneous  in AB->C
A->C U A->C  =A->C
F has the same

try reverse does F-{A->B} U (A-a) -->B  logically implies F for
AB->C and A->E
let B be extraneous in AB->C then
F'=A->C , A->E
does F' logically implies F
well it does and will always imply F irrespective of FD's

Method
A' = A-a
check if A' -> B can be inferred from F that is compute closure A'+ under F

2) attribute b of B set is extraneous in FD A-->B
if F-{A->B} U (A) -->(B-b) logically implies F
example
F: AB->CD and A->C
let C be extraneous in AB->CD then
F'= AB->D , A->C
does F' implies F, yes
reveres would be always true in this case also that F would always imply F'
Method
check if A ->b is implied by F' that is computer A+ on F' and check if right hand side has b

Cannonical cover
set of depndencies Fc such that F logically implies Fc
1)no functional dependency in Fc contains an extraneous attribute
2)each lef t side of functional dependency in Fc is unique

cannonical cover may not be unique


Decomposition
A set of relation schemas { R1, R2,…, Rn } is a
decomposition of R if
R = R1 U R2 U …..U Rn
each Ri is a subset of R ( for i = 1,2…,n)

Given instances of the decomposed relations,
we may not be able to reconstruct the
corresponding instance of the original relation
– information loss
Lossy decomposition or lossy join bad design gets more tuples but less information
After Natural Join, we get  extra tuples. Thus, there is loss of information.
A decomposition {R1, R2,…, Rn} of a relation R is called a lossless decomposition for R if the natural join of
R1, R2,…, Rn produces exactly the relation R.
 
     R(A, B, C)
          |
     Decompose
R1(A, B)   R2(A, C)
         |
    Recover
   R’(A, B, C)

Thus,
R’ = R

R : relation
F : set of functional dependencies on R
X,Y : decomposition of R

Decomposition is lossles if :
if X ∩ Y forms a superkey of either X or Y, the decomposition of R is a lossless decomposition

Superkey
no distinct tuple in the realtion have same values for the attributes in this set


Normal forms
1NF
all attributes directly or indirect depend on candidate key


2NF
1)should be in 1NF
2)Every non key  attribute is fully dependent on candidate key


when not in 2NF its vulnerable to update anomaly


example AB is key for R(ABCD)
if we have values like
A    B    C     D
A1 B1  C1   D1
A2 B1  C1   D2
A3 B1  C1   D3

here B ->C that is C is dependent only on B which is partial key
now if we have update C for B1  we need to make sure we update all B1 records

So we divide the table in to two BC and ABD for table to be in 2NF
3)case when mutual dependence between non key attribute
2NF eliminates certain update anomaly but not all
the case when non key attribute is dependent on another non key attribute is not handled here
example
R(supplier_no,status,city)
and functional dependency are
--------------------------
supplier_no --> status
supplier_no -->city
city ---> status
---------------------------
This relation is in 2NF but not in 3NF
Reason  here both non key attributes status and city are dependent on whole of candidate key but there is dependency city -->status which can lead to update anomaly

4) Anomalies in 2NF
Insertion anomaly : we cannot maintain status of city in above case until we have supplier in that city
Deletion Anomaly:same way if we delete all supplier for a city we loose status information
Updation Anomaly:status occurs many times like if we update status for a particular city we have to update all rows with that city otherwise inconsistency

5)can exist only if table has composite key, always in 2NF if no composite key

6) Case when not proper subset of candidate key and not super key either
Consider the Relation R(pubId, title,pagecount,price)
if FD's are
PubId, title --> pagecount,price
PubId, pagecount --> price
This is still in 2NF because price depends not on the proper subset of candidate key{pubId,title} but on part of candidate key with some other non key attribute
so its not proper subet of key
and not superkey either
This is not in 3NF because
1)non key attribute price should be dependent on superkey or we can say
2) pubId,title -> pagecount,pubId -->price, which is transitive dependency that is non prime attribute price here is transitively dependent on candidate key {pubId,title}

reference

7)Check for all the candidate key for partial dependency

3NF
1)If relation has only one candidate key then no non key column should determine another non key column
2) 2NF say non key attributes depend on whole key, 3NF restraints it further and says non key attribute depends on nothing but the whole key
that is in 3NF cases like X->A,B where X is composite candidate key and A and B are non prime attribute then A->B is not allowed though B depends on whole key X but it also depends on A
2)For each functional dependency X-->A either of the condition should hold true
i)X->A non trivial dependency that is A is subset of X or
ii)X is superkey or
iii)A-X that is nontrivial attributes on right side is prime attribute that is part of some candidate key
3)Anomalies in 3NF
can suffer from all three in case when its not in BCNF
4)when all attributes are prime then its in 3NF

BCNF
1)more restrict than 3NF
2)for every functional dependency X->A either
i)X->A is trivial dependency or
ii)X is superkey
3)3NF with no overlapping candidate key is guaranteed to be in BCNF
4)Dcomposition in BCNF may not preserve certain dependency when attribute functionally dependent moves in to another table

Decomposition of BCNF
for A -> B where A is not key we divide the relation R in to AB and {R-B}
Any relation in BCNF is in 3NF

Problem
1)R(A, B, C)
FD's
--------------
AB -> C,
C -> B
-------------

check for BCNF
i)find closure of the left side attributes
for C->B
closure of C is B that is not = ABC so not key hence not in BCNF

Check for 3NF
i)find the minimal keys
AB and AC
since all right side attributes are prime its in 3NF or 
ii)check 3NF conidtion for each dependecy
AB - >C   AB is key so satisfies 3NF
C->B   C is not key but B is part of key so it also satisfy 3NF


Problem R(A,B,C,D,E) and FD’s
A->B,
B->AE,
AC->D
why not in BCNF?  Decompose in BCNF
solution
i) A->B check closure of A which is ABE so not in 3NF
ii) B->AE  closure of B is BAE this also violates
iii) AC-> D  closure of AC is ACDBE  which satisfies
Now we decompose relation such that FD i and ii are satisfied
first rewrite FD like we check this on new FD whenever some transitive dependecy comes from present FD
A-> B , B->A , B->E,A->E, AC->D
for A->B  decompose it in R1(AB) and R2(ACDE) and check if FDs hold
on R2 A ->E violates as A is not superkey so we decompose R2 as R3(AE) R4(ACD)
now every FD holds

Problem  R(A,B,C,D,E) with functional dependencies A→ E, BC→ A, DE→ B.
Solution
find keys
ACE ACD ACB
since all right side are prime key attributes or part of superkey so it satisfies the 3NF
but left side are not the key so not in BCNF
BCNF decomposition
R1(AE)  and  R2(ABCD) holds A->E violates BC - A
R1(AE) R2(BCA) R3(CBD)  only dependency in R3 is trivial BCD -> BCD
lost the dependency DE->B


Problem  R1(A,B), R2(C,D,E,F) A→ B, C → D, D → EF
Solution
key for R1 is A and key for R2 is C
it holds 2NF condition because A-> B, B depends on complete key, for C->D, D depends on complete key and D->EF, EF doesnot depends on partial key either



Entity Relationship
coneptual representation : ER
logical: relational model with attributes datatype lenght constraints
physical: with schema def views triggers

Realtionship set
set of same type of relations
each entity plays a role in the relation
same entity set can participate in relation set more than one time with different roles
descriptive attributes on relation set
relationship instance in the relation set must be uniquely identifiable from its participating entity without descriptive attributes:
Mapping cardinality: number of entities to which another entity can be associated using relationship set
Participation :if every entity in E participates in atleast one relationship in R total participation

Key for relationship set
Ei entities participating in the realtionship
primarykey(Ei) and {a1,a2,..an} attributes associated with realtionship set

describes individual relationhip and union of keys forms the superkey

set of attributes  primarykey(E1) U  primarykey(E2) U primarykey(E3) ..U {a1,a2..an}

when realtionhship set is many to one
the primary key of the relationship set is identified from the key of the entity on the many side

Translating from ER model o relational model

Many to many relationship
employee[ssn]----------works_in(since)-----deptartment[did]
many employess can work in a department and
and a employee can work in many departments
in translating relation work_in it need to include primary key of both participating entities(which becomes it's PK) and its own attributes


One to Many relation
employee[ssn] <-----------manages(since) ----------department[did]
each employee can manage number of departments but (constraint)
each department is managed by at most one employee

since each deparment has a unique manager we can combine the department entity and manages relation without introducing any redundancy

manages seperate would have been like

ssn ,dId ,since , primary key (did), foreign key(ssn) reference Employees, foreign key (did) REFERENCES Departments)

with combined dept_manages for total participation from "many side"
 attributes of department + attribute of manages above
ssn ,did ,since , primary key (did), foreign key(ssn) reference Employees


Total Participation
If every department must have a manager that is must appear in manages relation with non null ssn then
we represent relation manages and department in single table only but on deletion of employee or ssn we don't delete the dept_manages tuple
and ssn constrain of not null

Weak entity set
A weak entity can be identified uniquely only by considering the primary key of another (owner) entity.
– Owner entity set and weak entity set must participate in a one-to-many relationship set (1 owner, many weak entities).
– Weak entity set must have total participation in this identifying relationship set.

Weak entity set and identifying relationship set are translated into a single table.

employee ([ssn],name)-----------policy(cost) <-----------------------dependents ([pname],age)

Dep_policy
pname,age,cost,ssn
with primary key [ssn ,pname] and foreign key [ssn] referencing employee and on delete cascade

when owner entity is deleted all the owned weak entities must also be deleted



References
www.cs.sjsu.edu/faculty/lee/.../26Presentation_Jung_T_Chang.ppt
pages.cs.wisc.edu/~dbbook/openAccess/firstEdition/.../mod5l1-2.pdf
https://agora.cs.illinois.edu/display/cs411sp10/Assignments
www.cs.arizona.edu/classes/cs460/fall09/hmwk2.pdf

Monday, December 20, 2010

Number representations - IEEE floating point, 2's compliment,biased ..

 2's compliment


Fig 2's compliment representation

Positive number in 2s compliment
sign bit an-1 is zero
n-1 bits are allowed for the magnitude of number so total distinct positive numbers will be 2^(n-1)
numbers range from all zero to all ones(2^0+2^1+2^2 ...2^(n-2)) i.e from 0 to 2^(n-1) - 1

Negative Number in 2s compliment
sign bit an-1 is 1
for magnitude of number all bits are used in calculation including the most significant an-1 bit
Calculation of magnitude considers most significant bit as negative and rest as positive so
magnitude is calculated as  -2^(n-1)+ sum(0 to n-2)ai*2^i
smallest number in negative that can be represented is when all the n-1 bits are zero and only the msb contributes that is -2^(n-1)
Maximum number in negative 2s compliment is when all n-1 bits are 1 and they reduce the effect of msb, it is -2^(n-1) + (2^n-2 + 2^(n-3) ....2^1+2^0) = -1

Observations
single representation of zero, when all n bits are zero
we can represent -2^(n-1)  but not +2^(n-1) in 2s compliment
when bits of positive number n are complimented(1s compliment)  it becomes -(n+1) in decimal example
+3 is 0011, its compliment 1100 is -4
+7 is 0111 its compliment 1000 is -(8)
To obtain the negation of the number that is +n to -n we need to first take 1s compliment which will give us -(n+1) then we can add single digit to get the required number -n
negation of zero  will give zero only if we ignore the carry, exception case is for negation of -2^n-1 also where all bits will become zero , we get same number -2^n-1

Arithmetic Operations in 2s compliment

Addition
discard the extra bit when adding , two 4bit numbers discard the 5 msb if any of result. It is different from overflow which gives incorrect result.
when result is large than can be handled in word size we call it overflow. ALU must signal not to use the result.
example of overflow
add   1001  //-7
         1010  //-6
       10011  //   +3
Overflow can occur whether or not there is carry
overflow occurs when sign of result for two positive numbers or two negative numbers is opposite

Multiplication
//booth algorithm

Division

Overflow
when adding unsigned binary numbers, overflow occurs when the final carry-out is a 1
In 2's complement, the final carry-out is always ignored, and overflow occurs when there is a sign change

One's Complement
used to represent negative numbers
positive numbers same as in sign magnitude system (with first bit zero )
negative number is obtained by inverting the bits of positive number
zero has two representation +0 and -0
range of number sis -(2^n-1 - 1) to (2^n-1 -1)

Floating Point Representation
for numbers too large or too small
number is represented approx to fixed number of significant digits and scaled using exponent
way to represent scientific numbers
A floating point number consist of
1) significand mantissa or coefficient signed digit string.
significand length determines to which precision number can be represented, prportional to accuracy
2)a signed integer exponent scale factor proportional to range, more bits to the exponent means more range

|---|--------------|---------------------|
|---|--------------|---------------------|
sign(1bit)   E(8bit)       Significand(23bit)

Fig. 32 bit floating point Format

 S*B^e
B is bias, implicit, same for all numbers. bias value is typically 2^e-1 - 1
radix point is assumed to be after MSB bit of signigicand

advantage of biased representation is that non negative floating point numbers can be treated as integers for comparison purpose

sign: 0 positive, 1 negative

exponent: stored in biased form, that is, bias is subtracted from the field to get true exponent
exponent is 8 bits means number from 0 to 255 with bias it becomes -127 to +128


Significand: stored in normalized form, msb should be one and radix point should be after msb
msb one is taken as implicit and need not be stored, so 23 bit of significand can represnt 24 bits
In normalized value of significand lies in [1,2)

IBM/360 floating point format
total length 32 bits
exponent 7 bits
fraction 24 bits
bias 2^7-1  = 64 unlike IEEE, 2^n-1 -1 bias

base is 16 unlike IEEE  base-2
there are no hidden 1 significand
so normalization is to 0.bbbb only
If number is represented in IBM/360 floating point like
1/0 eeeeeee fffffff..

fraction has no hidden 1.
we get the decimal for the binary representation exponent and raise it to the base 16
(sign) 0.ffff  * 16 ^exponent


IEEE 754 floating point standard
 single precision: 1 bit sign, 8 bit exponent, 23 bit fraction (24 bit significand = 1+ fraction) (“1” is implied)
double precision: 1 bit sign, 11 bit exponent, 52 bit fraction (53 bit significand)



Fractional portion of the significand represents a fraction between 0 and 1 (each bit from left to right has weight 2-1, 2-2, …)
Since 0 has no leading 1, it is given the reserved exponent value 0 so that hardware won’t attach a leading 1 to it
Exponent is “biased” (biased notation) to make sorting easier
all 0s is smallest exponent (most negative) all 1s is largest (most positive)
bias of 127 for single precision and 1023 for double precision

(–1)sign × (1 + fraction) × 2exponent – bias

Exponent and mantissa of all zeros and ones are special values

Range of exponents 
exponent of all zero(0) and all ones(255) defines special values
For single precision
it can range from 1 to 254 and in bias we would say -126 to +127
for double precision
-1022 to +1023

Mantissa number of bits
A normalized number would add a hidden 1 making the mantissa of 2
24 bit for single precision
53 bit for double precision


Denomalized or subnormal number
demormal numbers have hidden bit of significand as 0
have exponent of all zeros 

0.0         0 or 1   00000000   00000000000000000000000
                                (hidden bit is a 0)

subnormal   0 or 1   00000000   not all zeros
                                (hidden bit is a 0)

normalized  0 or 1    > 0        any bit pattern
                                (hidden bit is a 1)

Representation of Zero
Two representation of zero +0 and -0

exponent and mantissa all 0

in hex 0x0000 0000 and  0x8000 0000

Representation of infinity and NaN
• +∞ an -∞ are denoted with an exponent of all 1's

s  e        f
  +∞     0 11111111 00000... (0x7f80 0000)
  -∞     1 11111111 00000... (0xff80 0000)
  NaN    ? 11111111 (not all zero) 
 
(S is either 0 or 1, E=0xff, and F is anything but all zeros)




significand fraction lies betwee 1/B <= s <=1
where B is base
for base 16 fraction lies betwee 1/16 and 1 inclusive
for base 2 its .5 to 1 inclusive

Error in representing recurring fractions

xT – true value
xA – approximate value

True Error in xA (exact value of the error) = E(x) = xT - x A
True Relative Error in xA = xT-xA / xT
True Percentage Relative Error in xA = xT-xA / xT * 100


Problem Consider 16 bit floating point format  with 1 bit for sign 5 for exponent and 10 for fraction exponent is in excess 15 representation
a) decimal for number in this format
0 01001 0101000000
Solution
sign bit is positive
exponent is in excess 15 so e+15 = 9
e=-6
so the value is 1.01012 * 2-6 which is (1 + 1/4 + 1/16)/64 = 2.05078 * 10-2

b)convert decimal number -50.75 in above representation
5010 = 1100102  and .7510 = .112 
so -50.75 = -1.1001011 * 25
 biased exponent value will be 15 + 5 = 2010 = 101002
leaving the first bit in significand
Number is 1 10100 1001011000


Problem 6 bit wide number represented in 2's compliment what is maximum and minimum number
Solution Minimum when first bit is 1 i.e -2^n-1 = -32
Max number when first bit is zero and sum of all n-1 bits i.e 2^n-1 -1  = 31




Problem Consider bit pattern 1010 1100 1011 0101 0011 0000 0011 1000 what value does it represent in
1)2's compliment
since number is negative(msb is 1) we find the value by
1)subtracting  1
2)fliping bits

2)single precision floating point
sign = negative
exponent 8 bits after sign 010 11001, its biased so we have to subtract 127 from it
i.e 89-127 = -38
significand is remaining bits with an implicit 1 so
1.011 0101 0011 0000 0011 1000
number is - 1.011 0101 0011 0000 0011 1000 * 2^-38

Problem Convert - 1101001.10011 to IEEE format

Solution
a) Normalize : -1.10100110011 * 2 ^6
b)exponent = 6+127= 133 = 10000101
significand 10100110011  and remaining bits goes zero


Problem Consider IBM base-16 single precision format
represent decimal number 2.25 in this format
Solution
1)get sign bit, 0
2)express number in binary
(2.25)d = (10.01)b

Base 16 format means number is represented

Offset Binary Numbers (Excess-K)
all zeros corresponds to minimal negative value 00000000..
all ones correspond to maximal positive value 111111...


excess 2^n-1
take an example where 4 bit number is stored in  excess 8
4 bit number 1111
in unsigned notation its 15 in decimal
in offset biased notation the number is  15 - 8 =7
so decimal range for number is from -8, all zero's 0 - 8
and max would be 7, when all 1's  15-8
zero would be represented by 1000 because 8- 8(excess-8)

comparison to 2's compliment 
If we look at the minimum and maximum number above we can see that if we invert the first bit we get the same number in 2's compliment
like minimum -8 is 0000 in excess 8
invert first bit 1000 and interpret as 2's compliment its same
for 8 we have 0111 in excess-8
invert first bit 1111 and in 2's compliment it is 8

Example
excess 2^n-1 is used in IBM mainframe 360/370  to store the exponent
excess (2^n-1 - 1) is used in IEEE 754 to store the exponent



converting  IEEE  to decimal number
IEEE single precision  1 bit sign 8 bit exponent rest signigicand
number in significand lies between  1/2 <=s <= 1

For single precision
total length = 32 bits
exponent =  8 bits
bias =  2^8-1 -1 =127
fraction is 32 - (8+1) = 23 bits

for double precision
total lenght = 64 bits
exponent= 11bits
bias is 2^11-1 -1 =1023
fraction is 64 - (11+1) = 52
 steps
1)get the sign bit from
2)get the next 8 bits for the exponent from the representation
this is in excess 2^8-1 -1 i.e excess-127 for single precision
3)subtract 127 from the exponent value
e = extracted exponent - 127
4) get the 24 bit fraction f the representation
and one to the fraction 1.f
now number is
(sign) (1.f) * 2^e

Convert from binary to IEEE format


convert from decimal to IEEE
1)convert decimal number into binary
2)normalize it form 1.bbb * 2^e
3) find the exponent by adding bias
4)add bbbb... to significand and store exponent
when bbb... lenght is less than number of bits to store store remaining zeros
(sign) (exponent)  bbbbbbbbbb

Example


1)10.6
10 =   1010 ,  .6=10011001...
For single precision
1) so number in binary is  1010.10011001 with repeating pattern 1001
2) normalize it, 1.01010011001... * 2^3
3) exponent is 3+127 = 130, in binary 10000010
4) sign is 0
fraction becomes repeating pattern  for total length of 23
0 10000010 01010011001..

Converting decimal to IBM/360 floating point format

Steps
1)get the sign bit of the number
2)express decimal number in binary notation
3)normalize it, its base-16 so we move 4 bits at a time instead of 1
since we are moving radix point in binary  *2^4(16) will move radix right 4 places
and if we moved n times exponent is n
like for binary 10.01 we represent it in normal base 16 as
.001001 *  16^1
for binary .00001
.1*16^-1
4)fraction is to the right of radix point in normalized form
5)exponent is 64+power of base-16, express it in binary

Example
2.25
1)sign = +
2)binary for 2.25 decimal = 10.01
3)normalized value = .001001 * 16^1
4)fraction value is 001001 and rest zeros for 24 bits total
5)exponent is 64 + 1 = 65, in binary 1000001
6) number is 0 1000001 001001000000000000000000


Floating Point Arithmetic
1)Align radix point, exponent values should be same for both opernads
2)perform operation
3)

Overflow and Underflow 


Overflow
overflow is check by checking exponent value before and during normalization
positive exponent exceeds the maximum possible exponent value
represented by setting to infinity 
Underflow
occurs when number is too small(near zero) and getting a 1 before radix point in normalized form causes exponent field to be zero 
underflow may result in demormalized values where hidden bit will be zero and exponent wil be all zeros
precision would be reduced 



largest negative number(magnitude)
exponent all 1's = 255-127 = 128
fraction all 1's  = (.111...) = (1/2)^1 +(1/2)^2 .....(1/2)^23 = ((1/2)^24 - (1/2))/(-1/2) = -2^-23+1
significand would be 2-2^-23
sign -1
number is  -(2-2^-23) *2^128
Negative overflow
numbers less than -(2-2^-23) *2^128 are negative underflow
smallest magnitude negative number 
exponent all 0's = -127
fraction all 0's =0
significand =1+0 =1
number - 2^-127
Negative underflow
numbers greater than - 2^-127 are negative underflow
smallest positive number expressible
exponent all zeros
fraction all zeros
significand 1
number 2^-127
Positive underflow
numbers less than 2^-127 are positive underflow
largest positive number
all exponent bits 1
all fraction bits 1 = 1-2^-23
significand = 2-2^-23
number  (2-2^-23)*2^128


Rouding
when number cannot represented in limited precision bits, we round the results
there are number of ways of rounding 
IEEE standard requires
ALU register also has three extra bits  other than significand 1.bbbbb
extra bits in order from lsb
1)guard bit
2)round bit
3)sticky bit

when mantissa has to be shifter to align the radix point, bits that fall off the lsb goes into extra bits.These bits can also be set in division and multiplication
guard and round bits are for precision 
if sticky bit is set to 1 it remains at 1 to indicate what was beyond  lsb
  1. round toward 0. truncates any bits that the representation does not hold. result is approximate value represented err on the side of being closer to 0.
  2. round toward positive infinity. chooses representations that err on the side of being to "the right" as viewed on a number line.
  3. round toward negative infinity. chooses representations that err (for representing approximate values) on the side of being to "the left" as viewed on a number line.
  4. round to nearest. This method attempts to distinguish which approximate value that may be represented is closer to the desired value. 

Number Conversions

All Operations on 7bit number
Decimal Number to 1's compliment form
1)find the representation for positive number that is 127
2)negate the bits  
Represent -127 in 1's compliment
solution
representation of 127 is 0111111 so 1's compliment is negation of this that is1000000
Represent 0 in 1's compliment
1's compliment has two representation for zero +0(all zero's) and -0(will all 1's)

Decimal Number to signed representation 
1)include sign bit (msb)  =1 when number is negative else 0
2)represent the number with remaining n-1 bits in normal form

Represent -23 in signed representation
1)msb =1
2)binary of 23 is 16+4+2+1 = 010111
so signed number is 1010111

Decimal Number to 2's compliment
for negative number
1)find binary for +N
2) -N = (1's compliment[n] +1)

Note remember to convert number to required number of bits
example if we convert 23 to 10111 in 5 bits instead of 6 and perform operation like
23 = 10111
2's compliment = (01000 +1) = 01001 it gives positive number which is incorrect

represent -23
1)23=010111

2)1's compliment of 23=  1101000
2's compliment  is 1101001


2's compliment to decimal number
For negative
if number has more 1's then zero then
1)B = (1's compliment B + 1)
example represent 2's compliment 10111 in decimal
B = -(01000 +1) = -(01001) = -9



    http://pages.cs.wisc.edu/~cs354-1/cs354/karen.notes/flpt.apprec.html
    www.cs.sunysb.edu/~cse220/Homework/HW1-sol.pdf      //IEEE and floating point

    Thursday, December 16, 2010

    Network Layer - Routing

    Transport layer provides service between two processes running on different hosts
    network layer provides service between  hosts
    network layer of sending host transfers packet to nearby router, which switches packet from inbound link to outbound link

    Functions of network layer
    Path determination: Network layer must determine the path taken by packets as they sent.The algorithms to calculate path are called routing algorithm
    Switching:The router must move the inbound packet to outbound link
    Call setup: Protocols like ATM requires a handshake before sending packets similar to transport layer handshake

    Network layer service classification
    Network layer service can be provided as either Datagram service or Virtual circuit service.

    Virtual Circuit(connection oriented service)
    VC setup : sender specifies the destination address to its n/w layer. The n/w layer determines the path , updates the path in each packet switch and may also reserve the resources
    Data Transfer
    VC termination Sender informs n/w layer which in turn informs the end system and updates the table on the path in packet switch to reflect that VC no longer exists
    In connection setup window size maximum packet size data rate timer values like can be negotiated
    File transfer Remote login Video on demand are applications that needs connection oriented service
    There are situations in which connection oriented service can tranmit packet out of order for example in case of user termination signal

    Unlike transport layer in which connection setup only involves two end systems here in n/w layer all the packets switches in the path are also involved

    Datagram(connection less service)
    Sender informs the n/w layer which attaches the destination address to packet(unlike VC) and moves packet in the network. so packets may follow different paths because routing table may have modified
    Best effort service, timing between packets is not guaranteed to be preserved also delivery of the packet is not guaranteed
    short burst applications like user verification, fund transfer uses datagram service

    Routing Principles
    Determines the path that packets are to follow

    Routing Algorithm
    Problem definition Given a set of routers, with links connecting them Find optimal path from source to destination
    Global Routing Algorithms Least cost path is determined using complete knowledge about the n/w. The algorithm takes the connectivity between all nodes and all costs as input
    Calculations can be done at centralized location or can be replicated at multiple sites
    Also called link state algorithms

    Decentralized Algorithm
    No node has the complete information, each node has knowledge of only its directly attached links and then through iterative process of calculation and exchange of information with its neighboring nodes  it calculates the path
    also called Distance vector algorithms as only direction is known to each node to which it needs to forward packet


    Link state routing Algorithm
    each node broadcasts the cost of all its attached links
    Now each node has identical and complete view of the system
    each node can run the algorithm and compute same set of least cost path as other nodes

    Dijkstra Algorithm
    computes least cost path from one node to all other nodes
    for non negative edge path cost only
    After kth iteration of algorithm the least cost path are know to k destination nodes and among the least cost paths to all destination nodes the k path will have k smallest cost
    c(i,j) link cost from node i to node j, if not directly connected its infinity
    c(i,j) = c(j,i)
    p(v):  neighbor of b on least cost path to v

    Steps

    Initially
    mark source node as current node and make it permanent node
    mark each neighboring node with distance from A
    Examine all tentatively labeled nodes and make one with smallest permanent and call it new current node

     After first initialization
    1)Examine all adjacent nodes to current node
    2)If  sum of label on the current node + distance to neighbor node  <  label on neighbor node
      relabel the neighbor node to new distance from current node
    3) check all tentative nodes in whole graph
    4) mark one with smallest distance permanent

    Complexity of algorithm
    In the first iteration we need to check n nodes to find minimum
    In the second iteration we check n-1 nodes to find minimum
    so it has n(n-1)/2

    Distance Vector Algorithm
    Asynchronous iterative self terminating distributed algorithm
    Each node has a distance table in which rows are all destinations in n/w and columns are directly attached neighbors
    Dx(y,z) = C(x,z)+min(Dz(Y,w))
    X  Distance table shows that X can reach Y through Z with cost D(Y,Z)
    Bellman ford algorithm
    graph with negative edge weights
    If graph contains negative cycle i.e sum of weights on cycle comes to be negative then there can be no shortest path
    Bellman ford can detect negative weight cycles and report their existence but cannot produce correct result if negative cycle is reachable from source

    Link cost changes and failures
    when a node X detects change in cost form itself to its neighbor Y it updates the distance table and informs the neighbor Z if minimum cost path has changed
    neighbors Z updates its cost to X from Y and informs its neighbors if it has changed minimum cost path
    It informs X again in case minimum cost path has changed and X updates its table and it continues

    1)node Y updates its distance to X to be 60 it checks its minimum cost path to X has changed from 4 to 6 that is minimum cost path is available from Z to X with cost of  6, so it updates its neightbours
    2)Z updates its minimum cost to Z via Z its 1+ new min cost that is 7. Z sees that it has new minimum cost it updates its neighour
    3)Y gets update from Z that it can reach X in 7 so Y updates its table and so on ..
    This problem is called count to infinity
     Y and Z will exchange data and update their cost to X until the cost from Z to X via Y becomes greater than 50


    Comparision of Distance vector and link state
    Message complexity Link state O(nE) message are sent to let every node know cost of each link
    DV algorithm requires message exchange between directly connected nodes at each iteration
    Convergence Link state suffers from oscillation while DV can converge slowly and suffers count to infinity
    Roubutness

    Internet protocol
    IP is unreliable(no guarantee that packet reaches destination) and connectionless(no state information about successive datagram) protocol

    IP header
    normal size is 20bytes unless options are present
     bits are transmitted in big endian format that is 0-7 8-15 .. which is network byte order in TCP/IP
    fields
    Ip version
    header length(4 bit) : 32 bit words in header including options, limits size of ip header to 60byte
    TOS(4bits):
    minimum delay Telnet Rlogin FTP control TFTP .. interactive applications for small amount of data
    maximum throughput :FTP file transfer .... any bulk data transfer
    Maximum reliability : routing protocols SNMP
    Minimize monetary cost :usenet NNTP
    total lenght(16 bit): length of IP datagram in bytes so maximum 65535 bytes, this field changes when datagram is fragmented
    a host is not required to recieve a datagram larger than 576bytes
    required field because some datalink layer protocols pad small frames to be of minimum length so to know th boundry

    identification field uniquely identifies each datagram sent by host
    TTL upper limit on number of routers though which a datagram can pass, decremented by 1 by each router that handles the datagram, when this field reaches zero datagram is thrown away, prevents the datagram from getting caught in routing loop forever
    header checksum is calculated over IP header only unlike TCP and other protocol.



    IP addressing
    host has single interface while router has multiple interfaces, each inteface has its own IP address.no two interfaec have same IP address
    each interface belongs to a networks and shares that network address the host part specific interface
    32 bit long

    Token Bucket


    Problem Give global distance vector table when

    1) each node knows distance to only its neighbors
    table contains only entries of neighbors 

    2)each node has reported information it had in previous step to its neighbors
    each node now knows about its neighbors neighbor like A now knows of distance to E and F

    3) same step is performed again
    each node now knows about all nodes since there are no more that third level neighbors



    2)each node shares the above information with its neighbors ----------- each node knows about its neighbor neighbors

    Sockets
    connect
    sends SYN
    called from client
    connect return when error or connection estabilitsed
    error
    no response to syn
    rst sent by server hard error
    ICMP error soft error

    Bind
    no need to call bind from client
    binds IP and port
    if not called ephemral port is choosen
    server has to bind some reserved port for it
    if server not binds IP, use that which is in client packet as destination IP

    listen
    called on server
    state from - closed to listen

    accept
    returns next completed connection`

    Problem How many class B networks and hosts are possible
    Solution
    Class B has 16 bits for netid out of which 2bits are fixed and 16bits for hostid
    number of networks = 2^14 - 2 (leaving all ones and all zeros also)
    number of hosts = 2^16 -2

    Problem If delay are recorded as 8 bit number, 50 routers in n/w distance vectors are exchanged twice/sec then how much bandwidth per link is used in both direction
    Solution
    Length of each distance vector exchanged between routers is 50*8bit
    this is the information about all the destinations and minimum distance to them shared by each router with its neighbor
    Now for a link a distance vector is shared by both routers
    there are 2distance vector exchanged per second by each router so for a link total 4 distance vetors would be shared in both directions so usage is 1600bits/sec

    Problem Consider 6Mbps network regulated with token bucket which is filled at rate of 1Mbps Its initially filled with 8Mbits How long can computer transmit at full 6Mbps
    Solution Lets S be the time interval for which it transmits at full 6Mbps
    For this time the total bits transmitted = Tokens consumed in this period
    total bits transmitted in S is 6Mbps * S sec
    total number of tokens consumed during S = tockens already present in bucket + tokens generated in time S
     = 8Mbits + 1Mbps*S
    so 6S = 8+S
    S=1.6

    Problem

    Problem
    http://www.seas.gwu.edu/~cheng/232/