Computer Science Cafe: serializability

Showing posts with label serializability. Show all posts

Wednesday, November 24, 2010

Serializability, Isolation Recoverability - Concurrency Control in Database

Why we want to run transactions concurrently?

Concurrent or overlapping execution of transactions are efficient

How we ensure the correctness of the concurrent transactions?

Concurrency control(Serializability) and Recovery are two criteria that ensure the correctness of concurrent transactions

Why concurrency control is needed

Lost Update : update of some data by one transaction is lost by update from another transaction
Dirty Read or temporary update problem : one transaction updates the value of common data and aborts before it can revert the changes transaction 2 reads the value of updated variable.
incorrect summary problem: transaction reads the data while another transaction is still changing the data

Why recovery is needed

In any kind of problem like hardware malfunction, software error exceptions or violating the concurrency property, deadlock recovery of transaction is needed

Transaction states

begin transaction marks the beginning of transaction.
end_transaction specifies transaction execution is complete and system check whether changes can be permanently applied.
rollback or abort for unsuccessful end of tranaction

Fig. transaction states

At commit point all transaction operations have been logged and new entry is done in log 'commit T' stating that all transaction operation permanently logged
before writing commit T the complete log should be written to disk from buffers
Rollback : when commit T statement is not found in log, its rollbacked

How recoverability is implemented

System log is kept on disk which logs transaction like write old value new value read.
Protocol that do not provide cascading rollback do not need to keep read entry

Schedules
Recoverable Schedule : If T2 reads a data item written by T1 commit operation of T1 should appear before commit operation of T2.

Cascadeless Schedule: If T2 reads a data item written by T1 commit operation of T1 should appear before read operation of T2.
Strict Schedule : If a write operation of T1 precedes a conflicting operation of T2 (either read or write), then the commit event of T1 also precedes that conflicting operation of T2.

Fig. schedules recoverability

pattern for recoverable schedule would be like

Fig. simple pattern for recoverability schedules on vertical time line

What is Serializability

If executing interleaved transaction results in same outcome as serial schedule(running transaction in some sequnence) then they are considered serializable. this schedule is type of nonserial schedule.

Serializabality might be compromised in some cases but recoverability compromise would mean violating database integrity. Isolation levels decide tradeoff between correctness and concurrency

Isolation level
when the changes made by one operation becomes visible to another operation
most relaxed ACID property
From more stringent to relaxed isolation levels
Serializable as if transactions execute in complete isolation, in serial fashion. read locks released immediately but write locks released at end of transaction
Read Uncommited Dirty reads allowed that is another transaction reads the data value but then first transaction can revert so other transaction has incorrect value
Repeatable read: A phantom read occurs when range of rows is read by one transaction and another transaction inserts or delete a record in between rows returned by same query is different at two points in time for a transaction that is another transaction updates and commits in between

Read commited(Non repeatable read):prevents dirty read that is reads commited data only.
UPDATE or DELETE of any of the rows read by earlier transaction

	Dirty Read	Lost Update	Phantom Records
Read uncommitted	yes	yes	yes
Read committed	No	yes	yes
Repeatable read	No	no	yes
Serializable	No	no	no

Types of serializability

View and conflict serializability

conflict is subset of view serializability
Conflict is widely utilized because it is easier to determine and covers a substantial portion of the view serializable

Equivalence to serial schedule such that

In view serializable, two schedules write and read the same data values.
and In conflict seriablizable, same set of respective chronologically ordered pairs of conflicting operations.

Conflict serializable
In conflict serializabability two schedules are conflict equivalent and we can reorder the non conflicting operation to get the serial schedule
Conflicting operation
1) they are upon same data item
2)At least one of them is write
3) they are from different transactions
Non commutative that is their orders matter

Testing conflict serializability
Test is through precedence graph. The acyclic preced graph shows conflict serialiczable schedule and topological sorting of that graph gives the serializable schedules
cycle of commited transaction can be prevented by aborting an undecided transaction on each cycle in precedence graph of all transaction

view serializability is NP complete
materialized conflict if the requested conflicting operation is actually executed
//pending

Problem 1 Consider two schedules
S1 : r1(x) w1(x) r1(y) c1
S2: r2(x) r2(y) c2
1.What the number of possible schedules.
2. How many serial schedules are possible
3. Determine type of schedules(recoverable cascadeless strict) and serializability(conflict seriablizable or not) for below schedules
S1: r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1
S2: r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2
S3: r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2

Solution

1.
In general, given m transactions with number of operations n1, n2, ..., nm, the number
of possible schedules is: (n1 + n2 + ... + nm)! / (n1! * n2! * ... * nm!)
here m=2 and n1= 4 and n2 = 3
so (3+4)! / (3!*4!)
7C3 or 7C4
2.Number of serial schedules is m! where m is number of transactions
so here we have 2!

3. To check type of schedule
we need to check all conflicting operations of schedule
S1 : r2(X) comes after w1(x) , since no commit before r2(X) its not cascadeless and strict
commit of T1 should be before T2 in this case for it to be recoverable which is not so it is not even recoverable. Its non recoverable schedule
S2: there is only one conflicting operation w1(X) w2(X) and for that we have commit of T1 before w2(X) so its strict schedule and so is cascadeless also
S3: for one of the conflicting operation w1(X) r2(X) there is no commit in between so its not cascadeless and strict. To check for recoverable commit of T1 should be before that of T2 which is there

To check serializability
S1 precedence graph is
T1 ---> T2
acyclic so its conflict serializable

S2 precedence graph
T1-------->T2
^ |

|_________|

there is cycle so its not conflict serializable

//pending problems
http://academic.udayton.edu/SaverioPerugini/courses/Winter2006/cps432/index.html

Friday, October 29, 2010

Locking Protocol for concurrency control

Two way phase locking 2PL

guarantees conflict serializability
may be subjected to deadlocks
doesnot guarantees cascading rollbacks
In expanding phase, number of locks can only increase
In shrinking phase locks only released

2PL is superset of SS2PL(rigourness)

Strict 2PL
For any two transactions T1, T2, if a write operation precedes a conflicting operation of T2 (either read or write), then the commit event of T1 also precedes that conflicting operation of T2.

Any strict schedule is cascadeless, but not the converse. Strictness allows efficient recovery of databases from failure.

//http://en.wikipedia.org/wiki/Schedule_%28computer_science%29#Strict
Release exclusive lock only after end of transaction that is unlock happens only after commit or abort
guarantees serialiability and recoverable schedule too
read locks can be released
S2PL class of schedule is (2PL intersection Strictness)
release its write locks only after it has ended
read locks are released regularly during phase 2
we need to know the phase 1 end seperate from transaction end

Transaction can be in following states

Running its executing
Ready its programs execution has ended and its waiting to be Ended
Ended or completed It is either commited or Aborted de

SS2PL Strong Strict 2PL (Rigorous )
release both locks only after END
has actully one phase only, no phase 2
SS2pL enforce both conflict serializability and strictness

Deadlock in 2PL

Other enforcing mechanism for serializability
Time stamp ordering

Correctness of database transaction is defined by its
serializability or isolation and its recoverability

relaxing the serializability criteria isolation levels

view and conflict serializability

Problem 1

Determine whether
1)schedule is serializable or not
2)schedule can be produced by 2PL
3)schedule can be produced by Strict 2PL

Schedule S1

T1	T2	T3
		r(D)
w(A)
	r(B)
w(B)
	r(D)
		w(D)

Solution

1) To check the serializability we draw the precedence graph of the schedule

http://www-stud.uni-due.de/~selastoe/?mdl=dbms&mode=precedence

T2--------->T1
|
|
v
T3

Since the graph is acyclic, so its serializable

2)
Non-serializable schedules can not be 2PL or strict 2PL.
Subset of serializable schedules can be 2PL or strict 2PL

2PL protocol says for a transaction all locks should be requested before releasing any lock

T1	T2	T3
		Ls(D)
		r(D)
Lx(A)
w(A)
	Ls(B)
	r(B)
	Ls(D) U(B)
Lx(B)
w(B)
U(A) U(B)
	r(D)
	U(D)
		Lx(D)
		w(D)
		U(D)

Fig. showing how 2 phase locking protocol works for this schedule

Here transaction requested shared lock for read and exclusive lock for write
Shared locks on data item are compatible like here T2 requested shared lock on data item D which was shared locked by T3.

If we look at only the transaction lock requests we can see that for each transaction we first requested all the
locks before releasing any

T1	T2	T3
		Ls(D)

Lx(A)

	Ls(B)

	Ls(D) U(B)
Lx(B)

U(A) U(B)

	U(D)
		Lx(D)

		U(D)

Fig showing only the lock request and lock release for all transactions

Ls = shared lock
Lx = Exclusive lock
U = Unlock

converting shared lock to exclusive lock is part of growing phase while
converting exclusive lock to shared lock is part of shrinking phase

Problem read to write lock and write to read lock conversions
Solution A read lock can be converted to write lock if no other readers, this converion happens by first unlocking the data item and then acquiring the write lock, if the request are already queued by some other processes for write lock might not get lock instantly
write to read is always possible.

If transaction Ti has shared lock and transaction Tj request exclusive lock on same item first transaction Ti has to release the lock and then transaction Tj can acquire the lock

Problem consider schedule S =T1 R(x), T2:R(Y) ,T1:W(z) , T1:commit, T3:R(y) , T3:r(z) ,T2 w(y), T3: w(x),T2:commit , T:commit

2PL
T1       T2          T3
Ls(x)
r(x)
               Ls(y)
               r(y)
Lx(z)
w(z)
U(x)U(z)
commit;
                                Ls(y)
                                 r(y)
                                Ls(z)
                                 r(z)
                                 Lx(x) U(y)
                 Lx(y)
                 w(y)
               U(y)
                commit;
                               w(x)
                               U(x)
                              commit;

Strick 2PL would require that T2 gets the Lx(y) while T3 has shared lock on y but T3 commits after T2 request so it cannot get the Lx(y)

In the 2PL we are allowed to read or write even after some unlock only condition is no lock can come after first unlock

In S2pL all exclusive lock unlocks comes at the end after commit or abort

Dealing with Deadlock
Deadlock Prevention
A transaction locks all data item it refers to before execution this prevents deadlocks since transaction never has to wait
Another approach is to impose ordering on all the data items and to require that transactions lock data items only in sequence consistent with ordering example tree protocol

Conservative 2PL uses this approach

Deadlock detection and resolution
wait for graph,created using lock table, is kept which is used to detect the cycle and then one of the transaction is rolled back
transactions are added to wait for graph as they get blocked

Deadlock Avoidance
some avoid deadlock by not letting the cycle to complete, if discovers that blocking a transaction is likely to create a cycle it rollback the transaction
wound wait and wait die algorithms use timestampt to avoid deadlock by rolling back victim

Starvation
particular transaction never gets a chance to proceed further

Timestamp Protocol
selects ordering among transaction in advance instead at time of execution

timestamp monotonically increasing
timestamp is assigned by system before transaction starts
timestamp for transaction determines serializability order thus if Ti < Tj then system must ensure that Ti appears before transaction Tj in schedule

we associate with each data item Q two timestamps
W-timestamp(Q) largest timestamp of any transaction that executed write (Q) successully
simliarly R-timestamp(Q) for read

Timestamp ordering Protocol
ensures conflict serializability
ensures freedom from deadlock since no transaction ever waits , its just killed
possibility of starvation
can generate schedules not recoverable but can be guaranteed in several ways

1) when transaction T issues write(X) check if
read_TS(X) > TS(T) or write < TS then some younger transaction has already read the data item X so abort
and rollback T
when transaction issues read(X)
if younger transaction has already written to the data item X abort and rollback T
ensures that any conflicting operations are executed in timestamp order

Strict Timestamp Ordering
ensures strict schedule along with conflict serializable
instead of writing when transaction has higher timestamp then on data item it waits until transaction that wrote value of data item commits or aborts

Thomas write Rule
does not ensures conflict serializability
1)IF read_ts(X) >TS(T) that is some younger transaction read the data item before T
abort and rollback T and reject operation
2)write_ts(x) > ts(T) donot execute the write but continue and will be detected in 1)

Pages

Wednesday, November 24, 2010

Serializability, Isolation Recoverability - Concurrency Control in Database

Friday, October 29, 2010

Locking Protocol for concurrency control