Tuesday, 6 September 2016

Construction of an FA from an RE

We can use Thompson's Construction to find out a Finite Automaton from a Regular Expression. We will reduce the regular expression into smallest regular expressions and converting these to NFA and finally to DFA.

Some basic RA expressions are the following −

Case 1 − For a regular expression ‘a’, we can construct the following FA −

Case 2 − For a regular expression ‘ab’, we can construct the following FA −

Case 3 − For a regular expression (a+b), we can construct the following FA −

Case 4 − For a regular expression (a+b)*, we can construct the following FA −

Method

Step 1	Construct an NFA with Null moves from the given regular expression.
Step 2	Remove Null transition from the NFA and convert it into its equivalent DFA.

Problem

Convert the following RA into its equivalent DFA − 1 (0 + 1)* 0

Solution

We will concatenate three expressions "1", "(0 + 1)*" and "0"

Now we will remove the ε transitions. After we remove the ε transitions from the NDFA, we get the following −

It is an NDFA corresponding to the RE: 1 (0 + 1)* 0. If you want to convert it into a DFA, simply apply the method of converting NDFA to DFA

Arden's Theorem

In order to find out a regular expression of a Finite Automaton, we use Arden’s Theorem along with the properties of regular expressions.

Statement −

Let P and Q be two regular expressions.

If P does not contain null string, then R = Q + RP has a unique solution that is R = QP^*

Proof −

R = Q + (Q + RP)P [After putting the value R = Q + RP]

= Q + QP + RPP

When we put the value of R recursively again and again, we get the following equation −

R = Q + Q^P + QP² + QP³…..

R = Q (ε + P + P² + P³ + …. )

R = QP^* [As P* represents (ε + P + P² + P³ + ….) ]

Hence, proved.

Assumptions for Applying Arden’s Theorem −

The transition diagram must not have NULL transitions
It must have only one initial state

Method

Step 1 − Create equations as the following form for all the states of the DFA having n states with initial state q₁.

q₁ = q₁R₁₁ + q₂R₂₁ + … + q_nR_n1 + ε

q₂ = q₁R₁₂ + q₂R₂₂ + … + q_nR_n2

…………………………

……………………………

q_n = q₁R_1n + q₂R_2n + … + q_nR_nn

R_ij represents the set of labels of edges from q_i to q_j, if no such edge exists, then R_ij = ∅

Step 2 − Solve these equations to get the equation for the final state in terms of R_ij

Problem

Construct a regular expression corresponding to the automata given below −

Solution

Here the initial state is q₂ and the final state is q₁.

The equations for the three states q₁, q_{2_{, and q₃ are as follows −}}

q₁ = q₁a + q₃a + ε (ε move is because q₁ is the initial state0

q₂ = q₁b + q₂b + q₃b

q₃ = q₂a

Now, we will solve these three equations −

q₂ = q₁b + q₂b + q₃b

= q₁b + q₂b + (q₂a)b (Substituting value of q₃)

= q₁b + q₂(b + ab)

= q₁b (b + ab)^* (Applying Arden’s Theorem)

q₁ = q₁a + q₃a + ε

= q₁a + q₂aa + ε (Substituting value of q₃)

= q₁a + q₁b(b + ab^*)aa + ε (Substituting value of q₂)

= q₁(a + b(b + ab)*aa) + ε

= ε (a+ b(b + ab)*aa)^*

= (a + b(b + ab)*aa)^*

Hence, the regular expression is (a + b(b + ab)*aa)^*.

Problem

Construct a regular expression corresponding to the automata given below −

Solution

Here the initial state is q₁ and the final state is q₂

Now we write down the equations −

q₁ = q₁0 + ε

q₂ = q₁1 + q₂0

q₃ = q₂1 + q₃0 + q₃1

Now, we will solve these three equations −

q₁ = ε0^* [As, εR = R]

So, q1 = 0^*

q2 = 0*1 + q20

So, q2 = 0*1(0)^* [By Arden’s theorem]

Hence, the regular expression is 0^*10^*.

Turing machine

Turing Machines

Turing Machine

The languages accepted by Turing machine are said to be recursively enumerable. A Turing Machine (TM) is a device with a finite amount of read only hard memory (states) and an unbounded amount of read/write tape memory.

Recursive languages are closed under complementation, union, intersection, concatenation and Kleene closure.
A Turing machine is said to be partially decide a problem, if the following two conditions are satisfied.
1. The problem is a decision problem.
2. The Turing machine accepts as given input if and only if the problem has an answer ‘yes’ for the input that is the Turing machine accepts the language L.
A Turing machine is said to be decide a problem, if it partially decides the problem and all its computations are halting computations.

Universal Turing Machine (UTM)

A UTM is a specified Turing machine that can simulate the behaviour of any TM.
A UTM is capable of running any algorithm.

For simulating even a simple behaviour, a Universal Turing Machine must have a large number of states. If we modify our basic model by increasing the number of read/write heads, the number of dimensions of input tape and adding a special purpose memory, then we can design a Universal Turing Machine.

Definition of Turing Machine

A Turing Machine (TM) is defined as 7-tuples.

TM = (Q, Σ, Γ, δ, q₀, b, F),

where, Q is a finite non-empty set of states, Σ is a non-empty set of input symbols (alphabets) which is a subset of Γ and b ∈ Σ, Γ is a finite non-empty set of tape symbols, δ is the transition function which maps (Q × Γ) to (Q × Γ × {L, R}), q₀ is the initial state and q₀ ∈ Q, b is the blank and b ∈ Γ, F is the set of final states and F ⊆ Q.

Transition Function of a Turing Machine

The transition function Q × Γ → Q × Γ × {L, R} states that if a Turing machine is in some state (from set Q), by taking a tape symbol (from set Γ), it goes to some next state (from set ï) by overwriting (replacing) the current symbol by another or same symbol and the read/write head moves one cell either left (L) or right (R) along the tape.

Transactions and Concurrency Control

Atomicity: Atomicity requires that each transaction is all or nothing. If one part of .the transaction fails, the entire transaction fails and the database state is left unchanged.

Consistency: If each transaction is consistent and the data base starts on as consistent, it ends up as consistent

Isolation: Execution of one transaction is isolated from that of another transactions. It ensures that concurrent execution of transaction results in a system state that would be obtained if transaction were executed serially, i.e., one after the other.

Durability: Durability means that once a transaction has been committed, it will remain even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently.

If a transaction commits, its effects persist.

A transaction starts with any SQL statement and ends with a COMMIT or ROLLBACK.
COMMIT statement makes changes permanent to the database.
ROLLBACK statement reverses changes.
A transaction includes one or more database access operations. These can include insertion, deletion, modification or retrieval operations.

Concurrency Control: Process of managing simultaneous execution of transactions in a shared database, is known as concurrency control. Basically, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

Need of Concurrency Control: Simultaneous execution of transactions over a shared database can create several data integrity and consistency problems.

Lost Update: This problem occurs when two transactions that access the same database items, have their operations interleaved in a way that makes the value of some database items incorrect.

Dirty Read Problems: This problem occurs when one transaction reads changes the value while the other reads the value before committing or rolling back by the first transaction.

Inconsistent Retrievals: This problem occurs when a transaction accesses data before and after another transaction(s) finish working with such data.

We need concurrence control, when

The amount of data is sufficiently great that at any time only fraction of the data can be in primary memory and rest should be swapped from secondary memory as needed.
Even if the entire database can be present in primary memory, there may be multiple processes.

Schedule: A schedule (or history) is a model to describe execution of transactions running in the system. When multiple transactions are executing concurrently in an interleaved fashion, then the order of execution of operations from the various transactions is known as a schedule or we can say that a schedule is a sequence of read, write, abort and commit operations from a set of transactions.

Classification of Schedules Based on Serializability: The schedules can be classified as Serial Schedule. A schedule in which the different transactions are not interleaved (i.e., transactions are executed from start to finish one-by-one).

Complete Schedule: A schedule that contains either a commit or an abort action for each transaction. Consequently, a complete schedule will not contain any active transaction at the end of the schedule.

Non-serial Schedules: Non-serial schedules are interleaved schedules and these schedules improve performance of system (i.e., throughput and response time). But concurrency or interleaving operations in schedules, might lead the database to an inconsistent state. We need to seek to identify schedules that are:

As fast as interleaved schedules.
As consistent as serial schedules.

Conflicting Operations: When two or more transactions in a non-serial schedule execute concurrently, then there may be some conflicting operations. Two operations are said to be conflicting, if they satisfy all of the following conditions.

The operations belong to different transactions.
Atleast one of the operations is a write operation.
The operations access the same object or item.

The following set of operations is conflicting

While the following sets of operations are not conflicting

Serializable Schedule: A schedule S of n transactions is serializable, if it is equivalent to some serial schedule of the same n transactions.

A non-serial schedule S is serializable is equivalent to saying that it is correct, because it is equivalent to a serial schedule. There are two types of serializable schedule

Conflict serializable schedule
View serializable schedule

Conflict Serializable Schedule: When the schedule (S) is conflict equivalent to some serial schedule (Sï), then that schedule is called as conflict serializable schedule. In such a case, we can reorder the non-conflicting operations in S until we form the equivalent serial schedule S’.

Conflict Equivalence: The schedules S₁ and S₂ are said to be conflict equivalent, if the following conditions are satisfied:

Both schedules S₁ and S₂ involve the same set of transactions (including ordering of operations within each transaction).
The order of each pair of conflicting actions in S₁ and S₂ are the same.

Testing for Conflict Serializability of a Schedule: There is a simple algorithm that can be used to test a schedule for conflict serializability. This algorithm constructs a precedence graph (or serialization graph), which is a directed graph. A precedence graph for a schedule S contains

A node for each committed transaction in S.
An edge from T_i to T_j, if an action of T_i precedes and conflicts with one of T_j’s operations.

A schedule S is conflict serializable if and only if its precedence graphs is acyclic.

Note: The above example of schedule is not conflict serializable schedule.

View Serializable Schedule: A schedule is view serializable, if it is view equivalent to some serial schedule. Conflict serializable ⇒ View serializable, but not vice-versa.

View Equivalence: Two schedules S₁ and S₂ are view equivalent,

T_i reads initial value of database object A in schedule S₁ then T_i also reads initial value of database object A in schedule S₂.
T_i reads value of A written by T_j in schedule S₁ then T_i also reads value of A written by T_j in schedule S₂.
T_i writes final value of A in schedule S_l then T_i also writes final value of A in S₂.

In the above example, both schedules S₁ and S₂ are view equivalent. So, they are view serializable schedules. But S₁ schedule is not conflict serializable schedule and S₂ is not conflict serializable schedule because cycle is not formed.

Recoverable Schedule: Once a transaction T is committed, it should never be necessary to rollback T. The schedules under this criteria are called recoverable schedules and those do not, are called non-recoverable. A schedule S is recoverable, if no transaction T in S commits until all transactions T’, that have written an item that T reads, have committed.

Cascadeless Schedule: These schedules avoid cascading rollbacks. Even, if a schedule is recoverable to recover correctly from failure of transaction T_i, we may have to rollback several transactions. This phenomenon in which a single transaction failure which leads to a series of transaction rollbacks, is called cascading rollbacks.

Cascading rollback is undesirable, since it leads to the undoing of a significant amount of work. It is desirable to restrict the schedules to those, where cascading rollback cannot occur. Such schedules are called cascadeless schedules.

Cascadeless Schedule

Strict Schedule: A schedule is strict,

If overriding of uncommitted data is not allowed.
Formally, if it satisfies the following conditions
1. T_j reads a data item X after T_i has terminated (aborted or committed).
2. T_j writes a data item X after T_i has terminated (aborted or committed).

Concurrency Control with Locking : If concurrency control with locking technique is used, then locks prevent multiple transactions from accessing the items concurrently.

Access on data only, if TA ‘has lock’ on data.
Transactions request and release locks.
Schedule allows or differs operations based on lock table.

Lock: A lock is a variable associated with a data item that describes the status of the item with respect to possible operations that can be applied to it. (e.g., read lock, write lock). Locks are used as means of synchronizing the access by concurrent transactions to the database items. There is one lock per database item.

Problems Arising with Locks: There are two problems which are possible when using locks to control the concurrency among transactions

Deadlock: Two or more competing transactions are waiting for each other to complete to obtain a missing lock.
Starvation: A transaction is continually denying the access to a given data item.

Purpose of Concurrency Control with Locking

To enforce isolation among conflicting transactions.
To preserve database consistency.
To resolve read-write, write-read and write-write conflicts.

Locking is an operation that secures the permission to read, and permission to write a data item.

Lock (X): Data item X is locked on behalf of the requesting transaction.

Unlocking is an operation which removes these permissions from the data item.

Unlock (X): Data item X is made available to all other transactions.

Types of Locks: The two types of Locks are given below

Binary Locks: A binary lock has two states: Locked/Unlocked. If a database item X is locked, then X cannot be accessed by any other database operation.

Shared/Exclusive Locks (Read/Write Locks): There are three states:- Read locked / Writelocked / Unlocked. Several transactions can access the same item X for reading (shared lock), however if any of the transactions want to write the item X, the transaction must acquire an exclusive lock on the item.

Guaranteeing Serializability by Two-phase Locking: A transaction is said to follow the two-phase locking protocol, if all locking operations precede the first unlock operation in the transaction.

Such a transaction can be divided into two phases:

Expanding or Growing Phase During which new locks on items can be acquired but none can be released.
Shrinking Phase During which existing locks can be released but no new locks can be acquired.

Variations of Two-phase Locking (2PL) Technique:

Basic 2PL (2 Phase Locking) Transaction locks data items incrementally. This may cause the problem of deadlock.
Conservative 2PL (Static 2PL) Prevents deadlock by locking all desired data items before transaction begins execution. However, it is difficult to use in practice because of the need to predeclare the read-set and write-set, which is not possible in most situations.
Strict 2PL A more stricter version of basic algorithm, where unlocking of write lock is performed after a transaction terminates (commits or aborts and rolled back). Hence, no other transaction can read or write an item that is written by transaction.
Rigorous 2PL Strict 2PL is not deadlock free. A more restrictive variation of strict 2PL, is rigorous 2PL, which also guarantees strict schedule. In this variation, a transaction does not release any of its locks (exclusive or shared) until after it commits or aborts.

Deadlock: A deadlock is a condition in which two or more transaction are waiting for each other deadlock (T₁ and T₂). An example is given below

Deadlock Prevention: A transaction locks all data items; it refers to before it begins execution. This way of locking prevents deadlock, since a transaction never waits for a data item. The conservative two-phase locking uses this approach.

Deadlock Detection and Resolution: In this approach, deadlocks are allowed to happen. The scheduler maintains a wait-for-graph for detecting cycle. If a cycle exists, then one transaction involved in the cycle is selected (victim) and rolled back.

A wait-for-graph is created using the lock table. As soon as a transaction is blocked, it is added to the graph.
When a chain like T_i waits for T_j, T_j waits for T_K and T_k waits for T_i or T_joccurs, then this creates a cycle. One of the transactions of the cycle is selected and rolled back.

Deadlock Avoidance: There are many variations of two-phase locking algorithm. Some avoid deadlock by not lefting the cycle to complete. This is as soon as the algorithm discovers that blocking a translation is likely to create a cycle, it rolls back the transaction.

Following schemes use transaction times-tamps for the sake of deadlock avoidance

Wait-die scheme (Non-preemptive): Older transaction may wait for younger one to release data item. Younger transactions never wait for older ones; they are rolled back instead. A transaction may die several times before acquiring the needed data item.

Wound-wait scheme (Preemptive): Older transaction bounds (forces rollback) of younger transaction instead of waiting for it. Younger transactions may wait for older ones. May be fewer rollbacks than wait-die scheme.

Starvation: Starvation occurs when a particular transaction consistently waits or restarted and never gets a chance to proceed further. In a deadlock resolution scheme, it is possible that the same transaction may consistently be selected as victim and rolled back.

Time-stamp Based Concurrency Control Algorithm: This is a different approach that guarantees serializability involves using transaction time-stamps to order transaction execution for an equivalent serial schedule.

Time-stamp: A time-stamp is a unique identifier created by the DBMS to identify a transaction.

Time-stamp is a monotonically increasing variable (integer) indicating the age of an operation or a transaction.
A larger time-stamp value indicates a more recent event or operation.

Starvation versus Deadlock

Pumping Lemma

Pumping Lemma for regular languages

Suppose that a language L is regular. Then there is a FA that accepts L.
Let n be the number of states of that FA. Then for any string x in L with |x| ≥ n, there are strings u, v and w which satisfy the following:
- x = uvw
- |uv| ≤ n
- |v| > 0 is same as v ≠ ε
- For every integer m ≥ 0, uv^mw ∈ L.
If L is regular then for every x such that |x| ≥ n then there exists uvw such that x=uvw, v ≠ ε, |uv| ≤ n, and for which uvⁱw is in L for every i.

Pumping Lemma gives a necessity for regular languages.

Pumping Lemma is not a sufficiency, that is, even if there is an integer n that satisfies the conditions of Pumping Lemma, the language is not necessarily regular.

Pumping Lemma can not be used to prove the regularity of a language.

It can only show that a language is non-regular.

Example: L = a^kb^kis non-regular, where k is a natural number.

Suppose that L is regular and let n be the number of states of an FA that accepts L. Consider a string x = aⁿbⁿ for that n.
Then there must be strings u, v, and w such that
x = uvw,
|uv| ≤ n
|v| > 0, and
for every m ≥ 0, uv^mw ∈ L.
Since |v| > 0, v has at least one symbol.
Also since |uv| ≤ n, v = a^p, for some p > 0,
Let us now consider the string uv^mw for m = 2.
Then uv²w = a^n-pa^2pbⁿ = a^n+pbⁿ. Since p > 0 , n + p ≠ n .
Hence a^n+pbⁿ can not be in the language L represented by a^kb^k.
This violates the condition that for every m ≥ 0, uv^mw ∈ L.
Hence L is not a regular language.

Pumping Lemma for CFL’s

Let L be a CFL. Then there exists a constant N such that if z ∈L s.t. |z|≥N, then we can write z=uvwxy,
|vwx| ≤ N
vx ≠ ε
For all k ≥ 0 : uv^kwx^ky ∈ L

Impotant properties in Theory of computation

Regular and Context Free Languages

Every regular language is also CFL, but every CFL need not be regular.
Every DCFL is also CFL, but every DCFL need not be regular.
Every regular language is also DFCL, but every DCFL need not regular.

Regular Languages

{w | w ∈ {a, b }^* }
{aw | w ∈ {a, b }^* }
{bw | w ∈ {a, b }^* }
{wa | w ∈ {a, b }^* }
{awb | w ∈ {a, b }^* }
{w₁abw₂ | w₁,w₂ ∈ {a, b }^* }
{ a^mbⁿ | m,n>0 }
{ a^mbⁿc^k | m,n,k>=0}
{a²ⁿ | n>=0}

Non-Regular Languages

{ aⁿbⁿ | n is a positive integer }
{ ww | w ∈ {a, b }^* }
{w | w has an equal number of a’s and b’s}
{w| w is a palindrome of a’s and b’s}
{aⁿ | n is prime}

Deterministic CFLs (DCFLs)

{ aⁿbⁿ | n is a positive integer }
{w | w has an equal number of a’s and b’s}
{ a^mbⁿ | m < n}
{ a^mbⁿ | m = 2n}
{ a^mbⁿc^k | if m is even, then n=k}
{wCw^R | w ∈ {a, b }^* , C is a special symbol and w^R is the reverse of string w}

CFL’s (NCFL’s)

{ww^R | w ∈ {a, b }^* , and w^R is the reverse of string w}
{ a^mbⁿc^k | m=n or n=k}
{ a^mbⁿc^k | if m=n, then n=k}
All regulars
All DCFLs

Non-CFL’s

{ ww | w ∈ {a, b }^*}
{ aⁿbⁿcⁿ | n>=0}
{aⁿ | n is prime}
{ a^mbⁿc^k | m<n<k}

Important Properties:

Let L be a Context Free Languages, and R be a regular language. Then
- L ∩ R = always CFL and need not be regular
- L ∪ R = always CFL and need not be regular
- L – R = always CFL and need not be regular
- R – L = Always CSL but need not be CFL
Let D be a DCFL, and R be a regular language. Then
- D ∩ R = always DCFL and need not be regular
- D ∪ R = always DCFL and need not be regular
- D – R = always DCFL and need not be regular
- R – D = Always DCFL but need not be regular

undecidable problems in Theory of Computation

Undecidability

Decidable Problem

If there is a Turing machine that decides the problem, called as Decidable problem.
A decision problem that can be solved by an algorithm that halts on all inputs in a finite number of steps.
A problem is decidable, if there is an algorithm that can answer either yes or no.
A language for which membership can be decided by an algorithm that halts on all inputs in a finite number of steps.
Decidable problem is also called as totally decidable problem, algorithmically solvable, recursively solvable.

Undecidable Problem

A problem that cannot be solved for all cases by any algorithm whatsoever.
Equivalent Language cannot be recognized by a Turing machine that halts for all inputs.

The following problems are undecidable problems:

Halting Problem: A halting problem is undecidable problem. There is no general method or algorithm which can solve the halting problem for all possible inputs.
Emptyness Problem: Whether a given TM accepts Empty?
Finiteness Problem: Whether a given TM accepts Finite?
Equivalence Problem: Whether Given two TM’s produce same language?. Is L(TM1) = L(TM2) ?
Is L(TM1) ⊆ L(TM2) ? (Subset Problem)
Is L(TM1) Ո L(TM2) = CFL?
Is L(TM1) = Σ* ? (Totality Problem)
Is the complement of L(G1) context-free ?

Undecidable problems are two types: Partially decidable (Semi-decidable) and Totally not decidable.

Semi decidable: A problem is semi-decidable if there is an algorithm that says yes. if the answer is yes, however it may loop infinitely if the answer is no.
Totally not decidable: A problem is not decidable if we can prove that there is no algorithm that will deliver an answer.