Posts Tagged ‘database’

Transaction Isolation levels

Tuesday, June 1st, 2010

Isolation levels come into play when a transaction requires an exclusive right to use a resource thereby isolating it from other transactions. This isolation is provided by obtaining locks on the resource. What locks need to be set for the transaction is determined by SQL Server referring to the Isolation Level that has been set. Lower Isolation Levels allow multiple users to access the resource simultaneously thereby increasing concurrency but also introducing the possibility of concurrency related problems. Higher Isolation Levels eliminate concurrency related problems and increase the data accuracy but they may introduce blocking and in some cases result in deadlocks.


The isolation levels defined by the ANSI/ISO SQL standard are:

TRANSACTION ISOLATION LEVEL READ UNCOMMITTED:

This level of isolation allows users to read data that has not been committed. This increases throughput since the user does not have to wait for the other transaction to complete its execution before it can read the data. But if the user reads data that has been modified by another transaction, which happens to fail at some point of time and roll-back, the data read initially by the user, is wrong or inconsistent. This is called dirty reads. To illustrate with an example, 

Create a new database.

Create table Employee with the following fields: Emp_Name, Salary

Open two query windows in SQL Server and execute the following lines of code.

First query window:

begintran

UPDATE [Employee]
SET [Emp_Name]='ABC',
    [Salary]=1234

Second query window:

begintran

SETTRANSACTIONISOLATIONLEVELREAD UNCOMMITTED
SELECT*FROM [Employee]

Execute the query in the first window. After that, execute the second query in a different query window. Note that the transaction in the first window has not committed or ended.

The result of the select statement gives,

Emp_Name | Salary
ABC| 1234

Now go back to the first query window and execute the command,

ROLLBACKTRAN

The value in the [Employee] table goes back to the original value, say ‘DEF’.

So the second user ended up with the wrong data or was subjected to a dirty read. This happens because the level of isolation allows you to read data that is not yet committed.

TRANSACTION ISOLATION LEVEL READ COMMITTED:

It offers a higher level of isolation as it prevents users from reading data that has not been commited yet.
Execute the same steps as above, after changing the level of isolation as‘READ COMMITTED’ in the second query window.

First query window:

begintran

UPDATE [Employee]
SET [Emp_Name]= 'ABC',
    [Salary]=1234

Second query window:

begintran

SETTRANSACTIONISOLATIONLEVELREAD COMMITTED
SELECT*FROM [Employee]

Execute the query in the first window.Now on executing the second query, we do not get any result and the status is shown as ‘executing’.

SQL query
Now go back to the first query window and commit the transaction.

committran

As soon as the first transaction commits, the result is displayed in the second query window. As a result, the end-user always gets data that is consistent.In this isolation level, read locks are acquired on selected data but they are released immediately whereas write locks are released only at the end of the transaction. This level of isolationhas an adverse effect on performance as the second process has to wait for the first transaction to commit.

TRANSACTION ISOLATION LEVEL REPEATABLE READ:

This level of isolation implies that once a transaction reads existing data, read data would never be changed in that transaction irrespective of how many times the same select statement is issued.

Consider the case where two users are accessing data from the same table. User1 beginstransaction and executes a select query on the table. At the same time user2 performs an update on the same table. Read Committed isolation level does not prevent this from happening. Now when user1 executesthe same select query again, within the same transaction he gets a set of values different from the first select statement.This is a data consistency problem related to Read Committed isolation Level called Non Repeatable Read.

This can be avoided in Repeatable Read Isolation level by not releasing Shared Locks (acquired to read data)till the end of transaction. Therefore in the above mentioned scenario, user2 will have to wait till shared locks on the data are released, thus allowing the update only after the first transaction completes execution.

In Repeatable Read Isolation level a select statement also would hold locks till the end of transaction. Again as the level of isolation increases, understand how Concurrency is gradually coming down.

TRANSACTION ISOLATION LEVEL SERIALIZABLE:

This is the highest level of isolation, where each statement is executed serially in an isolated manner. i.e, each statement is executed one after the other. Almost all database anomalies including phantom reads are eliminated in this level of isolation.

Consider the case where two users are accessing data from the same table. User1 starts his transaction and executes a select query on the table. At the same time user2 performs an insert on the same table. TheRepeatable Read Isolation level prevents updates on data holding a shared lock, but it allows new rows to be inserted.Now when user1 executes the same select query again, within the same transaction, it returns an extra row which was not expected by the first user. This is the problem of phantom reads.

The serializable isolation level does not allow updates on data till the current transaction completes execution. It also prevents new rows to be inserted to the table until the transaction completes, thereby returning the same result set no matter how many times the same select statement is executed within a single transaction. But this level of isolation has a serious effect on concurrency thereby reducing performance. So it should be used only in circumstances which require a high level of data consistency.

The above four isolation levels are said to provide pessimistic control, because they acquire locks on data foreseeing the problems related to concurrency. There are two more levels of isolation which are new to SqlServer that provide optimistic control.UnderOptimistic Control, SQL Server does not hold locks but once read checks for inconsistency for the next read.

The two new levels of isolation are:
SNAPSHOT ISOLATION LEVEL:

This Isolation Level prevents all concurrency related problems just like Serializable Isolation Level; in addition to that it allows multiple updates for the same resource by different transactions at the same time. It maintains consistent versions of records in a version store.

When a user performs a select statement on a set of records it obtains a versioned copy from the version store. Every time a select statement is issued, the last consistent copy of the record, when the transaction started, is returned and the same version is read during the entire transaction, irrespective of the data being modified by other transactions. Now a second user also obtains a versioned copy and performs some updates on the data and commits the transaction. Now if the first user tries to perform an update on the data, the SqlServer detects this conflict and aborts the transaction.If another transaction starts changing the same record, another version will be stored and so on; resulting in a longer linked list in the version store. Maintaining and traversing through longer lists will impact the performance.  SQL Server keeps versions of the record until there is no reference for it.

READ COMMITTED SNAPSHOT ISOLATION LEVEL:
The only difference between ReadCommitted and Read Committed Snapshot isolation levels is that, Read Committed Snapshot is Optimistic whereas Read Committed is Pessimistic.

Read Committed Snapshot Isolation Level is set at the database level. It differs from Snapshot in two ways; Unlike Snapshot, it always returns the latest consistent version and there is no conflict detection. Since this level does not require maintaining previous versions, there will be no impact for performance like Snapshot but all the concurrency related problems except dirty reads can happen.


-DB Team                                                      

    Performance boosting for MySQL

    Wednesday, March 17th, 2010

    Suppose, a web page displays the paginated result containing links to the pages that show other sections of a search result. There are MySQL language specific constructs that can be useful in situations when you want to restrict the number of rows that a query returns, but also determine the number of rows in the full result set without running the query again. In the pagination scenario, we can easily determine how many other pages are needed for the rest of the result.

    Technically, a MySQL SELECT statement may include a LIMIT clause to restrict the number of rows the server returns to the client. In the cases where it is desirable to know how many rows the statement would have returned without the LIMIT, but without running the statement again, include a SQL_CALC_FOUND_ROWS option in the SELECT statement, and then invoke FOUND_ROWS() afterward:

    mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name

     -> WHERE id > 100 LIMIT 10;

    mysql> SELECT FOUND_ROWS();

    The second SELECT returns a number indicating how many rows the first SELECT would have returned had it been written without the LIMIT clause.

    In the absence of the SQL_CALC_FOUND_ROWS option in the most recent SELECT statement, FOUND_ROWS() returns the number of rows in the result set returned by that statement.

    If you are using SELECT SQL_CALC_FOUND_ROWS, MySQL must calculate how many rows are in the full result set. However, this is faster than running the query again without LIMIT, because the result set need not be sent to the client.

    To embrace this strategy which mitigates the extravagance of a second COUNT query, another option is to be considered – SQL_NO_CACHE

    MySQL Query caching is like a key-value cache with the key being the query itself and the resultset being the value. Once we turn on the cache, the pagination is faster with the second query using COUNT()!

    When using SQL_CALC_FOUND_ROWS the application has to calculate the found rows every single time we request the page, because the query changes, while the COUNT() – Query always remains the same, meaning that its result comes from the query cache from the second time on.

    So if the query cache is on, it is better to use COUNT() to leverage the performance gain provided by the query caching mechanism. It would be better to use FOUND_ROWS()in all other cases.

    By DB Team, Software Associates

      Deadlock Scenario in SQL Server

      Monday, March 8th, 2010

       

      What is Deadlock

      A deadlock is a situation wherein two or more competing actions are waiting for the other to finish, and thus neither ever does. In Database systems, deadlock refers to a situation in which two or more processes are waiting for the other one to release the resource and neither of them finishes and concludes with a deadlock situation. Deadlocking represents a failure of processes to work and play well together. Deadlocks are the dark side of concurrency, that is, they occur when some combination of locking, coding, or transaction management goes wrong.

      We have mainly the following conditions that can result in a Deadlock scenario, they are as follows

      1.       Mutual exclusion condition: a resource that cannot be used by more than one process at a time

      2.       Hold and wait condition: processes already holding resources may request for new resources

      3.       No preemption condition: No resource can be forcibly removed from a process holding it, resources can be released only by the explicit action of the process

      4.       Circular wait condition: two or more processes form a circular chain where each process waits for a resource that the next process in the chain holds

      In SQL Server, Deadlocking occurs when two user processes have locks on separate objects and each process is trying to acquire a lock on the object that the other process has. When this happens, SQL Server identifies the problem and ends the deadlock by automatically choosing one process and aborting it, allowing the other process to continue. The aborted transaction is rolled back and an error message is sent to the user who instantiated the aborted process. Generally, the transaction that requires the least amount of overhead to rollback is the transaction that is aborted.

      How to Track Deadlock

      We have different methods to track down Deadlocks. Let’s have a glance at the following methods

      ·         In SQL Server 2000 we can set Trace Flag 1204 to capture the processes involved in deadlocks.

      ·         In SQL Server 2005 we have SQL Server Profiler. Using this approach we will get detailed information on deadlocks.

       

      Best practices to get rid of Deadlocks are as follows

      • Implement a well normalized Database design
      • Have the application access server objects in the same order each time.
      • During transactions, don't allow any user input. Collect it before the transaction begins.
      • Try to avoid the usage of cursors whenever possible.
      • Keep transactions as short as possible. Reduce the number of round trips between your application and SQL Server by using stored procedures or keeping transactions with a single batch.
      •  Try to implement database level caching
      •  Reduce lock time. Try to develop your application so that it grabs locks at the latest possible time, and then releases them at the earliest.
      •  If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK.
      •  Consider using the NOLOCK hint to prevent locking if the data being locked is not modified often.
      •  If appropriate, use the lowest level of isolation as possible for the user connection running the transaction. 
      • Consider using bound connections.

       By DB Team,Software Associates

        Caching with dependencies

        Monday, February 15th, 2010

         

        This blog is continuation of our blog on Data caching

        Caching with Dependencies

        As time passes, the information in your data source may change. If your code uses caching, you may be unaware of the changes and might continue using out-of date information from the cache. To overcome this problem cache dependencies are introduced. They allow a cached item be dependent on another resource so that when that resource changes the cached item is removed automatically.

        Three types of dependencies mainly play in caching.

        • Dependencies on other cached items
        • Dependencies in files or folders
        • Dependencies on a Database query

        File Dependencies

        To use cache dependency, you need to create a cache dependency object, then need to supply that dependency object, while add a new cached item. For example when we create a cache dependency object that depends on an XML file, when the file is changed, the cache dependency will be invalidated and the dependent cached item will be evicted from the cache immediately.

        We can also specify a directory that needs to be monitored for changes, or we can supply an array of string that represents multiple files and directories.

        Cache Item dependencies

        Here we can create a cached item that’s dependent on another item in the cache. Now, when the first cached item changes or is removed from the cache, the second will automatically be dropped from the cache as well.

        SQL Server Cache Dependencies

        A more complex kind of cache dependency is the SQL Server cache dependency. SQL cache dependencies provide the ability to automatically invalidate a cached data object, when related data is modified in the database.

        SQL Cache dependencies still entails more complexity than just using a time based expiration policy. It’s more difficult to set up, and it can be more of a headache to maintain. If it’s acceptable for certain information to be used without reflecting all the most recent changes (and developers often overestimate the importance of up-to-the millisecond live information), you may not need at all.

        SQL Server 2005 and 2008 have built in notification infrastructure called “Service Broker”. This manages queues, which are database objects that have the same standing as table, stored procedures, or views.

        When using notification with SQL Server, we get following benefits.

        • Notification is much finer grained- Instead of invalidating our cached object when the table changes, SQL Server invalidate the cached object only when a row that affects codes query is inserted, updated or deleted.
        • It’s more intelligent-A notification message is sent the first time the data is changed but not if the data is changed again (unless we register for notification messages by adding an item back to the cache).
        • No special steps are required to set up notification-We just need to call a start service once in our code to enable. There is no other complex coding steps involves.

        There are the most important rules to manage notifications. If you break one of these rules, you won’t receive an error. However, the notification message will be sent as soon as you register, and cached item will be invalidated immediately.

          Output Caching

          Monday, February 15th, 2010

           

          I recommend before you start reading this blog please go through our blog on Web Caching

           

          Output Caching

          With output caching, the final rendered HTML of the page is cached. When the same page is requested again, the control systems are not created, the page life cycle doesn’t start, and none of the code executes. Instead the cached HTML is served. Clearly output caching gets the theoretical maximum performance increase, because all the overhead of the code is side stepped.

          Client side Caching

          One of the cache options is to store the page exclusively on the client side. In this case the browser stores a copy and will automatically use this page if the client browses back to the page or retypes the page’s URL. However if the user clicks the refresh button, the cached copy will be abandoned and the page will be re-requested from the server, which will run the appropriate page code once again.

          Client side caching is less common than server-side caching because the page is still re-created for every separate user and would not reduce code execution or database access nearly as dramatically as server-side caching (which shares a single copy among all users). However client side caching can be a useful technique if the cached page uses some sort of personalized data. Even though each user is in a separate session, the page will be created only once and reused for all clients, ensuring that most will receive the wrong greeting. Instead it can either use fragment caching to cache the generic portion of the page or use client side caching to store a user specific version on each client’s computer.

          Multiple version Caching

          One of the main considerations in caching is deciding when a page can be reused and when information must be accurate up to the last second. Developers, with their love of instant gratification (and lack of patience) generally tend to overemphasize the importance of real time information. We can usually use caching to efficiently reuse slightly stale data without a problem and with a considerable performance improvement.

          Sometimes information needs to be dynamic. eg., if the page uses information from the current user’s session to tailor the user interface. In this case, full page caching just isn’t appropriate because the same page cannot be reused for requests from different users. Likewise is the case when the page is receiving information from another page through the URL query string. Again the page becomes too dynamic to cache.

          In these cases of consideration, the cache server creates different versions of the same page based on different criteria (like URL query string). Now when we request a page with additional information, the cache engine examines the information. If the parameter matches a previous request and a cached copy of that page exists, it will be reused, else a new copy of the page will created and cached separately.

          Output caching works well if the pages depend only on server-side data and data in URL query string. However this does not work if the page output depends on user specific information such as session data or cookies as there is no way to vary caching based on these criteria. Output caching also won’t work with dynamic pages that change their content in response to control events. In such scenarios data caching can perform more effectively to cache specific information.

            Caching – A performance booster

            Tuesday, June 16th, 2009

            For years, technology products have used caching to bring high performance. Simply put a cache becomes most of the time, the final solution to increase the performance of a data request. Page and DNS Caching are now critical for all operations in the wired space. Recently caching became really popular for Database access management.

            Caching provides two key benefits for Internet users

            • Reduce latency – Though the requests satisfy from the cache, which is closer to the client, caching reduces the overall network latency and returns much faster pages to the browsers.
            • Reduce network traffic – Caching, for some extent reduces the network bandwidth requirement as much as 35 percentage. Because most of the requests gets fulfilled locally and can avoid the transfer over expensive WAN. This saves money if the host charges for the traffic.

            Web Caching

            A web Cache lies between one or more web servers and clients, which analyzes the requests comes, save copies (HTML, images and files) and waits for another request for the same and responds with the saved copies instead of asking for the original to server it again. While Caching used to store Database information, retrieving the information from database takes time. With careful optimization we can reduce the time and lessen the burden imposed on database to a certain extent, but sure we can never eliminate it. In the case of page caching, cache server monitors which page object are frequently requested and store those objects locally. When these objects are later requested, the cache delivers it from local storage rather than forwarding the requests on to the original server.

            Web Caching can mainly divided into

            To know more about Web Caching and Data Caching click the above links. You can directly write to us for any queries.

              Database skills update

              Monday, November 17th, 2008

               

              Figure 1-1 Database Management life cycle

              Organizations today must deliver tangible business value from their database investments. Adding to this challenge is the constant change inherent in complex, often diverse environments. From application and platform changes to increased user activity, such changes place added stress on both the database environment and the professionals charged with ensuring its peak performance.

              Software Associates provides the breadth and depth of technical expertise required to meet the challenges presented by an ever-changing IT landscape. Software Associates leverages its prominent experts to provide best practices and proven techniques to help organizations maximize efficiency throughout the database development lifecycle.

              Our approach to database management is not limited to database design alone. Along with it we offer further database management services. Our database management solutions life cycle is shown in figure 1-1.


              We follow 80 to 85% of the world's best practices in database management. We provide database management solutions in different scales such as:

              • Database Standardization
              • Database Normalization
              • Database Customization
              • Database Migration
              • Database Integration
              • Backup & Restoration
              • Repair of corrupted database
              • Total Database Management

              Our team has proven database skills in:

              • Design normalized database as per project requirement.
              • Stored Procedures.
              • User defined functions.
              • Triggers.
              • Complex queries, especially "SELECT" queries using several tables; and JOINS – INNER,
              • LEFT, UNION etc
              • CURSORS, WHILE Loop etc for data iteration
              • Temporary tables, @Table variables for temporarily storing table data for required contexts.

              Our experience in database development has led us to develop a series of "best practices." Our goal is to create a set of guidelines that help organizations create usable, sustainable, and sharable databases. We follow the best practices in database management.

              1) Anticipating the possibility of changes that may occur in the database we make our DB design strong enough to meet future requirements.

              2) Implementing the 5 Ws of database design: who, what, when, where, why. We've got a logging and audit system to cover where and why and we do have a system that covers who, what and when too. And it has come in handy a few times. Every table in our database has a few special fields:

              • Creation Time
              • Created By
              • Last Modified Time
              • Last Modified By

              These four fields are on every table and our application logic updates them appropriately.

              3) We prefer to use stored procedures instead of 'inline queries' because of its features like:

              • Precompiled execution
              • Reduced client/server traffic
              • Efficient reuse of code and programming abstraction
              • Enhanced security controls

              4) Strictly follow T-SQL standards.

              5) Name/ Map entity tables considering OOP.

              6) Do not use * in select statement.

              7) Do not prefix "sp" in stored procedures.

              8) Avoid Cursor, use forward; cursor only if necessary.

              9) Avoid global temporary tables.

              10) Avoid SQL Injection

              11) Design Relational database in hierarchical structure.

              12) Use GUID in tables for the distributed databases.

              13) Provide encryption to ensure database security.

              14) Provide encrypted fields in tables to ensure security of user credentials, if required.

              How we deal with different Database Management Systems:

              MS Access

              SQL query

              For most of the projects we had used the in' line queries' but now we implement stored procedures as it is now available on MS Access

              Parameters

              We have strictly used parameters instead of concatenating the values to generating DB queries. This, we feel, is the best approach and practice, although we find most of the developers still using the concatenating method.

              This approach has helped us to reduce the development time to handle the sql injection issues, which is the main head- ache for Access development.

              Table management

              Despite all the management facilities available to manage Table and data in MS Access, most of the time we used Queries to create tables and in bulk data management scenarios. Through this approach, we could easily manage database which is working online, without interrupting the System process.

              MS SQL

              SQL 2005

              We have implemented most of the features provided by MS SQL 2005

              • Stored Procedure
              • Functions
              • Views

              We have also followed the MS recommended T-SQL standards.

              DB Design

              Rather than following pre-defined patterns we have focused on the customer's requirement and value thus reducing development time.

              Query building

              We have reduced the use of methods and commands (like CURSOR, Temporary Tables), which use more response time and memory and affect the whole system process.

              Role based management

              The plethora of role based data management features provided by the SQL server have been effectively used while connecting to online database. This helped implement effective security in db management.

              MySQL

              We use PhpMyAdmin for database development & management.
              Future time expansion is considered during database design phase.
              Direct modification of the live database is not permitted in most cases. Modifications are done through scripts.
              Direct export of database is not allowed. Export is possible only by using Scripts.

              Data management with MS Office

              Most of our work includes the conversion of the manual process to online. So at the database design time we have to consider the customer's existing data, and start with those.

              At these times we found that most of our clients did their daily work with the help of Excel spread sheet for data storage and management. So to start with these data, we had to migrate the data to our New Application's database first.

              We called this process "bulk data uploading", which we successfully achieved with the help of OLEDB management classes. To achieve this, we developed an application for gathering the data from Excel and uploading it to the Database.

              Then we added a feature to the application for a client, by which they can download whole database table data in an excel sheet. Clients can also upload bulk data to the Application system through excel sheet entry, where we provided proper v alidation and error messages for each column records.