My Technical Scratch Pad.: hibernate

Showing posts with label hibernate. Show all posts

Thursday, March 10, 2011

concurrency concepts in databases and hibernate

I've read about this multiple times but often forget the details as it is not often that I need to work on it in my applications. So, here I am taking quick notes for faster recap of it later.

First thing to know is what are the issues that can happen by multiple concurrent transactions if no preventive measures are taken.

lost update : I have found different meanings to this term. The hibernate book describes this as follow. One transaction commits its modifications and another one rolls back resulting undoing of the updates done by the other transaction.
However at many other places lost-update has the meaning that of second-lost-update described later. And, ANSI SQL-92 standard does not seem to talk about it.

dirty read : One transaction fetches a record and modifies it. Another transaction fetches the same record and gets the modifications made by first transaction even if they are not committed.

unrepeatable read : One transaction reads some record, another transaction commits some modifications to same record, first transaction reads the same record again and finds it different from what was read previously.
One special case of unrepeatable read is the second lost updates problem. Let say, One transaction reads some record and modifies it, another transaction reads same record and modifies it. First one commits its modifications and then the second commits and modifications made by first transaction are lost.

phantom read : One transaction reads a list of records by some filter criteria. Another transaction starts, inserts some records that fulfill the filter criteria used by first transaction and commits. First transaction reads same list again by same filter criteria but ends up getting a different set of records this time.

Now, to overcome above issues ANSI standard defines following transaction isolation levels.

read uncommitted : this permits all 3.. dirty read, unrepeatable read and phantom reads.

read committed : this does not permit dirty reads but allows unrepeatable as well as phantom reads.

repeatable read : this does not permit dirty and unrepeatable reads but allows phantom reads.

serializable : this is the strictest transaction isolation which simply does not seem to allow concurrent transactions. this again is usually not used in typical applications due to being too restrictive and badly performing. This does not allow any of dirty read, unrepeatable read or phantom reads.

exact implementation of above isolation levels varies significantly among the vendors. One has to consult the docs of particular db to understand the impact of each isolation level to performance and scalability.

In Hibernate, by default, every JDBC connection to a database is in the default isolation level of the DBMS, usually read-committed or repeatable-read. You can change this by explicitly setting hibernate.connection.isolation property. Then hibernate will set the mentioned isolation level to each new JDBC connection(note that it will not change the default isolation level of your db).

Optimistic concurrency control:
An optimistic approach assumes that everything will be OK and any conflicts are detected only in the end when data is written/flushed to the db.
Multi-user applications usually default to optimistic concurrency control and database connections with a read-committed isolation level.
Let say the isolation level is set to read-committed. Transaction A and B start at the same time and both read and modify the same record(they can not see each other's changes as dirty read is not permitted in read-committed isolation level). Transaction A commits and then B does. Then one of the following 3 things can happen.

last commit wins : both transactions commit successfully and B overrides any modifications done by A and no error raised.

first commit wins : when second transaction(B) is committed, conflict is detected and error is raised.

merge conflicting updates : conflicts are detected and one interactive dialogue helps resolving those conflicts

With hibernate, you get last-commit-wins by default. You can enable first-commit-wins strategy by enabling optimistic concurrency control. This needs versioning enabled for entities .

In the end, if you need more fine grained control on locking then you can use variety of "SELECT ... FOR UPDATE ..." statements by using LockMode.UPGRADE et al. This is called explicit pessimistic locking.

Reference:
Chapter-9: Transactions and Concurrency in the book "Java Persistence with Hibernate".
ANSI SQL-92 Standard Spec

Thursday, September 10, 2009

Introduction to the term "ORM"

In enterprise applications we have many objects that we want to be persistent, i.e. we want their existence beyond system restarts. For it to happen one should be able to store the object state to some persistent storage and reconstruct the object from the state whenever needed. One object is a simple case, Its very usual that we need to store a whole object graph that we want to recover later.

Relational databases have been used to store data, and the technology is mature, hence its the natural choice for storing object state in relational databases.

Relational databases expose SQL based API for interaction. In java, one can use JDBC to create SQL queries, bind arguments to parameters, initiate execution of the query, scroll through the query results, retrieve values from the result-set and so on. These are very low level data access tasks; and as application developer one is more interested in solving the business problem rather than data access. So came the concept of persistence layer which should abstract all the details of data access and provide an API to business layer to deal just with the objects and not with low level stuff such as result-sets.

Let us look at some of the problems that the persistence layer faces due to paradigm mismatch between the object view and relational view of the world.

The Problem of Granularity:
If one object contains another object of different type(one and only one instance, noone else contains objects of this type). Should we create two different tables for both types or just one table having columns that take care of columns from both.

The Problem of subtypes:
How do we model inheritance in relational database.

The problem of associations:
How do we model one-to-many, many-to-many relationships.

The problem of object graph navigation:
Abstracting out the details of data access so that biz layer can navigate objects graphs in natural object oriented way like aUser.getBillingDetails().getAcoountNumber() rather than using SQL join queries.

If the application is simple enough then we can model our system relationally or may handcode the persistence layer with SQL/JDBC. Other solutions are serialization, EJB entity beans or using object-oriented database systems rather than relational databases.

ORM(Object-Relational mapper) is an attempt to create a generic piece of software that can take care of all the persistence related issues mentioned above.
It is the automated(and trasparent) persistence of objects in an application to the tables in a relational database, using metadata that describes the mapping between the objects and the database tables. ORM is supposed to provide atleast following things..

An API for performing basic CRUD operations on objects of persistent classes.
A query language based on the persistence classes and their properties.
Facilities for specifying mapping metadata.
Various optimization facilities like lazy association fetching, caching, automatic syncing of object state with same in database.

For an application that has to scale, even above is not sufficient. One also needs to think about multi-tenancy and partitioning.

(Potential)Advantages of Using ORM: Productivity, Maintainability, Performance and Vendor Independence

Reference: Chapter 1, Hibernate in Action

Saturday, May 23, 2009

hibernate code generation

Now a days, I'm working on a server side backend service that exposes an API using a REST interface. To get to the data model, we first created a business domain model which contained business entities, value objects and some enumerations. From this domain model we created a ER diagram and from there a db data model. Next comes the ORM layer(using hibernate) and we figured out that there were two ways to go...

Write hibernate hbm files and generate code and db-scripts from them.

Create db-scripts from the data-model and reverse engineer this schema to generate hibernate classes and hbm files

At the moment we decided that we want to keep a close grip on what gets created inside db, so we went ahead with the second approach. After some use it looks like, we're taking alot of pain in work-arounds to not to touch generated java code(so that we can change data model anytime and regenerate the code). Let us see if this approach works out fine or if at some point we switch to 1st approach and start modifying the hbm files and generate db-scripts from them.

My Technical Scratch Pad.

Thursday, March 10, 2011

concurrency concepts in databases and hibernate

Thursday, September 10, 2009

Introduction to the term "ORM"

Saturday, May 23, 2009

hibernate code generation

Labels

Blog Archive

Pageviews past week

About Me