To use an ORM or not to use an ORM? That is the question.

The object-relational impedance mismatch
Those of us who spend most of our software lives in the object oriented world are aware of the concept of 'impedance mismatch'. We design and build object oriented systems, but persist our data to relational databases. These two structures are very different and require translation of the data as we move data to and from our persistence store. This usually requires the creation of a custom query and translation layer within the application. Object Relational Mappers are an attempt to free us from this concern.

In this post I aim to set out reasons why you would, or would not, opt for an ORM. My experience of ORMS has been in the MS .NET world, so I will be using .NET ORM libraries as reference points. After investigating a number of ORMs or ORM-aspirants (including SubSonic, and the MS Entity Framework), many of them useful and productive, I have formed the opinion that NHibernate, a .NET port of the Java's Hibernate ORM, is the more mature and feature-rich of the .NET options currently available. So most of my comments will relate to NHibernate. However, my survey of different ORMs shows that they all face similar issues, and they all address the same issues in a similar manner. This is not surprising really, given that they all set out to solve the same fundamental problem.

Non-ORM solutions
Before I proceed too far I would like to point out that there are other solutions for this problem:

The aforementioned custom translation layer - in the .NET world using ADO.NET and stored procedures (you don't really write dynamic SQL do you?*).
Data Access Layer generators such as LLBLGen.

* Actually, I do believe that in the appropriate circumstances, a carefully crafted dynamic SQL solution can be the best option.

I am, of course, assuming that an n-tier approach is being employed. I have to acknowledge that there are a large number of commercially viable applications that use databinding from the UI to the database.

I won't address the pro's and con's of these approaches here, each probably warrants a post of their own. I just wanted to point out that there is more than one way to skin this particular cat.

The arguments
Advocates of ORMs tend to claim the following:

ORMs free developers from writing CRUD code and can focus on business concerns rather than persistence concerns, making them more productive.
ORMs produce SQL that is as good as, or better than, the SQL the average developer writes.
The ability to connect to legacy databases.
The ability to connect to and query databases that you can cannot alter.

ORM detractors raise concerns such as:

Performance degradation due to the introduction of a generic translation layer.
Performance degradation due to not using stored procedures, so no caching of query plans.
Loss of control.
Needing to learn another query language (eg. HQL)

Productivity

Yes, the developer is freed from writing boring CRUD and object-relational translation logic. However, the developer still does need to define the object-relational mappings as dictated by the ORM library being used (XML for NHibernate, code for the Fluent NHibernate interface).
Having set up the mappings, the developer will then spend time using a profiler tool such as SQL Server Profiler to check that the queries being generated are appropriate eg.

lazy loaded collections are being loaded 'lazily'
collections are being loaded with a single query rather than multiple individual select statements

The mapped objects provide a core benefit for the maintenance of an application. If a field/property needs to be added/changed every object-database interaction for that object is updated with 3 simple changes:

The object definition
The database table(s) the object maps to
The mapping definition

So 'vanilla' functions come at low cost. But the developer will still need to write queries. In some cases this may need to be in SQL, possibly bypassing any caching mechanisms used by the ORM. If it is SQL, one of the advantages of using the ORM has just been lost. In other cases this may mean coming to grips with a new query language and idiom (eg. HQL).
Depending on how any queries are written, the maintenance effort when changing the properties on an object may be increased beyond what was outlined earlier.
Most ORMs offer a way of generating a SQL script for your database based on the mappings that have been defined. So there is no need to create this script. However, often these scripts are devoid of constraints, foreign keys and indexes. With advocates citing the caching and batching options available over indexes and pointing out that the relationships do not need to be defined in two places. Detractors would contend that the database will be accessed anyway, and that foreign keys and constraints are effective belts and braces to ensure data integrity - if there is an error or defect in the application layer at least the data stays free from corruption. Trying to repair a corrupted database is not a trifling matter.

Design
I cannot really comment on the value of using an ORM to access a legacy database or a database that cannot be altered. I have only used ORMs on 'green field' projects.

What I have found is that ORM implementation details leak into both the domain design and database design. I find this somewhat irksome. Call me a purist if you like, but I like to design my domain object graph to accurately represent the domain. And I like to design my database to efficiently and effectively store the data for the domain; read optimising, or write optimising the structure and indexes. To then take both designs and have to alter them to fit within the constraints of the ORM annoys me. Experienced ORMers have claimed to me that they never have to do this. But when I review their domain and database designs, I find that they have instinctively built these ORM oriented aspects into their designs. They have made the same compromises - they are just less aware of them due to their familiarity. Examples:

Making all methods and properties on a class virtual so the ORM can create proxy objects.
Creation of discriminator columns in database tables, and the creation of discriminator enumerations to be able to map inheritance relationships.
Requiring a parameterless constructor so a default object can be created then have its properties assigned to (I dislike parameterless constructors and generic default values for properties. I believe it is better to 'enforce' the creation of a valid object. I would rather have compilation and build failures at development than system instability due to unexpected behaviour arising from unexpected combinations of data (some default, some supplied)).
You cannot apply 'setter' logic to a lazy-loaded property - it invokes the property. So you have to either map to the backing field (breaking encapsulation) or have a separate internal property to map to. Either way, you still cannot execute any logic on them.
Having a summary object and a detail object that represent the same record in the database, just with less data and fewer relationships on the summary object. This is a common scenario - think list views and detail views. Such a relationship cries out for inheritance. But having two objects mapping to the same row in a database table is not something ORMs like. Yes, there are workarounds, but they are workarounds that will require modification of your database and/or your domain model.
NHibernate wraps lazy-loaded properties with its own proxies. This is fine in a web application where the NHibernate session normally exists for the entire processing time of the web request. But ORMs and NHibernate have a wider use and application than just web applications. Suddenly session management becomes very important. The developer must keep in mind whether there is an open session or not, and must also be aware of the current context ("Will the lazy-loaded property be populated or not?"). This is hardly separation of concerns and being agnostic about your persistence layer.
If you are going to make use of caching options then the system needs to be designed without leveraging some of the power databases offer. Sometimes the best way of implementing something (eg. change logging) is at the database layer. If you are using an ORM, and especially if you use caching, you need to be aware that data changes at this level will not automatically cascade into your application layer. ORMs detractors may say that excluding the option to access the power that databases offer is a bit like owning a Ferrari, but only using it to visit the supermarket once a week. Advocates will counter that many business applications don't utilise that power anyway and that spreading business logic across is not a good approach.

Performance

Caching options made available by various ORM implementations can provide significant performance gains.
While the general ORM approach precludes the use of stored procedures (often you can still use them if you wish to) and the performance gains offered by cached execution plans and hints, many ORMs do cache the queries preventing rebuilding the query from scratch.
For web applications generally (not always) the performance hit is due to two factors, the http request-response round trip and data access. In these cases, the performance of the ORM is deemed to be 'good enough'. Most of the time spent in loading and serving a page is in the http request-response round trip. The ORM would have to perform rather poorly to have an impact on this.
Other types of applications present a different story. This is where the caching options provided come into play. Also, it is worthwhile bearing in mind that certain poor-performing processes can be optimised using SQL and stored procedures directly if necessary.
Even if the script for the database has been generated by the ORM, it is possible to add indexes post script generation if certain accesses are slow.
A poorly written query will perform poorly whether it is written in SQL, LINQ, HQL or another language. However, having a level of abstraction from SQL means that a developer is less likely to accidentally create a query that locks large portions of the database.

Other considerations

If you have a small software development team and you are finding it difficult to find the right people, adopting an ORM may help to change the required skillset and increase the number of prospective applicants.
Contrary to what I stated above, swapping SQL knowledge for knowledge of NHibernate, for example, may not help much. And you will still need someone who knows something about SQL, but at least you won't need a DBA (sorry guys).
Never forget the maintenance story. The green field part of a software product lifespan probably takes 9-12 months on average (depending on resourcing and feature-set). After this initial development work, the product can go on being used and developed for 10-12 years, or even longer. An ORM can simplify maintenance by:

isolating business logic to one tier (nothing in triggers or stored procedures).
making extension/modification simpler. If you need to add or change a field, just update your domain object, add the new column to your database table, update your mapping definition, and apart from any specialised queries - your job is done. In theory, this is less hassle than updating all the CRUD procedures (and their callers) used for reading and writing the data.

If your team are not fully conversant in SQL, then you have the additional assurance that defects are unlikely to be introduced in the data access layer.

Conclusion

Do not expect ORMs to give you a productivity boost. But they won't make your team any slower either.
Using an ORM will allow you to do more technically interesting work and less CRUD-oriented work.
Design the system as well as you can, but expect to make design compromises to work with, and work around, the ORM.
In most cases, using an ORM will not reduce the performance of your application.
Using an ORM can reduce some of the burden of maintenance and 'organic' growth of a system.

software project musings

Search This Blog

To use an ORM or not to use an ORM? That is the question.

Comments

Post a Comment