Active Record versus Data Mapper

Publicado
2009-07-20 11:30
Written by
rasantiago - member of the CiviCRM community - view blog guidelines
Some recent discussions and debates about Active Record and Data Mapper have popped up in the context of new architectural proposals for CiviCRM from Dharmatech and raSANTIAGO. We think it is important that the differences between each is known and to clarify what are some erroneous perceptions. This is not to claim that either design pattern is above criticism. It is to say, that there are some misperceptions that prevent a more intelligent discussion of the trade-offs between these two design patterns. Our hope is to bring some clarity to this discussion. What is Active Record

     Active Record, properly defined, is a design pattern where an object is represented as a record on a table in a relational database. As most know at this point this has been the ORM design pattern of choice in Ruby on Rails(RoR), although that is starting to change a bit. Because of the assumption of "one object - one record" there are a lot of programming efficiencies that can be gained when it comes to the standard CRUD operations. This is the source of much of RoR's programming efficiencies (i.e. not having to write code but still getting lots of functionality).

What Active Record is Not

     It is important to highlight what Active Record is not, for sake of addressing the confusion that has resulted in some very erroneous statements in recent blog posts and comments.

     Active Record is not a database interface. It is a way in which objects are represented. The relationship is from the object to the database, not from the database to the object. This may seem subtle but for some reason there is the erroneous implication that if we use Active Record then the only way we are allowed to work with the database is through objects. Even more erroneous is the idea that we are not allowed to use our most powerful SQL Fu when retrieving records which will be turned into objects or creating/updating/deleting records. This is simply not the case.

     Active Record is also not a table driven design methodology for applications. Put simply, while Active Record copes easily with database schema changes, it does not imply that the database schema is the way an application is architected. Instead, the object design of an application is the primary tool which then through Active Record is persisted in the database.

What is Data Mapper

     Data Mapper is very similar to Active Record but with one major design difference: The representation of an object is not necessarily a record from a table in a relational database. Hence, the keyword "Mapper" instead of "Record". Data Mapper can implement Active Record albeit without the efficiencies that come with the assumption of the "one object - one record" constraint.

     This feature of the design is important when you start to consider data stores other than relational databases (e.g. external data stores accessed through REST, file based data storage or schema free databases). Indeed, the rise of web applications and web services has fueled the need for more flexibility in the ORM layer.

What Data Mapper is Not

     Like Active Record, Data Mapper is not a datastore interface nor is it a datastore driven design methodology. More importantly, the Data Mapper design pattern is not any more or less scalable than Active Record. Put simply, just because you change the relationship between the object and its representation in a datastore does not directly imply any greater scalability or efficiency.

Four Classes of Criticisms

     Each of these ORM designs can and should be criticised. To often, though, criticisms of topics related to the ORD design are confused with criticisms of the ORM design itself. There are four classes of criticisms which often get confused when it comes to ORM layers. Only one of them actually has anything to do with the ORM design. These classes are:

  • Database - It is often the case that performance criticisms are more often than not an issue with the database and how it is setup rather than with the ORM Design. Issues of this type include: 1) database engines issues (e.g. MyISAM vs InnoDB, MySQL vs Postgres) 2) configuration issues (e.g. caching configuration, execution planning config, logging) and 3) hardware configuration issues (e.g. installed on same box as web server or separate). These are most often the culprits when it comes to performance and scaling. It is important to note that our choice of MySQL does bring with it some limitations in terms of performance but this has nothing to do with the ORM layer.
  • ORM Implementation - The next level of criticism is of the ORM implementation (e.g. Doctrine vs Propel). There are a lot of ORM implementations out there and to take one of those implementation and assume that it is indicative of the ORM design is incorrect. That said, it is important to criticise the ORM implementation because as has been mentioned in previous posts all ORMs need to handle custom SQL (or datastore manipulation) with ease and transparency in order to be very useful. In addition, ORM implementations that know how to take advantage of the database are much better than those that do not. Not all ORM implementations are equal.
  • ORM Design - This is the only level of criticism that has anything to do with Active Record or Data Mapper. These criticisms focus on the constraints and implications of having a specific relationship between data in a data store and an object. This will be the focus of the next section.
  • Application Object Design - This one is on of the greatest sources of problems in implementation and performance (and confused analysis of ORM designs). In short, poor object design leads to an over (or under) abundance of objects and relationships. This will slow up even the best ORMs. But, again this is not a criticism of the ORM, its a criticism of the object design. Cleaning up the object design often brings huge performance increases.
Criticism of the Data Mapper and Active Record ORM Designs in the Context of CiviCRM

     There is only one fundamental aspect of these designs that is at issue, the relationship between an object and its representation in a database. All other issues fall into the other categories as described already (the most important at this stage of the game being the ORM Implementation).

     Thus there is only one pressing question: Does CiviCRM have any need to consider data storage in anything other than a relational database? We think the answer at this point is that it does not and will not for some time to come.

     That said, the benefit of Data Mapper is not apparent. In fact, some collateral issues arise. Specifically, Data Mapper while allowing for a flexible relationship between object and data store also requires that this relationship be setup by the programmer. This leads to two important problems:

  • The way that a mapper is setup for each object now needs to be managed otherwise a diverse development team will generate many different mappers. This in turn results in lack of maintainability. The alternative, of course, is to lay down a standard for how the mapper should work and be implemented. But at that point you are almost back to Active Record.
  • The efficiencies that come from assuming "one object - one record" brings with it a lot of opportunity for optimization (Note: whether a particular Active Record implementation takes advantage of these opportunities is a separate question).

     I believe these two problems work against what we all would like to see with the CiviCRM code base: greater maintainability without sacrificing, in fact improving, performance.

     But Active Record is not without its issues as well.

     There are times when we would prefer an object representation of a set of records or even a whole table or perhaps an XML document. It is important to remember that both of these design patterns are important because they offer a managed relationship between objects and data store. We desire easy object persistence so we can take advantage of them when programming business logic and applications. The objects we sometimes desire,though , are representations of very large data sets or of unstructured data. These do not lend themselves to the "one object - one record" constraint.

     This problem is NOT to be confused with BULK CRUD operations (i.e. performing a CRUD operations on a large number of objects at once).To reiterate a previous point: Active Record is not a database interface; nothing keeps us from generating the SQL for these types of bulk CRUD operations.

     Here we are talking about scenarios where we really want to work in an object space where objects represent large and/or unstructured data sets. This is not a trivial need nor uncommon. Thus this brings about another way in which Data Mapper can be useful along with several other design patterns. This is also one of the primary drivers with the recent merging of RoR and MERB.

     Right now CiviCRM has limited need for these types of object representations. Although, there are some fronts where it may happen more and more. One of these, of course, is data importing. The other likely front is reporting. When the need becomes more serious, though, there is nothing from using a different ORM layer for those specific modules or subsystems.

Conclusions

     If you have read this far then perhaps at minimum you understand more of raSANTIAGO's perspective on web application architecture, specifically the ORM layer. Even better, perhaps you understand why raSANTIAGO has made the type of proposal it has and objected to the Data Mapper proposal. The best would be if this has brought some useful framework for discussing the pros and cons of each design pattern in the context of CiviCRM.

Filed under

Comments

Hello,

I certainly don't have enough knowledge to add to this conversation but I just wanted to say I am really appreciating reading your ideas on this and learning a lot from your blogs.