Proposal for a new CiviCRM Architecture: The ORM Layer....Doctrine

Published
2009-07-08 11:35
Written by
This is a follow up to our last post proposing a new architecture for CiviCRM. Much appreciation for everyone's patience. Following from our last post we want to go over the use of Doctrine, a PHP implementation of the Active Record design pattern made popular through Ruby on Rails. The Doctrine Project has done a great job of maintaining detailed documentation and has a lot of features that we believe everyone will find useful when working with CiviCRM objects. We have posted some of our working code for the new ORM and REST API here at git hub.We have given this code set the working name civiBASE.
For those who are not familiar with Active Records and ORM a little background will help. An object relational mapping (ORM) layer translates objects from a programming environment into records in a relational database and vice-a-versa. In our case, we want the ORM to take PHP objects and store them in MySQL and then retrieve them again.
ActiveRecord is a very popular design pattern used to implement ORMs. The biggest advantage of Active Record is that it takes care of most of the database level work (i.e. sql code) for the CRUD operations and for accessing related objects. Such is the case with Doctrine.
For example, let's look at the creation, updating & retrieval of an object as well as accessing a related objects. The following snippet of code shows each of these steps.
Let's suppose that we have a simple User object and that each User object has an Email object. The User object has two attributes: username and password. The Email object has one attribute: address.
First let's create the User object, set some attributes and save it.
$user = new User();
$user->username='foo';
$user->password='bar';
$user->save();

So far so good. We have not had to write SQL. Now let's retrieve this object. For simplicity, let's assume we know the id of the object is 5. In doctrine, and all active record implementations, all objects have a unique id. We will retrieve the object then add an associated Email object.
$user = Doctrine::getTable('User')->find(5); $user->Email['address'] = 'foo@bar.com'; $user->save();
Notice a few things. First, we retrieved the User object only using the unique id. Next, even though we had not instantiated an associated Email object for the user object, Doctrine takes care of instantiating the new Email object and setting the address attribute. Finally, notice that the save() command takes care of storing the User object and the Email object. No SQL code, yeah!
This should provide a good flavor for the type of code that we get to write with Doctrine. There is very good documentation online that will step you through getting started and comfortable with the fundamental concepts of Doctrine and Active Record. This can be found at http://www.doctrine-project.org/documentation/manual/1_1/en/introduction. I will not focus on much of that here but highlight two very important features worthy of close attention.
First, Doctrine can go from an existing database and generate the necessary ActiveRecord files for each table taking into account all foreign key relationships, existing fields and data types as well as any cascading relationships and constraints between tables. We took advantage of this feature to get us started. Details of how you do this can be found in the documentation already referenced.
The second, and perhaps more important from a long term perspective, is the files Doctrine uses to represent an object (and as generated automatically from an existing database). Specifically it uses Object, BaseObject and ObjectTable classes.
So for example, for a Contact object we would find:
1. a BaseContact class found in a file called BaseContact.php (BaseObject) 2. a Contact class found in a file called Contact.php (Object) 3. and perhaps a ContactTable class found in a file called ContactTable.php (ObjectTable)
The last one is listed as perhaps because it is not always necessary. The Base class defines the objects, its attributes, its relationships and any special accessing methods. The Object class is used to implement all methods on the object. The ObjectTable class is a utility and captures database level special functions (e.g. for special database retrieval optimizations involving SQL or the DQL, the Doctrine Query Language). It is not always necessary and we do not utilize this class yet in our implementation.
Being a brief introduction I will not continue on much more other than to provide some details of the code found in out CiviCRM Doctrine Git repository.
There are two directories in the repo. One pathed for packages and one simply pathed as doctrine. The doctrine directory should go at the same level as CRM. The packages directory should be added into the packages directory.
Using doctrine requires a bootstrap.php file which sets up connectivity and references the Doctine library. We have set it up so that it pulls the database connectivity info from CiviCRM. It also has pathing to the Doctrine library which it expects to find in the packages directory.
The do_merge.php file is a good example of how we have used doctrine as an API for writing custom functionality. In this case we needed an auto-merge function where we could hand off two contact id's and have them merged into a single contact. This function can be invoked from command line (i.e. php do_merge.php ). This is a great of seeing how we have implemented some generic reusable object behaviors.
Lastly, and certainly not least, there is a CRUD.php file. This is a simple REST interface which allows us to access all civicrm objects and perform basic CRUD operations. We will do a separate blog entry for this interface.
In the mean time please poke around and ask questions.
Roberto
Filed under

Comments

Hi Roberto

I don't know a great deal about Doctrine specifically, but the model set out in your previous post sounds interesting and based on current development patterns would seem to be the direction to go - the challenge being the amount of accumulated knowledge of the core developers and the impact this would have for them.

For us, having an API compatible with the current one will certainly be an issue due to the number of systems we have that use it.

Thanks for all this effort and for sharing your thinking and development code!

I look forward to hearing Dave, Lobo, Kurund & team's thoughts on the code and Doctrine.

Andrew

Hi,

Could you compare it more with the existing architecture and show what Doctrine makes easier/quicker...

Have you compared it with the model in django ? My experience is that for basic crud and 1->n or n->n relationship it works well, for more complex ones, the abstraction layer is too stupid to handle it and is almost always more painful to work around that going sql all the way.

(might be because I'm more confortable with SQL than ORM X, but that's safe to assume that's the majority ;)

X+

X+,

I understand your point of view. Indeed, many are very comfortable with SQL over a particular ORM. But, the question is, how many people are comfortable with your SQL? For supporting the whole community there is greater need for, and certainty in, convention even if in some cases it causes you to learn a bit more about using an ORM instead of quickly hacking out some SQL.

That said, Doctrine is one of the better ActiveRecord implementations. We have already applied the auto-merge functionality we built (available in the civiBASE repo) to de-dupe a large data set. To wit, at first Jesse was a little frustrated because of performance issues (i.e. it seemed like Doctrine was too dumb). Here was that moment when the gut instinct would be to write some SQL. But, it turns out that the default object retrieval behavior was not appropriate for us. By asking Doctrine to do a shallow retrieve we accomplished our task with one additional line of code and with no SQL.

Still, there are times when you will need to break out the mystical powers of SQL Fu. The architecture of Doctrine realizes that there are cases where we will need to assist the ORM to get things done effectively. Hence the ObjectTable class. This is where those types of optimizations are captured. But, using that approach requires you to use conventions for embedding your SQL. The end result, is keeping SQL very near the data store layer and out of all other business logic and application code. And, more importantly, providing some boundaries to promote reuse (i.e. your CRUD optimization becomes available for all to reuse). Lastly, there is the Doctrine Query Language which is very powerful and geared to the type of specialized CRUD which commonly pop up.

In terms of comparisons. Well django is a python thing and is a good framework. I have only had chance to do an architectural and partial code review of the framework but not actually used it for anything meaningful. I am not sure if I am in any position to really do a good compare/contrast.

But DAO/BAO is different and it is a great suggestion that a comparison be provided. I will follow up with another blog entry on common operations in Doctrine versus DAO/BAO. That should be helpful to many I believe. But, there are two other important features to focus on in Doctrine: Migrations and Fixtures. These are not available in DAO/BAO. Perhaps that is another blog post on its own. With the recent talk of better testing and upgrading this would be useful to chat about.

ra+

Lobo et al gave a lot of thought to the XML data definition stuff, and on the whole, they did a good job with it. Once you grok how the xml/ directory works, it's not too hard to add additional tables, and it becomes easier to automate things like upgrades and schema changes.

In the 1.x data model, if you wanted something to scale well, using the XML data definitions and writing BAO classes was a better way to do things than custom data was. Now custom data scales better than it used to. But for multiple deployment purposes (and for sharing applications with the community), I'd guess that xml/BAO would still be better than custom data.

Any data abstraction layer needs to be:
1) Fast
2) Scale well
3) Lead to code a human can understand.
4) Lead to code that MySQL (at least) can generate efficient queries for.

Some ways of modifying an object may be prettier or more elegant or even cooler than what we have, but if they don't do the four things above well, it's not worth churning the code base to go to them.

Having spent a lot of time deep in the guts of the CiviCRM code base, I'm a little skeptical with the direction you're going with this.

I'm not a fan of QuickForms either, especially since it tucks so much data into $_SESSIONS that it's nigh impossible to figure out what it's doing. Debugging is particularly difficult.

But my own sense is that you're concentrating your effort in the wrong place. The major problems doing CiviCRM development have tended to be:

1. The complexity of the class hierarchy. It takes a long time to understand how dispatch works, or to understand how the query generator works.
2. The interdependence of different parts of the the library. I have yet to do a non-trivial project that didn't require hitting 20 or more core files to integrate new functionality. In comparison: on Drupal projects, I haven't needed to modify Drupal core in years.
3. There still isn't a good packaging model for new functionality, although I understand a bit of work was done for 2.0. Adding UI generally means dropping multiple files in multiple directories. There isn't a simple way for a chunk of code to integrate itself into CiviCRM screens. And while custom fields are much better than they were in 1.x, it's still hard to manage a separate table (although the XML data defs could become this).
4. Scaling is hard. This is where I'm most skeptical, because it isn't clear to me that a Ruby-like approach would be an improvement. Reading your last diary, I'm in general agreement with Joe Murray: pushing some core functionality into the database itself would make it much easier to do bulk operations, and it would likely also make the PHP code easier to understand. I built a large library of SQL and perl code to do bulk operations in 1.x, because it was orders of magnitude faster to do these things outside of CiviCRM in PHP than I could do independently. But that's a lot of effort, and since what I do is fairly specialized, it's not the sort of thing I can put up on a server and say "have at" for people with similar problems. And if we pushed certain core operations (add a contact record, delete a contact record) and had a good set of hooks, we'd have both good speed as well as much simpler client code as well.

If given a single choice, I'd say that packaging is the highest priority, since it would all people to do non-trivial projects that were easier to port between versions of CiviCRM, and would decrease the need some of us have had to fork the CiviCRM tree in order to track and maintain the modifications we've made in a particular version.

first of all kudos to raSantiago and folks for stepping up and doing some due diligence, work and prototyping. I think it helps to keep the team thinking about potential new technologies that we can adopt in the future.

I think the debate about ORM vs SQL has been in existence since the days of ORM. I also think DB_DO has some fairly primitive ORM capabilities (which we have not exploited as much). So moving to a better full fledged ORM is definitely an option.

Both Matt and Rob make some good points that should be noted and addressed. The ORM layer is one part of the puzzle, i'm a lot more interested in seeing how all the pieces fit and play together. So am looking forward to the blog posts and code to come.

Would be great to continue with the dedupe example to basically rebuild the dedupe system in civicrm and show some of the pros/cons of the new system, couple of thoughts:

Step 1: Merge two contacts (already implemented)
Step 2: User interface to merge two contacts (with user options to select which parts to merge and skip)
Step 3: API for the above merge process
Step 4: Dedupe rules and applying these rules to find merge candidates etc (this gets into both complex queries and relatively complex UI). i.e. the same functinality that CiviCRM 2.2 exposes today. Step 5: An API for the above

lobo

Things have gone quiet on this front lately, but it seemed like a really useful direction to be exploring. Is civiBASE still alive? The git repo seems to no longer be available.

It sounds like some really valuable work has been done here and we'd be keen to check it out as we are working with Doctrine too at present.

Andrew
Community Builders Australia

http://civicrm.org/blogs/xavier/database-layer-evaluation