Persistence Refactoring

Published
2013-10-11 15:45
Written by
peterh - member of the CiviCRM community - view blog guidelines

We've started to look into changing how objects get persisted in CiviCRM and what can we do to make things easier for people extending CiviCRM. Part of our approach is to try and integrate Doctrine into CiviCRM.

Background

We get a lot of requests for customization and feel that the current CiviCRM customization infrastructure doesn’t meet most of our customization needs. We get a lot of requests to customize event checkouts and one of the things we frequently want to do is modify what happens at the end of the checkout process. A recent example is a subscription system where subscribers get a number of free passes to an event. We did this by manipulating some of the internal form variables to not do the payment part of checkout. The main problem with this approach is that we don’t really expect it to survive updates over a long period because it uses some internal mechanisms.

 

We’d like to fix this situation by separating out a lot of the object configuration work (creating participants and contributions) from the persistence work (saving the participants and contributions to the database). Currently, a lot of the configuration code is interspersed with the persistence code. This makes it difficult to reuse code. For example, if I call CRM_Event_BAO_Participant::create(...) to create a participant record and I want it to do everything in there, but one small part, I need to add some arguments to that function or copy it and not include the part I don’t want. This is because the function persists the participant (as well all some other objects). If we take the persistence part of that function and just returns the objects that we want to create, the caller can then manipulate the objects before they get persisted making the one small change.

 

Approach

We talked with some members of the core team and decided to try incorporating Doctrine into CiviCRM. The main thing that Doctrine buys us is that it handles associations between objects very nicely. This makes it easy to build up a tree of objects in lower level generic routines, and then let upper level routines manipulate the objects, and then have the objects get persisted.

 

We wanted to do this with a representative, but small, part of the code so that we could get it done in a reasonable amount of time and provide a realistic model for this approach going forward. Because we work with the event checkout system a lot, we decided to try and rewrite CRM_Event_Form_Registration_Confirm::postProcess to use doctrine because that’s where most of the persistence in the event checkout process happens.

 

The rewriting should follow this general pattern:

1. Create Doctrine objects instead of using the DBObject objects and static functions.

2. Set all the properties and relationships between the Doctrine objects in memory without persisting.

3. Call appropriate hooks for the action being taken passing the Doctrine objects and accepting a list of objects back.

4. Persist the objects returned from the hooks.

5. Call appropriate hooks for actions that need to happen after persistence.

6. Run any standard after persistence actions (send emails, show page, etc…)

 

Results

We rewrote about 25% of the CRM_Event_Form_Registration_Confirm::postProcess function so far and we wanted to share our progress so far before the Dalesbridge sprint was over. The current code is up at https://github.com/giant-rabbit/civicrm-core/tree/doctrine right now. The checkout process isn’t fully working, but we have a unit test setup for that function that at least gets through all the parts we’ve rewritten.

 

Here’s a look at what happens to  CRM_Event_Form_Registration_Confirm::updateContactFields when we change to this approach:

It gets a lot simpler. Partly this is because some of the code moves elsewhere. The code that checks to see if this contact is already in the group gets handled in some code I added to addToGroups method in the Contact entity that Doctrine generates.  That looks like this:

 public function addToGroups($groups)
   {
       $new_group_contacts = array();
       foreach ($groups as $group) {
           $found = FALSE;
           foreach ($this->getGroupContacts() as $group_contact) {
               if ($group_contact->getGroup() == $group && $group_contact->isActive()) {
                   $found = TRUE;
               }
           }
           if (!$found) {
               $group_contact = new Civicrm\GroupContact();
               $group_contact->setStatus('Added');
               $group_contact->setGroup($group);
               $this->addGroupContact($group_contact);
               $new_group_contacts[] = $group_contact;
           }
       }
       return $new_group_contacts;
   }

The nice thing about moving code into the Contact entity method is that this fairly general purpose code that prevents subscribing more than once to the same group is now easier to use in more than one place. Also it’s now decoupled from a lot of other code that did other things like send confirmation emails.


The other thing that happens to the code that was in updateContactFields is that it is going to move to the end of the postProcess function like this:

 

/* After all of the objects created with this event registration have been persisted */

foreach ($this->new_group_contacts as $new_group_contact) {

 CRM_Group_Emails_Confirmation::send($new_group_contact);

}

 

Another spot to look at is CRM_Event_Form_Registration_Confirm::processContribution. There’s not quite as much code reduction in this part, but I do think it gets easier to read.

 

One thing that I think will be really nice when using Doctrine is that we can setup methods to handle a lot of the standard text to lookup value stuff to make it easier to use.

This:

$contribParams['contribution_status_id'] = array_search('Pending', $allStatuses);

Becomes:

$contribution->setStatusByName('Pending');

 

Doctrine Implementation

Doctrine has a bunch of ways to work with it, so we’ll layout some details about how we set it up to work with CiviCRM.

 

All the Doctrine files live in ‘src’. Doctrine files are split into two main categories: metadata and entities. The metadata lives in src/Civicrm/metadata and the entities live in src/Civicrm.

 

Metadata

I chose YML for the format of the metadata files because I like YML. No real technical reason there. The YML metadata files define the structure of your data. Doctrine has a tool which can generate the metadata from an existing database and I used that. It isn’t perfect though. It doesn’t seem to pick up one-to-many relationships. So we have to add those manually, here’s an example of how I added phones to contacts in src/metadata/Civicrm.Contact.dcm.yml:

  oneToMany:
       phones:
           cascade: ['persist']
           mappedBy: contact
           targetEntity: Civicrm\Phone

Entities

Once you have the metadata setup, you can use that to generate entities. One big difference between the generated DAO objects and Doctrine entities is that you don’t split the entity code into the generated file (DAO) and the file you add your additional methods (BAO). Doctrine leaves any code you put in the entity file alone when you regenerate your entities. Continuing the example from above, once I added that code to the metadata file, I ran ./vendor/bin/doctrine orm:generate:entities which added the addPhone, removePhone, and getPhones methods to the Contact entity. I then added this line to the addPhone method:

$phone->setContact($this);
That line ensures that when you persist the contact, the phone also gets persisted with a reference to this contact.

Regenerating the entities won’t mess with that code I added to the addPhone method. Because of this sometimes if you make certain changes to the metadata, you need to remove methods from the entity manually. I’ve had to do this a few times and it’s no problem.

 

Repositories

Repositories are another major component of Doctrine that we are making use of. Repositories are basically collections of queries. Instead of putting queries inside the entities, Doctrine puts them in Repository classes. I have an example one that I did to make it easy to get a specific option value out of the civicrm_option_value table:

https://github.com/giant-rabbit/civicrm-core/blob/doctrine/src/Civicrm/OptionValueRepository.php

You can see how it gets used in the Basic Usage section below.

Basic Usage

I created a unit test to test the work I’m doing on the CRM_Event_Form_Registration_Confirm::postProcess function and it provides a good example of how to use Doctrine in the CiviCRM code base. This code creates an event with the minimum associated objects to do a credit card registration.

 

   $entity_manager = CRM_DB_EntityManager::singleton();
   $config = CRM_Core_Config::singleton();
   $domain = $entity_manager->getReference('Civicrm\Domain', 1);
   $civi_event_component = $entity_manager->getRepository('Civicrm\Component')->findOneBy(array('name' => 'CiviEvent'));
   $payment_processor_type = $entity_manager->getRepository('Civicrm\PaymentProcessorType')->findOneBy(array('name' => 'Dummy'));
   $payment_processor = new Civicrm\PaymentProcessor();
   $payment_processor->setBillingMode(1);
   $payment_processor->setDomain($domain);
   $payment_processor->setIsActive(TRUE);
   $payment_processor->setIsDefault(TRUE);
   $payment_processor->setClassName('Payment_Dummy');
   $payment_processor->setName('Test');
   $payment_processor->setPaymentProcessorType($payment_processor_type);
   $entity_manager->persist($payment_processor);
   $entity_manager->flush();
   $price_set = new Civicrm\PriceSet();
   $price_set->setDomain($domain);
   $price_set->setName("Test");
   $price_set->setTitle("Test");
   $price_set->setExtends($civi_event_component->getId());
   $price_field = new Civicrm\PriceField();
   $price_field->setName('Test');
   $price_field->setLabel('Test');
   $price_field->setHtmlType('Text');
   $price_field->setIsActive(TRUE);
   $price_set->addPriceField($price_field);
   $price_field_value = new Civicrm\PriceFieldValue();
   $price_field_value->setName('Test');
   $price_field_value->setLabel('Test');
   $price_field_value->setAmount(22);
   $price_field_value->setDeductibleAmount(0);
   $price_field_value->setIsActive(TRUE);
   $price_field->addPriceFieldValue($price_field_value);
   $participant_role = $entity_manager->getRepository('Civicrm\OptionValue')->findOne('participant_role', 'Attendee');
   $financial_type = $entity_manager->getRepository('Civicrm\FinancialType')->findOneBy(array('name' => 'Event Fee'));
   $event = new Civicrm\Event();
   $event->setTitle('CRM_Event_Form_Registration_ConfirmTest');
   $event->setIsActive(TRUE);
   $event->setIsOnlineRegistration(TRUE);
   $event->setIsMultipleRegistrations(FALSE);
   $event->setIsMonetary(TRUE);
   $event->addPaymentProcessor($payment_processor);
   $event->addPriceSet($price_set);
   $event->setDefaultRoleId($participant_role->getValue());
   $event->setIsEmailConfirm(TRUE);
   $event->setFinancialType($financial_type);
   $entity_manager->persist($event);
   $entity_manager->flush();

Problems

We’re happy to report that Doctrine works well alongside the existing CiviCRM code base. It even handles very nicely some difficult areas like the civicrm_price_set_entity which has a foreign key to different tables based on the entity_table column (see Civicrm.PriceSetEntity.dcm.yml). The only problem I’ve had with so far are fields that store foreign keys in a text field like the payment_processor column in civicrm_event. I think that’s possible to do using some of Doctrine’s hooks, but it’ll need some more time as working inside the hook I think would work well is a little bit complicated.

 

What’s Next

Assuming the CiviCRM community thinks that this is a good way to go, the first step would be to get the Doctrine code integrated into the code base. What I have in the Giant Rabbit github repository is nearly there. The only thing that needs to  be done is that I’m using a new class called CRM_DB_Settings to store the database settings and I think we need to get that loaded from the CIVICRM_DSN constant to keep the settings file backwards compatible. Once that is done, people can just start using Doctrine most places inside CiviCRM.

 

Because we rushed our plan a little bit to get some of the work done inside the postProcess function before the Dalesbridge sprint, we want to go back to the tests and get tests written that cover all the branches in the existing postProcess code. Once that’s done we’ll pick back up and finish converting the postProcess method to use Doctrine.

 

Comments

Looks pretty promising. A few thoughts/questions after reading the blog and the git-commits:

1. Metadata -- YAML seems acceptable, but I would vote for annotations if there are no functional differences. For someone adding a new field, tracing the code, or generating documentation, the annotation approach feels like it would have less cross-referencing/lookup/duplication than the YAML/XML approaches. (But maybe I'm biased because I used annotations in past projects.)

2. Config -- The idea of defining DB settings in a more structured way seems good.

3. General comment -- It looks like there are several improvements/fixes to the auto-generated YML/PHP files. Those make sense.

4. Namespacing -- Looking at a few modules/extensions/patches, we have some variation in how people use namespaces with Civi -- I've seen "\CRM\*", "\CiviCRM\*", and now "\Civicrm\*". We should pick one. Personally, I initially leaned toward \CRM\* (because it was familiar). Someone made the argument it should be different "CRM_*" to avoid confusion about which classes us "_" and which use "\". Now looking at this patch, "\Civicrm\*" looks a bit easier to type than "\CiviCRM", but lowercase "crm" feels weird.  Now... I think "\Civi" makes the most sense. It's easy to type and to pronounce, and it reflects our de facto naming convention (where many projects are "CiviFoo" rather than "CiviCRM-Foo")

5. The "Repository" pattern seems pretty useful. Would we consider migrating static helpers from BAO's to non-static helpers in repositories?

6. Runtime Schema Variations -- Depending on the configuration of custom-data and extensions, a given database may have extra tables/relationships, and I'm wondering how well these would work with Doctrine. There seem to be a few levels, e.g.

  a. You cannot access these tables/relationships at all.
  b. You can use the same datasource as Doctrine and run bare SQL queries -- without any real support from the ORM.
  c. You can use DQL to write queries that touch both core tables and extra tables. You can use EntityManager for basic, single-object CRUD.
  d. You can use the full, combined domain model with mapped PHP fields/functions -- e.g. you can say "$contact->getHRJobs()" (where "HRJob" is an example entity defined by an extension).

It looks to me like we're at "a" or "b" right now. Could we get to "c" by instructing Doctrine to scan for metadata in more locations?

7. The patch currently uses composer to download Doctrine. I'm happy with that... because I think we should be pulling clean copies of our dependencies (so that there's less temptation to hack/fork them). But it poses a small logistical issue because our development team/documentation isn't accustomed to fetching dependencies as a separate step -- and our process is already pretty difficult for contributors. I'm thinking to add "composer" support to civicrm-drupal-project ( https://github.com/civicrm/civicrm-drupal-project )... and then we could recommend that as the baseline for developing core/extension code. Thoughts?

1. Yeah. I don't really have an opinion about YML vs. annotated. I'll give annotated a try.

4. When I started I was worried about namespace conflicts, so I picked something not in use, but I'm pretty sure we could use CRM if we wanted to. I kind of like the idea of just having sort of one namespace which means we should stick with CRM, but it might also make sense to have a different name space for code that follows the PSR0 conventions in which case we could go for Civi for that stuff. I don't really care too much about this, so I'll just go with whatever the consensus is.

5. Ultimately, one of the long terms goals is to get rid of all the BAO static code. So definitely some of it would be moving into the repository patterns.

6. So it turns out that the problem with runtime schema variations isn't really the metadata. Doctrine has an event for messing with the metadata as it gets loaded, so we could do the runtime schema stuff there. The problem, though, is that doesn't affect the entities. The metadata just defines how the database maps to your entity objects, so if you add a column to your metadata you have to add the member variables and the accessor methods to your entity. I've thought about a couple of approaches to this problem, but I'm not really sure what would be good.

Approach A:

You can have Doctrine regenerate your entities with that new schema once you do your runtime metadata, but I'm not sure that's a good idea. If we did do that, here's how I would see it working:

    1. Your extension implements a CiviCRM extension install hook that lets you alter the metadata for Doctrine. So you would add a new column to the civicrm_contact table there. 

    2. CiviCRM would use Doctrine to regenerate the Contact entity php file and it would now have the members and methods for your new columns.

What I don't like about this approach is regenerating the Contact entity file. It definitely introduces a bunch of complications. The upgrade process would be more complicated because it would have to regenerate all the entity files instead of just using the one from the release. Also, the Doctrine entity generation stuff doesn't work well when you remove columns. It seems to just leave the member and methods for the added columns in there and you'd have to remove them manually which would make uninstallation of extensions difficult. If we could get it to work, though, it would be really cool.

Approach B:

Use the Doctrine metadata events to let extensions alter the metadata. This would happen on every page load, not just installation. Then have all our entities inherit from an entity base class that has a bunch of methods for handling schema extensions. So it would have stuff that lets you do stuff like:

$contact->setField('pet_name', 'Fluffy);

$subscription = new \MyExtension\Subscription();

$contact->addOneToMany('subscriptions', $subscription);

$contact->getOneToMany('subscriptions');

$contact->removeOneToMany('subscriptions', $subscription);

$fluffy = new \MyExtension\FurType();

$contact->setManyToOne('fur_type', $fluffy);

We would use the metadata to figure out that from the first argument what table the data is in and how to set it. The thing that's not nice about this is that you now how two sets of interfaces to access columns. One for the standard Doctrine entities and then another set for columns added by extensions. Definitely nicer than how things work now. Also, I'm a little worried about approaches that have the metadata alter hooks called on every page load. We could probably figure out a way to cache that. I think Doctrine already has some metadata caching mechanism that might work with this. I think queries will still be able to use DQL if we setup stuff like this.

7. I'm not too worried about adding a composer step to the setup process for developers. It's a very standard PHP thing now. Also, I come from the Rails world where you always type 'bundle' whenever you grab a Rails project from somewhere, so I think because composer is so well supported, it isn't like trying to generate the DAO stuff which is not well known. Also, we could always start working on a little command-line tool that helps with this sort of stuff like I did for the tests. The tests/run.php command, sets up a MySQL RAM server, grabs the civicrm-packages stuff to packages, creates the DAO objects, downloads and installs Drupal, configures Drupal and CiviCRM for tests and then starts running the tests, so including composer in that is no problem. 

 

It looks like you are doing a bunch of really great things in the snippets - moving logic out of the form layer & reducing complexity in the form layer, using OO classes better, getting rid of PEAR. Yay - all great stuff. I wish I had time to really take a close look.

 

Question, if you were creating an extension with a crudable object now - would you use Doctrine instead of creating a DAO? How easy is it to get up & running with that against current 4.4 environments?

Right this second, I would still use DAO. Just need to work out the naming issue that Tim brought up above and some settings stuff. Once that's done, there's really no reason not to use the Doctrine stuff instead of DAO. This week I'm going to make a branch with just the Doctrine stuff without my changes to the event stuff and get the settings issue fixed. If we can get the naming thing decided, then that should be enough to let people use it. The main downside to using it instead of DAO, is that you miss out on the stort of standard things that happen if you use CRM_Contact_BAO_Contact::createProfileContact. 

Here's my naming preference (using contact as an example):

1. CRM\Contact

2. Civi\Contact

3. CiviCRM\Contact

We could break it out a bit more like it is in the current system, but I'm not sure I see any advantage to that. Since these objects are sort of the center of CiviCRM, it makes sense to me to have them be very high up in the hierarchy. Even if we do them as listed above, you can still have stuff as CRM\Contact\ContactHelper and what not. If we did break it out, it would be like this:

CRM\Contact\Contact and CRM\Event\Participant

We might want to separate out all of the doctrine stuff, but I'm not sure if I think it is necessary:

CRM\Doctrine\Contact

Or mimic the current DAO stuff:

CRM\Contact\Doctrine\Contact

Or:

CRM\Contact\Entity\Contact

Opinions?

I'm not hugely keen on including Doctrine in the name per

CRM\Doctrine\Contact

CRM\Contact\Doctrine\Contact

 I think logic / functionality rather than technology is a better long-term convention.

 

I'll watch this space to see when you think Doctrine is a better option. I've masked the lack of a BAO/ DAO in my extension with a heavily 'to-do'd' API @ the moment but my api would look a lot tidier if there was a proper DB layer behind it.

Ok. I've started over a little bit and have gone with this namespacing:

Civi\Contact\Contact

Civi\Core\Phone

I decided not to go with CRM just to make it easier in case there might be some namespace conflicts. 

I also switched to using annotations for the metadata instead of YAML. Still a little unsure about this decision. I think the YAML was cleaner. I think the only technical reason not to use annotations is that we might have to do some stuff to deal with performance. If we start puting other code in src\Civi besides the entities, the Doctrine code that parses the entities will have to read through all those other files as well as the entities to do its parsing. I have it setup to cache the parsed annotations, so it shouldn't be an issue.

I have the new version up at https://github.com/giant-rabbit/civicrm-core/tree/doctrine if anyone wants to take a look at it. If anyone has any objections, make them soon as I'll want to get started writing code based on this stuff next week.

 

I am really excited about what you have done Peter, and definitely think we should push CiviCRM in this direction - Doctrine, composer, a modern replacement for QuickForm - the whole stack is overdue for an overhaul!

Would Doctrine replace the current DAO layer (and therefore should not implement any business logic), or replace both the DAO and BAO by adding business logic methods to the data access object?

If replacing just the DAO, then shouldn't the forms code call the BAO objects rather than the DAO/Doctrine directly? And then our refactoring efforts focus on porting the BAO to use this new Doctrine-based layer rather than the current DAO.

If also replacing the BAO, then I would get worried about the refactoring that would be implied with such a change: from an architectural perspective, I have always considered the BAO as the central layer in CiviCRM - it rests on top of the DAO, and supports the Forms layer, the API's, and the extensions. Theoretically, no code whether in forms, api or extensions should call the DAO directly (yeahh, I know, not always the case!). So replacing the BAO with a new Doctrine-based layer would imply refactoring most of CiviCRM in the long term.

Either ways you can count me in on this effort.

 

 

The DAO layer is close but not quite the same as Doctrine's entity-class layer -- e.g. it's acceptable to add some extra domain logic to an entity class, but the DAOs *never* have any logic. The BAO layer would likely be split up in various ways -- some logic moves from a BAO to an entity class; some moves to a repository class; some moves to a new class.

I like the idea of listing some specific refactorings that will help us move from the DAO/BAO architecture to an entity/repository architecture. If we do these refactorings across the entire codebase (but in sequence), then we'll always know where things stand generally. For example, the steps might be:

  1. Separate BAO/DAO class hierarchies. The overwhelming majority of BAO code consists of static helpers (which could be placed anywhere -- we just call them BAO by convention). It will be easier to plan refactorings if we can rely on this. (Crudely speaking, we should remove references to $this and self:: from BAOs, and we should change the base-class.)
  2. Convert DAOs to Doctrine entities. (e.g. "$x = new CRM_Foo_DAO_Foo();" => "$x = new Civi\Foo();")
  3. Convert BAOs to Doctrine repositories -- e.g. change "CRM_Foo_BAO_Foo::twiddle($id)" to "$entityManager->getRepository(...)->twiddle($id)".
  4. Move single-record methods from repository-classes to entity-classes. (e.g. "$entityManager->getRepository(...)->twiddle($id)" => "$entityManager->find($id)->twiddle()".

There may be a better series of refactorings, but I think it's important that (a) we define a series steps that we can visualize and (b) that we provide some consistency/integrity within the codebase at each step.

Tim, I kind of get your #1 point but part of me feels a bit queasy that it sounds like we are moving further from OO rather than closer to it. However, I guess where I'm hoping to get to is that the entities should have relevant and thought out actions on them rather than a multitude of 'some what related to contribution actions'.  Do you imagine having the business logic in an OOP style at some point...?

"The only problem I’ve had with so far are fields that store foreign keys in a text field like the payment_processor column in civicrm_event"

 

They are a few places where there is a mismatch between the field and it's FK, most notably the value on the civicrm_option_values table.

I think we shouldn't try to get Doctrine hook its way around it. It's a schema bug to have that mismatch and we need to change the type of value, or create a new column, or create a separate table per group.

The later would have my preference if symfony can generate a simple CRUD admin interface for a new table from the YAML directly without any need to code.

X+

Hi Peter,

I think your approach is better, cos the fresh start is really what's needed most of all.  However, before I became aware of your work, I wrote a Doctrine integration which annotates the existing DAO classes.  Convergent evolution, whoohoo!  Please see,

https://github.com/civicrm/civicrm-core/pull/2124

There is also a PR on -packages to add Doctrine/orm, if that's helpful:

https://github.com/civicrm/civicrm-packages/pull/36

It's awesome to see a movement in this direction!  Customizability of entity behavior and the presentation layer have definitely been our pain points at Wikimedia as well.

-Adam