Taking Config Management To Next Level

Published
2013-02-10 04:38
Written by

As part of my course i have been doing research on what would it require plug an external storage engine into CiviCRM, and how other open source systems doing it. Answer lies in a better config system which allows doing it in a scalable pluggable manner. As i make progress i'll be showing more reasons to get excited and curious about building a better config system. Drupal 8 has spent a fair bit of time on configuration management to make things easier. And we shouldn't shy learning from them and others.     
 

Standardizing config with hierarchical storage


The second layer of drupal's proposed configuration system is to store config data from file system to database for fast access. That is storing all config data in one place. CiviCRM has already been doing this decently for quite some time with civicrm_settings table. Whats new things to learn here are -

1. Standardising access to config

So we could have things like:

$config->system->cms->framework_version
$config->system->profile->double_optin
$config->component->civimail->replies
2. Storing information hierarchically in a tree of configuration objects

If we try to categories civicrm configuration and put them in tree kind of structure, this is how "group_name column of civicrm_settings table" or "config branches" may look like:

civicrm.system
civicrm.system.cms
civicrm.system.custom
civicrm.system.mail
civicrm.system.profile
civicrm.system.activity
civicrm.system.logging
civicrm.system.report
....
civicrm.component // stores general component level config settings
civicrm.component.contribute
civicrm.component.event
civicrm.component.campaign
civicrm.component.campaign.petition
...
civicrm.extension // stores general extension config settings
civicrm.extension.my-extension // extension specific config settings
civicrm.extension.demoqueue
civicrm.extension.search-basic
...


Please note these naming conventions are not fixed. I'm just throwing ideas on how things might look like. And this is how configs could be accessed or worked with.

$config = civicrm_config('civicrm.system.cms');
echo $config->userFramework;
echo $config->userFrameworkVersion;
echo $config->userFrameworkClass;
echo $config->userHookClass;
$config = civicrm_config('civicrm.system.profile');
$config->double-optin = true;
$config->add-to-group-double-optin = false;
$config->save();
// loads all config with 'system.' prefix
$config = civicrm_config('civicrm.system');
echo $config->profile->double-optin;
echo $config->cms->userFrameworkVersion;
echo $config->custom->templateDir;
echo $config->custom->phpDir;
// when we interested in core settings only
$config = civicrm_config('civicrm.system.core');
echo $config->base-url;
echo $config->root-dir;
echo $config->compile-dir;
// component name space
$mailConfig = civicrm_config('civicrm.component.civimail');
echo $mailConfig->workflow;
echo $mailConfig->replies;
3. Storing new revisions of config everytime its changed, helping doing rollbacks incase of failures.

4. Not loading the entire config for every page request.
 

So far thats about primary config. How about other site-wide configs like event / contribution pages, custom sets, profiles, custom sets .. ?  They could extend the same config system. Since storing is in the form of serialized arrays, it's possible to create new configs from existing configs, for example how cool would it be to create a new profile by merging two existing profiles, something like:

civicrm.system.profile.page3 = array_merge(civicrm.system.profile.page1, civicrm.system.profile.page2)

Like drupal we will also need an interface, drush / cli, apis to convert / interpret these special configs.

 

Managing Environments with configs on disk

Drupal likes to store all active configs on disk, under a special files/config directory.

$ ls files/config_*
system.site.yml
system.schema.yml
views.schema.yml
views.view.frontpage.yml
...


A lot more discussion on what other file formats (xml, json, ini) could be with each having its own pros and cons can be seen here - http://groups.drupal.org/node/159044.

In case a config gets corrupted, is re-stored from disk. So a new config is always written to disk + database, with db storing all revisions. Drupal's idea of storing configs on disk is to make configuration portable and version control friendly. So configs could be migrated from say development to production environment - via SFTP, $vcs commit/push/update/pull.

Individual config file like block.block.bartik.search.yml looks like:

id: bartik.search
label: Search
uuid: a2398568-7cc6-4a3a-8c20-3210075487bd
region: sidebar_first
weight: '-1'
module: search
status: '1'
visibility:
  path:
    visibility: '0'
    pages: ''
......

 


If you tried installing drupal 8, you would see new directories like:

deepak@bfc:/var/www/d8.loc$ ls -la sites/default/files/config_{$some-randome-number}/
drwxrwxr-x 2 www-data www-data 4096 Feb  6 18:39 active/
drwxrwxr-x 2 www-data www-data 4096 Feb  6 18:24 staging/

 


Say you have two drupal sites setup example.com and dev.example.com at two different machines, and we interested in deployment of site name from dev to production. Both sites have their own data stores:

deepak@bfc:/var/www/d8.loc$ ls -la example.com/sites/default/files/config_{$some-randome-number}/
drwxrwxr-x 2 www-data www-data 4096 Feb  6 18:39 active/
drwxrwxr-x 2 www-data www-data 4096 Feb  6 18:24 staging/


deepak@bfc:/var/www/d8.loc$ ls -la dev.example.com/sites/default/files/config_{$some-randome-number}/
drwxrwxr-x 2 www-data www-data 4096 Feb  6 18:39 active/
drwxrwxr-x 2 www-data www-data 4096 Feb  6 18:24 staging/
 

1. Change the site name at http://dev.example.com/admin/config/system/site-information.
2. Copy system.site.yml from ../dev.example.com/sites/default/files/config_*/active to ../example.com/sites/default/files/config_*/staging. (This can be done manually or via git).
3. At example.com visit admin/config/development/sync and click "Import all".

Another interesting thing here is to be able to store these configs to pluggable storage backends in addition to file systems, like apc, chdb, mongodb or some other system for increased performance. It seems drupal is going to extract benefit from symfony framework here.

I have been trying to figure out the way configurations are going to look like for these pluggable storage backends but haven't had any success. Considering that the primary settings.php is not going to change, seems like it could still be $conf or $database with a bit more generalised form of (in case of mongodb for example):

........
  $conf['mongodb_connections'] = array(
    // Connection name/alias
    'default' => array(
      // Omit USER:PASS@ if Mongo isn't configured to use authentication.
      'host' => 'localhost',
      // Database name
      'db' => 'drupal_mongo',
    ),
  );
   $conf['cache_default_class']         = 'DrupalMongoDBCache';
   $conf['page_cache_without_database'] = TRUE;
   $conf['page_cache_invoke_hooks']     = FALSE;
........

Managing Environment in CiviCRM

When doing config changes probably is the best time to think about providing better support for managing environment - dev / staging / production.

Considering the new config tree structure, civicrm.settings.php file format could be changed to - xml, json, ini or something else. Assuming xml for now here is how the file may look like:

civicrm.system.xml (civicrm.config.xml sounds better ?)

<production>
    <base-url>live.example.com</base-url>
    <root-dir>/var/www/htdocs/drupal/sites/all/modules/civicrm</root-dir>
   <compile-dir>/var/www/htdocs/drupal/sites/default/files/civicrm/templates_c</compile-dir>
    <database>
            <default>
                <driver>mysql</driver>
                <host>db.example.com</host>
                <dbname>dbname</dbname>
                <username>dbuser</username>
                <password>secret</password>
            </default>
    </database>
    <cms>
      <user-framework>foo</user-framework>
      <user-framework-dsn>foo://foo@bar</user-framework-dsn>
    </cms>
    <component>
       <civimail>
          <workflow>false</workflow>
          <replies>true</replies>
       </civimail>
    </component>
</production>
<staging extends="production">

    <debug>true</debug>
    <base-url>dev.example.com</base-url>
   <compile-dir>/var/www/htdocs/drupal/sites/dev.example.com/files/civicrm/templates_c</compile-dir>
    <database>
            <default>
                <driver>mysql</driver>
                <host>dev.example.com</host>
                <dbname>devdbname</dbname>
                <username>devuser</username>
                <password>devsecret</password>
            </default>
    </databse>
</staging>

 


Interesting aspect of this zend type of style is to provide 'overrides' to specific values, and inherit rest of of the properties.
This is how initialization code may look like :

// this builds & stores config tree and say could return civicrm.system.core branch
// APPLICATION_ENV environment variable configured in virtual host, could alternatively be given preference
$config = civicrm_config_init('path/to/civicrm.config.xml', 'Staging');
// working with multiple databases
$config = civicrm_config('civicrm.system');
crm_db_activate($config->database->extra);
// execute query here to work with additional db with dsn named 'extra'
crm_db_activate();
// start working back with default db - $config->database->default

Whole idea of this post is looking at configs from new perspective, evaluating what drupal has already done and how could that help civicrm. When it comes to implementation there is still a lot to discuss and think about, for example converting nested array in nested objects, import/export/sync/parsing, writing to disk, api implementation. As we plan to move forward and make progress we'll be doing more detailed posts on a particular implementation.

You can read more about drupal's config initiative here incase you haven't already - http://groups.drupal.org/node/155559.

Filed under

Comments

1. Terminology: "Group Name", "Config File", "Config Branches", "Config Volume", etc

In civicrm_settings, we organize items with "group_name". In the last link, the Drupal Configuration API is described as supporting a function "config($file)" where "$file" is basically the name of a config-file on disk (omitting the file's base directory and extension).

The two serve basically the same purpose -- e.g. the group_name or $file indicates ownership and purpose (e.g.  what module created the group and why), and it helps manage performance (D8 reads an entire file on-demand; Civi can fetch from settings based on group_name).

Within each group_name or $file, there are several more individual pieces of configuration data.

What terminology should we use for these concepts? How about:

 * "ConfigGroup" for a volume, file, or list of closely related settings
 * "ConfigItem" for some a record or subtree stored inside a ConfigGroup  

2. Standardized access to config

The notations in #1 and #2 of the blog post look a little different to me:

 * Get one root object; all ConfigGroups and ConfigItems can be accessed as properties  (e.g. "$config->system->cms->framework_version")
 * Get an object for the ConfigGroup ($config = config('civicrm.system.cms')) and then access ConfigItems as properties ("$config->framework_version")

It looks like the first notation ("$config->system->cms->framework_version") might just be an introductory example, and the second more fully formed.  The second also aligns better with the designs in civicrm_settings (group_name) and D8 Config API (file).

3. Heirarchy

It's a good idea to come up with one naming convention that captures our myriad packages (core code, components, native CiviCRM extensions, Drupal modules, Joomla plugins, etc).  In the cases of hook_civicrm_managed and the Resources API, we had similar issues and used these conventions:

 * Each string is "an extension key"
 * For "native CiviCRM extensions", the key is a reverse-domain  
   (like "nz.co.fuzion.omngateway" which is mentioned in info.xml)
 * For core code and components, the key is "civcrm"
 * For Drupal modules, the key is "drupal.{$module_name}"
 * For Joomla plugins, the key is "joomla.{$plugin_name}"

The "extension key" (with those conventions) would be a good candidate for ConfigGroups.

4. Richness of metadata

In 4.3, Eileen's done a lot of work to produce much richer metadata about our configuration. This goes deeper than D8's approach. For example, each setting is decalred with default values, data-types, validation, etc. This is enough to create web configuration forms without any extra coding for each configitem:

 * https://github.com/eileenmcnaughton/eu.tttp.setting

 * http://svn.civicrm.org/civicrm/trunk/settings/

Drupal's approach to this metadata is... more lazy? Although some folks have tried to add it in via module:

 * http://drupal.org/project/variable

It's cool functionality -- would be nice to keep it.

5. I don't think it would be a good idea to store the DB credentials for staging/production in the same file. For starters, one splits staging and production due to a lack of trust. For example, you may trust a developer enough to work on staging sites -- but not enough to work on production. If you put the credentials in the same file, then the developer gets access to both. Similarly, you expect bugs to go into staging -- e.g. the staging system might "accidentally" read the credentials for the production system -- and proceed to trample on production data. If the staging/production systems store their DB credentials independently, then these risks go away.

#1. By ConfigItem you mean to hold meta data info for a particular group / level ? ConfigGroup sounds good.

Just thinking, if it would be better for individual pieces / columns of config data be also part of returned object.

Example $config = config('civicrm.system.cms');

$config->created_id;

$config->created_date;

...

#2. The idea is config('level') also retrieves sub-levels. And properties are accessed in the hierarchical way they appear.

#3. For configs drupal is also trying to use module.my-module.* and system.module.module-name.* naming convention.
Exact extension-key being part of branch name is fine too.

#4. We might also need consider permissions (to write / change a config) as another metadata.
I haven't checked the eilleen's work, but like the idea of defaults (and others) in db. Will check.

#5. How about civicrm.settings.{$Environment}.xml files in that case, that is each environment has its own settings file.
 

The drupal variable module (above) turns on an interface for editing variables. However, it relies on the variables metadata being declared by hook. From what I see most drupal modules of significance declare their variables in this way.

Deepak - one thing I found when I looked at declaring the settings metadata (in the settings directory as per Tim's comment) was that xml did not easily cope with the custom separator character - which is the default for a bunch of fields.

 

One reason why the settings metadata needs to be declared if settings are to be set by something other than the form layer  (e.g the setting.create api call) is that there are arrays stored as arrays and arrays stored as separated strings in the settings table. So, prior to 4.3 the information as to how to serialize the array was only in the form layer.

 

I also looked at the possibilities of hooks wanted to change the defaults for settings with the thinking being that if you installed a 'India' extension it could change the defaults for all the India settings & you could view & revert them to those defaults as appropriate (which is what the CiviConfigure extension is about).

 

NB for 4.3 is is possible to set settings using the api / drush

 drush civicrm-api setting.create debug_enabled =1 userFrameworkLogging=1

civicrm_api('setting', 'create', array('debug_enabled'=> 1, 'userFrameworkLogging=1));

etc

My notes are here

http://wiki.civicrm.org/confluence/display/CRM/Settings+Metadata+and+Usage#SettingsMetadataandUsage-Drush

 

As a bit of an aside (but relevant if refactoring stuff)- it would be good to keep on our radar that we should try to get the DSNs out of the variables which are exposed when you use smartyDebug to make leaving debugging on less risky.