Preparing for a session at CiviCon London last month, I realised I needed better data than the out of the box sample data. It's a situation I've found myself in a few times so rather than write a one of script to generate the data, I decided to try and write something more generic and reusable. The result is something I am calling CiviPop (or Pop if three syllables / 7 letters is too much for you).
The basic idea is that you create a 'pop file' (in yaml format) that summarizes the data you want to generate and let Pop do the hard work of realistic sounding names and associated entities etc.
At it's simplest, you can use it to create a single set of entities. For example, the following will create 1,000 individuals.
- Individual: 1000
Behind the scenes, Pop is giving each individual a realistic name (using the Faker library) and is creating associated entities as well: each contact will have between 0 and 3 emails, and between 0 to 2 phone numbers and postal addresses.
You can create more complex Pop files that contains multiple instructions some of which have extra parameters to override defaults:
- Individual: 1000
children:
- Contribution: 10-50 # Each individual will be given between 10 and 50 contributions
- Event: 30
fields:
is_online_registration: true
- Participant: 400
- Group: 70
- GroupContact: 300 # Pop will select already existing Groups and Contacts at random when creating these GroupContact entities.
See the README.md for full details of what you can do with Pop syntax.
You can use Pop via CiviCRM's new command line tool cv as follows:
$ cv pop /path/to/pop.yml
It's also available as a stand alone library should you feel inclined to integrate it with other tools and processes.
Some situations I'm imagining that Pop might come in handy:
- generating large data sets to stress test code and/or hosting infrastructure (e.g. will the find contribution search still work nicely on this server once we've received 3,000,000 donations?)
- preparing for client demos / conference presentations etc. where the sample data isn't good / realistic enough
- improving the sample data that we distribute with CiviCRM
As of today, the pop command for cv is my fork of the cv repo. There's a PR to add it to the official repo. If you use it and like it and it works for you (or not) feel free to leave feedback on that PR that'll help it on its way. It's early days and there is lots to improve, but feel free to have a play around and let me know what you think. You can file issues on the Pop repository or email me (michaelmcandrew@thirdsectordesign.org) with any questions or ideas.
Comments
Nice work batman.
Seems like a really useful addition to the tools
Brilliant. For extra points, what about an option for localizing that data - e.g. language and/or country? Or does it just use the existing site settings?
yeah - that's a nice idea and something i haven't thought about a lot yet. Interested in more detail on what it would look it. Faker does have some localisation that might be helpful.
We did something similar a while ago - in case anyone is interested: https://github.com/systopia/civicrm-fake-data
Afaik, it is localizable as well...