Many modern web applications have a lot of spam deterrent such as Captcha, Bayesian filters, URL, ip detections etc. One example is trying to do 2 consecutive search on the CiviCRM.org forum and you will get a an error that look like
"Your last search was less than 5 seconds ago. Please try again later."
The concept behind this is flood control is to prevent a webbot (automated script) that is trying to spam and flood the server.
Sometimes this technique is useful in place of something such as a Captcha system because when someone performs a search on the forum, it would be annoying to have to play the "guess game" with a captcha everytime. Therefore discourages the usage of the searching functionality.
We are applying the same concept to CiviContribute contribution page in attempt to stop spammers from using the contribution form as a gateway to test fake or stolen credit cards. See the code in the below link (pastebin):
The concept here is very simple: When a contribution form is successfully submitted, we insert a record into civicrm's cache table that contains information on the user's ip address, the contribution form page id and the timestamp of then the form was successfully submitted. We implement a check during the form validation process to see if the same incoming ip address has submitted the form less than the flooding interval (in this case 60 seconds) and give them a gentle error message.
The down side to this approach is that if the user is behind a proxy, the ip address recorded will be the proxy ip address, therefore another person behind the same proxy attempting to submit the form within the interval will see the error.
In any case, this is a rather simple implmentation and can be used on any CiviCRM forms.
Hope you guys find this helpful,
I'd recommend not using the cache table - this is not a cache and you wouldn't want to remove flood events when the cache is invalidated.
Also to get around the same-IP address issue you can start a session and record the session ID.
Agree with dalin, it isn't a cache ;)
The session id is probably too trivial to bypass (if your spam bot doesn't accept cookies, it's not going to be of any protection), isn't it?
since i suggested the cache table, i'll defend the idea :) 1. Avoids creating another table to store this
2. The cache table has a "group key" to demarcate various sections. Typically most cache cleanups involve only the specific "group keys"
3. The data contained is valid for 1 / 5 / 10 mins, so the lifetime is quite brief. We are trying to reduce flood requests, not eliminate them completely lobo
Thanks for the suggestion guys.
I looked at a few different flooding control implmentation and essentially it broke down to session based or database based. Obivously there are pro's and con's to both approaches (like xavier) mentioned. With some of my previous experience writing webbot for remote posting database based solution was more reliable.
Although I think it would probably be a good practice to use some mechanism such as Drupal Cron to clean up the flood control records in the cache table.
There is another easy way to block spam-bots, that is far simpler:
Simply put a input field in the form that is hidden via CSS. No human user would edit the value of this field, only bots (not interpreting CSS) would fill that field. So getting rid of generic bots is as easy as blocking all non-default-values for that field.
The downside is: It doesn't work with bots that are tailored to this form. But there are ways to make it harder for them too.