How to Protect Contact Forms from Spam with Proof of Work
Do you operate a personal or business web site? If so, you most likely have a "contact" page with instructions for how a visitor can send you a message. However, if you post an email address directly on a web site, you're asking for trouble. That address will get scraped by bots pretty quickly and will start receiving spam.
How do you avoid publishing an email address? You can add a contact form that sends email from your server instead. However, if you're tried this then you've probably noticed that bots will simply autofill the form and submit it, leaving you back with your original problem.
How do you prevent spam? By making it costly! In a previous post I described how to set up a paywall for high priority messages:
But perhaps you don't want to charge people money or require that they have bitcoin they can send you in the first place. How can we make it harder for bots to abuse a simple form? Most web engineers would likely say "add a captcha!"
Captchas are a form of proof of work; they rely upon the assumption that computers can't be automated to interpret semi-scrambled images to the level that humans can. So instead of making a computer perform inane, repetitive work, we make humans perform inane, repetitive work. And yet there are no shortage of tools for cracking captchas.
As someone who spends 100% of his time browsing the web behind a VPN or via tor, I really hate captchas because they treat me as a second class netizen. I often find myself having to fill out 3+ captcha puzzles because the risk scoring deems my IP address to be suspicious. I prefer not to burden other privacy conscious folks with this on my own web site. Also, I prefer to limit dependencies on 3rd party APIs... more on that later.
Hashcash to the Rescue!
Hashcash is a proof-of-work algorithm that has been used as a denial-of-service countermeasure in a number of systems.
A hashcash stamp constitutes a proof-of-work which takes a parameterizable amount of work to compute for the sender. The recipient can verify received hashcash stamps efficiently. Hashcash was invented by Adam Back in 1997; you can learn all about it here.
While the original intent was for hashcash to be used to add a cost to sending email and thus deter high volume email spammers, hashcash can be added to any internet service in order to impose a computational cost to clients who might otherwise abuse a server's resources.
How does a hashcash transaction work? The main question comes down to how the puzzle is generated. For email hashcash, the puzzle is related to the date and recipient's email address. For Bitcoin, it's related to the content of the current block header (which includes the hash of the previous block.) For protecting the submission of a web form, the server needs to generate a puzzle that can't be guessed ahead of time by the client. The process goes like so:
- The client requests a web page from the server. It can be any page that submits data with a form.
- The server calculates a stamp based on the client’s IP address, the current time on the server, and a server-defined salt value. The stamp is then scrambled using a hash function for security, and is embedded invisibly in the form.
- The client fills out the form and clicks the submit button.
- Before the client’s computer actually submits the data to the server, it gets the hidden stamp from the page, brute forces a collision string (nonce) and adds the nonce into the form fields that get submitted.
- Before the server accepts and processes the data from the client, it checks to see if a hashcash stamp was submitted.
- If there is a stamp, the server checks to see that it is valid — it has been posted before the stamp expires, and that the hash of the stamp and nonce meet the chosen difficulty target. This "collision check" is performed by looking at the first N bits of the binary representation of the solution hash and checking that they are equal to 0.
- If the stamp check fails or if there is no stamp, the server returns an error, and can either ignore the submitted data, request that the user try again, or present some alternate way for the client to authenticate itself.
- If the stamp check succeeds, the server accepts the data and processes it as appropriate.
Rolling Your Own Hashcash
For several years I used hashcash.io to protect my web site forms. This was a 3rd party API that essentially doled out shares of work for the client to mine monero. Unfortunately, this API broke recently when the operator failed to renew their SSL certificate.
So I started looking around for hashcash implementation I could host myself. There are countless hashcash repositories on github; it's implemented in every major language. However, very few of them have both client (in-browser) and server-side code that can be easily used to protect the submission of a web form. The few I came across that did were:
The University of Rhode Island - this is a barebones JS/PHP code snippet that uses SHA1 and doesn't look particularly user friendly.
Hashcash-js - this code is pretty good, but it's 15 years old and uses CRC32 as the hash function.
Eric Martindale's Hashcasher - this looks like a great option for folks who run their web site on a nodejs server. Unfortunately, my site is static HTML with one PHP page for the contact form.
So I spent a day modernizing the hashcash-js codebase; the code is now more succinct and uses SHA256 as its hash function. The code is available here:
Choosing the Right Difficulty
I also wrote a benchmarking script you can run to get a sense of how long a given difficulty target will take to find a solution for. Note that this code runs the PoW loop in the main javascript thread, so it will degrade the browser's performance a bit while running. As such, I think a great future improvement would be to offload the actual PoW generation to a web worker.
At time of writing I'm using a target of 19 which should take most devices between 30 and 60 seconds to solve. This figure will need to be updated every few years as CPUs become faster.
The Proof is in the Posting
My contact form was unprotected for several days after I disabled the hashcash requirements.
During that time I received dozens of spam emails through the form. Once I implemented my new self hosted hashcash library, the spam stopped once more!
Update: in the year since implementing this, I've bumped up my contact form's difficulty target a couple times. One thing worth noting is that while it does a good job of stopping the vast majority of spambots, it won't stop 100% of them. I still tend to get 1 successful spam submission every few weeks. I think the simple explanation for this is that once in a while, much like with mining a bitcoin block, a bot will get lucky and find a solution to the hashcash puzzle within a few seconds. This is to be expected given the poisson distribution of expected solutions to a hashcash puzzle.