How to protect web pages from email harvesting
How to protect web pages from email harvesting
How to protect web pages from email harvesting is the question that every webmaster should ask. While the very original techniques for building web sites were sufficient in the early stages of internet, they open door to hackers and spammers today. How your web pages are coded and designed plays a very important role in your web security.
So, how can we protect web pages from email harvesting?
First, we should explain what email address harvesting is, why it is bad and how it relates to you and then let's talk about how to avoid your web site becoming a victim of email address harvesting.
What is email address harvesting?
Email harvesting is the process of obtaining lists of email addresses using various methods for use in bulk email or other purposes usually grouped as spam. Speaking of web sites, spammers have programs which spider through web pages looking for email addresses. Email address harvesting is done using special software known as "harvesting bots", "harvesting robots", or "harvesters" which crawl web pages and capture every email address they find.
Email address harvesting is bad because once your or your client's email address gets into spammers' lists, it will get flooded with spam and trash very quickly. And how it relates to you? If you run, design or build a web page, you need to take preventive steps to protect email addresses from getting harvested. (If not, you could even be held liable.) And if your email address is displayed somewhere, well you do not want to have to create a new email account soon just because it gets spammed, right.
Why would anyone want to harvest email addresses?
Spam. Phishing. Spoofing. Direct marketing. All these techniques are used with one goal - to sell you something or to conduct some illegal activity leading to getting some monetary or other benefit from you. If someone with bad intentions has your email address, you can become target of his or her bad intentions. Harvesting email addresses alone can make money as well. There are many spammers that just collect email address lists only to sell them to marketing companies.
How to protect web pages from email harvesting - security tips
We already know that email address harvesting is not good and that spammers have software that searches through the web and looks for email addresses. Now let's take a look at how to protect web pages from email harvesting. The following is a list of methods to hide email addresses from the page source to minimize visibility against the email harvesting spam bots. Each method has its advantages and disadvantages, so it is up to you to decide which method suits your needs the most.
Plain HTML code
First, let's explain how email addresses are usually displayed at websites and then start with the easy stuff. Email addresses are often coded into web pages like the following example:
This example produces clickable email@example.com. If you click this email address, your mail client (i.e. Outlook) will open up with this email address in the To: field. This email format is a beauty for email harvesting software, this is exactly what they are looking for and where they get majority of email addresses.
Email address written outSome people make the job for email address harvesting software by writing out the email address as shown in the following two examples.
<a href="mailto:firstname.lastname@example.org">foo[AT]example[DOT]com</a> and foo[AT]example[DOT]com
This is a bit better than the plain HTML format but notice that the first example still includes your correct email address in the mailto field, so email harvesting software still can find you. The second option leaves out the A HREF tag, so the link will not be clickable anymore and the visitor will have to copy your email address and paste it into his or her email client. Substituting @ with [AT] and dot with [DOT] is a nice idea but there is nothing easier than telling the email harvesting software "if you find [AT], replace it with @".
Fake email address or switched domains
A good way to protect your email address in a web page is to fake it for the email harvesting robot and let the human know that it has been faked.
<a href=mailto:foo@example[REMOVETHIS].com>foo@example[REMOVETHIS].com</a> or
These examples are not bad, but you have to really let the visitor know that he or she needs to fix the email address before sending email to it. Many people just blindly click, copy, past, so you really have to make this visible (perhaps by displaying the [REMOVETHIS] in red color or formatting with a strikethrough line). This email harvesting protection technique works well against email harvesting bots because even though they get the email, it is an invalid one, hence you are safe. On the other hand, emails in this format may cause confusion to the user, if the idea is not described well.
You can also protect web pages from email harvesting by enclosing individual email address parts with HTML comments.
foo<!-- >@. -->@<!-- >@. -->example<!-- >@. -->.<!-- >@. -->com
This would be displayed as email@example.com. Placing the @, ., and > symbols inside the comment makes it a little more difficult for the email harvesting software to harvest emails from your web page. Unfortunately, the drawback is that a user initiated mail client cannot be brought up with this method.
Unicode characters, hexadecimal or decimal entities
Another way to protect web pages from email harvesting is to encode the email address into some language that the computer can understand but not without some additional work.
The Gibberish code provided above is the same as the firstname.lastname@example.org email address above in the Plain HTML code section just in different language (decimal entities). Even though this Gibberish code is not readable to human like this, it will be displayed as email@example.com at the website. The Gibberish code above is displayed by a browser or email client exactly the same way like the nice mailto:firstname.lastname@example.org. Here is a page that tells you how this can be done in PHP: PHP loop through string.
If you want to know how the Gibberish code translates to readable letters, take a look at the ASCII table (dec 102 = char f). Our ASCII to hex converter and dec to hex converter tools can help you when setting this up.
Not a bad idea, however again this is similar to above methods from an email address harvesting robot's perspective. It can just as easily interpret the special character entities for the characters. But, not every email address harvesting robot is programmed to do this conversion. If you however combine a mix of unicode characters, decimal and hexadecimal entities, you will be another step ahead.
Email address or its parts displayed as images
Another way to protect web pages from email harvesting is to use a small image that contains either the full email address or its parts. Even though obtaining information from an image is possible, only a few email harvesting programs are capable of doing this. Obtaining your email address from an image is resource costly and for email address harvested not worth the effort.
The email address is shown as an image. Although this method is very effective, it has some major disadvantages too. Only user-agents that can render the image properly will display the email address. Visually impaired users may not be able to obtain the email address. And, if you have thousands of visitors per day, this can be a performance issue as well. You can mitigate some of these disadvantages by substituting only the AT and DOT with images.
This makes the address unreadable to email address harvesting robots but still semi-readable to visually impaired humans.
There are many more techniques that can be used to protect web pages from email harvesting. You can find more on the next page: Prevent email address harvesting (part 2).