Email Regex

When it comes to data entry, you can perform validation using Regular Expressions which defines a pattern/structure. Certain identifiers have rules, for example; an English postcode (I think) is 2 letters, 1 or 2 numbers, usually formatted with a space, then a number, followed by 2 letters. eg  HX1 1QG.

Software developers often come up with convoluted rules to try and define what an email address looks like, but I think they are inconsistent even between email providers that certain special characters can be excluded.

In our software, a developer had decided to change the regular expression in our email address validation. It was complex before, and it looked more complex after the new developer’s changes.

This was what it was:

return (Regex.IsMatch(emailAddress, @"^[A-Za-z0-9_\.\-&\+~]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6}$"));

And he changed it to this:

return (Regex.IsMatch(emailAddress, @"^([\w-]+(?:\.[\w-]+)*(?:\'[\w-]+)*)@((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$"));

Now, glancing at that, I have no idea what the advantage is. Some developers just assumed it was a good change and approved the change. What I noticed is that the end part was changed from [a-zA-Z] to [a-z], which means the last part of the email eg “.com” can only be written in lowercase, but previously we accepted “.COM“. As far as I am aware, email addresses don’t care about casing, so although it would be nice if users only entered them in lowercase, it seems an unnecessary restriction. Was it intentional? What else in this validation is different?

If he had unit tests to show the valid emails, and prove his change wasn’t flagging valid emails as invalid; then I would be more comfortable approving it. But there were no unit tests, and I didn’t understand the change.

Another developer pointed out that the advice for emails is usually just to go for the simple *@*.* validation which translates to; “it must have  some characters, followed by an @ sign, more characters, then a period followed by some characters“.

Email validation is surprisingly complicated, so it’s best having a simple restriction just to filter out data-entry errors. Having complex restrictions can then exclude people’s valid email addresses and you definitely don’t want to do that!

Leave a comment