What the hell will i learn?
This is a beginner to advanced guide, you will learn how to use basic to extremely complicated regular expressions, you will also learn how to take advantage of them in JavaScript.
What is a regular expression?
A regular expression is a tool that lets you match patterns, they can be extremely complex and save the programmer hours of time depending on how complex the expressions are.
OK, enough foreplay lets jump in!
Regular Expressions in JavaScript are called like this:string.match(/expression/modifier);
string.replace(/expression/modifier,replacement);
/
is the start and end of a regular expression (no quotes). If your match has a / in it your will have to put a backslash in front of it \/.
expression
your actual regular expression
modifiers
the modifiers you want your regular expression to use.
replacement
the string you will be replacing the regular expressions match with.
Modifiers
JavaScript has three types of modifiers which define how the regular expression will act and what rules it should follow. Modifiers are not a required parameter.
G
tells the regular expression to keep matching or replacing after the first match. Its matches will be stored in an array like usual. When used for a replacement it will replace all matches.
I
makes the regular expression case-insensitive which means CAT will match with cat, by defualt regular expressions are case-sensitive.
M
changes how the regular expressions works with multi lined strings ( \r\n ). The caret and dollar match for each line instead of the whole string (you will learn more about the caret and dollar).
Expressions
An expression is a pattern that will be applied to match the subject string.
Character Classes
character classes start with a backslash and then are follow with the actual character class. So to match any digit you could use \d.s
white space
S
anything except a whitespace
d
digit
D
anything except a digit
w
word
W
anything except a word
x
hexadecimal digit
o
octal digit
Quantifiers
Quantifiers are placed after a character class or character. So a simple example would be /cat{2}/ the quantifier has been appended to the character t so now this will only match catt. They can be used like this too {2,5} and that would match t two to five times {5,} will match t only of 5 or more exist. There are three more types of quantifiers:
*
0 or more, same as {0,}
+
1 or more, same as {1,}
?
0 or 1, this ones a bit different. Its basically a conditional, telling the regex that it really doesn't have to match it but if its there then it should.
Ranges
Ranges are a character range to match and they are surrounded by [ and ]. So an example would be [abc] which would match a b or c. The cool thing about ranges is you can do [a-z] which would match letter a all the way to letter z, same thing with numbers [0-9] matches all numbers. Also these ranges can be placed next to each other like this [a-z0-9] only this would match any lower-case letter or number. I you set the I modifier then it would match lower-case and upper-case letters. But you can also do this instead of adding a modifier [a-zA-Z]. Hers a fun one you can use a ^ character after the start of a range and it will match everything except what you declared in the range, so [^0-9] will match everything except numbers.
Strings
Strings in regular expressions are just the basic string so /cat/ matches cat, remember a regular expression can be a combination of ranges, strings, and groups. So when we combine a string and a range /[sfbc]at/ would match sat fat cat bat.
Groups
Groups can contain strings and ranges and character classes. Groups are surrounded by ( and ). And they can become conditional by placing a ? after the group so /(cat)?/ will always match even if there is no cat. One thing i havent jumped into yet is sub-matches, anything you place in a group will become a sub-match and will be accessible in the response array. Also if you don't want a group to count as a sub-match you can place a ?: right after the start of a group, like this /(?:c[sfbc]at)/.
Sub-Matches
Sub-matches are basically your groups, but after they have been matched. So lets say your matching /(c)(a)(t)(?:s)/, this has four groups and one of them is escaped via ?: so this will result in 3 submatches from a one (the 0 index is the full match) based array. So hers an example:
javascript Code:
teststring = "the cats looked like they were on crack!";
matches = teststring.match(/(c)(a)(t)(?:s)?/);
for (i=0; i <= matches.length-1; i++) {
console.info(matches[i]);
}
so what will happen here is:
[0] = cats
[1] = c
[2] = a
[3] = t
So as you can see you can now access any of the sub-matches.
Now that you understand the tools, lets put them to a practical use!
I will be guiding you through a couple of obstacles, and show you how regular expressions can do it.
Emails!
email_input = "email@domain.type";
ok simple enough you have three strings and 2 separators, thats the pattern. You get to decide if you want to define a set rules for the email, or let it be what ever the hell the the user enters.
/[a-z0-9]+@[a-z0-9]+\.[a-z0-9]{2,4}/i
this will match emails that don't use any special characters. But thats not good enough... we want to match emails with some special characters, such as _ -, remember to escape special characters in ranges.
/[a-z0-9\_\-]+@[a-z0-9\_\-]+\.[a-z0-9]{2,4}/i
this would match 99% of the emails, theres always someone that has a gay email that cant be processed though... So the less secure way to do this is this:
/[^\@\.]+@[^\@\.]+\.[a-z0-9]+/i
this would match every email.
javascript Code:
email_input = "email@domain.type";
console.info(email_input.match(/[a-z0-9\_\-]+@[a-z0-9\_\-]+\.[a-z0-9]{2,4}/i));
if you remove the . or @ you will notice that it fails :D
Phone Number
ok well a phone number, lets assume the user needs to actually enter - separators and the ideal input looks like this:
phone_input = "1-310-674-6669";
so the pattern is eleven digits and 3 separators.
/\d-\d{3}-\d{3}-\d{4}/
the above matches any phone number that is properly typed in, but lets say that we want it so that the user doesn't need to type in the first digit and they also don't need the area code or separators. We need to add conditionals, and some groups.
/(\d)?-?(\d{3})?-?(\d{3})-?(\d{4})/
So the match index 3 and 4 will always contain the proper number, and if used index 2 will be the area code.
javascript Code:
phone_input = "1-310-674-6669";
console.info(phone_input.match(/(\d)?-?(\d{3})?-?(\d{3})-?(\d{4})/));
so heres the possible types of inputs that work
1-310-674-6669
13106746669
310-674-6669
6746669
674-6669
you then can run some validation on the matching array and notify the user that they have entered it improperly or when they leave the field it can tidy their phone number up.
Thats the tutorial, remember use firefox with firebug...! You may check out my other regular expression tutorials for various programming/scripting languages.