<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-31023685</id><updated>2011-10-17T15:15:40.536Z</updated><title type='text'>Just a blog.</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://graemenail.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/31023685/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://graemenail.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Graeme</name><uri>http://www.blogger.com/profile/12481487643805553074</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-31023685.post-7552504145684087959</id><published>2007-07-21T20:47:00.001Z</published><updated>2007-11-02T10:33:16.592Z</updated><title type='text'>PHP Regular Expressions</title><content type='html'>&lt;span style="font-family:arial;"&gt;&lt;span style="font-size:180%;"&gt;PHP Regular Expression Reference&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span&gt;This reference is an accumulation of things I've learnt from the sites listed at the bottom and through my own learning. I recommend using this program to try things out.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://weitz.de/regex-coach/"&gt;http://weitz.de/regex-coach/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Thanks.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:130%;"&gt;Identifying Positions&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The caret symbol identifies the beginning of a string; while the dollar symbol represents the end of it.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;^&lt;/span&gt;Hello/ - This would match a string beginning with the word "Hello".&lt;br /&gt;/Goodbye&lt;span style="font-weight: bold;"&gt;$&lt;/span&gt;/ - while this would match a string ending with "Goodbye".&lt;br /&gt;&lt;br /&gt;These symbols can also be used together.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;^&lt;/span&gt;Hello Goodbye&lt;span style="font-weight: bold;"&gt;$&lt;/span&gt;/ - This matches the exact string "Hello Goodbye".&lt;br /&gt;&lt;br /&gt;However, the caret and dollar symbols act differently when using the multiple line pattern modifier.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;^&lt;/span&gt;Hello&lt;span style="font-weight: bold;"&gt;$&lt;/span&gt;\n&lt;span style="font-weight: bold;"&gt;^&lt;/span&gt;Goodbye&lt;span style="font-weight: bold;"&gt;$&lt;/span&gt;/m - Would match the string:&lt;br /&gt;"Hello&lt;br /&gt;Goodbye"&lt;br /&gt;&lt;br /&gt;The two symbols can be used to match the start and finish of each new line. To avoid any complications "\A" and "\Z" can be used instead.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;\A&lt;/span&gt;Hello Goodbye&lt;span style="font-weight: bold;"&gt;\Z&lt;/span&gt;/ - Matches the string "Hello Goodbye"&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;/&lt;span style="font-weight: bold;"&gt;\A&lt;/span&gt;Hello&lt;span style="font-weight: bold;"&gt;\Z\n\A&lt;/span&gt;Goodbye&lt;span style="font-weight: bold;"&gt;\Z&lt;/span&gt;/m - Would not match the string:&lt;br /&gt;"Hello&lt;br /&gt;Goodbye"&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Identifying Characters&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Note that these are case &lt;span style="font-weight: bold;"&gt;insensitive&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\b - Word boundary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This allows to to distinguish whole words. For instance with the strings "This is invalid" and "This is valid". A word boundary is located between a whitespace character and a non-whitespace character.&lt;br /&gt;&lt;br /&gt;/valid/ - This search for valid would match in both strings, but by using a word boundary, we can differentiate between the words "invalid" and "valid".&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;\b&lt;/span&gt;valid&lt;span style="font-weight: bold;"&gt;\b&lt;/span&gt;/ - This would only match the string "This is valid". Notice that this word boundary is used on both sides; if it were only used before "valid", words such as "validate" would be matched as well.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\B - Non-word boundary&lt;/span&gt;&lt;br /&gt;This is the opposite of a word boundary. In the string "This is a word"; the non-word boundary can be thought of as the space between "T" and "h", or between any characters of the same word.&lt;br /&gt;&lt;br /&gt;/T&lt;span style="font-weight: bold;"&gt;\B&lt;/span&gt;his/ - Would match "This" in the string "This is a word".&lt;br /&gt;&lt;br /&gt;/This&lt;span style="font-weight: bold;"&gt;\B&lt;/span&gt;/ - Since there are no more non-whitespace characters after "s", this is classed as a word boundary and therefore the pattern does not match.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\d - Numerical character&lt;/span&gt;&lt;br /&gt;A single character which is number, (0 to 9).&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;\d&lt;/span&gt;/ - Would match any number from 0 to 9.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;\D - Non-numerical character&lt;/span&gt;&lt;br /&gt;A single character which is not numeric.&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;\D&lt;/span&gt;/ - Would not match a number.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\n - Newline character&lt;/span&gt;&lt;br /&gt;A single newline character. ASCII Number 10.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;\r - Carriage return character&lt;/span&gt;&lt;br /&gt;A single carriage return character. ASCII Number 13.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\s - Whitespace character&lt;/span&gt;&lt;br /&gt;A single whitespace character, which represents a carriage return, new line, space or tab.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\S - Non-whitespace character&lt;/span&gt;&lt;br /&gt;A single non-whitespace character, represents any character besides those mentioned for Whitespace character.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\t - Tab character&lt;/span&gt;&lt;br /&gt;A single tab character. ASCII Number 9.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\w - Word character&lt;/span&gt;&lt;br /&gt;A single word character represents all the characters of the alphabet, numbers 0-9 and the underscore character.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;\w&lt;/span&gt;/ - Would match strings such as "a", "x", "4", "_".&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;\W - Non-word character&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;A single non-word character, represents any character besides those mentioned for Word character.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;\W&lt;/span&gt;/ - Would not match strings like "a", "x", "4", "_".&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;. - Dot character&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A single dot character represents any character, with the exception of the new line character.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;.&lt;/span&gt;/ - Will match any single character string, with the exception of new lines.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:130%;"&gt;Repeating Characters&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;Single characters are rarely searched for, instead we may be searching for a specific length word.&lt;br /&gt;&lt;br /&gt;/\w&lt;span style="font-weight: bold;"&gt;{4}&lt;/span&gt;/ - This would match any 4 letter word.&lt;br /&gt;&lt;br /&gt;On other occasions we may have a  minimum length.&lt;br /&gt;&lt;br /&gt;/\w&lt;span style="font-weight: bold;"&gt;{4,}&lt;/span&gt;/ - This would match any 4 letter word or larger&lt;br /&gt;&lt;br /&gt;On other occasions we may have a maximum length.&lt;br /&gt;&lt;br /&gt;/\w&lt;span style="font-weight: bold;"&gt;{0, 4}&lt;/span&gt;/ - This would match any word with a length from 0 up to a length of 4.&lt;br /&gt;&lt;br /&gt;With a slight variation of the one above, we can search for words within a set range.&lt;br /&gt;&lt;br /&gt;/\w&lt;span style="font-weight: bold;"&gt;{2, 4}&lt;/span&gt;/ - This would match words with a length between 2 and 4.&lt;br /&gt;&lt;br /&gt;There are also special symbols that do this.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;*         &lt;/span&gt;This is the same as using &lt;span style="font-weight: bold;"&gt;{0, }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;+&lt;/span&gt;        This is the same as using &lt;span style="font-weight: bold;"&gt;{1,}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;?          &lt;/span&gt;This is the same as using &lt;span style="font-weight: bold;"&gt;{0, 1}&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:130%;"&gt;&lt;br /&gt;&lt;br /&gt;The OR operator&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;The or operator can match one statement OR the other.&lt;br /&gt;&lt;br /&gt;/Hello&lt;span style="font-weight: bold;"&gt;|&lt;/span&gt;Goodbye/ - This will match the string "Hello" OR the string "Goodbye"&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Sub patterns&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Characters can be grouped together to create more complex patterns.&lt;br /&gt;&lt;br /&gt;/\w{4}&lt;span style="font-weight: bold;"&gt;(&lt;/span&gt;\d{2}\w{2}&lt;span style="font-weight: bold;"&gt;)&lt;/span&gt;?/ - This pattern searches for 4 words, followed by nothing or 2 numerical digits and 2 alphanumeric characters.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;br /&gt;Usually when using sub patterns, the script will place each matched sub pattern into the array. To prevent the script from doing this place ?: at the beginning of the sub pattern.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;/\w{4}(&lt;span style="font-weight: bold;"&gt;?:&lt;/span&gt;\d{2}\w{2})?/&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Character Classes&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;In a character class you can specify a list of single character or a range the string can match.&lt;br /&gt;You must escape certain characters with a backslash, these are the backslash, caret and hyphen&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;[&lt;/span&gt;abc&lt;span style="font-weight: bold;"&gt;]&lt;/span&gt;/ - This will match either of these strings "a", "b" and "c".&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;[&lt;/span&gt;a-c&lt;span style="font-weight: bold;"&gt;]&lt;/span&gt;/ - This will also match the three strings, but is a short method of writing a large list.&lt;br /&gt;&lt;br /&gt;However, you should be careful when using a range.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;[&lt;/span&gt;A-z&lt;span style="font-weight: bold;"&gt;]&lt;/span&gt;/ - This seem harmless, but this will actually match underscores and carets, amongst other non-alphabetic characters. To prevent this, specify two ranges&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-weight: bold;"&gt;[&lt;/span&gt;A-Za-z&lt;span style="font-weight: bold;"&gt;]&lt;/span&gt;/ - This over comes the problem.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=";font-family:arial;font-size:100%;"  &gt;&lt;span style=";font-family:arial;font-size:100%;"  &gt;The caret has a special function when not backslashed. It negates the character class.&lt;br /&gt;&lt;br /&gt;/[&lt;span style="font-weight: bold;"&gt;^&lt;/span&gt;0-9]/ - This would match a single character that is not numeric.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Pattern Modifiers&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;Modifiers alter the behaviour of the pattern. They are placed after the delimiter.&lt;br /&gt;&lt;br /&gt;/&lt;span style="font-style: italic;"&gt;pattern&lt;/span&gt;/&lt;span style="font-weight: bold;"&gt;Z&lt;/span&gt; &lt;/span&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;Where&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;&lt;span style="font-weight: bold;"&gt; Z&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt; is the modifier.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;&lt;span style="font-weight: bold;"&gt;i  - Case insensitive&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;When this modifier is in use the whole pattern becomes case insensitive.&lt;/span&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;m - Multiple Lines&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;&lt;span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;span&gt;When this modifier is active the string is treated as a single line. It is in the situation where the caret and dollar symbols act differently.&lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;&lt;span&gt;&lt;br /&gt;s - Dot All&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;span&gt;When this modifier is being used the dot character can also represent the new line character.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Resources&lt;/span&gt;&lt;br /&gt;&lt;a href="http://www.regular-expressions.info/tutorial.html"&gt;http://www.regular-expressions.info/tutorial.html&lt;/a&gt;&lt;br /&gt;&lt;a href="http://weblogtoolscollection.com/regex/regex.php"&gt;http://weblogtoolscollection.com/regex/regex.php&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/31023685-7552504145684087959?l=graemenail.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://graemenail.blogspot.com/feeds/7552504145684087959/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=31023685&amp;postID=7552504145684087959' title='94 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/31023685/posts/default/7552504145684087959'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/31023685/posts/default/7552504145684087959'/><link rel='alternate' type='text/html' href='http://graemenail.blogspot.com/2007/07/php-regular-expressions.html' title='PHP Regular Expressions'/><author><name>Graeme</name><uri>http://www.blogger.com/profile/12481487643805553074</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>94</thr:total></entry></feed>
