Difference between revisions of "Regular Expressions"

From Kolmafia
Jump to navigation Jump to search
imported>StDoodle
m (added category link-back)
imported>Bale
(beginning. much more to do.)
Line 8: Line 8:
 
* http://gskinner.com/RegExr/
 
* http://gskinner.com/RegExr/
 
* http://www.fileformat.info/tool/regex.htm
 
* http://www.fileformat.info/tool/regex.htm
 +
 +
 +
== Introduction ==
 +
 +
 +
Regular expressions, or regex are a language designed to enable creating very explicit patterns for searching strings. The regex language has wildcards for virtually every possible pattern of characters you might want to search for. Only some of the generally most common forms of regexes will be described on this page. For more details you are advised to search the internet where you will find many detailed resources on the subject. This writer will point the student at [http://www.regular-expressions.info/tutorial.html this tutorial] in particular.
 +
 +
 +
== Commonly used Regular Expressions ==
 +
 +
===Literal Characters===
 +
A character will match the first instance of itself in a string.
 +
 +
*E.g. '''a''' will match the first a in "J'''a'''ck is a dull boy."
 +
*E.g. '''cat''' is a set of three literal character which will find a match in "about '''cats''' and dogs."
 +
 +
===Special Characters===
 +
It's often more interesting to search for less specific patterns than literal characters. The a number of characters are reserved for this purpose. These special characters are often called "metacharacters". If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash.
 +
 +
*Opening and closing square bracket '''[''' and ''']'''
 +
*Backslash '''\'''
 +
*:Used to employ a special character as a literal. E.g. to match "1+1=2", the correct regex is "1\+1=2". Otherwise, the plus sign will have a special meaning.
 +
*:The question mark makes the preceding token in the regular expression optional. E.g. "colou?r" matches both "colour" and "color".
 +
*Asterisk or star '''*'''
 +
*:The asterisk attempts to match the preceding token zero or more times.
 +
*Plus sign '''+'''
 +
*:The plus attempts to match the preceding token once or more.
 +
*Caret '''^'''
 +
*Dollar sign '''$'''
 +
*Period or dot '''.'''
 +
*Vertical bar or pipe symbol '''|'''
 +
*Question mark '''?'''
 +
*Opening and closing round brackets '''(''' and ''')'''
 +
*Opening and closing braces '''{''' and '''}'''
 +
 
[[Category:String Handling Routines]]
 
[[Category:String Handling Routines]]

Revision as of 04:08, 7 May 2010

This is just a stub. A place to put some real information later...

Regular expressions in ASH mostly are wrappers for the Java java.util.regex package. You can find information about that here: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

There's a good resource for regexp language here: http://www.regular-expressions.info/

Awesome tools for testing regexp here:


Introduction

Regular expressions, or regex are a language designed to enable creating very explicit patterns for searching strings. The regex language has wildcards for virtually every possible pattern of characters you might want to search for. Only some of the generally most common forms of regexes will be described on this page. For more details you are advised to search the internet where you will find many detailed resources on the subject. This writer will point the student at this tutorial in particular.


Commonly used Regular Expressions

Literal Characters

A character will match the first instance of itself in a string.

  • E.g. a will match the first a in "Jack is a dull boy."
  • E.g. cat is a set of three literal character which will find a match in "about cats and dogs."

Special Characters

It's often more interesting to search for less specific patterns than literal characters. The a number of characters are reserved for this purpose. These special characters are often called "metacharacters". If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash.

  • Opening and closing square bracket [ and ]
  • Backslash \
    Used to employ a special character as a literal. E.g. to match "1+1=2", the correct regex is "1\+1=2". Otherwise, the plus sign will have a special meaning.
    The question mark makes the preceding token in the regular expression optional. E.g. "colou?r" matches both "colour" and "color".
  • Asterisk or star *
    The asterisk attempts to match the preceding token zero or more times.
  • Plus sign +
    The plus attempts to match the preceding token once or more.
  • Caret ^
  • Dollar sign $
  • Period or dot .
  • Vertical bar or pipe symbol |
  • Question mark ?
  • Opening and closing round brackets ( and )
  • Opening and closing braces { and }