Regular Expressions: Difference between revisions

From Kolmafia
Jump to navigation Jump to search
imported>Bale
beginning. much more to do.
imported>Bale
a little more
Line 29: Line 29:
*Opening and closing square bracket '''[''' and ''']'''
*Opening and closing square bracket '''[''' and ''']'''
*Backslash '''\'''
*Backslash '''\'''
*:Used to employ a special character as a literal. E.g. to match "1+1=2", the correct regex is "1\+1=2". Otherwise, the plus sign will have a special meaning.
*:Used to grant special meaning to a normally literal character or employ a special character as a literal. E.g. to match "1+1=2", the correct regex is "1\+1=2". Otherwise, the plus sign will have a special meaning.
*Question mark '''?'''
*:The question mark makes the preceding token in the regular expression optional. E.g. "colou?r" matches both "colour" and "color".
*:The question mark makes the preceding token in the regular expression optional. E.g. "colou?r" matches both "colour" and "color".
*Asterisk or star '''*'''
*Asterisk or star '''*'''
Line 35: Line 36:
*Plus sign '''+'''
*Plus sign '''+'''
*:The plus attempts to match the preceding token once or more.
*:The plus attempts to match the preceding token once or more.
*Period or dot '''.'''
*:Matches any character except for line breaks.
*Caret '''^'''
*Caret '''^'''
*:Matches the beginning of the string only. E.g. "^the" will match only the first word in "the way of the world."
*Dollar sign '''$'''
*Dollar sign '''$'''
*Period or dot '''.'''
*:Matches the end of the string only. E.g. "dog$" will match only the last word in "dog eat dog".
*Vertical bar or pipe symbol '''|'''
*Vertical bar or pipe symbol '''|'''
*Question mark '''?'''
*Opening and closing round brackets '''(''' and ''')'''
*Opening and closing round brackets '''(''' and ''')'''
*Opening and closing braces '''{''' and '''}'''
*Opening and closing braces '''{''' and '''}'''


[[Category:String Handling Routines]]
[[Category:String Handling Routines]]

Revision as of 04:14, 7 May 2010

This is just a stub. A place to put some real information later...

Regular expressions in ASH mostly are wrappers for the Java java.util.regex package. You can find information about that here: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

There's a good resource for regexp language here: http://www.regular-expressions.info/

Awesome tools for testing regexp here:


Introduction

Regular expressions, or regex are a language designed to enable creating very explicit patterns for searching strings. The regex language has wildcards for virtually every possible pattern of characters you might want to search for. Only some of the generally most common forms of regexes will be described on this page. For more details you are advised to search the internet where you will find many detailed resources on the subject. This writer will point the student at this tutorial in particular.


Commonly used Regular Expressions

Literal Characters

A character will match the first instance of itself in a string.

  • E.g. a will match the first a in "Jack is a dull boy."
  • E.g. cat is a set of three literal character which will find a match in "about cats and dogs."

Special Characters

It's often more interesting to search for less specific patterns than literal characters. The a number of characters are reserved for this purpose. These special characters are often called "metacharacters". If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash.

  • Opening and closing square bracket [ and ]
  • Backslash \
    Used to grant special meaning to a normally literal character or employ a special character as a literal. E.g. to match "1+1=2", the correct regex is "1\+1=2". Otherwise, the plus sign will have a special meaning.
  • Question mark ?
    The question mark makes the preceding token in the regular expression optional. E.g. "colou?r" matches both "colour" and "color".
  • Asterisk or star *
    The asterisk attempts to match the preceding token zero or more times.
  • Plus sign +
    The plus attempts to match the preceding token once or more.
  • Period or dot .
    Matches any character except for line breaks.
  • Caret ^
    Matches the beginning of the string only. E.g. "^the" will match only the first word in "the way of the world."
  • Dollar sign $
    Matches the end of the string only. E.g. "dog$" will match only the last word in "dog eat dog".
  • Vertical bar or pipe symbol |
  • Opening and closing round brackets ( and )
  • Opening and closing braces { and }