Regular Expressions

From Kolmafia
Revision as of 21:44, 7 May 2010 by imported>StDoodle
Jump to navigation Jump to search

This is just a stub. A place to put some real information later...

Regular expressions in ASH mostly are wrappers for the Java java.util.regex package. You can find information about that here: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

There's a good resource for regexp language here: http://www.regular-expressions.info/

Awesome tools for testing regexp here:


Introduction

Regular expressions, or regex are a language designed to enable creating very explicit patterns for searching strings. The regex language has wildcards for virtually every possible pattern of characters you might want to search for. Only some of the generally most common forms of regexes will be described on this page. For more details you are advised to search the internet where you will find many detailed resources on the subject. This writer will point the student at this tutorial in particular.


Commonly used Regular Expressions

Literal Characters

A character will match the first instance of itself in a string.

  • E.g. a will match the first a in "Jack is a dull boy."
  • E.g. cat is a set of three literal character which will find a match in "about cats and dogs."

Special Characters

It's often more interesting to search for less specific patterns than literal characters. The a number of characters are reserved for this purpose. These special characters are often called "metacharacters". If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash.

  • Opening and closing square bracket [ and ]
    Used to create "character sets" to match only one of several characters.
    E.g. "gr[ae]y" will match either "gray" or "grey", but it will not match "graey".
  • Backslash \
    Used to grant special meaning to a normally literal character or employ a special character as a literal.
    E.g. If you want to find the beginning of a word, the combination \b will match a word boundry.
    E.g. to match "1+1=2", the correct regex is "1\+1=2". Otherwise, the plus sign will have a special meaning.
  • Question mark ?
    The question mark makes the preceding token in the regular expression optional.
    E.g. "colou?r" matches both "colour" and "color".
  • Asterisk or star *
    The asterisk attempts to match the preceding token zero or more times.
  • Plus sign +
    The plus attempts to match the preceding token once or more.
  • Period or dot .
    Matches any character except for line breaks.
  • Caret ^
    Matches the beginning of the string only.
    E.g. "^the" will match only the first word in "the way of the world."
  • Dollar sign $
    Matches the end of the string only.
    E.g. "dog$" will match only the last word in "dog eat dog".
  • Opening and closing round brackets ( and )
    Used for grouping allowing a regex operator (like +) to be applied to the entire group. It also creates a backreference storing the match.
  • Opening and closing braces { and }
    This is a limited repetition operator matching only {min,max} of what preceeds it.
    E.g. "\b[1-9][0-9]{2,4}\b" matches a number between 100 and 99999. (\b matches a word boundry.)
  • Vertical bar or pipe symbol |
    This is an "or" operator to match one of several possibilities.
    E.g. "\b(cat|dog|fish)\b" will match either "cat", "dog" or "fish".

Using Regexes in KolMafia

Regular expressions in ASH are wrappers for the Java java.util.regex package. You can find detailed information about that in this Java Tutorial. Only the highlights will be described in this section.