Talk:String Handling Routines

From Kolmafia
Revision as of 21:11, 6 May 2010 by imported>Bale (→‎Regular expressions)
Jump to navigation Jump to search

So wtf does group_string actually do? The linked "descriptive" post has an utterly unhelpful example. Has anyone ever used it for anything?

Groups a string into a map using a regular expression. To understand the function you must know. 1. What maps are and how they are used. 2. Understand what regular expressions are and how to create them.

Using the original post:

FUNCTION DEFINTION: string [int,int] group_string( string source, string regex ) EXAMPLE: string [int,int] test = group_string( "This is a test", "([a-z] ) " );

Example Breakdown: string [int,int] Define a map. Two dimensional. The indices are integers. The data is stored as a string. test Define the map with name test. group_string Call the function. "This is a Test" Feeding the function a sample string. "([a-z] ) " Your regular expression.

Regular expressions deal with pattern matching. You want the function to find a particular pattern. The function then returns that pattern, or the stuff before it, or the stuff after it, or splits them appart, or squeezes them together. So what does this regular expression look for? The Parenthesis (): Tell the function this is a group of characters. [a-z]: Tell us they will be lower case letters.  : Tell us to look for one or more characters. That space between the ) and " Tells us the pattern ends in a space.

Thus reading down the string. T = Does not match [a-z] is a capital letter. h = Matches [a-z]. Starting Group i = Matches [a-z] s = Matches [a-z]

 = Matches space. First group found and is "his "

i = Matches [a-z]. Starting Group s = Matches [a-z]

 = Matches space. Second group found, and is "is "

a = Matches [a-z]. Starting Group

 = Matches space. Third group found, and is "a "

t = Matches [a-z]. Starting Group e = Matches [a-z] s = Matches [a-z] t = Matches [a-z] End of line. No more matches. Stop.

Thus, trusting the post, the map would be:

test[0][0] => "his " test[0][1] => "his" test[1][0] => "is " test[1][1] => "is" test[2][0] => "a " test[2][1] => "a"

I personally haven't used it. Would be used in parsing a page by hand.


Regular expressions

Finally! As of this moment, every single function has a page on this wiki except for the regular expression functions. (Much cheering!) Part of the problem with making pages for them is that a whole ream of background information is necessary to use them. I can see several approaches that we can take to this. I hope that we can discuss which tact to take:

  1. Create a new category and page for regular expressions where we discuss how to use them in detail.
  2. At the top of the Regular expressions section on this page, we post a link to another site that discusses how to use them. Obviously this is easiest. Then we assume that information is understood by the reader.
  3. Assume information about the nature of regular expressions and how to use regular expression functions is both known and just discuss the specifics of each function.

Personally, I favor creating a page for regular expressions which starts with a link to a site that explains how to create regular expressions. Then the page explains how they can be used in ash. --Bale 10:01, 6 May 2010 (UTC)

I'm leaning toward not adding in another category on the main page; it's getting cluttered as-is. However, I'm 100% the idea of linking to a dedicated page with more info & links. For the user comfortable with regex, they can go to the strings page and just follow links. For others, they can go to the regex page first. However, I'm not a programmer, so I don't know if most would consider regex to be a sub-set of string handling, or its own category. I'd go by whatever is considered "standard" for that.

If we go with complete separation, I'd probably want to have a link to said page on the string handling page, and perhaps not even include it on the Main Page. I dunno. Honestly, I'm fine leaving this to whoever feels comfortable enough with regex to add said function pages.

Whichever approach is taken, I agree that we shouldn't include all of the background info on each page. Matche(r)s, groups, etc. should be briefly defined on their specific pages, but otherwise left to a general description page or assumed to be known with references provided (on the category / sub-category page). Explaining each concept on every page is a bit absurd. (It would be like giving a definition of string on every function page that accepts a parameter of or returns a string; we have other "general info" pages for this reason.)

Also, congrats & thanks on reaching this major milestone! --StDoodle (#1059825) 15:48, 6 May 2010 (UTC)

  • Good points. Regexp are a subset of string handling so we can keep them on that page, but link to a page of regexp information at the top of the regexp page. If we decide to have regexp information on this site. That would be a pretty good solution. How much detail should we go into on the regexp page? Just basic information and a few links for the user to learn more? Or just assume that they are understood as well as string? --Bale 21:11, 6 May 2010 (UTC)