Talk:String Handling Routines: Difference between revisions

From Kolmafia
Jump to navigation Jump to search
imported>StDoodle
imported>StDoodle
Line 82: Line 82:


So yeah, keep up what you're doing basically. Flesh out quick coverage of the basics on the Regex page, add more links if you find ones that may also be helpful, but beyond that, don't worry too much about it. --[[User:StDoodle|StDoodle (#1059825)]] 04:09, 7 May 2010 (UTC)
So yeah, keep up what you're doing basically. Flesh out quick coverage of the basics on the Regex page, add more links if you find ones that may also be helpful, but beyond that, don't worry too much about it. --[[User:StDoodle|StDoodle (#1059825)]] 04:09, 7 May 2010 (UTC)
Edit to add:
There is one minor additional note. As you note on the Regex page & I've seen elsewhere, ash regex is just a wrapper for java regex. As such, there's absolutely no reason to cover the same ground (I'm fairly sure java has better documentation resources than mafia :P ) EXCEPT it might be nice to give some "advanced info" that points out exactly how the ash equivalents map to their java counterparts. This isn't high-priority, but is the only think beyond links and very basic coverage that I can see being worthwhile. --[[User:StDoodle|StDoodle (#1059825)]] 04:12, 7 May 2010 (UTC)

Revision as of 04:12, 7 May 2010

So wtf does group_string actually do? The linked "descriptive" post has an utterly unhelpful example. Has anyone ever used it for anything?

Groups a string into a map using a regular expression. To understand the function you must know. 1. What maps are and how they are used. 2. Understand what regular expressions are and how to create them.

Using the original post:

FUNCTION DEFINTION: string [int,int] group_string( string source, string regex ) EXAMPLE: string [int,int] test = group_string( "This is a test", "([a-z] ) " );

Example Breakdown: string [int,int] Define a map. Two dimensional. The indices are integers. The data is stored as a string. test Define the map with name test. group_string Call the function. "This is a Test" Feeding the function a sample string. "([a-z] ) " Your regular expression.

Regular expressions deal with pattern matching. You want the function to find a particular pattern. The function then returns that pattern, or the stuff before it, or the stuff after it, or splits them appart, or squeezes them together. So what does this regular expression look for? The Parenthesis (): Tell the function this is a group of characters. [a-z]: Tell us they will be lower case letters.  : Tell us to look for one or more characters. That space between the ) and " Tells us the pattern ends in a space.

Thus reading down the string. T = Does not match [a-z] is a capital letter. h = Matches [a-z]. Starting Group i = Matches [a-z] s = Matches [a-z]

 = Matches space. First group found and is "his "

i = Matches [a-z]. Starting Group s = Matches [a-z]

 = Matches space. Second group found, and is "is "

a = Matches [a-z]. Starting Group

 = Matches space. Third group found, and is "a "

t = Matches [a-z]. Starting Group e = Matches [a-z] s = Matches [a-z] t = Matches [a-z] End of line. No more matches. Stop.

Thus, trusting the post, the map would be:

test[0][0] => "his " test[0][1] => "his" test[1][0] => "is " test[1][1] => "is" test[2][0] => "a " test[2][1] => "a"

I personally haven't used it. Would be used in parsing a page by hand.


Regular expressions

Finally! As of this moment, every single function has a page on this wiki except for the regular expression functions. (Much cheering!) Part of the problem with making pages for them is that a whole ream of background information is necessary to use them. I can see several approaches that we can take to this. I hope that we can discuss which tact to take:

  1. Create a new category and page for regular expressions where we discuss how to use them in detail.
  2. At the top of the Regular expressions section on this page, we post a link to another site that discusses how to use them. Obviously this is easiest. Then we assume that information is understood by the reader.
  3. Assume information about the nature of regular expressions and how to use regular expression functions is both known and just discuss the specifics of each function.

Personally, I favor creating a page for regular expressions which starts with a link to a site that explains how to create regular expressions. Then the page explains how they can be used in ash. --Bale 10:01, 6 May 2010 (UTC)

I'm leaning toward not adding in another category on the main page; it's getting cluttered as-is. However, I'm 100% the idea of linking to a dedicated page with more info & links. For the user comfortable with regex, they can go to the strings page and just follow links. For others, they can go to the regex page first. However, I'm not a programmer, so I don't know if most would consider regex to be a sub-set of string handling, or its own category. I'd go by whatever is considered "standard" for that.

If we go with complete separation, I'd probably want to have a link to said page on the string handling page, and perhaps not even include it on the Main Page. I dunno. Honestly, I'm fine leaving this to whoever feels comfortable enough with regex to add said function pages.

Whichever approach is taken, I agree that we shouldn't include all of the background info on each page. Matche(r)s, groups, etc. should be briefly defined on their specific pages, but otherwise left to a general description page or assumed to be known with references provided (on the category / sub-category page). Explaining each concept on every page is a bit absurd. (It would be like giving a definition of string on every function page that accepts a parameter of or returns a string; we have other "general info" pages for this reason.)

Also, congrats & thanks on reaching this major milestone! --StDoodle (#1059825) 15:48, 6 May 2010 (UTC)

  • Good points. Regexp are a subset of string handling so we can keep them on that page, but link to a page of regexp information at the top of the regexp page. If we decide to have regexp information on this site. That would be a pretty good solution. How much detail should we go into on the regexp page? Just basic information and a few links for the user to learn more? Or just assume that they are understood as well as string? --Bale 21:11, 6 May 2010 (UTC)


The way you're currently going with is is a.o.k. by me. I wouldn't add too much to the dedicated page; I'd prefer additional tutorials & reference material to "original" content, for the most part. Mostly 'cause I see people fitting into the following groups re: regex;

  • Those who will never get it; it doesn't matter what we do for this group
  • Those who already get it; again, doesn't matter what we do
  • Those who are capable of getting it, given adequate info
    • Adequate is, for the most of people in this category, going to be far more than we really want to get in to (I know some people who are very comfortable with regex who STILL keep a cheat-cheat or bookmark for some stuff)
    • I don't want to spend 1k wiki-hours (that's an official metric now, dontchaknow?) on stuff that's been done well elsewhere when there's a lot of mafia (non-ash) stuff left to document; priorities, bang-for-your-buck, etc.

So yeah, keep up what you're doing basically. Flesh out quick coverage of the basics on the Regex page, add more links if you find ones that may also be helpful, but beyond that, don't worry too much about it. --StDoodle (#1059825) 04:09, 7 May 2010 (UTC)

Edit to add:

There is one minor additional note. As you note on the Regex page & I've seen elsewhere, ash regex is just a wrapper for java regex. As such, there's absolutely no reason to cover the same ground (I'm fairly sure java has better documentation resources than mafia :P ) EXCEPT it might be nice to give some "advanced info" that points out exactly how the ash equivalents map to their java counterparts. This isn't high-priority, but is the only think beyond links and very basic coverage that I can see being worthwhile. --StDoodle (#1059825) 04:12, 7 May 2010 (UTC)