简体   繁体   中英

Regex for Parsing Simple Text-Based Datafile

Can anyone give me a hand with a touch of regex?

I'm reading in a list of "locations" for a simple text adventure (those so popular back in the day). However, I'm unsure as to how to obtain the input.

The locations all follow the format:

<location_name>, [<item>]
    [direction, location_name]

Such as:

Albus Square, Flowers, Traffic Cone
    NORTH, Franklandclaw Lecture Theatre
    WEST, Library of Enchanted Books
    SOUTH, Furnesspuff College

Library of Enchanted Books
    EAST, Albus Square
    UP, Reading Room

(Subsequent locations are separated by a blank line.)

I'm storing these as Location objects with the structure:

public class Location {

    private String name;

    private Map<Direction, Location> links;

    private List<Item> items;

}

I use a method to retrieve the data from a URL and create the Location objects from the read text, but I'm at a complete block as to do this. I think regex would be of help. Can anyone lend me a well-needed hand?

You don't want to use a text-only format for this:

  • What happens when you have more than a single flower item? Are they all the same? Can't an adventurer collect a bouqet at by picking single flowers at several locations?

  • There will probably be several rooms with the same name ("cellar", "street corner"), ie filler rooms which add to the atmosphere but nothing to the game. They don't get a description of their own, though. How to keep them apart?

  • What if a name contains a comma?

  • Eventually, you'll want to use Unicode for foreign names or formatting instructions.

Since this is structured data which can contain lots of odd cases, I suggest to use XML for this:

<locations>
    <location>
        <name>Albus Square</name>
        <summary>Short description for returning adventurer</summary>
        <description>Long text here ... with formatting, etc.</description>
        <items>
            <item>Flowers</item>
            <item>Traffic Cone</item>
        <items>
        <directions>
            <north>Franklandclaw Lecture Theatre</north>
            <west>Library of Enchanted Books</west>
            <south>Furnesspuff College</south>
        </directions>
    </location>
    <location>
        <name>Library of Enchanted Books</name>
        <directions>
            <east>Albus Square</east>
            <up>Reading Room</up>
        </directions>
    </location>
</locations>

This allows for much greater flexibility, solves a lot of issues like formatting description text, Unicode characters, etc. plus you can use more than a single item/location with the same name by using IDs (numbers) instead of text.

Use JDom or DecentXML to parse the game config.

Agree w/ willcodejavaforfood, regex could be used but isn't a big boost here.

Sounds like you just need a little algorithm help (sloppy p-code follows)...

currloc = null
while( line from file )
    if line begins w/ whitespace
        (dir, loc) = split( line, ", " )
        add dir, loc to currloc
    else
        newlocdata = split( line, ", " )
        currloc = newlocdata[0]
        for i = 1 to size( newlocdata ) - 1
            item = newlocdata[i]
            add item to currloc

Can't get my head into Java-mode right now, so here's some pseudo-code that should do it:

Data = MyString.split('\n\n++\s*+');

for ( i=0 ; i<Data.length ; i++ )
{
    CurLocation = Data[i].split('\n\s*+');

    LocationInfo = CurLocation[0].split(',\s*+');

    LocationName = LocationInfo[0];

    for ( n=1 ; n<LocationInfo.length ; n++ )
    {
        Items[n-1] = LocationInfo[n];
    }


    for ( n=1 ; n<CurLocation.length ; n++ )
    {
        DirectionInfo = LocationInfo[n].split(',\s*+');

        DirectionName = DirectionInfo[0];

        for ( x=1 ; x<DirectionInfo.length ; x++ )
        {
            DirectionLocation[x-1] = DirectionInfo[x];
        }

    }


}

Can you change the format of the data. That format is klunky. I suspect that you're busy reinventing the square wheel... This screems "Just use XML" to me.

I think using XML is overkill (shooting sparrows with cannons) while regexps are "underkill" (using a too weak tool, scrubbing floors with a toothbrush).

The right balance sounds like it's "the .ini format" or "mail headers with sections". For python there are library docs at http://docs.python.org/library/configparser.html .

A brief example:

[albus_square]
name: Albus Square
items: Flowers, Traffic Cone
north: lecture_theatre
west: library_enchanted_books
south: furnesspuff_college

I'd assume there's a Java library for this format. As another poster has pointed out, you might have name collision so I took the liberty of adding a "name:" field. The name in the square brackets would be the unique identifier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM