简体   繁体   English

用于解析基于文本的简单数据文件的正则表达式

[英]Regex for Parsing Simple Text-Based Datafile

Can anyone give me a hand with a touch of regex? 谁能给我一点正则表达式的帮助?

I'm reading in a list of "locations" for a simple text adventure (those so popular back in the day). 我正在阅读一个“位置”列表,以进行简单的文字冒险(那些在当时很流行的冒险)。 However, I'm unsure as to how to obtain the input. 但是,我不确定如何获取输入。

The locations all follow the format: 所有位置均遵循以下格式:

<location_name>, [<item>]
    [direction, location_name]

Such as: 如:

Albus Square, Flowers, Traffic Cone
    NORTH, Franklandclaw Lecture Theatre
    WEST, Library of Enchanted Books
    SOUTH, Furnesspuff College

Library of Enchanted Books
    EAST, Albus Square
    UP, Reading Room

(Subsequent locations are separated by a blank line.) (后续位置由空白行分隔。)

I'm storing these as Location objects with the structure: 我将这些存储为具有以下结构的Location对象:

public class Location {

    private String name;

    private Map<Direction, Location> links;

    private List<Item> items;

}

I use a method to retrieve the data from a URL and create the Location objects from the read text, but I'm at a complete block as to do this. 我使用一种方法来从URL检索数据并从读取的文本创建Location对象,但是我在一个完整的步骤上这样做。 I think regex would be of help. 我认为正则表达式会有所帮助。 Can anyone lend me a well-needed hand? 有人可以帮我一把吗?

You don't want to use a text-only format for this: 您不想为此使用纯文本格式:

  • What happens when you have more than a single flower item? 如果您拥有一个以上的花朵,会发生什么? Are they all the same? 他们都一样吗? Can't an adventurer collect a bouqet at by picking single flowers at several locations? 冒险家不能在几个位置采摘鲜花来收集bouqet吗?

  • There will probably be several rooms with the same name ("cellar", "street corner"), ie filler rooms which add to the atmosphere but nothing to the game. 可能会有几个同名的房间(“地窖”,“街角”),即更衣室,增加了气氛但对游戏没有影响。 They don't get a description of their own, though. 但是,他们没有自己的描述。 How to keep them apart? 如何使它们分开?

  • What if a name contains a comma? 如果名称包含逗号怎么办?

  • Eventually, you'll want to use Unicode for foreign names or formatting instructions. 最终,您将要使用Unicode作为外来名称或格式说明。

Since this is structured data which can contain lots of odd cases, I suggest to use XML for this: 由于这是结构化的数据,可能包含很多奇怪的情况,因此我建议对此使用XML:

<locations>
    <location>
        <name>Albus Square</name>
        <summary>Short description for returning adventurer</summary>
        <description>Long text here ... with formatting, etc.</description>
        <items>
            <item>Flowers</item>
            <item>Traffic Cone</item>
        <items>
        <directions>
            <north>Franklandclaw Lecture Theatre</north>
            <west>Library of Enchanted Books</west>
            <south>Furnesspuff College</south>
        </directions>
    </location>
    <location>
        <name>Library of Enchanted Books</name>
        <directions>
            <east>Albus Square</east>
            <up>Reading Room</up>
        </directions>
    </location>
</locations>

This allows for much greater flexibility, solves a lot of issues like formatting description text, Unicode characters, etc. plus you can use more than a single item/location with the same name by using IDs (numbers) instead of text. 这样可以提供更大的灵活性,解决了很多问题,例如格式化描述文本,Unicode字符等。此外,您可以使用多个ID(数字)代替文本来使用多个具有相同名称的项目/位置。

Use JDom or DecentXML to parse the game config. 使用JDomDecentXML解析游戏配置。

Agree w/ willcodejavaforfood, regex could be used but isn't a big boost here. 同意使用willcodejavaforfood,可以使用正则表达式,但在这里并不是很大的提升。

Sounds like you just need a little algorithm help (sloppy p-code follows)... 听起来您只需要一点算法帮助即可(紧随其后的p代码)...

currloc = null
while( line from file )
    if line begins w/ whitespace
        (dir, loc) = split( line, ", " )
        add dir, loc to currloc
    else
        newlocdata = split( line, ", " )
        currloc = newlocdata[0]
        for i = 1 to size( newlocdata ) - 1
            item = newlocdata[i]
            add item to currloc

Can't get my head into Java-mode right now, so here's some pseudo-code that should do it: 现在无法进入Java模式,因此下面的一些伪代码应该可以做到:

Data = MyString.split('\n\n++\s*+');

for ( i=0 ; i<Data.length ; i++ )
{
    CurLocation = Data[i].split('\n\s*+');

    LocationInfo = CurLocation[0].split(',\s*+');

    LocationName = LocationInfo[0];

    for ( n=1 ; n<LocationInfo.length ; n++ )
    {
        Items[n-1] = LocationInfo[n];
    }


    for ( n=1 ; n<CurLocation.length ; n++ )
    {
        DirectionInfo = LocationInfo[n].split(',\s*+');

        DirectionName = DirectionInfo[0];

        for ( x=1 ; x<DirectionInfo.length ; x++ )
        {
            DirectionLocation[x-1] = DirectionInfo[x];
        }

    }


}

Can you change the format of the data. 您可以更改数据格式吗? That format is klunky. 这种格式很笨拙。 I suspect that you're busy reinventing the square wheel... This screems "Just use XML" to me. 我怀疑您正在忙于重新设计方形齿轮...这对我来说是“只使用XML”。

I think using XML is overkill (shooting sparrows with cannons) while regexps are "underkill" (using a too weak tool, scrubbing floors with a toothbrush). 我认为使用XML是过分的(用大炮射击麻雀),而正则表达式则是“过少的”(使用太弱的工具,用牙刷擦洗地板)。

The right balance sounds like it's "the .ini format" or "mail headers with sections". 正确的平衡听起来像是“ .ini格式”或“带有部分的邮件头”。 For python there are library docs at http://docs.python.org/library/configparser.html . 对于python,位于http://docs.python.org/library/configparser.html的库文档。

A brief example: 一个简单的例子:

[albus_square]
name: Albus Square
items: Flowers, Traffic Cone
north: lecture_theatre
west: library_enchanted_books
south: furnesspuff_college

I'd assume there's a Java library for this format. 我假设有一个Java库用于这种格式。 As another poster has pointed out, you might have name collision so I took the liberty of adding a "name:" field. 正如另一位海报指出的那样,您可能会遇到名称冲突,因此我随意添加了“名称:”字段。 The name in the square brackets would be the unique identifier. 方括号中的名称将是唯一标识符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM