繁体   English   中英

从输入文本文件填充2D数组

[英]Populate 2D array from input text file

我有一个数据语料库,其中充满了以下形式的实例:

'be in'('force', 'the closed area').
'advise'('coxswains', 'mr mak').
'be'('a good', 'restricted area').
'establish from'('person \'s id', 'the other').

我想从.txt文件中读取此数据,并仅用单引号内的信息填充2D数组,即

be in          [0][0], force         [0][1], the closed area [0][2]
advise         [1][0], coxswains     [1][1], mr mak          [1][2]
be             [2][0], a good        [2][1], restricted area [2][2]
establish from [3][0], person \'s id [3][1], the other       [3][2]

^这些数组索引作为概念性引用存在于此,正如我上面所说的,仅单引号中的信息是可取的,例如,索引[0] [0]将be in ,索引[3] [1]将在其中person \\'s id

但是与示例索引[3] [1]一样,我们可能在单引号前加上反斜杠,但不应将其解释为定界符。

到目前为止,这是我所拥有的:

BufferedReader br_0 = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/2_January/Prolog/horn_data_test.pl"));
    String line_0;
    while ((line_0 = br_0.readLine()) != null) 
    {

        String[] items = line_0.split("'");
        String[][] dataArray = new String [3][262978];
        int i;
        for (String item : items) 
        {
            for (i = 0; i<items.length; i++)
            {
                if (i == 0) 
                {
                    System.out.println("first arg: " + items[i]);
                } 
                if (i == 1) 
                {
                    System.out.println("first arg: " + items[i]);
                }
                if (i == 2)
                {
                    System.out.println("second arg: " + items[i]);
                }
            }
        }           
    }
    br_0.close();

我知道我需要类似的东西:

if (the character under consideration == ' && the one before it is not \)
put it into first index, etc. etc. 

但是如何使它在下一个定界符之前停止呢? 填充该数组的最佳方法是什么? 输入文件很大,因此我正在尝试优化效率。

您可以像这样将regex与PatternMatcher一起使用:

public static void main(String[] args) throws IOException {

    String[] stringArr = { "'be in'('force', 'the closed area').",
            "'advise'('coxswains', 'mr mak').",
            "'be'('a good', 'restricted area').",
            "'establish from'('person \'s id', 'the other')." };
    int i = 0;
    Pattern p = Pattern.compile("'(.*?)'(?![a-zA-Z])");
    String[][] arr = new String[4][3];
    for (int count = 0; count < stringArr.length; count++) {
        Matcher m = p.matcher(stringArr[count]);
        int j = 0;
        while (m.find()) {

            arr[i][j++] = m.group(1);
        }
        i++;

    }

    for (int k = 0; k < arr.length; k++) {
        for (int j = 0; j < arr[k].length; j++) {
            System.out.println("arr[" + k + "][" + j + "] " + arr[k][j]);
        }
    }

}

O / P:

arr[0][0] be in
arr[0][1] force
arr[0][2] the closed area
arr[1][0] advise
arr[1][1] coxswains
arr[1][2] mr mak
arr[2][0] be
arr[2][1] a good
arr[2][2] restricted area
arr[3][0] establish from
arr[3][1] person 's id
arr[3][2] the other

您可以使用此正则表达式来匹配带引号的单引号字符串:

'(.*?)(?<!\\)'

matcher.group(1)用作引号内的字符串。

正则演示

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM