简体   繁体   中英

Reading data from a text file, a badly written text file

I'm writing a program that takes data from rows in a text file. the problem is that its not the best written text file and there is much confusion when trying to write a parser for the file

Here are two such rows, for both I can get the address and latitude and longitude variables but on the second one i cannot get price or size(s). the error i keep getting is a string out of bounds exception of -41 (seriously)

|12091805|,|0|,|DETAILS|,||,||,|Latitude:54.593406, Longitude:-5.934344 <b >Unit 8 Great Northern Mall Great Victoria Street Belfast Down<//b><p><p><p>Price : 150,000<p>Size: 2,411 Sq Feet  ()<p>Rent : 50,500 Per Annum<p><p>Text<p><p>|,||,||

|15961081|,|0|,|DETAILS|,||,||,|<p>Latitude:54.593406, Longitude:-5.934344   <b>3-5 Market Street Lurgan BT66</b> </p>  <p> </p>  <p> </p>  <p>   Price : &pound;250,000 </p>  <p>   Size: 0.173 acres (0.07ha) </p>  <p> </p>  <p>   Text </p>  <p> </p>  <p>  Text </p>  <p> </p>  <p>   Text </p>  <p> </p>  <p> </p>|,||,||

Its a lot longer but I changed the paragraphs just to say text for now.

And no, I cannot re-write the text file. Any pointers would be appreciated

if (s.contains("Price"))
{
    int pstart = 0;
    int pend = 0;

    if (s.contains("<p>Size"))
    {

        //if has pound symbol
        if (s.contains("&pound;"))
        {
            String[] str = s.split("&pound;");
            StringBuilder bs = new StringBuilder();
            for (String st : str)
            {
                bs.append(st);
            }

            pstart = bs.indexOf("Price") + 8;
            pend = bs.indexOf("</p>") - 1;
        }
        else
        {
            pstart = s.indexOf("Price") + 8;
            pend = s.indexOf("<p>Size");
        }

        String sp = s.substring(pstart, pend);

        String[] spl = sp.split(",");
        StringBuilder build = new StringBuilder();
        for (String st : spl)
        {
            build.append(st);
            f = build.toString();
        }
        in = Integer.parseInt(f);
        p.setPrice(in);
    }
    else
    {
        if (s.contains("&pound;"))
        {
            String[] str = s.split("&pound;");
            StringBuilder bs = new StringBuilder();
            for (String st : str)
            {
                bs.append(st);
            }

            pstart = bs.indexOf("Price : ");
            pend = bs.indexOf("</p>") - 1;
        }
        else
        {
            pstart = s.indexOf("Price") + 8;
            pend = s.indexOf("<p>Size");
        }

        String sp = s.substring(pstart, pend);

        String[] spl = sp.split(",");
        StringBuilder build = new StringBuilder();
        for (String st : spl)
        {
            build.append(st);
            f = build.toString();
        }
        in = Integer.parseInt(f);
        p.setPrice(in);
    }
}

// if has size property
if (s.contains("Size"))
{
    //if in acres
    if (s.contains("acres"))
    {
        int sstart = s.indexOf("Size:") + 6;
        int send = s.indexOf("acres") - 1;

        String sp = s.substring(sstart, send);
        double d = Double.parseDouble(sp);

        p.setSized(d);

    }

    if (s.contains("()"))
    {
        int sstart = s.indexOf("Size:") + 6;

        int send = s.indexOf("Sq") - 2;

        String sp = s.substring(sstart, send);

        if (sp.contains("-") && sp.contains(","))
        {
            String[] spl = sp.split("-|,");

            StringBuilder str = new StringBuilder();
            str.append(spl[0] + spl[1]);

            StringBuilder str2 = new StringBuilder(0);
            str2.append(spl[2] + spl[3]);

            String s1 = str.toString();
            int i = Integer.parseInt(s1);
            p.setSize(i);

            String s2 = str2.toString();
            i = Integer.parseInt(s2);
            p.setSize2(i);
        }

        if (sp.contains("-"))
        {
            String[] spl = sp.split("-");

            int one = Integer.parseInt(spl[0]);

            p.setSize(one);

            int two = Integer.parseInt(spl[1]);

            p.setSize2(two);

        }
        else if (!(sp.contains("-")))
        {
            if (sp.contains(","))
            {
                String[] spl = sp.split(",");
                StringBuilder build = new StringBuilder();
                for (String st : spl)
                {
                    build.append(st);
                    f = build.toString();
                }
                in = Integer.parseInt(f);
                p.setSize(in);
            }
            else
            {
                p.setSize(Integer.parseInt(sp));
            }

        }

    }

}
v.add(p);
p = new Property();

I'd use regular expressions, the following should point you in the right direction:

Pattern pricePattern = Pattern.compile("Price\\s*:\\s*(&pound;)?([0-9,.]+)"); 
Pattern sqFeetPattern = Pattern.compile("Size\\s*:\\s*([0-9,.]+)\\s*Sq"); 
Pattern acresPattern = Pattern.compile("Size\\s*:\\s*([0-9,.]+)\\s*acres\\s*\\(([0-9,.]+)ha\\)"); 

NumberFormat nf = NumberFormat.getNumberInstance();
nf.setGroupingUsed(true);

BufferedReader r = new BufferedReader(inputFileReader);
String line;
while ((line = r.readLine()) != null) {
    Matcher m = pricePattern.matcher(line);
    if (m.find()) {
        int price = nf.parse(m.group(2)).intValue();
        System.out.println("Price: " + price);
    }
    m = sqFeetPattern.matcher(line);
    if (m.find()) {
        int sqFeet = nf.parse(m.group(1)).intValue();
        System.out.println("Sq Feet: " + sqFeet);
    }
    m = acresPattern.matcher(line);
    if (m.find()) {
        float acres = nf.parse(m.group(1)).floatValue();
        float ha = nf.parse(m.group(2)).floatValue();
        System.out.println("Acres: " + acres + " ha: " + ha);
    }
}

NB inputFileReader would be defined as a FileReader or similar to get your file.

The approach I would take is.

  1. Read the line of text
  2. Decode the line of text - looks like HTML markup, so convert escaped characters ( &pound; for example) to the equivalent text character and filter out HTML markup ( <p> etc)
  3. Perform extraction of data on the cleaned up data using Regular Expressions
  4. Process data
  5. Next line or end.

For step 2, something like this is what I'm thinking. So you strip all of the html markup out of the string before splitting it on the field separater (|)

Remove HTML tags from a String

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM