Java : How to remove all characters in String except a a-z,digits and German characters

Question

I am working on a Spring-MVC application in which we are currently integrating OCR functionality. OCR's have a habit of throwing wild-characters for wrong detections and when there is an image in background. After processing the image, we have considerably good data available, but there are still some errors. We would like to process the output as follows

Remove all single characters from the output String.
Remove any and all characters other than AZ,az, German characters ie äöü, ÄÖÜ,ß.
Spaces and digits should be left untouched.

Code :

  File imageFile = new File(fileLocation);

            BufferedImage img  = ImageIO.read(imageFile);
            BufferedImage blackNWhite = new BufferedImage(img.getWidth(),img.getHeight(),BufferedImage.TYPE_BYTE_BINARY);
            Graphics2D graphics = blackNWhite.createGraphics();
            graphics.drawImage(img, 0, 0, null);
            String blackAndWhiteImage =  zipLocation + String.valueOf(new BigInteger(130, random).toString(32))+".png";
            File outputfile = new File(blackAndWhiteImage);
            ImageIO.write(blackNWhite, "png", outputfile);

            ITesseract instance = new Tesseract();
            // Point to one folder above tessdata directory, must contain training data
            instance.setDatapath("/usr/share/tesseract-ocr/");
            // ISO 693-3 standard
            instance.setLanguage("deu");
            String result = instance.doOCR(outputfile);
            //System.out.println(result);
             result = result.replaceAll("\\P{ASCII}","");
            System.out.println("Result is "+result);
            return result;

Thank you.

Update

Wild characters left by the regex :

 |
| '(°Ul") 
_} °
=# '
( )
...................................__+_......_._._.__._._._+._._.

Answer 1

Ad. 1.
result.replaceAll("\\\\s[a-zA-ZöÖäÄüÜß]\\\\s", "");
Ad. 2.
result.replaceAll("[^a-zA-ZöÖäÄüÜß]", "");

Answer 2

This is the regex I finally used to solve this problem :

result = result.replaceAll("[^a-zA-Z0-9öÖäÄüÜß@\\s]", "");

Thank you.

Java : How to remove all characters in String except a a-z,digits and German characters

Question

2 answers

solution1
2 2017-02-28 12:39:20

solution2
0 2017-02-28 13:04:17

Java : How to remove all characters in String except a a-z,digits and German characters

Question

2 answers

solution1 2 2017-02-28 12:39:20

solution2 0 2017-02-28 13:04:17

solution1
2 2017-02-28 12:39:20

solution2
0 2017-02-28 13:04:17