I am extracting data from tables within a Microsoft word document (.doc).
The data extracts fine but at the end of each extracted value (from each cell) there is a non-printable ^G character which is seriously messing with further processing. I can only see this when I paste the console output into my text editor (TextMate).
What's the best way to remove this using regex. Is this a unicode character? I cant find any reference to ^G non printable characters. I assume its an end of cell character. To be honest I would rather get rid of all non-printable characters but at the moment this is the only one that is causing my any problems so either solution will do.
To be honest I would rather get rid of all non-printable characters
You may use:
input = input.replaceAll("\\P{Print}", "");
in Java to remove all non-printable characters.
\\p{Print}
matches all printable characters (including Unicode ones) and \\P{Print}
does the reverse by matching all non-printable characters.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.