简体   繁体   English

从 Java.io.Reader 获取有意义的文本

[英]Getting meaningful text from Java.io.Reader

I have a program that I'm writing where I am using another company's library to download some reports from their website.我有一个正在编写的程序,我正在使用另一家公司的图书馆从他们的网站下载一些报告。 I want to parse these reports before I write them to a file, because if they match certain criteria, I want to disregard them.我想在将它们写入文件之前解析这些报告,因为如果它们符合某些条件,我想忽略它们。

Problem is, their method, called download() returns a java.io.Reader.问题是,他们的方法,称为 download() 返回一个 java.io.Reader。 The only method available to me is我唯一可用的方法是

int read(char[] cbuf);

Printing this returned array out gives me meaningless characters.打印这个返回的数组给我无意义的字符。 I want to be able to identify what character set I'm working with or convert it to a byte array but I can't figure out how to do it.我希望能够识别我正在使用的字符集或将其转换为字节数组,但我不知道该怎么做。 I've tried我试过了

//retrievedFile is my Reader object
char[] cbuf = new char[2048];
int numChars = retrievedFile.read(cbuf);
//I've tried other character sets, too
new String(cbuf).getBytes("UTF-8");

and I'm afraid to downcast to a more useful reader because I can't know for sure if it will work or not.我不敢向更有用的读者低头,因为我不确定它是否有效。 Any suggestions?有什么建议么?

EDIT编辑

When I say it prints out "meaningless characters", I don't mean that it looks like the example given by Jon Skeet.当我说它打印出“无意义的字符”时,我并不是说它看起来像 Jon Skeet 给出的例子。 It's really hard to describe because I'm not at my machine right now, but I think it's an encoding issue.这真的很难描述,因为我现在不在我的机器旁,但我认为这是一个编码问题。 The characters seem to have indentations and structure similar to the look of the reports.这些字符似乎具有类似于报告外观的缩进和结构。 I'll try these suggestions as soon as I get back on Tuesday (I'm only an intern, so I haven't bothered with setting up a remote account or anything).我会在周二回来后立即尝试这些建议(我只是一名实习生,所以我没有为设置远程帐户或其他任何事情而烦恼)。

Try this: 试试这个:

BufferedReader in = new BufferedReader(retrievedFile);
String line = null;
StringBuilder rslt = new StringBuilder();
while ((line = in.readLine()) != null) {
    rslt.append(line);
}
System.out.println(rslt.toString());

Don't typecast the Reader to any class because you don't know the real type of it. 不要将Reader强制转换为任何类,因为您不知道它的真实类型。 Instead, use BufferedReader and pass Reader into it. 而是使用BufferedReader并将Reader传递给它。 And BufferedReader take any subclass of java.io.Reader as the argument so it is save to use it. 并且BufferedReader将java.io.Reader的任何子类作为参数,因此保存以使用它。

Printing out the char[] itself will probably give you something like: 打印出char[]本身可能会给你一些类似的东西:

[C@1c8825a5

That's just the normal output of calling toString on a char array in Java. 这只是在Java中的char数组上调用toString的正常输出。 It sounds like you want to convert it into a String , which you can do with a String(char[]) constructor. 听起来你想将它转换为String ,你可以使用String(char[])构造函数。 Here's some sample code: 这是一些示例代码:

public class Test {
    public static void main(String[] args) {
        char[] chars = "hello".toCharArray();
        System.out.println((Object) chars);

        String text = new String(chars);
        System.out.println(text);
    }
}

On the other hand, java.io.Reader doesn't have a read method returning a char[] - it has methods which either return a single character at a time, or (more usefully) accept a char[] to fill with data, and return the amount of data read. 另一方面, java.io.Reader 没有 返回 char[]read方法 - 它有一次返回单个字符的方法,或者(更有用的) 接受 char[]来填充数据,并返回读取的数据量。 This is actually what your sample code shows. 这实际上是您的示例代码所显示的内容。 You just need to use the char array and the number of characters read to create the new String . 您只需要使用char数组和读取的字符数来创建新的String For example: 例如:

char[] buffer = new char[4096];
int charsRead = reader.read(buffer);
String text = new String(buffer, 0, charsRead);

However, note that it may not return all the data in one go. 但请注意,它可能无法一次性返回所有数据。 You could read it line by line using BufferedReader , or loop to fetch all of the information. 您可以使用BufferedReader逐行读取它,或循环以获取所有信息。 Guava contains useful code in its CharStreams class. Guava在其CharStreams类中包含有用的代码。 For example: 例如:

String allText = CharStreams.toString(reader);

or 要么

List<String> lines = CharStreams.readLines(reader);

What meaningless chars does it give. 它给出了什么毫无意义的字符。 Probably null chars, because you don't read all the chars from the reader, but at most 2048 chars, and you ignore the returned value from the read method (which tell you how many chars were actually read. 可能是空字符,因为你没有从阅读器中读取所有字符,但最多只读取2048个字符,并且忽略了read方法返回的值(它告诉你实际读取了多少个字符。

If you want to read the whole thing into a String, you'll have to loop until the returned value is negative, and append the chars read at each iteration (from 0 to numChars) to a StringBuilder. 如果要将整个事物读入String,则必须循环直到返回的值为负,并将每次迭代(从0到numChars)读取的字符追加到StringBuilder。

StringBuilder builder = new StringBuilder();
int numChars;
while ((numChars = reader.read(cbuf)) >= 0) {
    builder.append(cbuf, 0, numChars);
}
String s = builder.toString();

As an alternative you can read a string from a java.io.Reader using java.util.Scanner using try with resources which should automatically close the reader. 作为替代方法,您可以使用java.util.Scanner with资源自动关闭阅读器,使用java.util.Scannerjava.io.Reader读取字符串。

Here is an example: 这是一个例子:

Reader in = ...
try (Scanner scanner = new Scanner(in).useDelimiter("\\Z")) {
    String text = scanner.next();
    ... // Do something with text
}

In this situation the call to scanner.next() will read all characters, because the delimiter is the end of file. 在这种情况下,对scanner.next()的调用将读取所有字符,因为分隔符是文件的结尾。

The following one liner will also read the whole text but will not close the reader: 以下一个班轮也将阅读全文,但不会关闭读者:

String text = new Scanner(in).useDelimiter("\\Z").next();

Since Java 1.8, you can use the BufferedReader.lines() method, returning Stream<String> .从 Java 1.8 开始,您可以使用BufferedReader.lines()方法,返回Stream<String>

So, this code will return whole content, with a custom line separator "\n":因此,此代码将返回全部内容,并带有自定义行分隔符“\n”:

String content = new BufferedReader(reader)
    .lines()
    .collect(Collectors.joining("\n"));

Wrap it in something more useful, like a StringReader or a BufferedReader: 将它包装在更有用的东西中,比如StringReader或BufferedReader:

http://docs.oracle.com/javase/6/docs/api/ http://docs.oracle.com/javase/6/docs/api/

.

由于文件是文本文件,因此从Reader创建一个BufferedReader并逐行读取 - 这应该有助于更好地理解它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 装饰器模式 java.io.reader - Decorator pattern java.io.reader 从java.io.Reader读取xml而不是提供xml文件的路径时,saxreader验证失败 - saxreader validation fails when reading xml from java.io.Reader instead of providing path to xml file 扩展java.io.Reader的类的默认文件夹是什么,为什么? - What is the default folder for classes extending java.io.Reader and why? 什么是Java 10中的java.io.Reader transferTo(java.io.Writer)方法? InputStream有类似的方法吗? - What is java.io.Reader transferTo(java.io.Writer) method in Java 10 ? Is there a similar method for InputStream? 如何合并 java.io.Reader 或包装 Reader 的内容? - How to merge java.io.Reader's or wrap Reader's content? Java 8:如何创建java.util.stream.Stream <Character> 基于java.io.Reader - Java 8: How to create java.util.stream.Stream<Character> based on java.io.Reader 使用缓冲读取器在java中读取文本文件时获取null - getting null when reading a text file in java using buffered reader 用于从压缩文本文件中获取文本行的阅读器 - A reader for getting text lines from a zipped text file Java文本阅读器 - Java Text Reader 如何使用Java从formdata获取有意义的数据? - How to get meaningful data from formdata with Java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM