简体   繁体   English

尝试将二进制文件作为文本读取,但扫描程序在第一行停止

[英]Trying to read binary file as text but scanner stops at first line

I'm trying to read a binary file but my program just stops at first line.. I think it's because of the strange characters the file has..I just want to extract some directions from it. 我正在尝试读取二进制文件,但我的程序只停在第一行..我认为这是因为文件有奇怪的字符..我只是想从中提取一些方向。 Is there a way to do this?.. 有没有办法做到这一点?..

public static void main(String[] args) throws IOException
{

    Scanner readF = new Scanner(new File("D:\\CurrentDatabase_372.txt"));
    String line = null;
    String newLine = System.getProperty("line.separator");
    FileWriter writeF = new FileWriter("D:\\Songs.txt");

    while (readF.hasNext())
    {
        line = readF.nextLine();

        if (line.contains("D:\\") && line.contains(".mp3"))
        {
            writeF.write(line.substring(line.indexOf("D:\\"), line.indexOf(".mp3") + 4) + newLine);
        }
    }

    readF.close();
    writeF.close();
}

The file starts like this: 该文件如下所示:

pppppamepD:\Music\Korn\Untouchables\03     Blame.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables003pMetalKornUntouchables003pBlameKornUntouchables003pKornKornUntouchables003pMP3pppppCpppÀppp@ppøp·pppŸú#pdppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒppp’ÍpET?ppppppôpp¼}`Ñ#ãâK†¡H¤*(DppppppppppppppppuÞѤéú:M®$@]jkÝW0ÛœFµú½XVNp`w—wâÊp:ºŽwâÊpppp8Npdpp¡pp{)pppppppppppppppppyY:¸[ªA¥Bi   `Û¯pppppppppppp2pppppppppppppppppppppppppppppppppppp¿ÞpAppppppp€ppp€;€?€CpCpC€H€N€S€`€e€y€~p~p~€’€«€Ê€â€Hollow LifepD:\Musica\Korn\Untouchables\04 Hollow Life.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables004pMetalKornUntouchables004pHollow LifeKornUntouchables004pKornKornUntouchables004pMP3pppppCpppÀHppppppøp¸pppǺxp‰ppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒpppŠºppppppppppôpp¼}`Ñ#ãâK†¡H¤*(DpppppppppppppppppãG#™R‚CA—®þ^bN °mbŽ‚^¨pG¦sp;5p5ÓÐùšwâÊp
)ŽwâÊpppp8Npdpp!cpp{pppppppppppppppppyY:¸[ªA¥Bi `ۯǺxp‰pppppp2pppppppppppppppppppppppppppppppppppp¿

I want to extract file directions like "D:\\Music\\Korn\\Untouchables\\03 Blame.mp3". 我想提取文件方向,如“D:\\ Music \\ Korn \\ Untouchables \\ 03 Blame.mp3”。

You cannot use a line-oriented scanner to read binary files. 您不能使用面向行的扫描程序来读取二进制文件。 You have no guarantee that the binary file even has "lines" delimited by newline characters. 您无法保证二进制文件甚至具有换行符分隔的“行”。 For example, what would your scanner do if there were TWO files matching the pattern "D:\\.*.mp3" with no intervening newline? 例如,如果有两个文件匹配模式“D:\\。*。mp3”没有插入换行符,你的扫描仪会做什么? You would extract everything between the first "D:\\" and the last ".mp3", with all the garbage in between. 您将在第一个“D:\\”和最后一个“.mp3”之间提取所有内容,其中包含所有垃圾。 Extracting file names from a non-delimited stream such as this requires a different strategy. 从诸如此类的非分隔流中提取文件名需要不同的策略。

If i were writing this I'd use a relatively simple finite-state recognizer that processes characters one at a time. 如果我写这篇文章,我会使用一个相对简单的有限状态识别器来逐个处理字符。 When it encounters a "d" it starts saving characters, checking each character to ensure that it matches the required pattern, ending when it sees the "3" in ".mp3". 当它遇到“d”时,它开始保存字符,检查每个字符以确保它匹配所需的模式,当它在“.mp3”中看到“3”时结束。 If at any point it detects a character that doesn't fit, it resets and continues looking. 如果它在任何时候检测到一个不适合的角色,它会重置并继续寻找。

EDIT: If the files to be processed are small (less than 50mb or so) you could load the entire file into memory, which would make scanning simpler. 编辑:如果要处理的文件很小(小于50mb左右),您可以将整个文件加载到内存中,这将使扫描更简单。

As was said, since it is a binary file you can't use a Scanner or other character based readers. 如前所述,由于它是二进制文件,因此您无法使用扫描仪或其他基于字符的阅读器。 You could use a regular FileInputStream to read the actual raw bytes of the file. 您可以使用常规FileInputStream来读取文件的实际原始字节。 Java's String class has a constructor that will take an array of bytes and turn them into a string. Java的String类有一个构造函数,它将获取一个字节数组并将它们转换为字符串。 You can then search that string for the file name(s). 然后,您可以在该字符串中搜索文件名。 This may work if you just use the default character set. 如果您只使用默认字符集,这可能会有效。

String(byte[]): http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html FileInputStream for reading bytes: http://download.oracle.com/javase/tutorial/essential/io/bytestreams.html String(byte []): http//download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html用于读取字节的FileInputStream: http//download.oracle.com/javase /tutorial/essential/io/bytestreams.html

Use hasNextLine() instead of hasNext() in the while loop check. 在while循环检查中使用hasNextLine()而不是hasNext()

while (readF.hasNextLine()) {
 String line = readF.nextLine();
 //Your code
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM