[英]How can I read characters until a specific one in Java?
I want to read a few words from a file. 我想从文件中读取几句话。 I didn't found any method to do this, so I decided to read char by char , but I need to stop at the spaces to store the read word in my array and go to the next one.
我没有找到执行此操作的任何方法,所以我决定逐个读取char ,但是我需要停在将读取的单词存储在数组中的空格处,然后转到下一个。
I'm making an external sorting aplication, that's why I have a memory limitation, and, in that case, I can't just use readLine()
and then split()
, I need to have a control of what I read. 我正在进行外部排序应用程序,这就是为什么我有内存限制的原因,在那种情况下,我不能只使用
readLine()
然后再split()
,我需要控制自己的读物。
The read()
method returns an int and I have no idea of what can I do to read()
method return a char and stop reading after a space. read()
方法返回一个int ,我不知道我该怎么办read()
方法返回一个char并在空格后停止读取。
This is my code this far: 到目前为止,这是我的代码:
protected static String [] readWords(String arqName, int amountOfWords) throws IOException {
FileReader arq = new FileReader(arqName);
BufferedReader lerArq = new BufferedReader(arq);
String[] words = new String[amountOfWords];
for (int i = 0; i < amountOfWords; i++){
//words[i] = lerArq.read();
}
return words;
}
Edit 1: I used a Scanner and the next()
method, it worked. 编辑1:我使用了Scanner和
next()
方法,它起作用了。 Scanner's initialization is at Main. 扫描仪的初始化位于Main。
static String [] readWords(int amountOfWords, Scanner leitor) throws IOException {
String[] words= new String[amountOfWords];
for (int i = 0; i < amountOfWords; i++){
words[i] = leitor.next();
}
return words;
}
Maybe this will be helpful. 也许这会有所帮助。
It's not a problem to use read()
. 使用
read()
没问题。 Just cast the result to a character: 只需将结果转换为字符即可:
...
for (int i = 0; i < memTam; i++) {
// this should work. you will get the actual character
int current = lerArq.read();
if (current != -1) {
char c = (char) current;
// then you can do what you need with this character
}
}
...
The method returns character read, as an integer in the range 0 to 65535 or -1 if the end of the stream has been reached. 该方法返回读取的字符,为0到65535之间的整数,如果已到达流的末尾,则返回-1。
I won't add a lot of theory about encodings, how it's done in Java, etc. because I am not aware of some very low-level details. 我不会添加很多有关编码,如何在Java中完成编码的理论,因为我不了解一些非常底层的细节。 I have a basic high-level understanding of how it works.
我对它的工作原理有基本的了解。
Every single key on your keyboard has a number associated with it. 键盘上的每个键都有一个与之关联的数字。 Every single character that you type can be translated into a decimal number.
您键入的每个字符都可以转换为十进制数字。 For example,
A
becomes the number 65
. 例如,
A
变为数字65
。 This is a standard and it is globally recognized. 这是一个标准,已得到全球认可。
At this point, I hope you can agree it's not that weird that read()
method returns a number and not the actual character :) 在这一点上,我希望你可以同意,
read()
方法返回一个数字而不是实际的字符不是很奇怪:)
There is something called the ASCII table which represents all those codes(numbers) for all the keys on your keyboard. 有一个叫做ASCII表的东西,它代表键盘上所有键的所有那些代码(数字)。
Here it is just to show how ot looks: 这只是显示ot的外观:
Dec Char Dec Char Dec Char Dec Char
--------- --------- --------- ----------
0 NUL (null) 32 SPACE 64 @ 96 `
1 SOH (start of heading) 33 ! 65 A 97 a
2 STX (start of text) 34 " 66 B 98 b
3 ETX (end of text) 35 # 67 C 99 c
4 EOT (end of transmission) 36 $ 68 D 100 d
5 ENQ (enquiry) 37 % 69 E 101 e
6 ACK (acknowledge) 38 & 70 F 102 f
7 BEL (bell) 39 ' 71 G 103 g
8 BS (backspace) 40 ( 72 H 104 h
9 TAB (horizontal tab) 41 ) 73 I 105 i
10 LF (NL line feed, new line) 42 * 74 J 106 j
11 VT (vertical tab) 43 + 75 K 107 k
12 FF (NP form feed, new page) 44 , 76 L 108 l
13 CR (carriage return) 45 - 77 M 109 m
14 SO (shift out) 46 . 78 N 110 n
15 SI (shift in) 47 / 79 O 111 o
16 DLE (data link escape) 48 0 80 P 112 p
17 DC1 (device control 1) 49 1 81 Q 113 q
18 DC2 (device control 2) 50 2 82 R 114 r
19 DC3 (device control 3) 51 3 83 S 115 s
20 DC4 (device control 4) 52 4 84 T 116 t
21 NAK (negative acknowledge) 53 5 85 U 117 u
22 SYN (synchronous idle) 54 6 86 V 118 v
23 ETB (end of trans. block) 55 7 87 W 119 w
24 CAN (cancel) 56 8 88 X 120 x
25 EM (end of medium) 57 9 89 Y 121 y
26 SUB (substitute) 58 : 90 Z 122 z
27 ESC (escape) 59 ; 91 [ 123 {
28 FS (file separator) 60 < 92 \ 124 |
29 GS (group separator) 61 = 93 ] 125 }
30 RS (record separator) 62 > 94 ^ 126 ~
31 US (unit separator) 63 ? 95 _ 127 DEL
So, imagine you have a .txt
file with some text - all the letters have corresponding numbers. 因此,假设您有一个带有一些文本的
.txt
文件-所有字母都有相应的数字。
The problem with ASCII is that ASCII defines 128 characters, which map to the numbers 0–127 (all of the upper-case letters, lower-case letters, 0-9 digits and a few more symbols). ASCII的问题在于ASCII定义了128个字符,这些字符映射到数字0–127(所有大写字母,小写字母,0-9数字和更多的符号)。
But there are many more different characters/symbols in the world (different alphabets, emoji, etc.), so there has to be another encoding system to represent them all. 但是世界上还有更多不同的字符/符号(不同的字母,表情符号等),因此必须有另一种编码系统来表示它们。
It is called Unicode. 它称为Unicode。 Unicode is exactly the same thing for characters whose codes are 0-127.
对于代码为0-127的字符,Unicode完全相同。 But in general, Unicode can represent a much much wider range of symbols.
但是总的来说,Unicode可以代表更广泛的符号。
In Java, the char
data type (and therefore the value that a Character
object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. 在Java中,
char
数据类型(以及因此Character
对象封装的值)基于原始Unicode规范,该规范将字符定义为固定宽度的16位实体。 You can check more details in this javadoc . 您可以在此javadoc中查看更多详细信息。 In other words, all Strings in Java are represented in UTF-16.
换句话说,Java中的所有字符串都以UTF-16表示。
Hope, after this long story, it makes some sense why you get numbers when read, but you can cast them to type char
. 希望在这段漫长的故事之后,在某种意义上为什么您在阅读时会得到数字是有道理的,但是您可以将其转换为
char
类型。 And again, it's just a kind of high-level overview. 同样,这只是一种高级概述。 Happy Coding :)
快乐编码:)
If you want to read it char by char (so you have more control over what you want to store and what you don't), you could try something like this: 如果您想逐个字符地读取它(这样您就可以更好地控制要存储的内容和不需要的内容),可以尝试如下操作:
import java.io.BufferedReader;
import java.io.IOException;
[...]
public static String readNextWord(BufferedReader reader) throws IOException {
StringBuilder builder = new StringBuilder();
int currentData;
do {
currentData = reader.read();
if(currentData < 0) {
if(builder.length() == 0) {
return null;
}
else {
return builder.toString();
}
}
else if(currentData != ' ') {
/* Since you're talking about words, here you can apply
* a filter to ignore chars like ',', '.', '\n', etc. */
builder.append((char) currentData);
}
} while (currentData != ' ' || builder.length() == 0);
return builder.toString();
}
And then call it like this: 然后这样称呼它:
String[] words = new String[amountOfWordsToRead];
for (int i = 0; i < amountOfWordsToRead; i++){
words [i] = readNextWord(yourBufferedReader);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.