简体   繁体   English

如何阅读直到Java中特定字符为止的字符?

[英]How can I read characters until a specific one in Java?

I want to read a few words from a file. 我想从文件中读取几句话。 I didn't found any method to do this, so I decided to read char by char , but I need to stop at the spaces to store the read word in my array and go to the next one. 我没有找到执行此操作的任何方法,所以我决定逐个读取char ,但是我需要停在将读取的单词存储在数组中的空格处,然后转到下一个。

I'm making an external sorting aplication, that's why I have a memory limitation, and, in that case, I can't just use readLine() and then split() , I need to have a control of what I read. 我正在进行外部排序应用程序,这就是为什么我有内存限制的原因,在那种情况下,我不能只使用readLine()然后再split() ,我需要控制自己的读物。

The read() method returns an int and I have no idea of what can I do to read() method return a char and stop reading after a space. read()方法返回一个int ,我不知道我该怎么办read()方法返回一个char并在空格后停止读取。

This is my code this far: 到目前为止,这是我的代码:

protected static String [] readWords(String arqName, int amountOfWords) throws IOException {
    FileReader arq = new FileReader(arqName);
    BufferedReader lerArq = new BufferedReader(arq);

    String[] words = new String[amountOfWords];

    for (int i = 0; i < amountOfWords; i++){
        //words[i] = lerArq.read();
    }

    return words;
}

Edit 1: I used a Scanner and the next() method, it worked. 编辑1:我使用了Scannernext()方法,它起作用了。 Scanner's initialization is at Main. 扫描仪的初始化位于Main。

static String [] readWords(int amountOfWords, Scanner leitor) throws IOException {
    String[] words= new String[amountOfWords];

    for (int i = 0; i < amountOfWords; i++){
        words[i] = leitor.next();
    }

    return words;
}

Maybe this will be helpful. 也许这会有所帮助。

It's not a problem to use read() . 使用read()没问题。 Just cast the result to a character: 只需将结果转换为字符即可:

...
for (int i = 0; i < memTam; i++) {
      // this should work. you will get the actual character
      int current = lerArq.read();
      if (current != -1) {
          char c = (char) current;
          // then you can do what you need with this character
      }
}
...

The method returns character read, as an integer in the range 0 to 65535 or -1 if the end of the stream has been reached. 该方法返回读取的字符,为0到65535之间的整数,如果已到达流的末尾,则返回-1。

I won't add a lot of theory about encodings, how it's done in Java, etc. because I am not aware of some very low-level details. 我不会添加很多有关编码,如何在Java中完成编码的理论,因为我不了解一些非常底层的细节。 I have a basic high-level understanding of how it works. 我对它的工作原理有基本的了解。

Every single key on your keyboard has a number associated with it. 键盘上的每个键都有一个与之关联的数字。 Every single character that you type can be translated into a decimal number. 您键入的每个字符都可以转换为十进制数字。 For example, A becomes the number 65 . 例如, A变为数字65 This is a standard and it is globally recognized. 这是一个标准,已得到全球认可。

At this point, I hope you can agree it's not that weird that read() method returns a number and not the actual character :) 在这一点上,我希望你可以同意, read()方法返回一个数字而不是实际的字符不是很奇怪:)

There is something called the ASCII table which represents all those codes(numbers) for all the keys on your keyboard. 有一个叫做ASCII表的东西,它代表键盘上所有键的所有那些代码(数字)。

Here it is just to show how ot looks: 这只是显示ot的外观:

Dec  Char                           Dec  Char     Dec  Char     Dec  Char
---------                           ---------     ---------     ----------
  0  NUL (null)                      32  SPACE     64  @         96  `
  1  SOH (start of heading)          33  !         65  A         97  a
  2  STX (start of text)             34  "         66  B         98  b
  3  ETX (end of text)               35  #         67  C         99  c
  4  EOT (end of transmission)       36  $         68  D        100  d
  5  ENQ (enquiry)                   37  %         69  E        101  e
  6  ACK (acknowledge)               38  &         70  F        102  f
  7  BEL (bell)                      39  '         71  G        103  g
  8  BS  (backspace)                 40  (         72  H        104  h
  9  TAB (horizontal tab)            41  )         73  I        105  i
 10  LF  (NL line feed, new line)    42  *         74  J        106  j
 11  VT  (vertical tab)              43  +         75  K        107  k
 12  FF  (NP form feed, new page)    44  ,         76  L        108  l
 13  CR  (carriage return)           45  -         77  M        109  m
 14  SO  (shift out)                 46  .         78  N        110  n
 15  SI  (shift in)                  47  /         79  O        111  o
 16  DLE (data link escape)          48  0         80  P        112  p
 17  DC1 (device control 1)          49  1         81  Q        113  q
 18  DC2 (device control 2)          50  2         82  R        114  r
 19  DC3 (device control 3)          51  3         83  S        115  s
 20  DC4 (device control 4)          52  4         84  T        116  t
 21  NAK (negative acknowledge)      53  5         85  U        117  u
 22  SYN (synchronous idle)          54  6         86  V        118  v
 23  ETB (end of trans. block)       55  7         87  W        119  w
 24  CAN (cancel)                    56  8         88  X        120  x
 25  EM  (end of medium)             57  9         89  Y        121  y
 26  SUB (substitute)                58  :         90  Z        122  z
 27  ESC (escape)                    59  ;         91  [        123  {
 28  FS  (file separator)            60  <         92  \        124  |
 29  GS  (group separator)           61  =         93  ]        125  }
 30  RS  (record separator)          62  >         94  ^        126  ~
 31  US  (unit separator)            63  ?         95  _        127  DEL

So, imagine you have a .txt file with some text - all the letters have corresponding numbers. 因此,假设您有一个带有一些文本的.txt文件-所有字母都有相应的数字。

The problem with ASCII is that ASCII defines 128 characters, which map to the numbers 0–127 (all of the upper-case letters, lower-case letters, 0-9 digits and a few more symbols). ASCII的问题在于ASCII定义了128个字符,这些字符映射到数字0–127(所有大写字母,小写字母,0-9数字和更多的符号)。

But there are many more different characters/symbols in the world (different alphabets, emoji, etc.), so there has to be another encoding system to represent them all. 但是世界上还有更多不同的字符/符号(不同的字母,表情符号等),因此必须有另一种编码系统来表示它们。

It is called Unicode. 它称为Unicode。 Unicode is exactly the same thing for characters whose codes are 0-127. 对于代码为0-127的字符,Unicode完全相同。 But in general, Unicode can represent a much much wider range of symbols. 但是总的来说,Unicode可以代表更广泛的符号。

In Java, the char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. 在Java中, char数据类型(以及因此Character对象封装的值)基于原始Unicode规范,该规范将字符定义为固定宽度的16位实体。 You can check more details in this javadoc . 您可以在此javadoc中查看更多详细信息。 In other words, all Strings in Java are represented in UTF-16. 换句话说,Java中的所有字符串都以UTF-16表示。

Hope, after this long story, it makes some sense why you get numbers when read, but you can cast them to type char . 希望在这段漫长的故事之后,在某种意义上为什么您在阅读时会得到数字是有道理的,但是您可以将其转换为char类型。 And again, it's just a kind of high-level overview. 同样,这只是一种高级概述。 Happy Coding :) 快乐编码:)

If you want to read it char by char (so you have more control over what you want to store and what you don't), you could try something like this: 如果您想逐个字符地读取它(这样您就可以更好地控制要存储的内容和不需要的内容),可以尝试如下操作:

import java.io.BufferedReader;
import java.io.IOException;

[...]

public static String readNextWord(BufferedReader reader) throws IOException {
    StringBuilder builder = new StringBuilder();

    int currentData;

    do {
        currentData = reader.read();

        if(currentData < 0) {
            if(builder.length() == 0) {
                return null;
            }
            else {
                return builder.toString();
            }
        }
        else if(currentData != ' ') {
            /* Since you're talking about words, here you can apply
             * a filter to ignore chars like ',', '.', '\n', etc. */

            builder.append((char) currentData);
        }

    } while (currentData != ' ' || builder.length() == 0);

    return builder.toString();
}

And then call it like this: 然后这样称呼它:

String[] words = new String[amountOfWordsToRead];

for (int i = 0; i < amountOfWordsToRead; i++){
    words [i] = readNextWord(yourBufferedReader);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM