简体   繁体   English

从文件读入字符串,但是一行上空格后的所有文本都被删除了?

[英]Reading into a string from a file, but any text after space on a line removed?

I have a large text file with phrases such as: 我有一个带有如下短语的大文本文件:

citybred JJ 
Brestowe NNP 
STARS NNP NNS
negative JJ NN
investors NNS NNPS
mountain NN 

My objective is to keep the first word of each line, without the spaces, and also make them lowercase. 我的目标是保留每行的第一个单词,不带空格,并使其小写。 EX: EX:

citybred 
brestowe
stars
negative
investors
mountain

Would be returned if the above text was evaluated. 如果上述文字经过评估,将返回。

Any help? 有什么帮助吗?

Current code: 当前代码:

public class FileLinkList
{
    public static void main(String args[])throws IOException{
        String content = new String();
        File file = new File("abc.txt");
        LinkedList<String> list = new LinkedList<String>();

        try {
            Scanner sc = new Scanner(new FileInputStream(file));
            while (sc.hasNextLine()){
                content = sc.nextLine();
                list.add(content);
            }
            sc.close();
        } catch(FileNotFoundException fnf){
            fnf.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println("\nProgram terminated Safely...");
        }

        Collections.reverse(list);
        Iterator i = list.iterator();
        while (i.hasNext()) {
            System.out.print("Node " + (count++) + " : ");
            System.out.println(i.next());
        }
    }
}

If your token and its POS tag is separated by space : 如果您的令牌及其POS标签用空格隔开:

public class FileLinkList{

    public static void main(String[] args) {

        BufferedReader br = null;
            LinkedList<String> list = new LinkedList<String>();
            String word;
        try {
            String sCurrentLine;
            br = new BufferedReader(new FileReader("LEXICON.txt"));
            while ((sCurrentLine = br.readLine()) != null) {
                System.out.println(sCurrentLine);
                            word = sCurrentLine.trim().split(" ")[0];
                            list.add(word.toLowerCase());
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (br != null)
                                br.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
    }
}

Add the following: 添加以下内容:

content = sc.nextLine();
string[] tokens = content.split(new char[] {' '}, StringSplitOptions.RemovEemptyEntries);
// You can add some validations here...
string word = tokens[0].ToLowerCase();

Try this : 尝试这个 :

public class FileLinkList {
    public static void main(String args[])throws IOException{
        String content = new String();
        int count=1;
        File file = new File("abc.txt");
        LinkedList<String> list = new LinkedList<String>();

        try {
            Scanner sc = new Scanner(new FileInputStream(file));
            while (sc.hasNextLine()){
                content = sc.nextLine();
                if (content != null && content.length() > 0)) {
                    list.add(content.trim().split(" ")[0].toLowerCase());
                }
            }
            sc.close();
        } catch(FileNotFoundException fnf){
            fnf.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println("\nProgram terminated Safely...");
        }

        for (String listItem : list) {
            System.out.println(listItem);
        }
    }
}

With Apache Commons IO it is much simpler to read a file into a list of Strings. 使用Apache Commons IO ,将文件读入字符串列表要简单得多。

import org.apache.commons.io.FileUtils;

List<String> lines = FileUtils.readLines(new File("abc.txt"));
List<String firstWords = new ArrayList<>();
for (String line : lines) {
  String firstWord = line.split(" ")[0].toLowerCase();
  firstWords.add(firstWord);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM