简体   繁体   English

读取txt文件,从其他词的Hashmap中搜索一些词替换掉,并保留标点和大写

[英]Read txt File, search for some words and replace them from Hashmap of other words, and preserve the puctuation and uppercase

I have a Hashmap of overused words as a key and their replacement as a value.我有一个 Hashmap 的过度使用的单词作为键,它们的替换作为值。 this some values from the maps.这是地图中的一些值。

[ amazing:astonishing interesting:intriguing literally:frankly nice:pleasant hard:taxing change:transform... ] [惊人的:令人惊讶的有趣的:有趣的字面意思:坦率地说很好:愉快的努力:征税变化:转变......]

I have to implement a class that searches for overused words in a given text file and replaces them with better choices.我必须实现一个 class 来搜索给定文本文件中过度使用的单词并用更好的选择替换它们。 OLD text file :旧文本文件

" "Amazing" is really the best way I can think of to describe it. Literally, it is hard to express how much I liked it. It was amazingly NICE,..,. Good, not bad. I wouldn't change a bit of it. Please, be nice and help me fix my writing!! b BB bbb Bb B." ““惊人”真的是我能想到的最好的形容方式。从字面上看,很难表达我有多喜欢它。它非常好,......,好,不错。我不会改变一个一点点。拜托,对我好一点,帮我改正我的写作!!b BB bbb Bb B。”

NEW text file :新文本文件

" "Astonishing" is really the best way I can think of to describe it. Frankly, it is taxing to express how much I liked it. It was amazingly PLEASANT,..,! Superior, not inferior. I wouldn't transform a bit of it. Please, be pleasant and help me fix my writing!! cat BB bbb Bb CAT " ““令人惊讶”真的是我能想到的最好的描述方式。坦率地说,表达我有多喜欢它是很费力的。它非常令人愉快,..,!优越,不逊色。我不会改变一个一点点。拜托,请愉快并帮助我修复我的写作!猫 BB bbb Bb CAT“

  • TextImprover must preserve the punctuation of the input file. TextImprover 必须保留输入文件的标点符号。
  • Assume all words in the input file are either in all lower case, leading upper case, or all caps.假设输入文件中的所有单词都是小写、前导大写或全部大写。

I have implemented the first function that reads a txt file and makes a map of overused words:我已经实现了第一个 function,它读取一个 txt 文件并生成一个 map 的过度使用的单词:

public class TextImprover {

    private HashMap<String, String> wordMap ;

    /**
     * Constructor
     * 
     * @param wordMapFileName   name of the file containing the over-used words and their replacements
     */
    public TextImprover(String wordMapFileName) { 
        this.wordMap = new HashMap<String,String>();
        try {
        BufferedReader br = new BufferedReader(new FileReader(wordMapFileName));
        String line ;
        while((line = br.readLine())!= null) {
            String[] wordLine = line.split("\t");
            //System.out.println(wordLine[1]);
            String overUsedWord = wordLine[0].trim();
            String replaceWord = wordLine[1].trim();
            
            wordMap.put(overUsedWord, replaceWord);
        }
        br.close();
            
        }catch(FileNotFoundException e){
            System.out.println("File: "+ wordMapFileName + " not found");   
        }catch (IOException e1) {
            System.out.println(e1.getMessage());
        }
    }

I need this Function:我需要这个 Function:

/**
     * Replaces all of the over-used words in the given file with better words, based on the word map
     * used to create this TextImprover
     * 
     * @param fileName  name of the file containing the text to be improved
     */
    public void improveText(String fileName) {
        try {
            BufferedReader br = new BufferedReader(new FileReader(fileName));
            String line ;
            while((line = br.readLine())!= null) {
                String[] lineWords = line.split(" ");
                // The code I'm strugling with 
            }
            br.close();
                
            }catch(FileNotFoundException e){
                System.out.println("File: "+ fileName + " not found");  
            }catch (IOException e1) {
                System.out.println(e1.getMessage());
            }

    }

Thank you for your help.谢谢您的帮助。

Instead of the split method, that also uses a regular expression for splitting, I would use the regular expression ( [a-zA-Z]+ ) in the "usual" way to find the next word in your input.而不是split方法,它也使用正则表达式进行拆分,我会以“通常”的方式使用正则表达式 ( [a-zA-Z]+ ) 来查找输入中的下一个单词。 (The "usual" way is with a Pattern and a Matcher .) (“通常”的方式是使用PatternMatcher 。)

Then you would use the Matcher.replaceAll(Function<MatchResult,String> replacer) method where you get each match into the function and there you can fetch the replacement from the map and decide if you want to convert it to all upper case or title case (only the first character upper case).然后您将使用Matcher.replaceAll(Function<MatchResult,String> replacer)方法将每个匹配项放入 function 中,然后您可以从 map 中获取替换项并决定是否要将其转换为全部大写或标题大小写(仅第一个字符大写)。

The equivalent of the code you posted (so with out the actual inner replacement stuff, but made easier there) would look like this:您发布的代码的等价物(因此没有实际的内部替换内容,但在那里更容易)看起来像这样:

Pattern pattern = Pattern.compile("[a-zA-Z]+"); // best outside the while loop!

// From here replaces your String[] lineWords = line.split(" "); inside the loop
Matcher matcher = pattern.matcher(line);
String result = matcher.replaceAll(match -> {

    String word = match.group();
    // TODO: find out if word is "ALL CAPS" or "Title Case"
    // TODO: get replacement from map - don't forget to convert the input to the map toLowerCase()
    String replacement = ...;
    return replacement
});

// here your result contains the whole line with all replacements.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM