简体   繁体   中英

Java 8 Search ArrayList with Streams algorithm failing

We are using a Stream to search an ArrayList of strings the Dictionary file is sorted & contains 307107 words all in lower case
We are using the findFirst to look for a match from the text in a TextArea
As long as the word is misspelled beyond the 3 character the search has favoriable results
If the misspelled word is like this "Charriage" the results are nothing close to a match
The obvious goal is to get as close to correct without the need to look at an enormous number of words

Here is the text we are tesing
Tak acheive it hommaker and aparent as Chariage NOT ME Charriag add missing vowel to Cjarroage

We have made some major changes to the stream search filters with reasonable improvements
We will edit the posted code to include ONLY the part of the code where the search is failing
And below that the code changes made to the stream filters
Before the code change if the searchString had a misspelled char at position 1 no results were found in the dictionary the new search filters fixed that
We also added more search information by increasing the number of char for endsWith
So what is still failing! If the searchString(misspelled word) is missing a char at the end of the word and if the word has an incorrect char from position 1 to 4 the search fails
We are working on adding & removing char but we are not sure this is a workable solution

Comments or code will be greatly appreciated if you would like the complete project we will post on GitHub Just ask in the comments

The question is still how to fix this search filter when multiple char are missing from the misspelled word?

After multiple hours of searching for a FREE txt Dictionary this is one of the best
A side bar fact it has 115726 words that are > 5 in length and have a vowel at the end of the word. That means it has 252234 words with no vowel at the end
Does that mean we have a 32% chance of fixing the issue by adding a vowel to the end of the searchString? NOT a question just an odd fact!

HERE is a link to the dictionary download and place the words_alpha.txt file on C drive at C:/A_WORDS/words_alpha.txt"); words_alpha.txt

Code Before Changes

}if(found != true){

    lvListView.setStyle("-fx-font-size:18.0;-fx-background-color: white;-fx-font-weight:bold;");
    for(int indexSC = 0; indexSC < simpleArray.length;indexSC++){

    String NewSS = txtMonitor.getText().toLowerCase();

    if(NewSS.contains(" ")||(NewSS.matches("[%&/0-9]"))){
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText(NOT+" Not in Dictionary");
        txaML.appendText(NOT+" Not in Dictionary");
        onCheckSpelling();
        return;
    }

    int a = NewSS.length();
    int Z;
    if(a == 0){// manage CR test with two CR's
        Z = 0;
    }else if(a == 3){
        Z = 3;
    }else if(a > 3 && a < 5){
        Z = 4;
    }else if(a >= 5 && a < 8){
        Z = 4;
    }else{
        Z = 5;
    }

    System.out.println("!!!! NewSS "+NewSS+" a "+a+" ZZ "+Z);

    if(Z == 0){// Manage CR in TextArea
        noClose = true;
        strSF = "AA";
        String NOT = txtMonitor.getText().toLowerCase();
        //txtTo.setText("Word NOT in Dictionary");// DO NO SEARCH
        //txtTest.setText("Word NOT in Dictionaary");
        txtTest.setText("Just a Space");
        onCheckSpelling();   
    }else{
        txtTest.setText("");
        txaML.clear();
        txtTest.setText("Word NOT in Dictionaary");
        txaML.appendText("Word NOT in Dictionaary");
        String strS = searchString.substring(0,Z).toLowerCase();
        strSF = strS; 
    }
    // array & list use in stream to add results to ComboBox
    List<String> cs = Arrays.asList(simpleArray);
    ArrayList<String> list = new ArrayList<>();

    cs.stream().filter(s -> s.startsWith(strSF))
      //.forEach(System.out::println); 
    .forEach(list :: add);   

    for(int X = 0; X < list.size();X++){
    String A = (String) list.get(X);  

Improved New Code

        }if(found != true){

    for(int indexSC = 0; indexSC < simpleArray.length;indexSC++){

    String NewSS = txtMonitor.getText().toLowerCase();
    if(NewSS.contains(" ")||(NewSS.matches("[%&/0-9]"))){
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText(NOT+" Not in Dictionary");

        onCheckSpelling();
        return;
    }
    int a = NewSS.length();
    int Z;
    if(a == 0){// manage CR test with two CR's
        Z = 0;
    }else if(a == 3){
        Z = 3;
    }else if(a > 3 && a < 5){
        Z = 4;
    }else if(a >= 5 && a < 8){
        Z = 4;
    }else{
        Z = 5;
    }

    if(Z == 0){// Manage CR
        noClose = true;
        strSF = "AA";
        String NOT = txtMonitor.getText().toLowerCase();
        txtTest.setText("Just a Space");
        onCheckSpelling();

    }else{
        txtTest.setText("");
        txtTest.setText("Word NOT in Dictionaary");
        String strS = searchString.substring(0,Z).toLowerCase();
        strSF = strS; 
    }
    ArrayList list = new ArrayList<>(); 
    List<String> cs = Arrays.asList(simpleArray);
    // array list & list used in stream foreach filter results added to ComboBox
    // Code below provides variables for refined search
    int W = txtMonitor.getText().length();

    String nF = txtMonitor.getText().substring(0, 1).toLowerCase();

    String nE = txtMonitor.getText().substring(W - 2, W);
    if(W > 7){
    nM = txtMonitor.getText().substring(W-5, W);
    System.out.println("%%%%%%%% nE "+nE+" nF "+nF+" nM = "+nM);
    }else{
    nM = txtMonitor.getText().substring(W-1, W);   
    System.out.println("%%%%%%%% nE "+nE+" nF "+nF+" nM = "+nM);
    }

    cs.stream().filter(s -> s.startsWith(strSF)
            || s.startsWith(nF, 0)
            && s.length()<= W+2
            && s.endsWith(nE)
            && s.startsWith(nF)
            && s.contains(nM)) 
    .forEach(list :: add);

    for(int X = 0; X < list.size();X++){
    String A = (String) list.get(X);
    sort(list);

    cboSelect.setStyle("-fx-font-weight:bold;-fx-font-size:18.0;");
    cboSelect.getItems().add(A);
    }// Add search results to cboSelect
    break;

Here is a screen shot of the FXML file the controls are named the same as the names used in our code with the exception of the ComboBox
FXML 布局

I am adding a JavaFX answer. This app uses Levenshtein Distance . You have to click on Check Spelling to start. You can select a word from the list to replace the current word being checked. I notice Levenshtein Distance returns lots of words so you might want to find other ways to reduce the list down even more.

Main

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javafx.application.Application;
import javafx.collections.FXCollections;
import javafx.collections.ObservableList;
import javafx.scene.Scene;
import javafx.scene.control.Button;
import javafx.scene.control.ListView;
import javafx.scene.control.TextArea;
import javafx.scene.control.TextField;
import javafx.scene.layout.VBox;
import javafx.stage.Stage;

public class App extends Application
{

    public static void main(String[] args)
    {
        launch(args);
    }

    TextArea taWords = new TextArea("Tak Carrage thiss on hoemaker answe");
    TextField tfCurrentWordBeingChecked = new TextField();
    //TextField tfMisspelledWord = new TextField();
    ListView<String> lvReplacementWords = new ListView();
    TextField tfReplacementWord = new TextField();

    Button btnCheckSpelling = new Button("Check Spelling");
    Button btnReplaceWord = new Button("Replace Word");

    List<String> wordList = new ArrayList();
    List<String> returnList = new ArrayList();
    HandleLevenshteinDistance handleLevenshteinDistance = new HandleLevenshteinDistance();
    ObservableList<String> listViewData = FXCollections.observableArrayList();

    @Override
    public void start(Stage primaryStage)
    {
        setupListView();
        handleBtnCheckSpelling();
        handleBtnReplaceWord();

        VBox root = new VBox(taWords, tfCurrentWordBeingChecked, lvReplacementWords, tfReplacementWord, btnCheckSpelling, btnReplaceWord);
        root.setSpacing(5);
        Scene scene = new Scene(root);
        primaryStage.setScene(scene);
        primaryStage.show();
    }

    public void handleBtnCheckSpelling()
    {
        btnCheckSpelling.setOnAction(actionEvent -> {
            if (btnCheckSpelling.getText().equals("Check Spelling")) {
                wordList = new ArrayList(Arrays.asList(taWords.getText().split(" ")));
                returnList = new ArrayList(Arrays.asList(taWords.getText().split(" ")));
                loadWord();
                btnCheckSpelling.setText("Check Next Word");
            }
            else if (btnCheckSpelling.getText().equals("Check Next Word")) {
                loadWord();
            }
        });
    }

    public void handleBtnReplaceWord()
    {
        btnReplaceWord.setOnAction(actionEvent -> {
            int indexOfWordToReplace = returnList.indexOf(tfCurrentWordBeingChecked.getText());
            returnList.set(indexOfWordToReplace, tfReplacementWord.getText());
            taWords.setText(String.join(" ", returnList));
            btnCheckSpelling.fire();
        });
    }

    public void setupListView()
    {
        lvReplacementWords.setItems(listViewData);
        lvReplacementWords.getSelectionModel().selectedItemProperty().addListener((obs, oldSelection, newSelection) -> {
            tfReplacementWord.setText(newSelection);
        });
    }

    private void loadWord()
    {
        if (wordList.size() > 0) {
            tfCurrentWordBeingChecked.setText(wordList.get(0));
            wordList.remove(0);
            showPotentialCorrectSpellings();
        }
    }

    private void showPotentialCorrectSpellings()
    {
        List<String> potentialCorrentSpellings = handleLevenshteinDistance.getPotentialCorretSpellings(tfCurrentWordBeingChecked.getText().trim());
        listViewData.setAll(potentialCorrentSpellings);
    }
}

CustomWord Class

/**
 *
 * @author blj0011
 */
public class CustomWord
{

    private int distance;
    private String word;

    public CustomWord(int distance, String word)
    {
        this.distance = distance;
        this.word = word;
    }

    public String getWord()
    {
        return word;
    }

    public void setWord(String word)
    {
        this.word = word;
    }

    public int getDistance()
    {
        return distance;
    }

    public void setDistance(int distance)
    {
        this.distance = distance;
    }

    @Override
    public String toString()
    {
        return "CustomWord{" + "distance=" + distance + ", word=" + word + '}';
    }
}

HandleLevenshteinDistance Class

/**
 *
 * @author blj0011
 */
public class HandleLevenshteinDistance
{

    private List<String> dictionary = new ArrayList<>();

    public HandleLevenshteinDistance()
    {
        try {
            //Load DictionaryFrom file
            //See if the dictionary file exists. If it don't download it from Github.
            File file = new File("alpha.txt");
            if (!file.exists()) {
                FileUtils.copyURLToFile(
                        new URL("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"),
                        new File("alpha.txt"),
                        5000,
                        5000);
            }

            //Load file content to a List of Strings
            dictionary = FileUtils.readLines(file, Charset.forName("UTF8"));
        }
        catch (IOException ex) {
            ex.printStackTrace();
        }

    }

    public List<String> getPotentialCorretSpellings(String misspelledWord)
    {
        LevenshteinDistance levenshteinDistance = new LevenshteinDistance();
        List<CustomWord> customWords = new ArrayList();

        dictionary.stream().forEach((wordInDictionary) -> {
            int distance = levenshteinDistance.apply(misspelledWord, wordInDictionary);
            if (distance <= 2) {
                customWords.add(new CustomWord(distance, wordInDictionary));
            }
        });

        Collections.sort(customWords, (CustomWord o1, CustomWord o2) -> o1.getDistance() - o2.getDistance());

        List<String> returnList = new ArrayList();
        customWords.forEach((item) -> {
            System.out.println(item.getDistance() + " - " + item.getWord());
            returnList.add(item.getWord());
        });

        return returnList;
    }
}

You just needed to go a little further out into the Dictionary
We are sure you were getting a lot of suggested words from the Dictionary?
We tested your code and sometimes it found 3000 or more possible matches WOW
So here is the BIG improvement. It still needs a lot of testing we used this line for our tests with 100% favorable results.

Tske Charriage to hommaker and hommake as hommaer

Our fear is if the speller really butchers the word this improvement might solve that degree of misspelling
We are sure you know that if the first letter is wrong this will not work
Like zenophobe for xenophobe

Here is the BIG improvement tada

     cs.stream().filter(s -> s.startsWith(strSF)
            || s.startsWith(nF, 0)
            && s.length() > 1 && s.length() <= W+3 // <== HERE
            && s.endsWith(nE)
            && s.startsWith(nF)
            && s.contains(nM)) 
    .forEach(list :: add); 

You can send the check to my address 55 48 196 195

This question is a possible duplicate: Search suggestion in strings

I think you should be using something similar to Levenshtein Distance or Jaro Winkler Distance . If you can use Apache's Commons . I would suggest using Apache Commons Lang . It has an implementation of Levenshtein Distance . The example demos this implementation. If you set the distance to (distance <= 2) , you will potentially get more results.

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.nio.charset.Charset;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.StringUtils;

/**
 *
 * @author blj0011
 */
public class Main
{

    public static void main(String[] args)
    {
        try {
            System.out.println("Hello World!");
            File file = new File("alpha.txt");
            if (!file.exists()) {
                FileUtils.copyURLToFile(
                        new URL("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"),
                        new File("alpha.txt"),
                        5000,
                        5000);
            }

            List<String> lines = FileUtils.readLines(file, Charset.forName("UTF8"));
            //lines.forEach(System.out::println);

            lines.stream().forEach(line -> {
                int distance = StringUtils.getLevenshteinDistance(line, "zorilta");
                //System.out.println(line + ": " + distance);
                if (distance <= 1) {
                    System.out.println("Did you mean: " + line);
                }
            });

        }
        catch (IOException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

Output distance <= 1

Building JavaTestingGround 1.0
------------------------------------------------------------------------

--- exec-maven-plugin:1.5.0:exec (default-cli) @ JavaTestingGround ---
Hello World!
Did you mean: zorilla
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 1.329 s
Finished at: 2019-11-01T11:02:48-05:00
Final Memory: 7M/30M

Distance <= 2

Hello World!
Did you mean: corita
Did you mean: gorilla
Did you mean: zoril
Did you mean: zorilla
Did you mean: zorillas
Did you mean: zorille
Did you mean: zorillo
Did you mean: zorils
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 1.501 s
Finished at: 2019-11-01T14:03:33-05:00
Final Memory: 7M/34M

See the possible duplicate for more details about Levenshtein Distance .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM