简体   繁体   中英

Longest Most Common Substring Based on Whole-Word Phrases

I've been doing a lot of research around this topic and can't quite crack this one easily. There are a lot of valuable solutions I've come across online for solving this problem based on characters, but how would you solve this problem based on whole-word phrases to avoid the result returning a phrase that contains a partial word at the start or end of the phrase?

For example, given an Array of Strings, the output would be the most common whole-word phrase that is contained in most (not all) of the Strings within the Array.

This example below is the closest I've found so far but it only works about half of the time and includes partial word results which isn't quite what I'm after. I'm sure someone has solved this one before.

// function to find the stem (longest common  
// substring) from the string  array 
public static String findstem(String arr[]) 
{ 
    // Determine size of the array 
    int n = arr.length; 

    // Take first word from array as reference 
    String s = arr[0]; 
    int len = s.length(); 

    String res = ""; 

    for (int i = 0; i < len; i++) { 
        for (int j = i + 1; j <= len; j++) { 

            // generating all possible substrings 
            // of our reference string arr[0] i.e s 
            String stem = s.substring(i, j); 
            int k = 1; 
            for (k = 1; k < n; k++)  

                // Check if the generated stem is 
                // common to all words 
                if (!arr[k].contains(stem)) 
                    break; 

            // If current substring is present in 
            // all strings and its length is greater   
            // than current result 
            if (k == n && res.length() < stem.length()) 
                res = stem; 
        } 
    } 

    return res; 
} 

// Driver Code 
public static void main(String args[]) 
{ 
    String arr[] = { "grace", "graceful", "disgraceful",  
                                        "gracefully" }; 
    String stems = findstem(arr); 
    System.out.println(stems); 
} 

Does this do what you intended. It simply checks to see if any word is a substring of itself and others.

If you want to check for real word substrings you would need to reference some dictionary which would be very time consuming.

         String arr[] = { "grace", "graceful", "disgraceful",  
                                                "gracefully" }; 

        String save = "";
        int count = 0;
        for (int i = 0; i < arr.length && count != arr.length; i++) {
            count = 0;
            for (int k = 0; k < arr.length; k++) {
                if (arr[k].contains(arr[i])) {
                    count++;
                    save = arr[i];
                }
            }
        }

        System.out.println(save);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM