简体   繁体   English

如何改进此Java代码以在字符串中查找子字符串?

[英]How can this Java code be improved to find sub-string in a string?

I was recently asked to submit a solution to a problem for a job. 我最近被要求提交一份解决问题的解决方案。

Problem : Find a sub-string in a string. 问题 :在字符串中查找子字符串。

Input: "Little star's deep dish pizza sure is fantastic."  
Search: "deep dish pizza"  
Output: "Little star's [[HIGHLIGHT]]deep dish pizza[[ENDHIGHLIGHT]] sure is fantastic."

Note that the highlighter doesn't have to have the exact same result on this example, since you are defining what a good snippet is and return the the most relevant snippet with the query terms highlighted. 请注意,荧光笔在此示例中不必具有完全相同的结果,因为正在定义一个好的代码段并返回最相关的代码段,并突出显示查询字词。

The most important requirement was to write it as I would write a production code . 最重要的要求是编写它,因为我会编写一个生产代码

My solution was not accepted. 我的解决方案未被接受。 How could I have improved it? 我怎么能改进它? I know, I could have used: 我知道,我本可以使用:

  1. Knuth–Morris–Pratt algorithm Knuth-Morris-Pratt算法
  2. Regex (could I?) 正则表达式(我可以吗?)

My QUESTION: 我的问题:

  1. What do tech companies take into consideration when they review a code for a job. 科技公司在审查工作代码时会考虑什么。 I submitted the code the same day, does that help in any way? 我在同一天提交了代码,这有什么帮助吗?

  2. In one of the comments, it pointed out, it looks like a school code than production code. 在其中一条评论中,它指出,它看起来像学校代码而不是生产代码。 How? 怎么样? Any suggestions? 有什么建议?

My solution: 我的解决方案
FindSubString.java FindSubString.java

/**
 * FindSubString.java: Find sub-string in a given query
 * 
 * @author zengr
 * @version 1.0
 */

public class FindSubstring {
    private static final String startHighlight = "[[HIGHLIGHT]]";
    private static final String endHighlight = "[[ENDHIGHLIGHT]]";

    /**
     * Find sub-string in a given query
     * 
     * @param inputQuery: A string data type (input Query)
     * @param highlightDoc: A string data type (pattern to match)
     * @return inputQuery: A String data type.
     */
    public String findSubstringInQuery(String inputQuery, String highlightDoc) {
        try {

            highlightDoc = highlightDoc.trim();

            if (inputQuery.toLowerCase().indexOf(highlightDoc.toLowerCase()) >= 0) {
                // update query if exact doc exists
                inputQuery = updateString(inputQuery, highlightDoc);
            }

            else {
                // If exact doc is not in the query then break it up
                String[] docArray = highlightDoc.split(" ");

                for (int i = 0; i < docArray.length; i++) {
                    if (inputQuery.toLowerCase().indexOf(docArray[i].toLowerCase()) > 0) {
                        inputQuery = updateString(inputQuery, docArray[i]);
                    }
                }
            }
        } catch (NullPointerException ex) {
            // Ideally log this exception
            System.out.println("Null pointer exception caught: " + ex.toString());
        }

        return inputQuery;
    }

    /**
     * Update the query with the highlighted doc
     * 
     * @param inputQuery: A String data type (Query to update)
     * @param highlightDoc: A String data type (pattern around which to update)
     * @return inputQuery: A String data type.
     */
    private String updateString(String inputQuery, String highlightDoc) {
        int startIndex = 0;
        int endIndex = 0;

        String lowerCaseDoc = highlightDoc.toLowerCase();
        String lowerCaseQuery = inputQuery.toLowerCase();

        // get index of the words to highlight
        startIndex = lowerCaseQuery.indexOf(lowerCaseDoc);
        endIndex = lowerCaseDoc.length() + startIndex;

        // Get the highlighted doc
        String resultHighlightDoc = highlightString(highlightDoc);

        // Update the original query
        return inputQuery = inputQuery.substring(0, startIndex - 1) + resultHighlightDoc + inputQuery.substring(endIndex, inputQuery.length());
    }

    /**
     * Highlight the doc
     * 
     * @param inputString: A string data type (value to be highlighted)
     * @return highlightedString: A String data type.
     */
    private String highlightString(String inputString) {
        String highlightedString = null;

        highlightedString = " " + startHighlight + inputString + endHighlight;

        return highlightedString;
    }
}

TestClass.java TestClass.java

/**
 * TestClass.java: jUnit test class to test FindSubString.java
 * 
 * @author zengr
 * @version 1.0
 */

import junit.framework.Test;
import junit.framework.TestCase;
import junit.framework.TestSuite;

public class TestClass extends TestCase
{
    private FindSubstring simpleObj = null;
    private String originalQuery = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";

    public TestClass(String name) {
        super(name);
    }

    public void setUp() { 
        simpleObj = new FindSubstring();
    }

    public static Test suite(){

        TestSuite suite = new TestSuite();
        suite.addTest(new TestClass("findSubstringtNameCorrect1Test"));
        suite.addTest(new TestClass("findSubstringtNameCorrect2Test"));
        suite.addTest(new TestClass("findSubstringtNameCorrect3Test"));
        suite.addTest(new TestClass("findSubstringtNameIncorrect1Test"));
        suite.addTest(new TestClass("findSubstringtNameNullTest"));

        return suite;
    }

    public void findSubstringtNameCorrect1Test() throws Exception
    {
        String expectedOutput = "I like fish. Little star's deep [[HIGHLIGHT]]dish pizza[[ENDHIGHLIGHT]] sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "dish pizza"));
    }

    public void findSubstringtNameCorrect2Test() throws Exception 
    {
        String expectedOutput = "I like fish. Little star's [[HIGHLIGHT]]deep dish pizza[[ENDHIGHLIGHT]] sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "deep dish pizza"));
    }

    public void findSubstringtNameCorrect3Test() throws Exception 
    {
        String expectedOutput = "Hello [[HIGHLIGHT]]how[[ENDHIGHLIGHT]] are [[HIGHLIGHT]]you[[ENDHIGHLIGHT]]r?";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery("Hello how are your?", "how you"));
    }

    public void findSubstringtNameIncorrect1Test() throws Exception 
    {
        String expectedOutput = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "I love Ruby too"));
    }

    public void findSubstringtNameNullTest() throws Exception
    {
        String expectedOutput = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, null));

    }
}

A few comments; 一些评论;

  • You only highlight the first occurance of the search string. 您只需突出显示搜索字符串的第一次出现。
  • You assume that lower case matching is fine. 你假设小写匹配是好的。 Unless this was specified as a requirement it might be better to provide two methods, one that respects case and one that ignores case. 除非将此指定为要求,否则最好提供两种方法,一种是尊重案例,另一种是忽略案例。
  • I would probably check the given parameters and throw a NPE if either of them were null. 我可能会检查给定的参数并抛出一个NPE,如果其中任何一个为null。 This would be the first thing my method did. 这将是我的方法做的第一件事。 I would clearly document this behaviour in the javadoc. 我会在javadoc中清楚地记录这种行为。
  • Your method mame is bad; 你的方法是糟糕的; findSubstringInQuery 's main task isn't to find, it is to highlight and the inQuery part is superflous. findSubstringInQuery的主要任务是找不到,它是突出显示而且inQuery部分是inQuery Just call the method highlight or maybe highlightIgnoreCase if you are going to have a highlight that respects case. 如果您想要一个尊重案例的highlight ,只需调用方法highlight或者highlightIgnoreCase highlight IgnoreCase。
  • Your method parameter names are bad. 您的方法参数名称不正确。 I've looked at your method signature 10 times and still have to look at the method body to remind myself which arg is the search term and which is the text to search. 我已经查看了你的方法签名10次,仍然需要查看方法体,以提醒自己哪个arg是搜索词,哪个是要搜索的文本。 Call them searchTerm and text . 称他们为searchTermtext
  • Production code doesn't use the default package. 生产代码不使用默认包。
  • Production code doesn't use System.out.println() . 生产代码不使用System.out.println()
  • Your javadoc need improving, it needs to tell the user everything they need to know about the code. 你的javadoc需要改进,它需要告诉用户他们需要知道的关于代码的一切
  • I would consider using static methods for a class with no class variables. 我会考虑对没有类变量的类使用静态方法。
  • I would also consider allowing the user to specify their own start and end highlighting markers (I wouldn't use static methods if I did this). 我还会考虑允许用户指定自己的开始和结束突出显示标记(如果我这样做,我不会使用静态方法)。
  • I wouldn't trim() unless this was specified as a requirement. 除非将此指定为要求,否则我不会trim() If I did, then obviously this behaviour would be documented in the javadoc. 如果我这样做,那么显然这种行为将记录在javadoc中。

I wouldn't worry about the algorithm used for searching, Knuth-Morris-Pratt looks good but they shouldn't expect you to know about it and implement it unless the job spec specifically asked for experience/expertise in string searching. 我不担心用于搜索的算法,Knuth-Morris-Pratt看起来不错,但他们不应该指望你知道它并实现它,除非工作规范明确要求字符串搜索的经验/专业知识。

If this code was submitted to me for review, this is what I would think: 如果这个代码提交给我审查,这就是我想的:

  • The code is overly verbose and complex where it does not need to be. 代码过于冗长和复杂,不需要。 The whole thing can be done in a small method in ten lines of code, including all sanity checks. 整个过程可以通过十行代码的小方法完成,包括所有健全性检查。 This method should probably be static. 这种方法应该是静态的。
  • Your code is doing things that (I assume) were not asked for. 你的代码正在做(我假设)没有被要求的事情。 You were being asked to search for a substring in a string. 您被要求在字符串中搜索子字符串。 If it is not found, then that's fine -- no need to split the substring into words and search for each individual word. 如果找不到,那就没问题了 - 不需要将子字符串拆分成单词并搜索每个单词。
  • Unless they asked you to remove leading and trailing whitespace, I would not have called trim() 除非他们要求你删除前导和尾随空格,否则我不会调用trim()
  • I would not have included the calls to toLowerCase() unless explicitly asked for, although I would have added a comment stating that they could be added if needed. 除非明确要求,否则我不会包含对toLowerCase()的调用,尽管我会添加一条注释,说明如果需要可以添加它们。 Anyway even if the search is meant to be case insensitive, there are too many redundant calls to toLowerCase() in your code. 无论如何,即使搜索意图不区分大小写,代码中的toLowerCase()也会有太多冗余调用。
  • You should not need to catch NullPointerException -- instead, you should ensure that this exception is never thrown. 您不应该需要捕获NullPointerException - 相反,您应该确保永远不会抛出此异常。

It sounds like you missed the point of the problem. 听起来你错过了问题的重点。 The original problem statement says: 最初的问题陈述说:

Note that the highlighter doesn't have to have the exact same result on this example, since you are defining what a good snippet is and return the the most relevant snippet with the query terms highlighted. 请注意,荧光笔在此示例中不必具有完全相同的结果,因为您正在定义一个好的代码段,并返回最相关的代码段,并突出显示查询字词。

It sounds like they wanted you to identify a good snippet to return, not to just highlight words in the original input. 听起来他们希望你找到一个好的片段来回归,而不仅仅是突出显示原始输入中的单词。 If the input was long, you'd want to return a smaller snippet of text with the highlighted words. 如果输入很长,您需要返回一小段带有突出显示单词的文本片段。

One possible approach would be: 一种可能的方法是:

  • Find the smallest snippet of text that contains all inputs (or the most possible inputs). 找到包含所有输入(或最可能的输入)的最小文本片段。
  • Expand the snippet to include the enclosing sentences if possible. 如果可能,展开代码段以包含封闭句子。
  • If the result is greater than a max length, say 100, prune the 如果结果大于最大长度,比如100,则修剪
    snippet, possibly by inserting ellipsis somewhere within. 片段,可能是通过在内部插入省略号。

Things like good sentence identification, word stemming, and spelling corrections might also be within scope, especially if you are allowed to use third party libraries. 良好的句子识别,词干和拼写更正等内容也可能在范围内,特别是如果您被允许使用第三方库。

I don't know if I'm missing something, but I'd write a simple block of code using indexOf() and other string methods. 我不知道我是否遗漏了什么,但我会使用indexOf()和其他字符串方法编写一个简单的代码块。 Depending on the defintion of the problem, I'd probably use a StringBuilder.insert() method to inject the highlighting. 根据问题的定义,我可能会使用StringBuilder.insert()方法来注入突出显示。 I'd not be looking do to tokenising and looping as you have done here. 我不会像你在这里那样做标记和循环。 I'd justify it by saying that it's the simplest solution to the problem as specified. 我可以说它是指定问题的最简单的解决方案。 KISS is the best approach to open questions like this. KISS是解决此类问题的最佳方法。

Although they have given you the inputs and outputs, did they specify what was to happen if the inputs changed and there was no match? 虽然他们已经为您提供了输入和输出,但他们是否指定了如果输入发生变化并且没有匹配会发生什么?

I also notice the catching of a null pointer. 我也注意到了一个空指针。 Good idea if you know where it's likely to occur and intend to do something about it. 如果您知道可能发生的地方并打算对此采取行动,那么这是个好主意。 But as you are just logging, perhaps you should have done a more general catch. 但是,由于你只是记录,也许你应该做一个更普遍的捕获。 For example, can your code trigger a IndexOutOfBoundsException. 例如,您的代码是否可以触发IndexOutOfBoundsException。 So I'd be looking to catch Exception or Throwable. 所以我想要抓住Exception或Throwable。

The other thing I'd be asking is a definition of what they consider "Production code". 我要问的另一件事是他们认为“生产代码”的定义。 At first glance it sounds simple enough, but my experience is that it can be interpreted in many different ways. 乍一看这听起来很简单,但我的经验是它可以用许多不同的方式来解释。

The real problem is that they will be expecting certain things, and you don't know them. 真正的问题是他们会期待某些事情,而你却不知道。 So you code what works for you and hope that it matches what they expect. 因此,您编写适合您的代码,并希望它符合他们的期望。

As they seem to emphasize that you can define what a good output is ... perhaps how do you do the parsing is not what they want to know. 因为他们似乎强调你可以定义什么是好的输出......也许你如何解析并不是他们想要知道的。 Maybe they want that you realize that marking the text in the string with a marker is not a very good solution. 也许他们希望你意识到用标记在字符串中标记文本并不是一个很好的解决方案。 If the result is to be used in the program for further processing something like this might be more appropriate. 如果要在程序中使用结果进行进一步处理,这样的事情可能更合适。

class MarkedText {
    String highlightDoc;
    String inputQuery;
    List<Range> ranges;
}

class Range {
    int offset;
    int length;
}

public MarkedText findSubstringInQuery(String inputQuery, String highlightDoc) {
    [...]
}

Do you mean to be matching on partial words in the case where you break up the query? 你打算在分解查询的情况下匹配部分单词吗? You also aren't accounting for the possibility that the search string contains the word "HIGHLIGHT". 您也没有考虑搜索字符串包含单词“HIGHLIGHT”的可能性。

For example, try this: 例如,试试这个:

Input: "Little star's deep dish pizza sure is fantastic." 输入:“小明星的深盘披萨肯定很棒。”
Search: "a HIGHLIGHT" 搜索:“突出显示”
Output: (probably not what you want) 输出:(可能不是你想要的)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM