简体   繁体   中英

What is the fastest way to find the last part of a url path?

I have a url such as:

"http:/www.someco.com/news/2016-01-03/waterloo-station"

The url never contains a query string.

What is the cleanest way to extract the String "waterloo-station" ?

Of course I can use the following code:

url.substring(url.lastIndexOf('/') + 1))

but I am not completely happy with it because it has to execute the search for the last index and then get the substring. I am wondering if there is a better way, (using regular expression?) to get the same result in a single step.

Of course the solution should be significantly faster when executed billions of times.

I do not think that it can be improved. The short answer is that because the search for the last index is a simple operation, it can be implemented with a fast algorithm (directly in the String class!) and it would be difficult for a regular expression to be as fast as this. The second access to the String, as you can see, couldn't cost less: it is just the initialisation of the new String.

It could have been faster if there was a dedicated method implemented directly in the String class.

If you want more details, you can see by yourself the code in the JDK. Copied here for your convenience.

The following code is the implementation of the method lastIndexOf() in my JDK:

public int lastIndexOf(int ch, int fromIndex) {
    int min = offset;
    char v[] = value;

    int i = offset + ((fromIndex >= count) ? count - 1 : fromIndex);

    if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
        // handle most cases here (ch is a BMP code point or a
        // negative value (invalid code point))
        for (; i >= min ; i--) {
            if (v[i] == ch) {
                return i - offset;
            }
        }
        return -1;
    }

    int max = offset + count;
    if (ch <= Character.MAX_CODE_POINT) {
        // handle supplementary characters here
        char[] surrogates = Character.toChars(ch);
        for (; i >= min; i--) {
            if (v[i] == surrogates[0]) {
                if (i + 1 == max) {
                    break;
                }
                if (v[i+1] == surrogates[1]) {
                    return i - offset;
                }
            }
        }
    }
    return -1;
}

Being implemented directly in the String class, it has access to its private members:

/** The value is used for character storage. */
private final char value[];

/** The offset is the first index of the storage that is used. */
private final int offset;

/** The count is the number of characters in the String. */
private final int count;

It is not working on substrings. In the same time, the substring method is very fast in Java because it does not create a new array of char, but it simply creates a new String object changing the offset and the count:

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
        throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
}

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}

String.valueOf(Paths.get(file).getFileName())

Not sure if it is the fastest way to get the "filename", but it is pretty simple and fast:

var url = "http://www.someco.com/news/2016-01-03/waterloo-station";
var fileName = Path.of(new URI(url).getPath()).getFileName();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM