简体   繁体   中英

Convert url to a safe filename

I have a list of urls that I am saving the html of, and I want the filename to be the url.

Is there any built in method in guava that can ensure the filename is safe to save?

It's not very clear what you mean by "safe to save." You could use CharMatcher.matchesAllOf to ensure the URL only contains specific safe characters, or in Guava 14, which will come out in a few weeks, you could use BaseEncoding.base64Url() to base64-encode the URL to a definitely safe string.

To answer your literal question, no, there is no such built-in method. In fact, the closest one I could find , com.google.common.io.Files.simplifyPath() , specifically comes with the warning that it might "not always match the behavior of the filesystem."

The CharMatcher idea that Louis came up with is a good one. For more on that, see Guava's wiki and its Javadoc . It should be relatively simple to build your own matcher based on your specific file naming rules.

Here's an example assuming you're using Windows/NTFS. On NTFS :

File and directory names can be up to 255 characters long, including any extensions. Names preserve case, but are not case sensitive. NTFS makes no distinction of filenames based on case. Names can contain any characters except for the following:

 ? " / \\ < > * | : 

On Windows, Microsoft recommends skipping all that, as well as those characters with values from 0-31, inclusive. So you might end up with something like this:

public boolean isSafeFilename(String url) {
    CharMatcher ntfsMatcher = CharMatcher.noneOf("?\"/\\<>*|:");
    char zero = 0;
    char thirty-one = 31;
    CharMatcher windowsMatcher = CharMatcher.inRange(zero, thirty-one);
    CharMatcher ntfsWindows = ntfsMatcher.and(windowsMatcher);

    return ntfsWindows.matchesAllOf(url);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM