简体   繁体   English

清理Java中的文件名

[英]Cleaning a file name in Java

I want to write a script that will clean my .mp3 files. 我想编写一个脚本来清理我的.mp3文件。 I was able to write a few line that change the name but I want to write an automatic script that will erase all the undesired characters $%_!?7 and etc. while changing the name in the next format Artist space dash Song . 我能够写几行来更改名称,但是我想编写一个自动脚本,该脚本将删除所有不需要的字符$%_!?7等,同时以下一种格式Artist space dash Song更改名称。

    File file = new File("C://Users//nikita//Desktop//$%#Artis8t_-_35&Son5g.mp3");
    String Original = file.toString();
    String New = "Code to change 'Original' to 'Artist - Song'";
    File file2 = new File("C://Users//nikita//Desktop//" + New + ".mp3");
    file.renameTo(file2);

I feel like I should make a list with all possible characters and then run the String through this list and erase all of the listed characters but I am not sure how to do it. 我觉得我应该列出所有可能的字符,然后在此列表中运行String并擦除所有列出的字符,但是我不确定该怎么做。

String test = "$%$#Arti56st_-_54^So65ng.mp3";

Edit 1: 编辑1:

When I try using the method remove , it still doesn't change the name. 当我尝试使用remove方法时,它仍然不会更改名称。

String test = "$%$#Arti56st_-_54^So65ng.mp3";
System.out.println("Original: " + test);
test.replace( "[0-9]%#&\\$", "");
System.out.println("New:      " + test);

The code above returns the following output 上面的代码返回以下输出

Original: $%$#Arti56st_-_54^So65ng.mp3
New:      $%$#Arti56st_-_54^So65ng.mp3

I'd suggest something like this: 我建议这样的事情:

public static String santizeFilename(String original){
    Pattern p = Pattern.compile("(.*)-(.*)\\.mp3");
    Matcher m = p.matcher(original);

    if (m.matches()){
        String artist = m.group(1).replaceAll("[^a-zA-Z ]", "");
        String song = m.group(2).replaceAll("[^a-zA-Z ]", "");

        return String.format("%s - %s", artist, song);
    }
    else {
        throw new IllegalArgumentException("Failed to match filename : "+original);
    }

}

(Edit - changed whitelist regex to exclude digits and underscores) (编辑-更改了白名单正则表达式以排除数字和下划线)

Two points in particular - when sanitizing strings, it's a good idea to whitelist permitted characters, rather than blacklisting the ones you want to exclude, so you won't be surprised by edge cases later. 特别要注意两点-在清理字符串时,最好将允许使用的字符列入白名单,而不是将要排除的字符列入黑名单,因此以后不会对边缘情况感到惊讶。 (You may want a less restrictive whitelist than I've used here, but it's easy to vary) It's also a good idea to handle the case that the filename doesn't match the expected pattern. (您可能希望使用比我在这里使用的限制更少的白名单,但是很容易改变)处理文件名与预期模式不匹配的情况也是一个好主意。 If your code comes across something other than an MP3, how would you like it to respond? 如果您的代码遇到了MP3以外的东西,您希望它如何响应? Here I've through an exception, so the calling code can catch and handle that appropriately. 在这里,我经历了一个异常,因此调用代码可以捕获并适当地处理它。

String new = original.replace( "[0-9]%#&\\$", "")

this should replace almost all the characters you don't want 这应该替换几乎所有您不需要的字符

or you can come up with your own regex 或者您可以提出自己的正则表达式

https://docs.oracle.com/javase/tutorial/essential/regex/ https://docs.oracle.com/javase/tutorial/essential/regex/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM