简体   繁体   English

正则表达式删除文件的扩展名

[英]Regular expression to remove a file's extension

I am in need of a regular expression that can remove the extension of a filename, returning only the name of the file. 我需要一个正则表达式,可以删除文件名的扩展名,只返回文件的名称。

Here are some examples of inputs and outputs: 以下是输入和输出的一些示例:

myfile.png     -> myfile
myfile.png.jpg -> myfile.png

I can obviously do this manually (ie removing everything from the last dot) but I'm sure that there is a regular expression that can do this by itself. 我显然可以手动执行此操作(即从最后一个点删除所有内容),但我确信有一个正则表达式可以自己执行此操作。

Just for the record, I am doing this in JavaScript 只是为了记录,我在JavaScript中这样做

Just for completeness: How could this be achieved without Regular Expressions? 只是为了完整性:如果没有正则表达式,如何才能实现这一目标?

var input = 'myfile.png';
var output = input.substr(0, input.lastIndexOf('.')) || input;

The || input || input || input takes care of the case, where lastIndexOf() provides a -1 . || input处理大小写,其中lastIndexOf()提供-1 You see, it's still a one-liner. 你看,它仍然是一个单行。

/(.*)\.[^.]+$/

Result will be in that first capture group. 结果将在第一个捕获组中。 However, it's probably more efficient to just find the position of the rightmost period and then take everything before it, without using regex. 但是,在不使用正则表达式的情况下,找到最右边的位置然后将所有内容都放在它之前可能更有效。

/^(.+)(\.[^ .]+)?$/

Test cases where this works and others fail: 测试用例和其他失败的测试用例:

  • ".htaccess" (leading period) “.htaccess”(领先期)
  • "file" (no file extension) “文件”(没有文件扩展名)
  • "send to mrs." “送给夫人。” (no extension, but ends in abbr.) (没有扩展名,但以abbr结尾)
  • "version 1.2 of project" (no extension, yet still contains a period) “项目1.2版”(没有扩展名,但仍包含一段时间)

The common thread above is, of course, "malformed" file extensions. 当然,上面的共同主题是“格式错误”的文件扩展名。 But you always have to think about those corner cases. 但你总是要考虑那些极端情况。 :P :P

Test cases where this fails: 测试失败的测试用例:

  • "version 1.2" (no file extension, but "appears" to have one) “版本1.2”(没有文件扩展名,但“看起来”有一个)
  • "name.tar.gz" (if you view this as a "compound extension" and wanted it split into "name" and ".tar.gz") “name.tar.gz”(如果您将其视为“复合扩展”并希望将其拆分为“name”和“.tar.gz”)

How to handle these is problematic and best decided on a project-specific basis. 如何处理这些是有问题的,最好在特定项目的基础上决定。

The regular expression to match the pattern is: 与模式匹配的正则表达式是:

/\.[^.]*$/

It finds a period character ( \\. ), followed by 0 or more characters that are not periods ( [^.]* ), followed by the end of the string ( $ ). 它找到一个句点字符( \\。 ),后跟0个或更多不是句点( [^。] * )的字符,后跟字符串的结尾( $ )。

 console.log( "aaa.bbb.ccc".replace(/\\.[^.]*$/,'') ) 

/^(.+)(\.[^ .]+)?$/

Above pattern is wrong - it will always include the extension too. 上面的模式是错误的 - 它也将始终包括扩展。 It's because of how the javascript regex engine works. 这是因为javascript正则表达式引擎的工作原理。 The (\\.[^ .]+) token is optional so the engine will successfully match the entire string with (.+) http://cl.ly/image/3G1I3h3M2Q0M (\\.[^ .]+)标记是可选的,因此引擎将成功匹配整个字符串(.+) http://cl.ly/image/3G1I3h3M2Q0M


Here's my tested regexp solution. 这是我测试的正则表达式解决方案。

The pattern will match filenameNoExt with/without extension in the path, respecting both slash and backslash separators 该模式将在路径中匹配带有/不带扩展名的filenameNoExt,同时遵循斜杠和反斜杠分隔符

var path = "c:\some.path/subfolder/file.ext"
var m = path.match(/([^:\\/]*?)(?:\.([^ :\\/.]*))?$/)
var fileName = (m === null)? "" : m[0]
var fileExt  = (m === null)? "" : m[1]

dissection of the above pattern: 解剖上述模式:

([^:\\/]*?)  // match any character, except slashes and colon, 0-or-more times,
             // make the token non-greedy so that the regex engine
             // will try to match the next token (the file extension)
             // capture the file name token to subpattern \1

(?:\.        // match the '.' but don't capture it
([^ :\\/.]*) // match file extension
             // ensure that the last element of the path is matched by prohibiting slashes
             // capture the file extension token to subpattern \2
)?$          // the whole file extension is optional

http://cl.ly/image/3t3N413g3K09 http://cl.ly/image/3t3N413g3K09

http://www.gethifi.com/tools/regex http://www.gethifi.com/tools/regex

This will cover all cases that was mentioned by @RogerPate but including full paths too 这将涵盖@RogerPate提到的所有案例,但也包括完整路径

another no-regex way of doing it (the "oposite" of @Rahul's version, not using pop() to remove) 另一种非正则表达方式(@ Rahul版本的“oposite”,不使用pop()删除)

It doesn't require to refer to the variable twice, so it's easier to inline 它不需要两次引用变量,因此内联更容易

filename.split('.').slice(0,-1).join()

This will do it as well :) 这样做也是:)

'myfile.png.jpg'.split('.').reverse().slice(1).reverse().join('.');

I'd stick to the regexp though... =P 我坚持使用正则表达式但是... = P.

  return filename.split('.').pop();

it will make your wish come true. 它会让你的愿望成真。 But not regular expression way. 但不是正则表达方式。

In javascript you can call the Replace() method that will replace based on a regular expression. 在javascript中,您可以调用将基于正则表达式替换的Replace()方法。

This regular expression will match everything from the begining of the line to the end and remove anything after the last period including the period. 这个正则表达式将匹配从行的开头到结尾的所有内容,并删除包括句点在内的最后一个句点之后的所有内容。

/^(.*)\..*$/

The how of implementing the replace can be found in this Stackoverflow question. 在Stackoverflow问题中可以找到实现替换的方法。

Javascript regex question Javascript正则表达式问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM