用于从 Content-Disposition 标头中提取文件名的 javascript 正则表达式

Question

Content-disposition 标头包含可以轻松提取的文件名，但有时它包含双引号，有时不包含引号，并且可能还有其他一些变体。 有人可以编写一个适用于所有情况的正则表达式。

Content-Disposition: attachment; filename=content.txt

以下是一些可能的目标字符串：

attachment; filename=content.txt
attachment; filename*=UTF-8''filename.txt
attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates
attachment; filename="omáèka.jpg"
and some other combinations might also be there

Answer 1

你可以尝试这种精神：

filename[^;=\n]*=((['"]).*?\2|[^;\n]*)

filename      # match filename, followed by
[^;=\n]*      # anything but a ;, a = or a newline
=
(             # first capturing group
    (['"])    # either single or double quote, put it in capturing group 2
    .*?       # anything up until the first...
    \2        # matching quote (single if we found single, double if we find double)
|             # OR
    [^;\n]*   # anything but a ; or a newline
)

您的文件名在第一个捕获组中： http ： //regex101.com/r/hJ7tS6

Answer 2

略微修改以匹配我的用例（删除所有引号和UTF标记）

filename\\*?=['"]?(?:UTF-\\d['"]*)?([^;\\r\\n"']*)['"]?;?

https://regex101.com/r/UhCzyI/3

Answer 3

/filename[^;=\n]*=(?:(\\?['"])(.*?)\1|(?:[^\s]+'.*?')?([^;\n]*))/i

https://regex101.com/r/hJ7tS6/51

编辑：您也可以使用此解析器： https ： //github.com/Rob--W/open-in-browser/blob/master/extension/content-disposition.js

Answer 4

免责声明：以下答案仅适用于PCRE （例如Python / PHP），如果您必须使用javascript，请使用Robin的答案。

这个修改后的Robin正则表达式删除了引号：

filename[^;\n=]*=(['\"])*(.*)(?(1)\1|)

filename        # match filename, followed by
[^;=\n]*        # anything but a ;, a = or a newline
=
(['"])*         # either single or double quote, put it in capturing group 1
(?:utf-8\'\')?  # removes the utf-8 part from the match
(.*)            # second capturing group, will contain the filename
(?(1)\1|)       # if clause: if first capturing group is not empty,
                # match it again (the quotes), else match nothing

https://regex101.com/r/hJ7tS6/28

文件名位于第二个捕获组中。

Answer 5

这是我的正则表达式。 它适用于Javascript。

filename\*?=((['"])[\s\S]*?\2|[^;\n]*)

我在我的项目中使用了这个。

Answer 6

filename[^;\n]*=(UTF-\d['"]*)?((['"]).*?[.]$\2|[^;\n]*)?

我已经升级了Robin的解决方案，还做了两件事：

即使文件已转义双引号，也要捕获文件名。
将UTF-8''部分捕获为一个单独的组。

这是一个ECMAScript解决方案。

https://regex101.com/r/7Csdp4/3/

Answer 7

我制作了一个使用组filename查找这些名称的正则表达式

/(?<=filename(?:=|\*=(?:[\w\-]+'')))["']?(?<filename>[^"';\n]+)["']?/g

 const regex = /(?<=filename(?:=|\\*=(?:[\\w\\-]+'')))["']?(?<filename>[^"';\\n]+)["']?/g const filenames = ` attachment; filename=content.txt attachment; filename*=UTF-8''filename.txt attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates attachment; filename="omáèka.jpg" ` function logMatches(){ const array = new Array filenames.split("\\n").forEach(line => { if(!line.trim()) return const matches = line.matchAll(regex) const groups = Array.from(matches).map(match => match?.groups?.filename) array.push(groups.length === 1 ? groups[0] : groups) }) console.log(array) } logMatches()

用于从 Content-Disposition 标头中提取文件名的 javascript 正则表达式

问题描述

7 个解决方案

解决方案1
24 已采纳 2014-04-14 08:00:18

解决方案2
7 2018-10-10 10:28:07

解决方案3
6 2017-10-29 11:06:23

解决方案4
3 2016-09-30 21:37:29

解决方案5
0 2017-10-27 03:25:33

解决方案6
0 2019-08-21 10:05:26

解决方案7
0 2021-12-15 17:09:47

用于从 Content-Disposition 标头中提取文件名的 javascript 正则表达式

问题描述

7 个解决方案

解决方案1 24 已采纳 2014-04-14 08:00:18

解决方案2 7 2018-10-10 10:28:07

解决方案3 6 2017-10-29 11:06:23

解决方案4 3 2016-09-30 21:37:29

解决方案5 0 2017-10-27 03:25:33

解决方案6 0 2019-08-21 10:05:26

解决方案7 0 2021-12-15 17:09:47

解决方案1
24 已采纳 2014-04-14 08:00:18

解决方案2
7 2018-10-10 10:28:07

解决方案3
6 2017-10-29 11:06:23

解决方案4
3 2016-09-30 21:37:29

解决方案5
0 2017-10-27 03:25:33

解决方案6
0 2019-08-21 10:05:26

解决方案7
0 2021-12-15 17:09:47