简体   繁体   English

用于从 Content-Disposition 标头中提取文件名的 javascript 正则表达式

[英]javascript regex for extracting filename from Content-Disposition header

Content-disposition header contains filename which can be easily extracted, but sometimes it contains double quotes, sometimes no quotes and there are probably some other variants too. Content-disposition 标头包含可以轻松提取的文件名,但有时它包含双引号,有时不包含引号,并且可能还有其他一些变体。 Can someone write a regex which works in all the cases.有人可以编写一个适用于所有情况的正则表达式。

Content-Disposition: attachment; filename=content.txt

Here are some of the possible target strings:以下是一些可能的目标字符串:

attachment; filename=content.txt
attachment; filename*=UTF-8''filename.txt
attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates
attachment; filename="omáèka.jpg"
and some other combinations might also be there

You could try something in this spirit: 你可以尝试这种精神:

filename[^;=\n]*=((['"]).*?\2|[^;\n]*)

filename      # match filename, followed by
[^;=\n]*      # anything but a ;, a = or a newline
=
(             # first capturing group
    (['"])    # either single or double quote, put it in capturing group 2
    .*?       # anything up until the first...
    \2        # matching quote (single if we found single, double if we find double)
|             # OR
    [^;\n]*   # anything but a ; or a newline
)

Your filename is in the first capturing group: http://regex101.com/r/hJ7tS6 您的文件名在第一个捕获组中: http//regex101.com/r/hJ7tS6

Slightly modified to match my use case (strips all quotes and UTF tags) 略微修改以匹配我的用例(删除所有引号和UTF标记)

filename\\*?=['"]?(?:UTF-\\d['"]*)?([^;\\r\\n"']*)['"]?;?

https://regex101.com/r/UhCzyI/3 https://regex101.com/r/UhCzyI/3

/filename[^;=\n]*=(?:(\\?['"])(.*?)\1|(?:[^\s]+'.*?')?([^;\n]*))/i

https://regex101.com/r/hJ7tS6/51 https://regex101.com/r/hJ7tS6/51

Edit : You can also use this parser: https://github.com/Rob--W/open-in-browser/blob/master/extension/content-disposition.js 编辑 :您也可以使用此解析器: https//github.com/Rob--W/open-in-browser/blob/master/extension/content-disposition.js

Disclaimer: the following answer only works with PCRE (eg Python / PHP), if you have to use javascript, use Robin's answer. 免责声明:以下答案仅适用于PCRE (例如Python / PHP),如果您必须使用javascript,请使用Robin的答案。


This modified version of Robin's regex strips the quotes: 这个修改后的Robin正则表达式删除了引号:

filename[^;\n=]*=(['\"])*(.*)(?(1)\1|)

filename        # match filename, followed by
[^;=\n]*        # anything but a ;, a = or a newline
=
(['"])*         # either single or double quote, put it in capturing group 1
(?:utf-8\'\')?  # removes the utf-8 part from the match
(.*)            # second capturing group, will contain the filename
(?(1)\1|)       # if clause: if first capturing group is not empty,
                # match it again (the quotes), else match nothing

https://regex101.com/r/hJ7tS6/28 https://regex101.com/r/hJ7tS6/28

The filename is in the second capturing group. 文件名位于第二个捕获组中。

Here is my regular expression. 这是我的正则表达式。 It works on Javascript. 它适用于Javascript。

filename\*?=((['"])[\s\S]*?\2|[^;\n]*)

I used this in my project. 我在我的项目中使用了这个。

filename[^;\n]*=(UTF-\d['"]*)?((['"]).*?[.]$\2|[^;\n]*)?

I have upgraded Robin's solution to do two more things: 我已经升级了Robin的解决方案,还做了两件事:

  1. Capture filename even if it has escaped double quotes. 即使文件已转义双引号,也要捕获文件名。 在此输入图像描述

  2. Capture UTF-8'' part as a separate group. 将UTF-8''部分捕获为一个单独的组。 在此输入图像描述

This is an ECMAScript solution. 这是一个ECMAScript解决方案。

https://regex101.com/r/7Csdp4/3/ https://regex101.com/r/7Csdp4/3/

I made a regex that finds these names using a group filename我制作了一个使用组filename查找这些名称的正则表达式

/(?<=filename(?:=|\*=(?:[\w\-]+'')))["']?(?<filename>[^"';\n]+)["']?/g

 const regex = /(?<=filename(?:=|\\*=(?:[\\w\\-]+'')))["']?(?<filename>[^"';\\n]+)["']?/g const filenames = ` attachment; filename=content.txt attachment; filename*=UTF-8''filename.txt attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates attachment; filename="omáèka.jpg" ` function logMatches(){ const array = new Array filenames.split("\\n").forEach(line => { if(!line.trim()) return const matches = line.matchAll(regex) const groups = Array.from(matches).map(match => match?.groups?.filename) array.push(groups.length === 1 ? groups[0] : groups) }) console.log(array) } logMatches()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从帖子中使用用户脚本更改Content-Disposition文件名 - Change Content-Disposition filename with userscript from a Post 使用Javascript在content-disposition头中设置size参数 - Setting size parameter in content-disposition header with Javascript 使用Javascript进行“内容处理”行为 - “Content-disposition”-like behavior with Javascript XMLHttpRequest 是否阻止了 Content-Disposition 附件? - Is Content-Disposition attachment blocked from XMLHttpRequest? 将自定义标题添加到 Content-disposition 旁边的 FormData/multipart - Add Custom Header to the FormData/multipart next to Content-disposition Java / Javascript读取内容-处置文件内容 - Java/Javascript read content-disposition file content 从 Rest API 响应内容处置输出 [Object, Object] 下载 javascript 中的 excel 文件 - Download excel file in javascript from Rest API response content-disposition outputs [Object, Object] 如何通过javascript设置content-disposition = attachment? - how to set content-disposition = attachment via javascript? 文件下载过程不会启动客户端(即使使用Content-Disposition:附件; filename = <filename> ) - File download process does not start client side (even with Content-Disposition: attachment ; filename=<filename>) 内容处置始终为 null - Content-Disposition is always null
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM