Content-disposition header contains filename which can be easily extracted, but sometimes it contains double quotes, sometimes no quotes and there are probably some other variants too. Can someone write a regex which works in all the cases.
Content-Disposition: attachment; filename=content.txt
Here are some of the possible target strings:
attachment; filename=content.txt
attachment; filename*=UTF-8''filename.txt
attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates
attachment; filename="omáèka.jpg"
and some other combinations might also be there
You could try something in this spirit:
filename[^;=\n]*=((['"]).*?\2|[^;\n]*)
filename # match filename, followed by
[^;=\n]* # anything but a ;, a = or a newline
=
( # first capturing group
(['"]) # either single or double quote, put it in capturing group 2
.*? # anything up until the first...
\2 # matching quote (single if we found single, double if we find double)
| # OR
[^;\n]* # anything but a ; or a newline
)
Your filename is in the first capturing group: http://regex101.com/r/hJ7tS6
Slightly modified to match my use case (strips all quotes and UTF tags)
filename\\*?=['"]?(?:UTF-\\d['"]*)?([^;\\r\\n"']*)['"]?;?
/filename[^;=\n]*=(?:(\\?['"])(.*?)\1|(?:[^\s]+'.*?')?([^;\n]*))/i
https://regex101.com/r/hJ7tS6/51
Edit : You can also use this parser: https://github.com/Rob--W/open-in-browser/blob/master/extension/content-disposition.js
Disclaimer: the following answer only works with PCRE (eg Python / PHP), if you have to use javascript, use Robin's answer.
This modified version of Robin's regex strips the quotes:
filename[^;\n=]*=(['\"])*(.*)(?(1)\1|)
filename # match filename, followed by
[^;=\n]* # anything but a ;, a = or a newline
=
(['"])* # either single or double quote, put it in capturing group 1
(?:utf-8\'\')? # removes the utf-8 part from the match
(.*) # second capturing group, will contain the filename
(?(1)\1|) # if clause: if first capturing group is not empty,
# match it again (the quotes), else match nothing
https://regex101.com/r/hJ7tS6/28
The filename is in the second capturing group.
Here is my regular expression. It works on Javascript.
filename\*?=((['"])[\s\S]*?\2|[^;\n]*)
I used this in my project.
filename[^;\n]*=(UTF-\d['"]*)?((['"]).*?[.]$\2|[^;\n]*)?
I have upgraded Robin's solution to do two more things:
This is an ECMAScript solution.
I made a regex that finds these names using a group filename
/(?<=filename(?:=|\*=(?:[\w\-]+'')))["']?(?<filename>[^"';\n]+)["']?/g
const regex = /(?<=filename(?:=|\\*=(?:[\\w\\-]+'')))["']?(?<filename>[^"';\\n]+)["']?/g const filenames = ` attachment; filename=content.txt attachment; filename*=UTF-8''filename.txt attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates attachment; filename="omáèka.jpg" ` function logMatches(){ const array = new Array filenames.split("\\n").forEach(line => { if(!line.trim()) return const matches = line.matchAll(regex) const groups = Array.from(matches).map(match => match?.groups?.filename) array.push(groups.length === 1 ? groups[0] : groups) }) console.log(array) } logMatches()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.