I am trying to process the names of the files my users upload. I want to support all valid UTF-8 characters except those that might pose a problem for display on an HTML webpage, access over a CLI interface, or storage and retrieval on a filesystem.
Anyway, I came up with the following lenient function and I'm wondering if it's safe enough to be used. I use prepared statements for all database queries and I always html encode my output, but I still like to know that this is also a well thought through approach.
// $filename = $_FILES['file']['name'];
$filename = 'Filename 123;".\'"."la\l[a]*(/.jpg
∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β),
ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B),
2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm
sfajs,-=[];\',./09μετράει
าวนั้นเป็นชน
Καλημέρα κόσμε, コンニチハ
()_+{}|":?><';
// Replace symbols, punctuation, and ASCII control characters like \n or [BEL]
$filename = preg_replace('~[\p{S}\p{P}\p{C}]+~u', ' ', $filename);
Is this approach safe for me, and suitable for my users?
To clarify, I do not use the filename for the name of the file on the filesystem. I generate a unique hash and use that - I just need to save the original name for the users befit since that is how they recognize their files. A SHA1 hash or UUID doesn't mean a thing to them.
The very first thing you need to do is to check your input is UTF-8.
mb_internal_encoding and mb_check_encoding are your friends.
You are using a blacklist, when it's good security practice to use a whitelist of allowed input.
Edit after the clarification :
You should be safe. Remember to filter Lm
and No
as well if you don't want to summon Zalgo .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.