简体   繁体   English

用于UTF-8有效文件名的正则表达式

[英]Regex for UTF-8 valid filenames

I am trying to process the names of the files my users upload. 我正在尝试处理用户上传的文件的名称。 I want to support all valid UTF-8 characters except those that might pose a problem for display on an HTML webpage, access over a CLI interface, or storage and retrieval on a filesystem. 我希望支持所有有效的UTF-8字符,但那些字符可能无法在HTML网页上显示,通过CLI界面访问或在文件系统上存储和检索时会出现问题。

Anyway, I came up with the following lenient function and I'm wondering if it's safe enough to be used. 无论如何,我想出了以下宽松的功能,我想知道它是否足够安全可以使用。 I use prepared statements for all database queries and I always html encode my output, but I still like to know that this is also a well thought through approach. 我对所有数据库查询都使用准备好的语句,并且我总是对输出进行html编码,但是我仍然想知道这也是一种经过深思熟虑的方法。

// $filename = $_FILES['file']['name'];

$filename = 'Filename 123;".\'"."la\l[a]*(/.jpg
∮ E⋅da = Q,  n → ∞, ∑ f(i) = ∏ g(i), ∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β),
  ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (A ⇔ B),
  2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm
sfajs,-=[];\',./09μετράει
าวนั้นเป็นชน
Καλημέρα κόσμε, コンニチハ
()_+{}|":?><';


// Replace symbols, punctuation, and ASCII control characters like \n or [BEL]
$filename = preg_replace('~[\p{S}\p{P}\p{C}]+~u', ' ', $filename);

Is this approach safe for me, and suitable for my users? 这种方法对我来说安全吗,并且适合我的用户?

Update 更新资料

To clarify, I do not use the filename for the name of the file on the filesystem. 为了澄清,我不使用文件名作为文件系统上文件的名称。 I generate a unique hash and use that - I just need to save the original name for the users befit since that is how they recognize their files. 我生成一个唯一的哈希并使用它-我只需要保存原始名称以供用户使用,因为这是他们识别文件的方式。 A SHA1 hash or UUID doesn't mean a thing to them. SHA1哈希或UUID对他们而言并不重要。

The very first thing you need to do is to check your input is UTF-8. 您需要做的第一件事就是检查您的输入是否为 UTF-8。

mb_internal_encoding and mb_check_encoding are your friends. mb_internal_encodingmb_check_encoding是您的朋友。

You are using a blacklist, when it's good security practice to use a whitelist of allowed input. 出于安全方面的良好做法,请使用黑名单,即使用允许输入的白名单。

Edit after the clarification : 澄清后编辑

You should be safe. 你应该很安全。 Remember to filter Lm and No as well if you don't want to summon Zalgo . 如果不想召唤Zalgo,请记住也要过滤LmNo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM