用正则表达式删除除中文字符之外的所有字符？

Question

I have a string that is a sentence, written in chinese.我有一个字符串是一个句子，用中文写的。

This contains chinese characters, and other filler things, like spaces, comma, exclamation marks and etc., all encoded in UTF8.这包含中文字符和其他填充物，如空格、逗号、感叹号等，均以 UTF8 编码。

Using regex with a latin1 string, I could use preg_replace and [a-zA-Z] to clean it and remove the filler.使用带有 latin1 字符串的正则表达式，我可以使用preg_replace和[a-zA-Z]来清洁它并去除填充物。

How can I keep only the chinese "alphabet" characters in the chinese string while removing all the filler items?如何在删除所有填充项的同时仅保留中文字符串中的中文“字母”字符？

Answer 1

According to this document , here are the unicode ranges of chinese characters:根据this document ，这里是汉字的unicode范围：

Table 12-2.表 12-2。 Blocks Containing Han Ideographs包含汉字的块

Block                                Range         Comment
CJK Unified Ideographs               4E00–9FFF     Common
CJK Unified Ideographs Extension A   3400–4DBF     Rare
CJK Unified Ideographs Extension B   20000–2A6DF   Rare, historic
CJK Unified Ideographs Extension C   2A700–2B73F   Rare, historic
CJK Unified Ideographs Extension D   2B740–2B81F   Uncommon, some in current use
CJK Compatibility Ideographs         F900–FAFF     Duplicates, unifiable variants, corporate
characters
CJK Compatibility Ideographs Supplement 2F800–2FA1F Unifiable variants

You could use it like this:你可以这样使用它：

preg_replace('/[^\u4E00-\u9FFF]+/u', '', $string);

or要么

preg_replace('/\P{Han}+/u', '', $string);

where \\P is the negation of \\p其中\\P是\\p的否定

see here for all the unicode scripts在这里查看所有unicode scripts

Answer 2

希望对你有用。

str1 = Regex.Replace(str1, @"[\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD]", "");

用正则表达式删除除中文字符之外的所有字符？

问题描述

2 个解决方案

解决方案1
7 已采纳 2012-01-24 15:35:03

解决方案2
0 2021-11-09 13:44:20

用正则表达式删除除中文字符之外的所有字符？

问题描述

2 个解决方案

解决方案1 7 已采纳 2012-01-24 15:35:03

解决方案2 0 2021-11-09 13:44:20

解决方案1
7 已采纳 2012-01-24 15:35:03

解决方案2
0 2021-11-09 13:44:20