简体   繁体   English

从XML文件C#中删除非ASCII字符

[英]Remove Non-ASCII characters from XML file C#

I am trying to write a program to remove open an XML file with Non-ASCII characters and replace those characters with spaces and save and close the file. 我正在尝试编写一个程序,以删除使用非ASCII字符打开的XML文件,并用空格替换这些字符,然后保存并关闭文件。

Thats basically it, just open the file remove all the non ascii characters and save/close the file. 基本上就是这样,只要打开文件,删除所有非ASCII字符并保存/关闭文件即可。

Here is my code: 这是我的代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.IO;
using System.Text.RegularExpressions;

namespace RemoveSpecial
{
    class Program
    {
        static void Main(string[] args)
        {
            string pth_input = string.Empty;
            string pth_output = string.Empty;
            for (int i = 1; i < args.Length; i++)
            {
                //input one
                string p_input = args[0];
                pth_input = p_input;
                pth_input = pth_input.Replace(@"\", @"\\");


                //output
                string p_output = args[2];
                pth_output = p_output;
                pth_output = pth_output.Replace(@"\", @"\\");
            }

            //s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);



            string lx;

            using (StreamReader sr = new StreamReader(pth_input))
            {
                using (StreamWriter x = new StreamWriter(pth_output))
                {
                    while ((lx = sr.ReadLine()) != null)
                    {
                        string text = sr.ReadToEnd();

                        Regex.Replace(text, @"[^\u0000-\u007F]+", "", RegexOptions.Compiled);
                        x.Write(text);
                    } sr.Close();

                }
            }


        }


    }
}

Thanks in advance guys. 在此先感谢大家。

According to documentation , the first string is an input parameter (and not passed by reference, so it could not change anyway). 根据文档 ,第一个字符串是输入参数(并且不通过引用传递,因此无论如何都不能更改)。 The result of the replacement is in the return value, like so: 替换的结果在返回值中,如下所示:

var result = Regex.Replace(text, @"[^\u0000-\u007F]+", "", RegexOptions.Compiled);
x.Write(result);

Note that RegexOptions.Compiled might decrease performance here. 请注意, RegexOptions.Compiled可能会降低性能。 It makes sense only if you reuse the same regular expression instance on multiple strings. 仅当您在多个字符串上重复使用相同的正则表达式实例时,它才有意义。 You can still do that, if you create the RegEx instance outside of the loop: 如果您在循环外创建RegEx实例,您仍然可以这样做:

var regex = new Regex(@"[^\u0000-\u007F]+", RegexOptions.Compiled);

using (var sr = new StreamReader(pth_input))
{
    using (var x = new StreamWriter(pth_output))
    {
        while ((lx = sr.ReadLine()) != null)
        {
            var text = sr.ReadToEnd();
            var result = regex.Replace(text, String.Empty);
            x.Write(result);
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM