简体   繁体   English

将无效字符解析为XML

[英]Parsing invalid characters to XML

the application idea is simple , the application is given a path , and application writes each file`s path into XML , the problem i am facing is the file name can have invalid character and that makes the application stops working , here is the code i use to parse file information into XML : 应用程序的想法很简单,给应用程序指定了路径,然后应用程序将每个文件的路径写入XML,我面临的问题是文件名可以包含无效字符,这使应用程序停止工作,这是代码i用于将文件信息解析为XML:

    // the collecting details method
    private void Get_Properties(string path)
    {
        // Load the XML File
        XmlDocument xml = new XmlDocument();
        xml.Load("Details.xml");

        foreach (string eachfile in Files)
        {
            try
            {
                FileInfo Info = new FileInfo(eachfile);

                toolStripStatusLabel1.Text = "Adding : " + Info.Name;

                // Create the Root element
                XmlElement ROOT = xml.CreateElement("File");

                if (checkBox1.Checked)
                {
                    XmlElement FileName = xml.CreateElement("FileName");
                    FileName.InnerText = Info.Name;
                    ROOT.AppendChild(FileName);
                }

                if (checkBox2.Checked)
                {
                    XmlElement FilePath = xml.CreateElement("FilePath");
                    FilePath.InnerText = Info.FullName;
                    ROOT.AppendChild(FilePath);
                }

                if (checkBox3.Checked)
                {
                    XmlElement ModificationDate = xml.CreateElement("ModificationDate");
                    string lastModification = Info.LastAccessTime.ToString();
                    ModificationDate.InnerText = lastModification;
                    ROOT.AppendChild(ModificationDate);
                }

                if (checkBox4.Checked)
                {
                    XmlElement CreationDate = xml.CreateElement("CreationDate");
                    string Creation = Info.CreationTime.ToString();
                    CreationDate.InnerText = Creation;
                    ROOT.AppendChild(CreationDate);
                }

                if (checkBox5.Checked)
                {
                    XmlElement Size = xml.CreateElement("Size");
                    Size.InnerText = Info.Length.ToString() + " Bytes";
                    ROOT.AppendChild(Size);
                }

                xml.DocumentElement.InsertAfter(ROOT, xml.DocumentElement.LastChild);

                // +1 step in progressbar
                toolStripProgressBar1.PerformStep();
                success_counter++;
                Thread.Sleep(10);
            }
            catch (Exception ee)
            {
                toolStripProgressBar1.PerformStep();

                error_counter++;
            }
        }

        toolStripStatusLabel1.Text = "Now Writing the Details File";

        xml.Save("Details.xml");

        toolStripStatusLabel1.Text = success_counter + " Items has been added and "+ error_counter +" Items has Failed , Total Files Processed ("+Files.Count+")";

        Files.Clear();
    }

Here is how the XML looks like after Generation of details : 这是生成详细信息后XML的样子:

<?xml version="1.0" encoding="utf-8"?>
 <Files>
  <File>
    <FileName>binkw32.dll</FileName>
    <FilePath>D:\ALL DLLS\binkw32.dll</FilePath>
    <ModificationDate>3/31/2012 5:13:56 AM</ModificationDate>
    <CreationDate>3/31/2012 5:13:56 AM</CreationDate>
    <Size>286208 Bytes</Size>
  </File>
 <File>

Example of characters i would like to parse to XML without issue : 我想无问题地解析为XML的字符示例:

BX]GC^O^_nI_C{jv_rbp&1b_H âo&psolher d) doိiniᖭ BX] GC ^ O ^ _nI_C {jv_rbp&1b_Hâo&psolher d)doိiniᖭ

icon_Áq偩侉₳㪏ံ ぞ鵃_䑋屡1] icon_Áq偩侉₳㪏ံ ぞ鵃_䑋屡屡1]

MAnaFor줡 MAnaFor줡。

EDIT [PROBLEM SOLVED] 编辑[已解决问题]

All i had to do is : 1- convert the file name to UTF8-Bytes 2- Convert the UTF8-Bytes back to string 我要做的就是:1-将文件名转换为UTF8字节2-将UTF8字节转换回字符串

Here is the method : 这是方法:

byte[] FilestoBytes = System.Text.Encoding.UTF8.GetBytes(Info.Name);
string utf8 = System.Text.Encoding.UTF8.GetString(FilestoBytes);

It's not clear which of your characters you're having problems with. 目前尚不清楚您遇到哪个角色的问题。 So long as you use the XML API (instead of trying to write the XML out directly yourself) you should be fine with any valid text (broken surrogate pairs would probably cause an issue) but what won't be valid is Unicode code points less than space (U+0020), aside from tab, carriage return and line feed. 只要您使用XML API(而不是试图直接写XML出自己),你应该罚款与任何有效的文本(破代理对可能会导致一个问题),但究竟会不会是有效的较少的Unicode码点空格(U + 0020),除了制表符,回车符和换行符。 They're simply not catered for in XML. 它们根本不适合XML。

Probably the xml is malformed. xml格式可能不正确。 Xml files can not have some characters without being escaped. Xml文件中的某些字符必须先进行转义。 For example, this is not valid: 例如,这是无效的:

<dummy>You & Me</dummy>

Instead you should use: 相反,您应该使用:

<dummy>You &amp; Me</dummy>

Illegal characters in XML are &, < and > (as well as " or ' in attributes) XML中的非法字符是&,<和>(以及属性中的“或”)

Illegal characters in XML are &, < and > (as well as " or ' in attributes) XML中的非法字符是&,<和>(以及属性中的“或”)

In file system on windows you can have only & and ' in the file name (<,>," are not allowed in file name) 在Windows上的文件系统中,文件名中只能包含&和'(不允许在文件名中使用<,>,“)

While saving XML you can escape these characters. 保存XML时,可以转义这些字符。 For example for & you will require &amp; 例如,对于&,您将要求&amp;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM