简体   繁体   中英

How to remove backslashes in a string

I get a string from a website called "willekeurigwoord.nl" which means random word. So when I get the string from the site with HtmlAgilityPack, it is formatted like "\\n\\t\\t\\tkegelvrucht\\r\\n \\t\\n\\t\\t".

So the word that I get is "kegelvrucht" but before and after the word there are backslashes which when I try to remove they get ignored even when I put "@" or use double backslashes ("\\") in front of the string.

So my question is, how do I remove the \\ in my string?

I did try everything that is in the comment lines.

    private string RandomWordOnline() //Get the word online
    {
        //get string from htlm file with htmlagilitypack
        var webGet = new HtmlWeb();
        var doc = webGet.Load("http://www.willekeurigwoord.nl/");
        String word = doc.DocumentNode.SelectSingleNode("//h1").InnerText;

        //word = word.Replace(@"\", "");            
        //word = @word.Trim(new char[] {' ','\\'});
        //word = word.Substring(8, word.Length - 13);
        //word = word.Substring(0, 13);

        //trying to remove backslash, does not work
        for (int i = 0; i < word.Length; i++)
        {

            char chrWord = Convert.ToChar(word.Substring(i, 1));
            char backslash = Convert.ToChar(@"\");
            if (chrWord == backslash)
            {
                word = word.Remove(i, 1);
            }

        }

        return word;           
    }

Those backslashes are not in the string, they are just a representation of tabs, carriage returns and line feeds. For example, a string which Visual Studio shows as \\t\\t\\n\\n is only 4 characters long, not 8.

You can get rid of them just like this:

var webGet = new HtmlWeb();
var doc = webGet.Load("http://www.willekeurigwoord.nl/");
String word = doc.DocumentNode.SelectSingleNode("//h1").InnerText;
string fixedWord = word.Trim();

Trim removes all white spaces that surround your text, including tabs and new lines. If you happen to only want to remove some specific characters, or to remove them in the middle of the string, you need to do something like this:

string fixedWord = word.Replace("\t", "").Replace("\n", "").Replace("\r", "").Trim();

Just call Trim() on your string:

string cleaned = word.Trim();

It will remove all leading and trailing whitespace, which includes all of the characters you want removed.

Probably a C# String expert will know the answer you are looking for. But this is a great example of where post C languages make things harder. Probably your \\ is being taken as an escape character by the compiler, so the code never sees it at run time.

By the way, "word" is a terrible choice for a label because it is reserved in most languages (meaning a type 16 bits wide or something similar).

In C, you just go through the string character by character and copy each one into a new string based on whether it is or isn't '\\'; (I didn't test/debug this, and you need to add bounds checking unless you know the sizes of all the strings.)

i = j = 0;

while (strIn[i] != '0') {

    if (strIn[i] != '\') {

        strOut[j++] = strIn[i];

    }

    i++;
}

(If that sounds like extra work, know that at run time, your C# is doing that anyway, and hiding the required interaction with the memory manager from you so you don't know why your program runs slowly.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM