简体   繁体   中英

Check if a string ends with another string or a part of another string

I wanted to know if there is a solution to the problem mentioned in the topic.

Example:

In my project I have to parse a lot of messages. These messages contain formatting characters like "\\n" or "\\r" . The end of this message is always signed with the name of the author.

Now I want to remove the signatures from each message. The problem is that the end of the message could look like

  • \\r\\n\\rDaniel Walters\\n\\r\\n
  • \\n\\r\\n\\r\\n\\rDaniel

or something else

The problem is that I don't know how to identifiy these varying endings. I tried to only remove the last "\\n\\r\\n" 's by calling string.EndsWith() in a loop but this solution only removes everything except "\\r\\n\\rDaniel Walter" . Then I tried to remove the author (I parsed it prior to this step) but this does not work either. Sometimes the parsed author is "Daniel Walters" and the signature is only "Daniel".

Any ideas how to solve this? Are there maybe some easier and smarter solutions than looping through the string?

You can make a regular expression to replace the name with an optional last name, and any number of whitespace characters before and after.

Example:

string message = "So long and thanks for all the fish  \t\t\r Arthur \t Dent  \r\r\n  ";
string firstName = "Arthur";
string lastName = "Dent";

string pattern = "\\s+" + Regex.Escape(firstName) + "(\\s+" + Regex.Escape(lastName) + ")?\\s*$";

message = Regex.Replace(message, pattern, String.Empty);

(Yes, I know it was really the dolphins saying that.)

You'll have to determine what "looks like" a signature. Are there specific criteria that always apply?

  • Always followed by at least 3 newlines (\\r or \\n)
  • Starts with a capital letter
  • Has no following text

A regex like this might work for those criteria:

/[\r\n]{3,}[A-Z][\w ]+[\r\n]*(?!\w)/

Adjust according to your needs.

Edited to add: This should match the last "paragraph" of a document.

/([\r\n]+[\w ]+[\r\n]*)(?!.)/

A different approach could be to split your message at the newline chars removing the empty newline entries. Then reassembling the expected string excluding the last line where I assume there is always the signature.

string removeLastLine = "Text on the firstline\r\ntest on second line\rtexton third line\r\n\rDaniel Walters\n\r\n";
string[] lines = removeLastLine.Split(new char[] {'\r', '\n'},  StringSplitOptions.RemoveEmptyEntries);
lines = lines.Take(lines.Length - 1).ToArray();
string result = string.Join(Environment.NewLine, lines);

you could try something like the following (untested) :-

string str="\r\n\rDaniel Walters\n\r\n";
while(str.EndsWith("\r") || str.EndsWith("\n"))
{
  // \r and \n have the same length. So, we can use either \r or \n in the end
  str=str.SubString(0,str.Length - ("\r".Length));
}
while(str.StartsWith("\r") || str.StartsWith("\n"))
{
  // \r and \n have the same length
  str=str.SubString("\r".Length,str.length);
}

you can do this as well but I am not sure if your pattern changes but this will return Daniel Walter

string replaceStr = "\r\n\rDaniel Walters\n\r\n";
replaceStr = replaceStr.TrimStart(new char[] { '\r', '\n' });
replaceStr = replaceStr.TrimEnd(new char[] { '\r', '\n' });

or if you want to use the trim method you can do the following

string replaceStr = "\r\n\rDaniel Walters\n\r\n";
replaceStr = replaceStr.Trim();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM