简体   繁体   中英

get Substring from Document with Regex

I'm not very much in Regex and hope to get some help from you guys:

I've got a String like this:

"... p.msochpdefault
{mso-style-name:msochpdefault;} ..."

Now I don't know, whats before and after this part of the string and I don't know the content between the brackets.

I've tried this, but it does take the last " ;} " of the file and does not contain " p.msochpdefault "

string match = Regex.Match(str, @"p.msochpdefault(.+);}", RegexOptions.Singleline).Groups[1].Value;

How can I extract this in the right way?

There are a some issues with your RegEx:

p.MsoNormal(.+);}

  1. You are searching for p.MsoNormal, not for p.msochpdefault.
  2. You have to escape the dot, otherwise it will match any character (p\\.MsoNormal or p\\.msochpdefault)
  3. The term .+ requires at least one character to be between p\\.MsoNormal and ;}. In your example you have none. So it should be .*
  4. You are using greedy evaluation, which is th reason why you catch the last instance of ;}. You have to use lazy evaluation. That is .+? instead of .+ and .*? instead of .* That will catch the first match, not the last.

I would recomend you check a regex evaluator. There are many (also free ones) online. With such a tool you can try your regex and revise it if it doesn't work.

This seams to work for you:

string str = "... p.msochpdefault\n{ mso - style - name:msochpdefault; } ...";
string match = Regex.Match(str, @"{.*.}").Groups[0].Value;

One tip is to take help of sites like this to try and get to the actual regex expression corect/working.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM