简体   繁体   中英

What is a more elegant way to parse this string?

I have a task where i need to parse C# scripts and look for a certain method attribute and extract parts from it, and i wonder if there is a more elegant way than how i do it:

[Info("Title", "Author", "5.2.5", ResourceId = 819)]

Here is what i do:

// foreach line in script
if (line.Contains("[Info(") && line.Contains("ResourceId"))
{
    var _attributes = line
        .Replace(" ", "")
        .Replace("\"", "")
        .Replace("[Info(", "")
        .Replace(")]", "")
        .Replace("ResourceId=", "")
        .Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
        // Do stuff with _attributes[0] _attributes[1] etc..
        break;
}

The easiest solution nowadays would be to use Roslyn. You can parse the code, find actual attributes (rather than things that look like the attribute you're looking for), and handle them all in a way that's C#-proper.

Here's a simple example:

var infoAttributes = CSharpSyntaxTree.ParseText(@"
namespace MyNamespace
{
    public class SomeClass
    {
        const string SomeConstant = ""Hi!"";

        [Info(""Some book"", ""Ray Brandenburg"", ""5.2.5"", ResourceId = 819)]
        public void SomeMethod()
        {

        }

        [InfoAttribute(SomeConstant, 42, ""Banana"")]
        public void SomeMethod2()
        {

        }

        // [Info(""Not going to happen"", ""Hilary Clinton"", ""1.2.0"")]
        public void SomeMethod3()
        {

        }
    }
}
")
.GetRoot()
.DescendantNodes()
.OfType<AttributeSyntax>()
.Where(i => i.Name.ToString() == "Info" || i.Name.ToString() == "InfoAttribute")
.Where
(
  i => 
    i.ArgumentList.Arguments.Count(j => j.NameEquals == null) == 3 
    && i.ArgumentList.Arguments[0].GetFirstToken().IsKind(SyntaxKind.StringLiteralToken)
    && i.ArgumentList.Arguments[1].GetFirstToken().IsKind(SyntaxKind.StringLiteralToken)
    && i.ArgumentList.Arguments[2].GetFirstToken().IsKind(SyntaxKind.StringLiteralToken)
)
.Select
(
  i =>
  new 
  {
    Title = (string)i.ArgumentList.Arguments[0].GetFirstToken().Value,
    Author = (string)i.ArgumentList.Arguments[1].GetFirstToken().Value,
    Version = (string)i.ArgumentList.Arguments[2].GetFirstToken().Value,
    ResourceId = 
      i.ArgumentList.Arguments
       .Where(j => j.NameEquals != null && j.NameEquals.Name.ToString() == "ResourceId")
       .Select(j => j.ChildNodes().Skip(1).First().GetFirstToken().Value.ToString())
       .FirstOrDefault()
  }
);

infoAttributes.Dump();

At this level, this is only doing parsing of the source code. To make things simpler, I added defensive clauses to only make this work with literal values - you'll probably want to turn those into warnings to be handled manually or something. The code correctly handles any trivia (eg whitespace), code that looks like attribute declaration but isn't, comments and plenty of other possible issues. There's still a simplifying assumption - the values must be literals (string or otherwise). The example will only find one Info attribute - the one on SomeMethod2 uses a constant and a different constructor overload, and the one on SomeMethod3 is commented out.

Another level is creating a compilation tree from this. That's a bit more involved, but allows you to make everything work as if it were real C# code - for example, the attribute on SomeMethod2 will resolve SomeConstant correctly. Of course, if you really want to be 100% correct, this requires gathering all the dependencies etc., which sounds like an overkill. Unless this is a real problem in your code, warnings should do fine for the outliers. If local constants are used often in your code, expanding the code to handle a local literal constant is still pretty easy.

As a disclaimer, this surely isn't the best way to do the parsing using Roslyn. It's just the first thing that came to mind and took just a while to get going. I'm still finding better ways of dealing with Roslyn pretty much every day :)

If for some reason what @Luaan suggests cannot be done, you can use an expression such as this: \\[Info\\("(.+?)", "(.+?)", "([\\d.]+)", ResourceId\\s*=\\s*(\\d+)\\)\\] to match and extract the values you are after.

An example is available here .

EDIT: As pointed out by @Evk, this expression will also match commented attributes. If this is not something which you are after, please let me know.

EDIT: As per your query, you would need to use something like so: \\[Info\\("(.+?)", "(.+?)", "?([\\d.]+)"?, ResourceId\\s*=\\s*(\\d+)\\)\\] . In this case, the quotation marks for the 3rd argument are followed by the ? character, which instructs the engine that the quotation marks might not be there. An example is available here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM