这是解析FIX格式的大字符串的有效方法吗？

Question

I need to parse a FIX message as fast as possible. 我需要尽快解析FIX消息。

My approach is below. 我的方法如下。 I keep a ref to the FixString in FP_FixString, then I work through it sequencially getting Tag values as I need them. 我在FP_FixString中保留了对FixString的引用，然后通过它依次解决了我需要时获取Tag值的问题。

The fix message is large (approximately 4,000 chars and with about 100 repeating group of 3 tags that I need to extract). 修复消息很大（大约4,000个字符，我需要提取大约100个重复的3个标签组）。

Am I doing this as efficiently as possible? 我会尽可能高效地执行此操作吗？ From a string building and parsing perspective? 从字符串构建和解析的角度来看？

public FixParser
{
        string FP_FixString;
        int FP_m;
        int FP_Lastm;

        public void Go()
        {
            using (StreamReader reader = new StreamReader(new GZipStream(File.OpenRead(@"L:\Logs\FIX.4.2-D.messages.current.log.20140512T.gz"), CompressionMode.Decompress)))
            {
                string line = "";
                while ( (line = reader.ReadLine()) != null)
                {

                      InitiateFixParse(ref line);

                      string Symbol;
                      if (!GetTagString(55, out Symbol))
                      return;

                      //DO ALL OTHER PROCESSING OF TAGS HERE
                }
        }


        public void InitiateFixParse(ref string fixString)
        {
            FP_Lastm = fixString.Length - 1;
            FP_FixString = fixString;
            FP_m = 0;
        }

       public bool IsEndMark()
        {
            if (FP_m>=FP_Lastm || FP_FixString[FP_m].Equals('\x01'))
                return true;

            return false;
        }

        public bool NextTag(out int Tag, out string ValueString)
        {
            Tag = 0;
            ValueString = "";
            if(FP_m>=FP_Lastm)
                return false;

            string TagString = "";
            bool GettingTag=true;
            while (!IsEndMark())
            {
                if (FP_FixString[FP_m].Equals('='))
                {
                    GettingTag = false;
                    FP_m++;
                }
                if(GettingTag)
                    TagString = TagString + FP_FixString[FP_m];
                else
                    ValueString = ValueString + FP_FixString[FP_m];
                FP_m++;
            }

            Tag=int.Parse(TagString);
            FP_m++; //Start of next Tag

            return true;
        }

        public bool GetTagString(int Tag, out string ValueString)
        {
            //bool FountIt = false;
            int aTag;
            string aValueString;
            while (NextTag(out aTag, out aValueString))
            {
                if (aTag == Tag)
                {
                    ValueString = aValueString;
                    return true;
                }
            }

            ValueString = "";
            return false;
        }
}

Answer 1

My only suggestion at first glance is to replace string concatenations that are done in loops, such as TagString = TagString + FP_FixString[FP_m]; 乍看之下，我唯一的建议是替换在循环中完成的字符串连接，例如TagString = TagString + FP_FixString[FP_m]; , with StringBuilders ie ，使用StringBuilders即

StringBuilder sb = new StringBuilder();
while (!IsEndMark())
{
   sb.append(FP_FixString[FP_m]);
}
TagString = sb.ToString();

as StringBuilder is much more efficient than concatenation in loops. 因为StringBuilder比循环中的连接要有效得多。 Usual caveats apply. 通常注意事项。 I agree with @DumbCoder that profiling is a good idea. 我同意@DumbCoder的观点，这是一个好主意。

As an aside it may be better to store a FIXml representation of the message and use xpath to extract the data if possible (ie if you aren't trying to parse automatically created logs) as it can help to solve some repeating groups issues, or rather it did for me here! 顺便说一句，最好存储消息的FIXml表示形式，并在可能的情况下使用xpath提取数据（即，如果您不尝试解析自动创建的日志），因为它可以帮助解决一些重复的组问题，或者而是在这里对我有用！

Answer 2

If you are dealing with very large files you want to avoid GC pressure as much as possible as already mentioned. 如果要处理非常大的文件，则要尽可能避免GC压力。 StringBuilder helps but you are still creating a StringBuilder instance. StringBuilder可以帮助您，但您仍在创建StringBuilder实例。 Some quick wins: 一些快速的胜利：

if (GettingTag)
{
    //TagString = TagString + FP_FixString[FP_m];
    Tag = Tag * 10 + ((byte)FP_FixString[FP_m] -48);
}// This reduces time by about 40%

//Tag = int.Parse(TagString);

The code below reduces the time even further (reduces to 50% of original): 下面的代码可进一步减少时间（减少至原始时间的50％）：

int valueStart = 0;
while (!IsEndMark())
{
    if (FP_FixString[FP_m].Equals('='))
    {
        GettingTag = false;
        FP_m++;
        valueStart = FP_m;
    }
    if (GettingTag)
    {
        //TagString = TagString + FP_FixString[FP_m];
        Tag = Tag * 10 + ((byte)FP_FixString[FP_m] -48);
    }
    //else
        //ValueString = ValueString + FP_FixString[FP_m];
    FP_m++;
}
ValueString = FP_FixString.Substring(valueStart, FP_m - valueStart);

//Tag = int.Parse(TagString);
FP_m++; //Start of next Tag

I havent looked at the code further. 我还没有进一步看代码。

这是解析FIX格式的大字符串的有效方法吗？

问题描述

2 个解决方案

解决方案1
1 2014-05-21 09:35:56

解决方案2
0 已采纳 2014-05-23 14:20:49

这是解析FIX格式的大字符串的有效方法吗？

问题描述

2 个解决方案

解决方案1 1 2014-05-21 09:35:56

解决方案2 0 已采纳 2014-05-23 14:20:49

解决方案1
1 2014-05-21 09:35:56

解决方案2
0 已采纳 2014-05-23 14:20:49