简体   繁体   English

C# - 实现Markdown to Word(OpenXML)

[英]C# - Implementing Markdown to Word (OpenXML)

I'm trying to implement my own version of markdown for creating Word Documents in a C# application. 我正在尝试实现我自己的markdown版本,以便在C#应用程序中创建Word文档。 For bold/italic/underline I am going to use ** / ` / _ respectively. 对于粗体/斜体/下划线,我将分别使用** / ` / _ I have created something that parses combinations of ** 's to output bold text by extracting a match and using something like this: 我创建了一些东西来解析**的组合,通过提取匹配并使用这样的东西输出粗体文本:

RunProperties rPr2 = new RunProperties();
rPr2.Append(new Bold() { Val = new OnOffValue(true) });

Run run2 = new Run();
run2.Append(rPr2);
run2.Append(new Text(extractedString));
p.Append(run2);

My issue is when I come to combining the three different formats, as I'm thinking I would have to weigh up all the different formatting combinations and split them into separate runs. 我的问题是当我结合三种不同的格式时,因为我认为我必须权衡所有不同的格式组合并将它们分成单独的运行。 Bold runs, bold italic runs, underline runs, bold underline runs etc etc. I want my program to be able to handle something like this: 大胆的运行,粗体斜体运行,下划线运行,粗体下划线运行等等。我希望我的程序能够处理这样的事情:

**_Lorem ipsum_** (creates bold & underlined run)

`Lorem ipsum` dolor sit amet, **consectetur _adipiscing_ elit**. 
_Praesent `feugiat` velit_ sed tellus convallis, **non `rhoncus** tortor` auctor.

Basically any mix of the styles you could throw at it I want it to handle. 基本上任何你可以抛出的样式混合我想要它处理。 However if I am programmatically generating these runs, I need to weigh everything up before setting the text into runs, should I handle this with an array of character indexes for each style and merge them into a big list of styles (not sure how exactly I would do this)? 但是,如果我以编程方式生成这些运行,我需要在将文本设置为运行之前对所有内容进行权衡,如果我使用每个样式的字符索引数组来处理它并将它们合并到一个大的样式列表中(不知道我究竟是怎样的)会这样做)?

The final question is does something like this already exist? 最后一个问题是这样的事情已经存在吗? If it does I have been unable to find it (markdown to word). 如果确实如此,我一直无法找到它(降价到单词)。

I think you'll have to split your text into parts by the formatting they have and add each part with the correct formatting to the document. 我认为您必须通过格式化将文本拆分为多个部分,并为文档添加正确格式的每个部分。 Like here http://msdn.microsoft.com/en-us/library/office/gg278312.aspx . 喜欢这里http://msdn.microsoft.com/en-us/library/office/gg278312.aspx

So 所以

**non `rhoncus** tortor` will become - "non "{bold}, "rhoncus "{bold,italic}, "tortor"{italic} **非`rhoncus ** tortor`将成为 - “非”{bold},“rhoncus”{bold,italic},“tortor”{italic}

I think it'll be easier than performing several runs. 我认为这比执行几次运行更容易。 You don't even have to parse the entire document. 您甚至不必解析整个文档。 Just parse as you go and after each "change" in the formatting write to the docx. 只需在你去,并在格式化写入docx中的每次“更改”之后进行解析。

Another thought - If all you're creating is simple text and that's all you need, it might be even simpler to generate the openXML itself. 另一个想法 - 如果您正在创建的只是简单的文本而且这就是您所需要的,那么生成openXML本身可能更简单。 Your data is very structured, should be easy enough to create an XML out of it. 您的数据非常结构化,应该足够简单,可以从中创建XML。

Here's a simple algorithm to do what I propose... 这是一个简单的算法来做我的建议......

// These are the different formattings you have
public enum Formatings
    {
        Bold, Italic, Underline, Undefined
    }

    // This will store the current format
    private Dictionary<Formatings, bool> m_CurrentFormat;

    // This will store which string translates into which format
    private Dictionary<string, Formatings> m_FormatingEncoding;


    public void Init()
    {
        m_CurrentFormat = new Dictionary<Formatings, bool>();
        foreach (Formatings format in Enum.GetValues(typeof(Formatings)))
        {
            m_CurrentFormat.Add(format, false);
        }

        m_FormatingEncoding = new Dictionary<string, Formatings>
                                  {{"**", Formatings.Bold}, {"'", Formatings.Italic}, {"\\", Formatings.Underline}};
    }

    public void ParseFormattedText(string p_text)
    {
        StringBuilder currentWordBuilder = new StringBuilder();
        int currentIndex = 0;

        while (currentIndex < p_text.Length)
        {
            Formatings currentFormatSymbol;
            int shift;
            if (IsFormatSymbol(p_text, currentIndex, out currentFormatSymbol, out shift))
            {   
                // This is the current word you need to insert                 
                string currentWord = currentWordBuilder.ToString();

                // This is the current formatting status --> m_CurrentFormat
                // This is where you can insert your code and add the word you want to the .docx

                currentWordBuilder = new StringBuilder();
                currentIndex += shift;
                m_CurrentFormat[currentFormatSymbol] = !m_CurrentFormat[currentFormatSymbol];

            }

            currentWordBuilder.Append(p_text[currentIndex]);
            currentIndex++;
        }


    }

    // Checks if the current position is the begining of a format symbol
    // if true - p_currentFormatSymbol will be the discovered format delimiter
    // and p_shift will denote it's length
    private bool IsFormatSymbol(string p_text, int p_currentIndex, out Formatings p_currentFormatSymbol, out int p_shift)
    {
        // This is a trivial solution, you can do better if you need
        string substring = p_text.Substring(p_currentIndex, 2);
        foreach (var formatString in m_FormatingEncoding.Keys)
        {
            if (substring.StartsWith(formatString))
            {
                p_shift = formatString.Length;
                p_currentFormatSymbol = m_FormatingEncoding[formatString];
                return true;
            }
        }

        p_shift = -1;
        p_currentFormatSymbol = Formatings.Undefined;
        return false;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM