简体   繁体   English

用大写字母拆分字符串(不包括带连字符的单词)

[英]Split string by capital letters (excluding hyphenated words)

I am using WatiN to click links in the browser based on the string i pull off the site. 我正在使用WatiN根据我拉下网站的字符串点击浏览器中的链接。

The problem is that some of the text it pulls is off of multiple lines, so it combines the string into 1 word. 问题是它拉出的一些文本是多行的,因此它将字符串组合成一个单词。 example: "Ultra Corrosion-Resistant Coated Alloy Steel" becomes "Ultra Corrosion-ResistantCoated Alloy Steel" 例如:“超耐腐蚀涂层合金钢”成为“超耐腐蚀涂层合金钢”

I am trying to split the string by all capital letters except ones that are hyphenated, so that i can start searching for links by portions of the string. 我试图用除了连字符之外的所有大写字母分割字符串,以便我可以开始按字符串的部分搜索链接。

this is what i have so far 这就是我到目前为止所拥有的

              types = doc.DocumentNode.SelectNodes("//h3[@class='AbbrPrsnttn_PrsnttnNm']");
                foreach (HtmlNode type in types)
                {
                    desc = type.InnerText.CleanText();

                    if (browser.Div(Find.ById("ProdPrsnttnGrpCntnr")).Element(Find.ByText(desc)).Exists)
                    {
                        browser.Div(Find.ById("ProdPrsnttnGrpCntnr")).Element(Find.ByText(desc)).Click();
                        System.Threading.Thread.Sleep(5000);
                        types = doc.DocumentNode.SelectNodes("//h3[@class='AbbrPrsnttn_PrsnttnNm']");
                        doc2.LoadHtml(browser.Html);
                        partTable = doc2.DocumentNode.SelectSingleNode("//div[@class='ItmTblGrp']");

                        MineNext(doc, doc2, browser, typeUrl, types, desc, partTable);
                    }

                    else
                    {
                        split = desc.Split(new Char[] { ' ' });

                    }

Here is an example of how it might be achieved: 以下是如何实现它的示例:

Updated to also separate numbers. 更新为也分开数字。

using System;
using System.Text;

namespace SplitOnUppercase
{
    class Program
    {
        static void Main()
        {
            const string text = "Test42-10 UltraCorrosion-ResistantCoated Alloy-SteelNumberTest42";
            var result = new StringBuilder(text.Length);
            for (var i = 0; i < text.Length - 1; i++)
            {
                result.Append(text[i]);
                if (text[i] != ' ' && text[i] != '-' && (char.IsUpper(text[i + 1]) || !char.IsDigit(text[i]) && char.IsDigit(text[i + 1])))
                    result.Append(' ');
            }
            result.Append(text[text.Length - 1]);

            Console.WriteLine(result);
        }
    }
}

您可以使用Char.IsUpper("C")查找要拆分的索引。

I'm pretty sure String.Split(Char[]) is case sensitive, but I cant test this at the moment. 我很确定String.Split(Char [])区分大小写,但我现在无法测试它。 I'm not at a computer where I can test or write c#, but this should work logically. 我不是在我可以测试或编写c#的计算机上,但这应该在逻辑上有效。 There are probably a lot of syntax errors in this. 这可能有很多语法错误。

Char[] splitChars = {'A', 'B', etc....}; //what the string will be split by
string desc = inputString; // input string
string[] splitByCapital = desc.Split(splitChars);
string[] output = new string[splitByCapital.length];
for (int i = 0; i < splitByCapital.length; i++)
{
    if (splitByCapital[i].Contais("-"))
    {
        output = splitByCapital[i] + splitByCapital[i-1];
    }
    else
    {
        output = splitByCapital[i];
    }
}

Here an example I put together using Linq. 这是我使用Linq整理的一个例子。 There are probably many ways to improve this. 可能有很多方法可以改善这一点。

public static string TransformLinqExample(this string toTransform)
        {
            string answer = toTransform
                .ToCharArray()
                .Select(c => new string(c, 1))
                .Aggregate((a, c) => a += (CapitalLetters.Contains(c) && c.IsUpper() && !a.EndsWith("-") && !a.EndsWith(" ")) ? " " + c : "" + c);
            return answer;
        }

Here's a complete example. 这是一个完整的例子。

using System;
using System.Linq;

namespace SplitProblem
{
    public static class StringAndCharExtensions
    {
        const string CapitalLetters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        public static bool IsUpper(this string theChar)
        {
            return theChar.ToUpper() == theChar;
        }
        public static string TransformLinqExample(this string toTransform)
        {
            string answer = toTransform
                .ToCharArray()
                .Select(c => new string(c, 1))
                .Aggregate((a, c) => a += (CapitalLetters.Contains(c) && c.IsUpper() && !a.EndsWith("-") && !a.EndsWith(" ")) ? " " + c : "" + c);
            return answer;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            string toSplit = "Ultra12.4 34.2 Corrosion-ResistantCoated 18-6 AlloySteel";
            string tranformed = toSplit.TransformLinqExample();
            Console.WriteLine("{0}\n\n", tranformed);

            foreach (var part in tranformed.Split(' '))
            {
                Console.WriteLine(part);
            }
            Console.ReadLine();
        }
    }
}

The easiest way is to make use of Regular Expression to split on capital letters (The following code won't split the string on special characters or numbers, to do so, update the regex pattern to include the special characters and numbers). 最简单的方法是使用正则表达式来分割大写字母(以下代码不会将字符串拆分为特殊字符或数字,为此,请更新正则表达式模式以包含特殊字符和数字)。

    var inputString = "AnyStringThatYouWantToSplitOnCap";
    var pattern = "[A-Z][a-z]+";
    Regex regex = new Regex(pattern);
    var matches = regex.Matches(inputString);
    StringBuilder value = new StringBuilder();
    foreach (Match item in matches)
    {
        value.AppendFormat("{0} ", item.Value);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM