简体   繁体   中英

Split string by capital letters (excluding hyphenated words)

I am using WatiN to click links in the browser based on the string i pull off the site.

The problem is that some of the text it pulls is off of multiple lines, so it combines the string into 1 word. example: "Ultra Corrosion-Resistant Coated Alloy Steel" becomes "Ultra Corrosion-ResistantCoated Alloy Steel"

I am trying to split the string by all capital letters except ones that are hyphenated, so that i can start searching for links by portions of the string.

this is what i have so far

              types = doc.DocumentNode.SelectNodes("//h3[@class='AbbrPrsnttn_PrsnttnNm']");
                foreach (HtmlNode type in types)
                {
                    desc = type.InnerText.CleanText();

                    if (browser.Div(Find.ById("ProdPrsnttnGrpCntnr")).Element(Find.ByText(desc)).Exists)
                    {
                        browser.Div(Find.ById("ProdPrsnttnGrpCntnr")).Element(Find.ByText(desc)).Click();
                        System.Threading.Thread.Sleep(5000);
                        types = doc.DocumentNode.SelectNodes("//h3[@class='AbbrPrsnttn_PrsnttnNm']");
                        doc2.LoadHtml(browser.Html);
                        partTable = doc2.DocumentNode.SelectSingleNode("//div[@class='ItmTblGrp']");

                        MineNext(doc, doc2, browser, typeUrl, types, desc, partTable);
                    }

                    else
                    {
                        split = desc.Split(new Char[] { ' ' });

                    }

Here is an example of how it might be achieved:

Updated to also separate numbers.

using System;
using System.Text;

namespace SplitOnUppercase
{
    class Program
    {
        static void Main()
        {
            const string text = "Test42-10 UltraCorrosion-ResistantCoated Alloy-SteelNumberTest42";
            var result = new StringBuilder(text.Length);
            for (var i = 0; i < text.Length - 1; i++)
            {
                result.Append(text[i]);
                if (text[i] != ' ' && text[i] != '-' && (char.IsUpper(text[i + 1]) || !char.IsDigit(text[i]) && char.IsDigit(text[i + 1])))
                    result.Append(' ');
            }
            result.Append(text[text.Length - 1]);

            Console.WriteLine(result);
        }
    }
}

您可以使用Char.IsUpper("C")查找要拆分的索引。

I'm pretty sure String.Split(Char[]) is case sensitive, but I cant test this at the moment. I'm not at a computer where I can test or write c#, but this should work logically. There are probably a lot of syntax errors in this.

Char[] splitChars = {'A', 'B', etc....}; //what the string will be split by
string desc = inputString; // input string
string[] splitByCapital = desc.Split(splitChars);
string[] output = new string[splitByCapital.length];
for (int i = 0; i < splitByCapital.length; i++)
{
    if (splitByCapital[i].Contais("-"))
    {
        output = splitByCapital[i] + splitByCapital[i-1];
    }
    else
    {
        output = splitByCapital[i];
    }
}

Here an example I put together using Linq. There are probably many ways to improve this.

public static string TransformLinqExample(this string toTransform)
        {
            string answer = toTransform
                .ToCharArray()
                .Select(c => new string(c, 1))
                .Aggregate((a, c) => a += (CapitalLetters.Contains(c) && c.IsUpper() && !a.EndsWith("-") && !a.EndsWith(" ")) ? " " + c : "" + c);
            return answer;
        }

Here's a complete example.

using System;
using System.Linq;

namespace SplitProblem
{
    public static class StringAndCharExtensions
    {
        const string CapitalLetters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        public static bool IsUpper(this string theChar)
        {
            return theChar.ToUpper() == theChar;
        }
        public static string TransformLinqExample(this string toTransform)
        {
            string answer = toTransform
                .ToCharArray()
                .Select(c => new string(c, 1))
                .Aggregate((a, c) => a += (CapitalLetters.Contains(c) && c.IsUpper() && !a.EndsWith("-") && !a.EndsWith(" ")) ? " " + c : "" + c);
            return answer;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            string toSplit = "Ultra12.4 34.2 Corrosion-ResistantCoated 18-6 AlloySteel";
            string tranformed = toSplit.TransformLinqExample();
            Console.WriteLine("{0}\n\n", tranformed);

            foreach (var part in tranformed.Split(' '))
            {
                Console.WriteLine(part);
            }
            Console.ReadLine();
        }
    }
}

The easiest way is to make use of Regular Expression to split on capital letters (The following code won't split the string on special characters or numbers, to do so, update the regex pattern to include the special characters and numbers).

    var inputString = "AnyStringThatYouWantToSplitOnCap";
    var pattern = "[A-Z][a-z]+";
    Regex regex = new Regex(pattern);
    var matches = regex.Matches(inputString);
    StringBuilder value = new StringBuilder();
    foreach (Match item in matches)
    {
        value.AppendFormat("{0} ", item.Value);
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM