简体   繁体   中英

In Swift, how can I generate an array of substrings from a larger string?

I have an HTML string where I'm trying to generate an array of all substring instances that occur between two sets of characters.

My string looks something like this:

<h2>The Phantom Menace</h2>
<p>Two Jedi escape a hostile blockade to find allies and come across a young boy who may bring balance to the Force, but the long dormant Sith resurface to claim their original glory.</p>
<h2>Attack of the Clones</h2>
<p>Ten years after initially meeting, Anakin Skywalker shares a forbidden romance with Padmé Amidala, while Obi-Wan Kenobi investigates an assassination attempt on the senator and discovers a secret clone army crafted for the Jedi.</p>
<h2>Revenge of the Sith</h2>
<p>Three years into the Clone Wars, the Jedi rescue Palpatine from Count Dooku. As Obi-Wan pursues a new threat, Anakin acts as a double agent between the Jedi Council and Palpatine and is lured into a sinister plan to rule the galaxy.</p>
<h2>A New Hope</h2>
<p>Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a Wookiee and two droids to save the galaxy from the Empire's world-destroying battle station, while also attempting to rescue Princess Leia from the mysterious Darth Vader.</p>
<h2>The Empire Strikes Back</h2>
<p>After the Rebels are brutally overpowered by the Empire on the ice planet Hoth, Luke Skywalker begins Jedi training with Yoda, while his friends are pursued by Darth Vader and a bounty hunter named Boba Fett all over the galaxy.</p>
<h2>Return of the Jedi</h2>
<p>After a daring mission to rescue Han Solo from Jabba the Hutt, the Rebels dispatch to Endor to destroy the second Death Star. Meanwhile, Luke struggles to help Darth Vader back from the dark side without falling into the Emperor's trap.</p>
<h2>The Force Awakens</h2>
<p>As a new threat to the galaxy rises, Rey, a desert scavenger, and Finn, an ex-stormtrooper, must join Han Solo and Chewbacca to search for the one hope of restoring peace.</p>
<h2>The Last Jedi</h2>
<p>Rey develops her newly discovered abilities with the guidance of Luke Skywalker, who is unsettled by the strength of her powers. Meanwhile, the Resistance prepares for battle with the First Order.</p>
<h2>The Rise of Skywalker</h2>
<p>The surviving members of the resistance face the First Order once again, and the legendary conflict between the Jedi and the Sith reaches its peak bringing the Skywalker saga to its end.</p>

I want to create an array of {h2} and {/h2} substrings to get the following result:

["The Phantom Menace", "Attack of the Clones", "Revenge of the Sith", "A New Hope", "The Empire Strikes Back", "Return of the Jedi", "The Force Awakens", "The Last Jedi", "The Rise of Skywalker"]

Is there a variation of this code where I can input the range between the tags?

let titles = htmlInput.components(separatedBy:"<h2>")

This returns an array with elements like this:

"The Phantom Menace

Two Jedi escape a hostile blockade to find allies and come across a young boy who may bring balance to the Force, but the long dormant Sith resurface to claim their original glory.

"

Any help would be welcome.

Thanks

As mentioned in the comment using an XMLParser here would be a good idea. Define your XMLParser , and set its delegate ( XMLParserDelegate ) which is a class you define (inheriting from XMLParserDelegate .): there you need two functions:

public func parser(_ parser: XMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [String : String]) {
    lastTag = elementName
}

and

/// When there is text found between tags, add it to the array.
public func parser(_ parser: XMLParser, foundCharacters string: String) {
    let text = string.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
    if !text.isEmpty && lastTag == "h2" {
        h2Array.append(text)
    }
}

And finally you need a getter for the h2Array to be able to use it where you need it. You need two private variables ( var lastTag: String and var h2Array: [String] ).

and here how you use then the parser:

let parser = XMLParser(data: htmlString.data(using: .utf8) ?? Data())
let parserDelegate = MyParserDelegate()
parser.delegate = parserDelegate
parser.parse()
let h2Array = parserDelegate.getterForTheArray() // this one needs to be defined

for func parser(_ parser: XMLParser, foundCharacters string: String) you should also take into account this from its documentation:

The parser object may send the delegate several parser(_:foundCharacters:) messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.

That meens you might need to change my solution, to be sure to not cut your strings and have two halves of one of the searched strings inside your array instead of the whole one...

You can use Regular Expression (?<=<h2>)(.*?)(?=</h2>)

Example:

let input: String = ...
let expr = "(?<=<h2>)(.*?)(?=</h2>)"

do {
    let regex = try NSRegularExpression(pattern: expr)
    let nsString = input as NSString
    let results = regex.matches(in: input, range: NSRange(location: 0, length: nsString.length))
    print(results.map { nsString.substring(with: $0.range)})
} catch let error {
    print("invalid regex: \(error.localizedDescription)")
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM