简体   繁体   中英

How do I remove a word before a dash using javascript?

I need to remove all words before the dash at the beginning of each sentence. Some sentences do not have words before dashes and dashes within the long sentence need to stay. Here is an example:

How do I change these strings:

PARIS — President Nicolas Sarkozy, running from behind for reelection...

GAZA CITY —Cross-border fighting between Gaza and Israel...

CARURU, Colombia — Quite suddenly, the endless green of Amazonian forest...

A year after an earthquake and tsunami devastated Japan's northeastern coast...

Into these strings:

President Nicolas Sarkozy, running from behind for reelection...

Cross-border fighting between Gaza and Israel...

Quite suddenly, the endless green of Amazonian forest...

A year after an earthquake and tsunami devastated Japan's northeastern coast...

How can I accomplish this with javascript (or php if javascript doesn't allow it)?

This is a pretty straightforward regex problem, but geez, it's not as straightforward as all the other answers assume. A few points:

  • Regex is the right choice - the split and substr answers won't deal with the leading space, and can't distinguish between a dateline with a dash at the beginning of a sentence, and a dash in the middle of your text content. Any option you use ought to be able to deal with content like: "President Nicolas Sarkozy — running from behind for reelection — came to Paris today..." as well as the options you suggest.

  • It's tricky to automatically recognize that my test sentence above doesn't have a dateline. Almost all the answers so far use the single description: any number of arbitrary characters, followed by a dash . That's insufficient for a test sentence like the one above.

  • You'll get better results by adding a few more rules, like fewer than X characters, located at the beginning of the string, followed by a dash, optionally followed by an arbitrary number of spaces, followed by a capital letter . Even this won't work correctly with "President Sarkozy — Carla Bruni's husband..." , but you're going to have to assume that this edge case is sufficiently rare to ignore.

All of which gives you a function like this:

function removeDateline(str) {
    return str.replace(/^[^—]{3,75}—\s*(?=[A-Z])/, "");
}

Breaking it down:

  • ^ - must occur at the beginning of the string.
  • [^—]{3,75} - between 3 and 75 characters other than a dash
  • \\s* - optional spaces
  • (?=[AZ]) - lookahead - the next character must be a capital letter.

Usage:

var s = "PARIS — President Nicolas Sarkozy, running from behind for reelection...";
removeDateline(s); // "President Nicolas Sarkozy — running from behind for reelection..."

s = "PARIS — President Nicolas Sarkozy — running from behind for reelection...";
removeDateline(s);  // "President Nicolas Sarkozy — running from behind for reelection..."

s = "CARURU, Colombia — Quite suddenly, the endless green of Amazonian forest...";
removeDateline(s); // "Quite suddenly, the endless green of Amazonian forest..."

If each sentence can be separated from the others you can use a regexp. Like this example:

var s = "PARIS — President Nicolas Sarkozy, running from behind for reelection..."
function removeWord(str)
{
    return str.replace(/^[^—]+—[\s]*/, "");
}
alert(removeWord(s));

PHP

$x = "PARIS — President Nicolas Sarkozy, running from behind for reelection...";
$var = substr($x, strpos($x, "—"));

In the most basic example:

var str = "PARIS - President Nicolas Sarkozy, running from behind for reelection.";
alert(str.split('-')[1]);​ // outputs: President Nicolas Sarkozy, running from behind for reelection.

Based on your actual document structure there could be ways to loop through the content to speed this type of operation up.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM