简体   繁体   中英

Javascript regex: get text at a particular line and character #

Given a chunk of text (imagine a page from a book), how can I get the word at a particular line and character #?

Find and return the word at Ln # 3, Ch # 7 "just".

var text = "Lorem ispum dolar\n
Si emit I dont know latin\n
Really just making this up as I go\n
Ok this should be enough for us to work on.\n

JSFiddle to try code on: http://jsfiddle.net/xa9xS/709/

You can use something like this (?:.*\\n){2}.{6}\\s+(\\w+) Where this would get word of line 2+1 starting at character 6+1.

Edit: Figured I'd robustify it a bit. The above fails to match anything if you provide a character-index in the middle of a word. The following will skip ahead untill the start of a word before it starts capturing: (?:.*\\n){2}.{6}.*?\\b(\\w+)\\b .

PS: Regex in javascript doesn't support positive lookbehind, so skipping back to the start of the word is quite a bit trickier.

Edit2: Making the string.replace work requires us to capture the other parts of the string. This seems to do the trick: text.replace(/((?:.*\\n){2}(?:.{6}.*?))\\b(\\w+)\\b((?:.*\\n?)*)/g, "$1[the-replacement]$3") but it does complicate things. It might be better to use the more direct approach in this case. Simplicity is king!

window.example_text = "Lorem ispum dolar\n\
Si emit I dont know latin\n\
Really just making this up as I go\n\
Ok this should be enough for us to work on.\n";

var lineNumber = 3;
var charNumber = 7;

var match = (example_text.split("\n")[lineNumber - 1]).substr(charNumber).split(/\s/)[0];
console.log(match);

http://jsfiddle.net/2DFhM/1/

Use this regex:

^(?:.*(?:\r?\n)*){2}.{6}\W+(\w+)

Explanation

  • The ^ anchor asserts that we are at the beginning of the string
  • To get to line 3, we need to skip two lines
  • Our line skipper is (?:.*(?:\\r?\\n)*){2} , matching any chars that are not line breaks, then line breaks
  • .{6} eats up the first six chars
  • There is no word starting at character 7, so we are going to match the next word:
  • \\W+ matches any non-word chars
  • (\\w+) captures word chars to Group 1
  • we retrieve the match from Group 1

In JS:

var myregex = /^(?:.*[\r\n]*){2}.{6}\W+(\w+)/;
var matchArray = myregex.exec(yourString);
if (matchArray != null) {
    thematch = matchArray[1];
} else {
    thematch = "";
}

Probably too late now lol, lots of good answers but here goes for the sake of completeness:

made this regexp here: http://regex101.com/r/nF2vX8/1

(?:.*\\n.*){2}^(?:.{7})(\\w*\\W)

and here's a solution in javascript:

var index_left = 0, index_right = 0, stringy = "";
for (; line_number-- > 0;){
    index_left = index_right;
    index_right = example_text.indexOf("\n", index_right) + 1;
}

stringy = example_text.substring(index_left, index_right-1);

index_left = 0;
index_left = stringy.indexOf(" ", char_number+1);
stringy = stringy.substring(0, index_left);
index_left = stringy.lastIndexOf(" ", index_left);
stringy = stringy.substring(index_left+1);

console.log(stringy);

and the fiddle for the js: http://jsfiddle.net/xa9xS/714/

it mangles line_number but it's easy to fix by copying the value and i'm too bored to do it now :P

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM