Extract a substring between two words from a string

Question

I have the following string:

string = "asflkjsdhlkjsdhglk<body>Iwant\to+extr@ctth!sstr|ng<body>sdgdfsghsghsgh"

I would like to extract the string between the two <body> tags. The result I am looking for is:

substring = "<body>Iwant\to+extr@ctth!sstr|ng<body>"

Note that the substring between the two <body> tags can contain letters, numbers, punctuation and special characters.

Is there an easy way of doing this? Thank you!

Answer 1

这是正则表达式方式：

regmatches(string, regexpr('<body>.+<body>', string))

Answer 2

regex = '<body>.+?<body>'

You want the non-greedy ( .+? ), so that it doesn't group as many <body> tags as possible.

If you're solely using a regex with no auxiliary functions, you're going to need a capturing group to extract what is required, ie:

regex = '(<body>.+?<body>)'

Answer 3

strsplit() should help you:

>string = "asflkjsdhlkjsdhglk<body>Iwant\to+extr@ctth!sstr|ng<body>sdgdfsghsghsgh"
>x = strsplit(string, '<body>', fixed = FALSE, perl = FALSE, useBytes = FALSE)
[[1]]
[1] "asflkjsdhlkjsdhglk"         "Iwant\to+extr@ctth!sstr|ng" "sdgdfsghsghsgh"  
> x[[1]][2]
[1] "Iwant\to+extr@ctth!sstr|ng"

Of course, this gives you all three parts of the string and does not include the tag.

Answer 4

I believe that Matthew's and Steve's answers are both acceptable. Here is another solution:

 string = "asflkjsdhlkjsdhglk<body>Iwant\\to+extr@ctth!sstr|ng<body>sdgdfsghsghsgh" regmatches(string, regexpr('<body>.+<body>', string)) output = sub(".*(<body>.+<body>).*", "\\\\1", string) print (output)

Extract a substring between two words from a string

Question

4 answers

solution1
7 2013-11-26 18:08:39

solution2
6 2013-11-26 18:16:10

solution3
2 2013-11-26 18:05:44

solution4
0 2017-08-27 07:00:19

Extract a substring between two words from a string

Question

4 answers

solution1 7 2013-11-26 18:08:39

solution2 6 2013-11-26 18:16:10

solution3 2 2013-11-26 18:05:44

solution4 0 2017-08-27 07:00:19

solution1
7 2013-11-26 18:08:39

solution2
6 2013-11-26 18:16:10

solution3
2 2013-11-26 18:05:44

solution4
0 2017-08-27 07:00:19