简体   繁体   English

正则表达式不匹配

[英]Regular expression not matching

I'm trying to write a small program that extract information from a website. 我正在尝试编写一个从网站提取信息的小程序。 I only want to get certain information that is in between two strings, "ORIGIN" and "//". 我只想获取介于两个字符串“ ORIGIN”和“ //”之间的某些信息。 Im not getting any errors in the code but I'm unable to print the info to screen for some reason. 我在代码中没有收到任何错误,但由于某种原因我无法将信息打印到屏幕上。 Could someone point out what I'm doing wrong? 有人可以指出我做错了吗?

import java.io.IOException;
import java.io.PrintStream; 
import java.io.FileOutputStream;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.util.regex.*;


class main {
    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=293762&db=nuccore&dopt=genbank&extrafeat=976&fmt_mask=0&retmode=html&withmarkup=on&log$=seqview&maxplex=3&maxdownloadsize=1000000").get();

        String text = doc.text();
        String pattern1 = "ORIGIN";  
        String pattern2 = "//";
        String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);

        Pattern pattern = Pattern.compile(regexString, Pattern.MULTILINE); 
        Matcher matcher = pattern.matcher(text);


        while (matcher.find()) {
            String textInBetween = matcher.group(1); 
        }

        Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
        Matcher m = p.matcher(text);
        while (m.find()) {
            System.out.println(m.group(1));
        }

    }
}

You need to use the DOTALL flag to match any possible newline characters 您需要使用DOTALL标志来匹配任何可能的换行符

Pattern pattern = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + 
                            Pattern.quote(pattern2), Pattern.DOTALL);

You have to compile the patterns with DOTALL modifier: 您必须使用DOTALL修饰符编译模式:

Pattern pattern = Pattern.compile(regexString, Pattern.MULTILINE | Pattern.DOTALL); 
Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2), Pattern.DOTALL);

This modifier allows the period . 此修饰符允许使用句点. to match every character including new lines. 匹配每个字符,包括换行符。 Without them, dot matches every character except for new lines. 没有它们,点将匹配换行符以外的所有字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM