简体   繁体   English

如何从字符串中获取子字符串而不拆分?

[英]How to get substring from string without split?

String str = "internet address : http://test.com Click this!";

I want to get " http://test.com ", so I wrote like this.我想得到“ http://test.com ”,所以我是这样写的。

String[] split = str.split(" ");
for ( int i = 0 ; i < split.length ; i++ ) {
    if ( split[i].contains("http://") ) {
        return split[i];
    }
}

but I think this is ineffective.但我认为这是无效的。 how to get that more easily?如何更轻松地获得它?

Assuming you always have the same format (some text : URL more text) this can work:假设您始终具有相同的格式(一些文本:URL 更多文本),这可以工作:

public static void main(String[] args) throws IOException {
    String str = "internet address : http://test.com Click this!";
    String first = str.substring(str.indexOf("http://"));
    String second = first.substring(0, first.indexOf(" "));
    System.out.println(second);
}

But better is regex as suggested in different answer但更好的是正则表达式,如不同答案中所建议

Usually, this is either done with a regular expression or with indexOf and substring .通常,这是使用正则表达式或使用indexOfsubstring

With a regular expression, this can be done like that:使用正则表达式,可以这样做:

    // This is using a VERY simplified regular expression
    String str = "internet address : http://test.com Click this!";
    Pattern pattern = Pattern.compile("[http:|https:]+\\/\\/[\\w.]*");
    Matcher matcher = pattern.matcher(str);
    if (matcher.find()) {
        System.out.println(matcher.group(0));
    }

You can read here why it's simplified: https://mathiasbynens.be/demo/url-regex - tl;dr: the problem with URLs is they can have so many different patterns which are valid.您可以在此处阅读简化的原因: https : //mathiasbynens.be/demo/url-regex - tl;dr:URL 的问题在于它们可以有许多不同的有效模式。

With split, there would be a way utilizing the URL class of Java:有了 split,就有一种利用 Java 的 URL 类的方法:

   String[] split = str.split(" ");

    for (String value : split) {
        try {
            URL uri = new URL(value);
            System.out.println(value);
        } catch (MalformedURLException e) {
            // no valid url
        }
    }

You can check their validation in the OpenJDK source here .您可以在此处的 OpenJDK 源代码中检查它们的验证。

My try with regex我对正则表达式的尝试

String regex = "http?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*)";
String str = "internet address : http://test.com Click this!";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
    System.out.println(matcher.group(0));
}

result:结果:

http://test.com

source: here来源: 这里

Find the http:// in the string, then look forwards and backwards for the space:在字符串中找到http:// ,然后向前和向后查找空格:

int pos = str.indexOf("http://");
if (pos >= 0) {
  // Look backwards for space.
  int start = Math.max(0, str.lastIndexOf(' ', pos));

  // Look forwards for space.
  int end = str.indexOf(' ', pos + "http://".length());
  if (end < 0) end = str.length();

  return str.substring(start, end);
}

It is not clear if the structure of the input string is constant, however, I would do something like this:不清楚输入字符串的结构是否是常量,但是,我会这样做:

    String str = "internet address : http://test.com Click this!";
    // get the index of the first letter of an url
    int urlStart = str.indexOf("http://");
    System.out.println(urlStart);
    // get the first space after the url
    int urlEnd = str.substring(urlStart).indexOf(" ");
    System.out.println(urlEnd);
    // get the substring of the url
    String urlString = str.substring(urlStart, urlStart + urlEnd);
    System.out.println(urlString);

I just made a quick solution for the same.我只是为此做了一个快速解决方案。 It should work for you perfectly.它应该非常适合你。

package Main.Kunal;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class URLOutOfString {

    public static void main(String[] args) {
        String str = "internet address : http://test.com Click this!, internet address : http://tes1t.com Click this!";
        List<String> result= new ArrayList<>();
        int counter = 0;
        final Pattern urlPattern = Pattern.compile(
                "(?:^|[\\W])((ht|f)tp(s?):\\/\\/|www\\.)"
                        + "(([\\w\\-]+\\.){1,}?([\\w\\-.~]+\\/?)*"
                        + "[\\p{Alnum}.,%_=?&#\\-+()\\[\\]\\*$~@!:/{};']*)",
                Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

        Matcher matcher = urlPattern.matcher(str);

        while (matcher.find()) {
            result.add(str.substring(matcher.start(1), matcher.end()));
            counter++;
        }

        System.out.println(result);

    }

}

This will find all URLs in your string and add it to arraylist.这将找到字符串中的所有 URL 并将其添加到 arraylist。 You can use it as per your business need.您可以根据业务需要使用它。

You could use regex for it你可以使用正则表达式

String str = "internet address : http://test.com Click this!";
Pattern pattern = Pattern.compile("((http|https)\\S*)");
Matcher matcher = pattern.matcher(str);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM