简体   繁体   English

Java String - 查看字符串是否仅包含数字和字符而不包含单词?

[英]Java String - See if a string contains only numbers and characters not words?

I have an array of string that I load throughout my application, and it contains different words. 我有一个字符串数组,我在我的应用程序中加载,它包含不同的单词。 I have a simple if statement to see if it contains letters or numbers but not words . 我有一个简单的if语句,看它是否包含字母或数字但不包含单词。

I mean i only want those words which is like AB2CD5X .. and i want to remove all other words like Hello 3 , 3 word , any other words which is a word in English. 我的意思是我只想要那些像AB2CD5X这样的AB2CD5X ...我想删除所有其他单词,如Hello 3 3 wordany other单词,这是英语单词。 Is it possible to filter only alphaNumeric words except those words which contain real grammar word. 除了那些包含真实语法单词的单词之外,是否可以只过滤alphaNumeric单词。

i know how to check whether string contains alphanumeric words 我知道如何检查字符串是否包含字母数字

Pattern p = Pattern.compile("[\\p{Alnum},.']*");

also know 也知道

 if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])

What you need is a dictionary of English words. 你需要的是英语单词词典。 Then you basically scan your input and check if each token exists in your dictionary. 然后你基本上扫描输入并检查字典中是否存在每个标记。 You can find text files of dictionary entries online, such as in Jazzy spellchecker . 您可以在线查找字典条目的文本文件,例如Jazzy拼写检查器 You might also check Dictionary text file . 您也可以检查词典文本文件

Here is a sample code that assumes your dictionary is a simple text file in UTF-8 encoding with exactly one (lower case) word per line: 下面是一个示例代码,假设您的字典是UTF-8编码的简单文本文件,每行只有一个(小写)字:

public static void main(String[] args) throws IOException {
    final Set<String> dictionary = loadDictionary();
    final String text = loadInput();
    final List<String> output = new ArrayList<>();
    // by default splits on whitespace
    final Scanner scanner = new Scanner(text);
    while(scanner.hasNext()) {
        final String token = scanner.next().toLowerCase();
        if (!dictionary.contains(token)) output.add(token);
    }
    System.out.println(output);

}

private static String loadInput() {
    return "This is a 5gse5qs sample f5qzd fbswx test";
}

private static Set<String> loadDictionary() throws IOException {
    final File dicFile = new File("path_to_your_flat_dic_file");
    final Set<String> dictionaryWords = new HashSet<>();
    String line;
    final LineNumberReader reader = new LineNumberReader(new BufferedReader(new InputStreamReader(new FileInputStream(dicFile), "UTF-8")));
    try {
        while ((line = reader.readLine()) != null) dictionaryWords.add(line);
        return dictionaryWords;
    }
    finally {
        reader.close();
    }
}

If you need more accurate results, you need to extract stems of your words . 如果您需要更准确的结果,则需要提取单词的词干 See Apache's Lucene and EnglishStemmer 请参阅Apache的LuceneEnglishStemmer

You can use Cambridge Dictionaries to verify human words. 您可以使用Cambridge Dictionaries来验证人类的单词。 In this case, if you find a "human valid" word you can skip it. 在这种情况下,如果您找到“人类有效”字词,则可以跳过它。

As the documentation says, to use the library, you need to initialize a request handler and an API object: 正如文档所说,要使用库,您需要初始化请求处理程序和API对象:

DefaultHttpClient httpClient = new DefaultHttpClient(new ThreadSafeClientConnManager());
SkPublishAPI api = new SkPublishAPI(baseUrl + "/api/v1", accessKey, httpClient);
api.setRequestHandler(new SkPublishAPI.RequestHandler() {
    public void prepareGetRequest(HttpGet request) {
        System.out.println(request.getURI());
        request.setHeader("Accept", "application/json");
    }
});

To use the "api" object: 要使用“api”对象:

      try {
          System.out.println("*** Dictionaries");
          JSONArray dictionaries = new JSONArray(api.getDictionaries());
          System.out.println(dictionaries);

          JSONObject dict = dictionaries.getJSONObject(0);
          System.out.println(dict);
          String dictCode = dict.getString("dictionaryCode");

          System.out.println("*** Search");
          System.out.println("*** Result list");
          JSONObject results = new JSONObject(api.search(dictCode, "ca", 1, 1));
          System.out.println(results);
          System.out.println("*** Spell checking");
          JSONObject spellResults = new JSONObject(api.didYouMean(dictCode, "dorg", 3));
          System.out.println(spellResults);
          System.out.println("*** Best matching");
          JSONObject bestMatch = new JSONObject(api.searchFirst(dictCode, "ca", "html"));
          System.out.println(bestMatch);

          System.out.println("*** Nearby Entries");
          JSONObject nearbyEntries = new JSONObject(api.getNearbyEntries(dictCode,
                  bestMatch.getString("entryId"), 3));
          System.out.println(nearbyEntries);
      } catch (Exception e) {
          e.printStackTrace();
      }

Antlr might help you. Antlr可能会帮助你。 Antlr stands for ANother Tool for Language Recognition Antlr代表ANother语言识别工具

Hibernate uses ANTLR to parse its query language HQL(like SELECT,FROM). Hibernate使用ANTLR来解析其查询语言HQL(如SELECT,FROM)。

if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])

I think this is a good starting point, but since you're looking for strings that contain both letters and numbers you might want: 我认为这是一个很好的起点,但是因为你正在寻找包含字母和数字的字符串,你可能需要:

if(string.contains("[a-zA-Z]+") && string.contains([0-9]+])

I guess you might also want to check if there are spaces? 我想你可能还想检查是否有空格? Right? 对? Because you that could indicate that there are separate words or some sequence like 3 word . 因为你可以表明有单独的单词或某些序列,如3 word So maybe in the end you could use: 所以也许最后你可以使用:

if(string.contains("[a-zA-Z]+") && string.contains([0-9]+] && !string.contains(" "))

Hope this helps 希望这可以帮助

You may try this, 你可以试试这个,

First tokenize the string using StringTokenizer with default delimiter, for each token if it contains only digits or only characters, discard it, remaining will be the words which contains combination of both digits and characters. 首先使用带有默认分隔符的StringTokenizer对字符串进行标记,如果每个标记仅包含数字或仅包含字符,则丢弃它,剩余的将是包含数字和字符组合的单词。 For identifying only digits only characters you can have regular expressions used. 仅用于识别数字,只能使用正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM