简体   繁体   English

如何在JAVA中找到区分大小写的字符串并忽略它

[英]How to find a string which is case-sensitive and ignore it in JAVA

I have a text file (T1.txt) in which it has few strings.out of them 2 are similar but case-sensitive. 我有一个文本文件(T1.txt),其中有几个字符串。其中2个相似,但区分大小写。 I have to ignore the other one and get the rest of them.. 我必须忽略另一个,并得到其余的..

eg. 例如。 ABCD, XYZ, pqrs, aBCd. ABCD,XYZ,pqrs,abcd。

i am using Set to return the strings.. but how I can ignore the duplicate and return only one string( either of ABCD , aBCd). 我正在使用Set返回字符串..但是我如何忽略重复项,只返回一个字符串(ABCD,aBCd之一)。

public static Set findDuplicates(File inputFile)
{
 FileInputStream fis = null;
    BufferedInputStream bis = null;
    DataInputStream dis = null;
    Set<String> set = new HashSet<String>();
    ArrayList<String> inpArrayList = new ArrayList<String>();

    try{

        fis = new FileInputStream(inputFile);

        bis = new BufferedInputStream(fis);
        dis = new DataInputStream(bis);

        while (dis.available() != 0) 
        {
           inpArrayList.add(dis.readLine());
        }

         for(int i=0; i < inpArrayList.size(); i++)
         {
             if(!set.contains(inpArrayList.get(i)))
                set.add(inpArrayList.get(i));
        }

    }
    catch (FileNotFoundException e) {
  e.printStackTrace();
} catch (IOException e) {
  e.printStackTrace();
}
System.out.println(" set" +  set);
return set;        
}

The returning set shall contain only XYZ, pqrs, aBCd or ABCD. 返回集应仅包含XYZ,pqrs,aBCd或ABCD。 but not both. 但不是两者兼而有之。

Thanks Ramm 谢谢拉姆

Create a hash-map, use currentString.toLowerCase() as key, and original string as value. 创建一个哈希映射,使用currentString.toLowerCase()作为键,并使用原始字符串作为值。 So that two string with different case will have the same key. 这样,两个大小写不同的字符串将具有相同的键。 When storing it, you use the original string as value, so when printing you won't get all lower-case but one of the original. 在存储它时,您使用原始字符串作为值,因此在打印时,您不会得到全部小写字母,而是得到其中的一个。

You could use a TreeSet and the String.CASE_INSENSITIVE_ORDER comparator, which I find more elegant than the suggested HashMap solutions: 您可以使用TreeSetString.CASE_INSENSITIVE_ORDER比较器,我发现它比建议的HashMap解决方案更优雅:

Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
set.add("abc");
set.add("AbC");
set.add("aBc");
set.add("DEF");
System.out.println(set); // => "[abc, DEF]"

Note that iteration through this set would give you the keys in lexicographical order. 请注意,通过此集合进行迭代将按字典顺序为您提供密钥。 If you want to preserve the insertion order as well, I'd maintain a List on the side like this: 如果您还想保留插入顺序,则可以像这样在侧面维护一个List:

Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
List<String> inOrder = new ArrayList<String>();
// when adding stuff inside your loop:
if (set.add(someString)) { // returns true if it was added to the set
    inOrder.add(someString);
}
inpArrayList.add(dis.readLine().toLowerCase());

添加此行应该工作...

You can use the old trick of calling .toLower() before putting it in the set. 您可以在调用.toLower()之前使用它的老技巧。

And if you want to keep the original case change to a hashmap from the lower case to the natural case then iterate the values. 并且,如果您希望将原始大小写更改为从小写到自然大小写的哈希表,则可以迭代这些值。

Convert every string to lowercase before inserting it into the set, and then the set will take care of the uniqueness for you. 将每个字符串转换为小写,然后再将其插入到集合中,然后集合将为您处理唯一性。

(If you also need to preserve the case of the input (returning abcd for AbCd is not acceptable), then you need a second set that stores lower-case variants and use checks on the second set to decide whether or not to add strings to the result set. Same principle, but one more step to program.) (如果还需要保留输入的大小写(不接受AbCd返回abcd),则需要第二个存储小写变体的集合,并使用对第二个集合的检查来决定是否向其中添加字符串结果集。原理相同,但要编程的又一步。)

Just store your strings in upcase in your set, before storing them in your ArrayList result. 只需将字符串以大写形式存储在您的集合中,然后再将它们存储在ArrayList结果中即可。

If you can't add a string to the set (because it already exists), don't store it in the ArrayList. 如果您不能将字符串添加到集合中(因为它已经存在),请不要将其存储在ArrayList中。

Just as said above, I did something similar earlier this week. 就像上面说的,本周初我做了类似的事情。 You can do something like (just adjust it to your code): 您可以执行以下操作(只需将其调整为您的代码即可):

HashMap<String, String> set = new HashMap<String, String>();

while(tokenzier.hasMoreTokens())
{
    String element = tokenzier.nextToken();
    String lowerCaseElement = element.toLowerCase();
    if (!set.containsKey(element)
    {
       set.put(lowerCaseElement, element);
    }
}

At the end the map 'set' will contain what you need. 最后,地图“集合”将包含您需要的内容。

How about using HashMap (HashMap), with key being generated by a your hash function. 如何使用HashMap(HashMap),并使用您的哈希函数生成密钥。 The hash function would return the string in lowercase. 哈希函数将以小写形式返回字符串。

Shash 词shash

If the case of the output is not important you could use a custom FilterInputStream to do the conversion. 如果输出的大小写不重要,则可以使用自定义FilterInputStream进行转换。

    bis = new BufferedInputStream(fis);
    fltis = new LowerCaseInputStream(bis);
    dis = new DataInputStream(fltis);

An example of LowerCaseInputStream comes from here . LowerCaseInputStream的示例来自此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM