简体   繁体   中英

String matching with maximum number of occurrence

I have this long string here and there are like 1000 lines like this in a text file.I wish to calculate the frequency of the occurance of each date in that text file.Any idea how can i do that?

{ "interaction":{"author":{"id":"53914918","link":"http:\\/\\/twitter.com\\/53914918","name":"ITTIA","username":"s8c"},"content":"RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...","created_at":"Sun, 10 Jul 2011 08:22:16 +0100","id":"1e0aac556a44a400e07497f48f024000","link":"http:\\/\\/twitter.com\\/s8c\\/statuses\\/89957594197803008","schema":{"version":2},"source":"oauth:258901","type":"twitter","tags":["attretail"]},"language":{"confidence":100,"tag":"en"},"salience":{"content":{"sentiment":4}},"twitter":{"created_at":"Sun, 10 Jul 2011 08:22:16 +0100","id":"89957594197803008","mentions":["fubarista"],"source":"oauth:258901","text":"RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...","user":{"created_at":"Mon, 05 Jan 2009 14:01:11 +0000","geo_enabled":false,"id":53914918,"id_str":"53914918","lang":"en","location":"Mouth of the abyss","name":"ITTIA","screen_name":"s8c","time_zone":"London","url":"https:\\/\\/thepiratebay.se"}}} "interaction":{"author":{"id":"53914918","link":"http:\\/\\/twitter.com\\/53914918","name":"ITTIA","username":"s8c"},"content":"RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...","created_at":"Sun, 10 Jul 2011 08:22:16 +0100","id":"1e0aac556a44a400e07497f48f024000","link":"http:\\/\\/twitter.com\\/s8c\\/statuses\\/89957594197803008","schema":{"version":2},"source":"oauth:258901","type":"twitter","tags":["attretail"]},"language":{"confidence":100,"tag":"en"},"salience":{"content":{"sentiment":4}},"twitter":{"created_at":"Sun, 10 Jul 2011 08:22:16 +0100","id":"89957594197803008","mentions":["fubarista"],"source":"oauth:258901","text":"RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...","user":{"created_at":"Mon, 05 Jan 2009 14:01:11 +0000","geo_enabled":false,"id":53914918,"id_str":"53914918","lang":"en","location":"Mouth of the abyss","name":"ITTIA","screen_name":"s8c","time_zone":"London","url":"https:\\/\\/thepiratebay.se"}}}

使用RandomAccessFile和BufferedReader类来读取部分数据,你可以使用字符串解析来计算每个日期的频率......

each date has some stable pattern, like \\d\\d (Jan|Feb|...) 20\\d\\d so you can extract those dates using regular expressions (Pattern class in Java) then you can use HashMap to increment value of some pair where key is the found date. Sorry for no code, however i hope that helps you :)

I thing its a JSON string u should parse it instead of matching. see this example HERE

Copy the required string to test.text and place it in c drive Working code , i have used Pattern and Matcher classes

in Pattern i gave the Pattern of date you were asking , u can check the pattern here

"(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[,] \\d\\d (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \\d\\d\\d\\d"

check the code

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Test{
public static void main(String[] args) throws Exception {

    FileReader fw=new FileReader("c:\\test.txt");
    BufferedReader br=new BufferedReader(fw);
    int i;
    String s="";
    do
    {

        i=br.read();
        if(i!=-1)
        s=s+(char)i;


    }while(i!=-1);

    System.out.println(s);

    Pattern p=Pattern.compile
            (
                    "(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[,] \\d\\d (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \\d\\d\\d\\d"
                );

    Matcher m=p.matcher(s);
    int count=0;
    while(m.find())
    {
        count++;
        System.out.println("Match number "+count);
        System.out.println(s.substring(m.start(), +m.end()));


    }
    }


}

EXtremely good description here Link 1 and Link 2

Your input string is in JSON format, thus I suggest you to use a JSON parser, which makes the parsing a lot easier and more important robust ! It might take some minutes to get into JSON parsing though, but it will be worth it.

After that, parse for the "created_at" tags. Create a Map with your date as key and your counting as value and write something like:

int estimatedSize = 500; // best practice to avoid some HashMap resizing
Map<String, Integer> myMap = new HashMap<>(estimatedSize);
String[] dates = {}; // here comes your parsed data, draw it into the loop later
for (String nextDate : dates) {
    Integer oldCount = myMap.get(nextDate);
    if (oldCount == null) { // not in yet
        myMap.put(nextDate, Integer.valueOf(1));
    }
    else { // already in
        myMap.put(nextDate, Integer.valueOf(oldCount.intValue() + 1));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM