简体   繁体   中英

Most suitable data structure for unique PDF uploading

I've been given an assignment at university that consists of storing PDF documents efficiently in a PDF store and only once (no content duplication by uploading the same file multiple times).

The method being the following store(String title, File pdfFile)

Example 1:

"Fast Cars", fastcars.pdf
"Even Faster Cars", fastcars.pdf
"Not So Fast Cars", cars.pdf
"Slow Cars", slowcars.pdf

Expected Result: It should have a size of 3 containing the following fastcars.pdf, cars.pdf and slowcars.pdf

Example 2:

"Fast Cars", fastcars.pdf
"Even Faster Cars", fastcars.pdf
"Fast Cars", sportscars.pdf
"Even Faster Cars", sportscars.pdf

It should have size 1 and only containing sportscars.pdf

My idea is to content hash the pdf and possibly use a HashMap mapping the content digest hash with a random integer and later mapping that to the PDF title?

The tricky part is trying to satisfy Example 2.

What data structure would you recommend for this problem for efficiency and what approach would you take?

Thanks in advance

I took the console input ..

testcase#1 i/p:

  FastCars fastcars.pdf
  EvenFasterCars fastcars.pdf
  NotSoFastCars cars.pdf
  SlowCars slowcars.pdf

o/p:

slowcars.pdf
 fastcars.pdf
 cars.pdf

testcase#2

i/p:

 FastCars fastcars.pdf
 EvenFasterCars fastcars.pdf
 FastCars sportscars.pdf
 EvenFasterCars sportscars.pdf

o/p:

  sportscars.pdf

public static void main(String[] args) throws Exception {

        Map<String,String> map1=new HashMap<String,String>();
        Map<String,String> map2=new HashMap<String,String>();


        BufferedReader br=new BufferedReader(new InputStreamReader(System.in));

        for(int i=0;i<4;i++)
        {
            String inpt[]=br.readLine().split(" ");
            String tag=inpt[0];
            String fileName=inpt[1];
            map1.put(tag,fileName);
            map2.put(fileName, tag);
        }

        Set<String> keySet=map1.keySet();
        Iterator it=keySet.iterator();
        while(it.hasNext())
        {
            String key=(String)it.next();
            if(map2.containsKey(map1.get(key)))
            {
                System.out.println(map1.get(key));
                map2.remove(map1.get(key));
            }
        }


    }

Every conforming PDF file has a unique ID as part of it's metadata. You might want to just use that string as the file name. Most PDF library tools allow easy access to this metadata.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM