简体   繁体   中英

Comparing elements from two lists

I got 2 lists of titles of books from two different bookstores. These titles can be the same, but they are written differently eg "For example" - "For - example", as you can see they are equal, but not at all.

That's why I wrote stream that will purify elements (it will delete blank spaces and special letters) from list and make them equal, so after stream both will look like "forexmaple" so they are now equal.

private List<String> purifyListOfTitles(List<Book> listToPurify) {
        return listToPurify
                .stream()
                .map(Book::getTitle)
                .map(title -> title.replaceAll("[^A-Za-z]+", ""))
                .collect(Collectors.toList());
    }

The problem is... I want to get ONE map that will consist original title and number of occurrences of book (maximum 2 occurrences, default 1). I've written algorithm that compares two titles and add title from first bookstore to map, but I have to add from second, but don't know how can I get this title.

To make it clear...

I'm comparing title from first bookstore with each title from second bookstore, if it is equal, then I'm adding +1, if for loop ends, I'm adding this iterated title from first bookstore with number of occurrences. But what with titles from second bookstore that has only one occurrence? I know index of iterated title from first bookstore so I can get this title from original list (with unpurified titles) by using .get(i) method, but I do not know the index of iterated title from second bookstore to get original title.

The only solution I see is, first compare tite with each title from second and then compare title with each title from first bookstore, but it is not optimal solution... or somehow unpurify list.

To sum up, I have only in map titles from first bookstore, how can I add titles from second bookstore that were omitted. I want to have originals titles in map (eg purified is houseisbig, but the original is House - is big)! I'm comparing with purified list and add original titles.

The class:

package bookstore.scraper.rankingsystem;

import bookstore.scraper.Bookstore;
import bookstore.scraper.book.Book;
import bookstore.scraper.book.scrapingtypeservice.CategorizedBookService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

import static java.util.stream.Collectors.toMap;

@Slf4j
@Component
public class CategorizedBooksRankingService {

    private final CategorizedBookService categorizedBookService;

    @Autowired
    public CategorizedBooksRankingService(CategorizedBookService categorizedBookService) {
        this.categorizedBookService = categorizedBookService;
    }

    public Map<String, Integer> getRankingForCategory(String category) {
        Map<Bookstore, List<Book>> bookstoreWith15CategorizedBooks = chooseGetterImplementationByCategory(category);

        List<Book> merlinBooks = bookstoreWith15CategorizedBooks.get(Bookstore.MERLIN);
        List<Book> empikBooks = bookstoreWith15CategorizedBooks.get(Bookstore.EMPIK);

        List<String> purifiedMerlinBookTitles = purifyListOfTitles(merlinBooks);
        List<String> purifiedEmpikBookTitles = purifyListOfTitles(empikBooks);

        Map<String, Integer> bookTitleWithOccurrencesNumber =
                prepareTitleAndOccurrencesMap(merlinBooks, empikBooks, purifiedMerlinBookTitles, purifiedEmpikBookTitles);

        return getSortedLinkedHashMappedByValue(bookTitleWithOccurrencesNumber);
    }

    private Map<String, Integer> prepareTitleAndOccurrencesMap(List<Book> merlinBooks, List<Book> empikBooks, List<String> purifiedMerlinBookTitles, List<String> purifiedEmpikBookTitles) {
        Map<String, Integer> bookTitleWithOccurrencesNumber = new LinkedHashMap<>();

        int occurrencesOfIteratedBook;
        String iteratedMerlinTitle;

        for (int i = 0; i < purifiedMerlinBookTitles.size(); i++) {
            occurrencesOfIteratedBook = 1;
            iteratedMerlinTitle = purifiedMerlinBookTitles.get(i);
            for (String iteratedEmpikTitle : purifiedEmpikBookTitles) {

                if (iteratedMerlinTitle.equals(iteratedEmpikTitle))
                    occurrencesOfIteratedBook++;
            }
            bookTitleWithOccurrencesNumber.put(merlinBooks.get(i).getTitle(), occurrencesOfIteratedBook);
            //how to add to bookTitleWithOccurrencesNumber map book titles from second bookstore that are not equal to any of title
        }
        return bookTitleWithOccurrencesNumber;
    }

    private List<String> purifyListOfTitles(List<Book> listToPurify) {
        return listToPurify
                .stream()
                .map(Book::getTitle)
                .map(title -> title.replaceAll("[^A-Za-z]+", ""))
                .collect(Collectors.toList());
    }

    private Map<String, Integer> getSortedLinkedHashMappedByValue(Map<String, Integer> mapToSort) {
        return mapToSort.entrySet()
                .stream()
                .sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
                .collect(
                        toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) -> e2,
                                LinkedHashMap::new));
    }

    private Map<Bookstore, List<Book>> chooseGetterImplementationByCategory(String category) {
        if (category.equals("crimes"))
            return categorizedBookService.get15BooksFromCrimeCategory();
        if (category.equals("romances"))
            return categorizedBookService.get15BooksFromRomanceCategory();
        if (category.equals("fantasies"))
            return categorizedBookService.get15BooksFromFantasyCategory();
        if (category.equals("guides"))
            return categorizedBookService.get15BooksFromGuidesCategory();
        if (category.equals("biographies"))
            return categorizedBookService.get15BooksFromBiographiesCategory();
        else {
            log.error(category + " is invalid category");
            throw new IllegalArgumentException();
        }
    }
}

Example:

Book a = new Book.BookBuilder().withTitle("To - jest haha").build();
        Book b = new Book.BookBuilder().withTitle("Bubu").build();
        Book c = new Book.BookBuilder().withTitle("Kiki").build();
        Book d = new Book.BookBuilder().withTitle("sza . la").build();

        Book e = new Book.BookBuilder().withTitle("Tojest haha").build();
        Book f = new Book.BookBuilder().withTitle("bam").build();
        Book g = new Book.BookBuilder().withTitle("zzz").build();
        Book h = new Book.BookBuilder().withTitle("szaLa").build();


        List<Book> list1 = new ArrayList<>();
        list1.add(a);
        list1.add(b);
        list1.add(c);
        list1.add(d);

        List<Book> list2 = new ArrayList<>();
        list2.add(e);
        list2.add(f);
        list2.add(g);
        list2.add(h);

        Map<String,Long> z = countBooksByTitle(list1,list2);

z map contains: {sza . la =2, Bubu=1, zzz=1, Kiki=1, bam=1, To - jest haha =2} {sza . la =2, Bubu=1, zzz=1, Kiki=1, bam=1, To - jest haha =2}

I got 2 lists
...
I want to get ONE map that will consist title and number of occurrences of book

You can do that is a single stream chain:

private Map<String, Long> countBooksByTitle(List<Book> list1, List<Book> list2) {
    return Stream.concat(list1.stream(), list2.stream())
            .map(book -> book.getTitle().replaceAll("[^A-Za-z]+", ""))
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

Note that the count could theoretically be higher than 2, if a list has two or more different books where the title maps to the same compact text. Eg since you only keep letters, Streams for dummies 1 and Streams for dummies 2 would count as 2 books titled Streamsfordummies .


UPDATE

To retain the original title, create a helper class that compares by purified title, but retains the original title, then first build map using the class, then unwrap it to the original title.

In the code below, the purification has been modified to retain digits too, and to eliminate accents while retaining the letter, eg -> be , whereas question code would eliminate the letter, -> b . That way and won't compare equal.

Since the counting code is mapping the key/value pair anyway, the value is mapped from Long to Integer too, just to show it can be done. The resulting map has also been modified to be sorted by title.

Helper Class

public final class PurifiedTitle implements Comparable<PurifiedTitle> {
    private final String original;
    private final String purified;
    public PurifiedTitle(String title) {
        this.original = title;
        // Purified string has only lowercase letters and digits,
        // with no accents on the letters
        this.purified = Normalizer.normalize(title, Normalizer.Form.NFD)
                .replaceAll("\\P{Alnum}+", "")
                .toLowerCase(Locale.US);
    }
    @Override
    public String toString() {
        return this.original;
    }
    @Override
    public int compareTo(PurifiedTitle that) {
        return this.purified.compareTo(that.purified);
    }
    @Override
    public boolean equals(Object obj) {
        if (! (obj instanceof PurifiedTitle))
            return false;
        PurifiedTitle that = (PurifiedTitle) obj;
        return this.purified.equals(that.purified);
    }
    @Override
    public int hashCode() {
        return this.purified.hashCode();
    }
}

Updated Counting Method

private static Map<String, Integer> countBooksByTitle(List<Book> list1, List<Book> list2) {
    Collator collator = Collator.getInstance(Locale.US);
    collator.setStrength(Collator.PRIMARY);
    return Stream.concat(list1.stream(), list2.stream())
            .collect(Collectors.groupingBy(book -> new PurifiedTitle(book.getTitle()),
                                           Collectors.counting()))
            .entrySet().stream()
            .collect(Collectors.toMap(e -> e.getKey().toString(),
                                      e -> e.getValue().intValue(),
                                      Integer::sum,
                                      () -> new TreeMap<>(collator)));
}

Test

List<Book> list1 = Arrays.asList(
        new Book("To - jest haha"),
        new Book("Bubû"),
        new Book("Kiki"),
        new Book("bam 2"),
        new Book("sza . lä"));
List<Book> list2 = Arrays.asList(
        new Book("Tojest haha"),
        new Book("bam 1"),
        new Book("zzz"),
        new Book("száLa"));
System.out.println(countBooksByTitle(list1, list2));

Output

{bam 1=1, bam 2=1, Bubû=1, Kiki=1, sza . lä=2, To - jest haha=2, zzz=1}

Possible solution with minimal impact on your algorithm : you can remove titles from the second list as soon as they match title from the 1st list.

By doing this the second list will contain only unmatched book after the for loop. Then you can add all of them to the map with occurence = 1.

You should use iterator to be able to browse list and remove item.

    for (int i = 0; i < purifiedMerlinBookTitles.size(); i++) {
        occurrencesOfIteratedBook = 1;
        iteratedMerlinTitle = purifiedMerlinBookTitles.get(i);
        Iterator<String> it = purifiedEmpikBookTitles.iterator();
        while (it.hasNext()) {
            String iteratedEmpikTitle = it.next();
            if (iteratedMerlinTitle.equals(iteratedEmpikTitle)) {
                occurrencesOfIteratedBook++;
                it.remove();
            }
        }
        bookTitleWithOccurrencesNumber.put(merlinBooks.get(i).getTitle(), occurrencesOfIteratedBook);
    }
    // At this time purifiedEmpikBookTitles contains only unmatched titles
    purifiedEmpikBookTitles.forEach(title -> bookTitleWithOccurrencesNumber.put(title, 1));
    return bookTitleWithOccurrencesNumber;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM