简体   繁体   中英

How to search large amount of data based on tags?

I'm planing to create an application to sort and view photos and images I have.

I want to give the program a list of folders (with subfolders) to handle and tag images with multiple, custom tags as I go through them. If I then enter one, or multiple, tags in a search bar I want all images with that tag to appear in a panel.

The go to approach would be SQL, but I don't want to have a SQL server running in the background. I want the program to be fully portable, so just the exe and maybe a small amount of files it creates.

I thought I would create a tree where every node is a folder and the leafs are the images. I would then add the tags of the leafs to the parent-node and cascade that upwards, so that the root node has a list of all the tags. This should allow for a fast search and with parallelisation for a fast building of the tree.

But before I start to work on such a tree I wondered if there is already something like this, or if there is a better approach?

Just to make it clear, I'm talking about multiple tags here, so a Dictionary won't work.

Tags by definition are unique and so cry out to be indexed and sorted.

A Dictionary<Tag,ImageCollection> . Why not? Seems ideal for tags.

A Dictionary<Image, TagCollection> . The reverse reference of the above. You don't want to try going through dictionary values to get at keys.

Create custom classes. Tag , Image , TagCollection , ImageCollection ; then override Equals , GetHashCode , implement IComparable . This will optimize the built-in .net indexing, sorting, searching. Many collection "Find" methods take delegates for customized searching. Be sure to read MSDN documentation.

I think this could constitute the core structure. For any given query, staring with initial fetches from these structures should be pretty quick. And yielding custom collections will help too.

There is nothing wrong with a mix of LINQ and "traditional" coding. I expect that in any case you're better off with indexed/sorted tags.

Here's how I'd handle it.

First, use SQLite. It's a single-dll distribution, lightweight, superfast and impressively capable database whose sole purpose is to be used by these types of applications. A database is a far better approach than trying to persist trees to files (the issue with a custom persistence isn't that the idea in itself is bad, but rather than there's a dozen edge cases it'll need to handle that you're not likely to have thought of where a database has them automatically covered).

Second, set up some POCOs for your media and your tags. Something like this:

abstract class Media
{
    public string Filename {get;set;}

    public virtual ICollection<Tag> Tags {get;set;}
}

public class Image : Media
{
    public ImageFormat Format {get;set;}
    public int ResX {get;set;}
    public int ResY {get;set;}  // or whatever
}

public class Video : Media 
{
    public VideoFormat Format {get;set;}
    public int Bitrate {get;set;}
}



public class Tag
{
    public string Name {get;set;}

    public virtual ICollection<Media> Media {get;set;}
}

This forms the basis for all of your MVVM stuff (you're using MVVM with WPF, right?)

Use Entity Framework for your data access (persistence and querying).

With that, you can do something like this to query your items:

public IEnumerable<Media> SearchByTags(List<Tag> tags) {

    var q = from m in _context.Media
            join mt in _context.MediaTags on m.ID = mt.ID
            join t in tags on mt.Name = tag.Name
            select m;

    return q;
}

That will covert to a relatively optimized database query and get you a list of applicable media based on your tags that you want to search by. Feed this list back to your presentation (MVVM) layer and build your tree from the results.

(this assumes that you have a table of Media, a table of Tags, and a junction/bridge table of MediaTags - I've left many details out and this is very much aircode, but as a general concept, I think it works just fine).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM