简体   繁体   English

如何基于标签搜索大量数据?

[英]How to search large amount of data based on tags?

I'm planing to create an application to sort and view photos and images I have. 我打算创建一个应用程序来排序和查看我拥有的照片和图像。

I want to give the program a list of folders (with subfolders) to handle and tag images with multiple, custom tags as I go through them. 我想给程序一个文件夹列表(带有子文件夹),以便在处理它们时用多个自定义标签处理和标记图像。 If I then enter one, or multiple, tags in a search bar I want all images with that tag to appear in a panel. 如果我随后在搜索栏中输入一个或多个标签,我希望所有带有该标签的图像都显示在面板中。

The go to approach would be SQL, but I don't want to have a SQL server running in the background. 可行的方法是使用SQL,但我不想在后台运行SQL Server。 I want the program to be fully portable, so just the exe and maybe a small amount of files it creates. 我希望该程序具有完全可移植性,因此只需要exe以及它创建的少量文件即可。

I thought I would create a tree where every node is a folder and the leafs are the images. 我以为我会创建一个树,其中每个节点都是一个文件夹,叶子是图像。 I would then add the tags of the leafs to the parent-node and cascade that upwards, so that the root node has a list of all the tags. 然后,我将叶子的标签添加到父节点并向上级联,以便根节点具有所有标签的列表。 This should allow for a fast search and with parallelisation for a fast building of the tree. 这应该允许快速搜索并并行化以快速构建树。

But before I start to work on such a tree I wondered if there is already something like this, or if there is a better approach? 但是在开始研究这种树之前,我想知道是否已经有类似的东西,或者是否有更好的方法?

Just to make it clear, I'm talking about multiple tags here, so a Dictionary won't work. 为了清楚起见,我在这里谈论的是多个标签,因此词典将无法正常工作。

Tags by definition are unique and so cry out to be indexed and sorted. 根据定义,标签是唯一的,因此需要索引和排序。

A Dictionary<Tag,ImageCollection> . 一个Dictionary<Tag,ImageCollection> Why not? 为什么不? Seems ideal for tags. 似乎是标签的理想选择。

A Dictionary<Image, TagCollection> . Dictionary<Image, TagCollection> The reverse reference of the above. 上面的反向引用。 You don't want to try going through dictionary values to get at keys. 您不想尝试遍历字典值来获取键。

Create custom classes. 创建自定义类。 Tag , Image , TagCollection , ImageCollection ; TagImageTagCollectionImageCollection ; then override Equals , GetHashCode , implement IComparable . 然后覆盖EqualsGetHashCode ,实现IComparable This will optimize the built-in .net indexing, sorting, searching. 这将优化内置的.net索引,排序,搜索。 Many collection "Find" methods take delegates for customized searching. 许多集合“查找”方法都会委托代表进行自定义搜索。 Be sure to read MSDN documentation. 请务必阅读MSDN文档。

I think this could constitute the core structure. 认为这可能构成核心结构。 For any given query, staring with initial fetches from these structures should be pretty quick. 对于任何给定的查询,盯着这些结构的初始提取都应该很快。 And yielding custom collections will help too. 产生自定义集合也将有所帮助。

There is nothing wrong with a mix of LINQ and "traditional" coding. LINQ和“传统”编码的混合并没有错。 I expect that in any case you're better off with indexed/sorted tags. 我希望无论如何都最好使用索引/排序标签。

Here's how I'd handle it. 这是我的处理方式。

First, use SQLite. 首先,使用SQLite。 It's a single-dll distribution, lightweight, superfast and impressively capable database whose sole purpose is to be used by these types of applications. 它是一个单一dll分发,轻量级,超快速且功能强大的数据库,其唯一目的是供这些类型的应用程序使用。 A database is a far better approach than trying to persist trees to files (the issue with a custom persistence isn't that the idea in itself is bad, but rather than there's a dozen edge cases it'll need to handle that you're not likely to have thought of where a database has them automatically covered). 与尝试将树持久保存到文件相比,数据库是一种更好的方法(自定义持久性的问题并不在于它本身的想法不好,而是要解决一堆边缘情况,而不是一dozen而就的问题)不太可能想到数据库会自动覆盖它们的位置)。

Second, set up some POCOs for your media and your tags. 其次,为您的媒体和标签设置一些POCO。 Something like this: 像这样:

abstract class Media
{
    public string Filename {get;set;}

    public virtual ICollection<Tag> Tags {get;set;}
}

public class Image : Media
{
    public ImageFormat Format {get;set;}
    public int ResX {get;set;}
    public int ResY {get;set;}  // or whatever
}

public class Video : Media 
{
    public VideoFormat Format {get;set;}
    public int Bitrate {get;set;}
}



public class Tag
{
    public string Name {get;set;}

    public virtual ICollection<Media> Media {get;set;}
}

This forms the basis for all of your MVVM stuff (you're using MVVM with WPF, right?) 这构成了您所有MVVM的基础(您正在将MVVM与WPF一起使用,对吗?)

Use Entity Framework for your data access (persistence and querying). 使用实体框架进行数据访问(持久性和查询)。

With that, you can do something like this to query your items: 这样,您可以执行以下操作来查询商品:

public IEnumerable<Media> SearchByTags(List<Tag> tags) {

    var q = from m in _context.Media
            join mt in _context.MediaTags on m.ID = mt.ID
            join t in tags on mt.Name = tag.Name
            select m;

    return q;
}

That will covert to a relatively optimized database query and get you a list of applicable media based on your tags that you want to search by. 这将涉及相对优化的数据库查询,并根据您要搜索的标签为您提供适用介质的列表。 Feed this list back to your presentation (MVVM) layer and build your tree from the results. 将此列表反馈回您的表示(MVVM)层,并根据结果构建树。

(this assumes that you have a table of Media, a table of Tags, and a junction/bridge table of MediaTags - I've left many details out and this is very much aircode, but as a general concept, I think it works just fine). (这假设您有一个Media表,一个Tag表和一个MediaTags交界/桥表-我省略了很多细节,这是很多空代码,但是作为一个一般概念,我认为它可以正常工作精细)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM