简体   繁体   English

解析这50万行的最有效方法是什么?

[英]What's the most efficient way to parse these 500k rows?

I currently have a database full of ACL entries which looks like so: 我目前有一个充满ACL条目的数据库,如下所示:

ACL数据库

I need to go through and parse out the difference between a root node (like \\\\chrlcltsvr02\\AYY_LMO\\ClientServices) and it's child nodes (ex. \\\\chrlcltsvr02\\AYY_LMO\\ClientServices\\Client1). 我需要仔细分析一下根节点(例如\\\\ chrlcltsvr02 \\ AYY_LMO \\ ClientServices)与其子节点(例如\\\\ chrlcltsvr02 \\ AYY_LMO \\ ClientServices \\ Client1)之间的区别。

I've attempted doing this in C# code by using an ORM and raw T-SQL like so (I do in fact know that opening a session per row is a horrible idea): 我曾尝试使用ORM和原始T-SQL在C#代码中执行此操作(实际上,我确实知道每行打开一个会话是一个可怕的想法):

foreach (string path in distinctPaths)
{
    using (session = sessionFactory.OpenSession())
    {
        string query;

        query = String.Format("SELECT DISTINCT UserGroup, AllowDeny, Permissions FROM FilerACLs WHERE FullPath LIKE '{0}'", path.Replace("'", "''"));

        var parentACLs = session.CreateSQLQuery(query).SetResultTransformer(Transformers.AliasToBean<ShareACLEntry>()).List<ShareACLEntry>();

        query = String.Format("SELECT DISTINCT UserGroup, AllowDeny, Permissions FROM FilerACLs WHERE FullPath LIKE '{0}\\%'", path.Replace("'", "''"));

        var childACLs = session.CreateSQLQuery(query).SetResultTransformer(Transformers.AliasToBean<ShareACLEntry>()).List<ShareACLEntry>();

        if (childACLs.Except(parentACLs, comparer).ToList().Count > 0)
            Console.WriteLine("{0} has diffs!", path);
    }
}

And finally comparing the resulting data to see whether the child nodes differ from the root node. 最后,比较结果数据以查看子节点与根节点是否不同。

By differ, I mean if I have an ACL for group "CLT-AD\\Full Access Shared-CHRL" with allowed full control on the parent node and not on the child node, I'd like to note that the ACL exists on the child but not the parent. 通过不同的方式,我的意思是,如果我具有“ CLT-AD \\完全访问共享的CHRL”组的ACL,并且在父节点而不是子节点上具有完全控制的权限,我想指出ACL存在于孩子,但不是父母。

Unfortuantely, this process is far too slow to parse the 500k rows in any decent amount of time. 不幸的是,此过程太慢了,无法在任何适当的时间内解析500k行。

I'd like to know if anyone has an idea for efficiently determining if there is differences in the data--be it using T-SQL directly, a SQL CLR function, or a better algorithm in general. 我想知道是否有人想有效地确定数据中是否存在差异-是直接使用T-SQL,SQL CLR函数还是通常使用更好的算法。

Please let me know if clarification is required. 请让我知道是否需要澄清。

Thanks! 谢谢!

EDIT 编辑

Since I've gotten a fair amount of hate on this question let me re-clarify exactly what I'm looking for minus the failed approaches I've outlined above. 由于我在这个问题上颇有仇恨,因此让我重新澄清我要寻找的内容,再减去上面概述的失败方法。

I recently performed a scan against ~1,000 shared folders on a Windows server. 我最近对Windows服务器上的〜1,000个共享文件夹执行了扫描。 This scan recursed from the top level directory all the way down the hierarchy of folders, and for each folder recorded a row for each ACL. 此扫描从顶层目录一直到文件夹的层次结构递归,对于每个文件夹,每个ACL记录一行。

The database therefore looks like the screenshot above. 因此,数据库看起来像上面的屏幕截图。

What I need to do is pull a report from this database which details the difference (or even whether there is any) between the ACLs recorded from a top level directory and the ACLs recorded for any directory under this top level directory. 我需要做的是从此数据库中提取一份报告,该报告详细说明了从顶级目录记录的ACL与为该顶级目录下的任何目录记录的ACL之间的区别(甚至是否存在)。

Hopefully that makes more sense. 希望这更有意义。

Here is some TSQL, 这是一些TSQL,

DECLARE @parentFullPath NVARCHAR(260) = N'\\chrlcltsvr02\AYY_LMO\ClientServices';
DECLARE @childFullPath NVARCHAR(260) = N'\\chrlcltsvr02\AYY_LMO\ClientServices\Client1';

SELECT
            [UserGroup],
            [AllowDeny],
            [Permissions]
    FROM
            [ACLs]
    WHERE
            [FullPath] = @childFullPath
EXCEPT
SELECT
            [UserGroup],
            [AllowDeny],
            [Permissions]
    FROM
            [ACLs]
    WHERE
            [FullPath] = @parentFullPath;

It may or may not do what your require, its hard to tell. 它可能会或可能不会满足您的要求,这很难说。


To find all the parent child pairs, 要查找所有父子对,

WITH [paths] AS (
SELECT
             [FullPath]
    FROM
             [ACLs]
    GROUP BY
             [FullPath])
SELECT
            [P].[FullPath] [ParentFullPath],
            [C].[FullPath] [ChildFullPath]
    FROM
            [paths] [P]
        JOIN
            [paths] [C]
                ON
                        [C].[FullPath] <> [P].[FullPath]
                    AND
                        CHARINDEX([P].[FullPath], [C].[FullPath]) = 1;

so you could in fact do it all at once, something like this. 因此,实际上您可以一次完成所有操作,就像这样。

WITH [paths] AS (
SELECT
             [FullPath]
    FROM
             [ACLs]
    GROUP BY
             [FullPath])
SELECT
            [PC].[ParentFullPath],
            [PC].[ChildFullPath],
            [D].[UserGroup],
            [D].[AllowDeny],
            [D].[Permissions]
    FROM (
            SELECT
                        [P].[FullPath] [ParentFullPath],
                        [C].[FullPath] [ChildFullPath]
                FROM
                        [paths] [P]
                    JOIN
                        [paths] [C]
                            ON
                                    [C].[FullPath] <> [P].[FullPath]
                                AND
                                    CHARINDEX([P].[FullPath], [C].[FullPath]) = 1;
        ) [PC]
CROSS APPLY
(
SELECT
            [UserGroup],
            [AllowDeny],
            [Permissions]
    FROM
            [ACLs]
    WHERE
            [FullPath] = [PC].[ChildFullPath]
EXCEPT
SELECT
            [UserGroup],
            [AllowDeny],
            [Permissions]
    FROM
            [ACLs]
    WHERE
            [FullPath] = [PC].[ParentFullPath]
) [D];

Ultimately, if you want this code to run efficiently you'll need to normalize your schema somewhat. 最终,如果您希望此代码有效运行,则需要对架构进行某种程度的规范化。 As long as the parent child relationship exists only by inference through string comparison this will be a relatively slow operation. 只要父子关系仅通过通过字符串比较进行推断而存在,这将是一个相对较慢的操作。

If you want to do it for the whole list in one go you might write an SQL expression (for example using substring()) to get_parent_path_from_child_path and run a following SQL structure. 如果要一次性完成整个列表,则可以将SQL表达式(例如,使用substring()使用)写入get_parent_path_from_child_path并运行以下SQL结构。 It is not clear from your question how to separate parent from child in a general case. 从您的问题尚不清楚,在一般情况下如何将父母与孩子分开。 So I am just giving you a wireframe code. 所以我只是给你一个线框代码。

(
SELECT -- parents
            [UserGroup],
            [AllowDeny],
            [Permissions],
            [FullPath] as parent_path
    FROM
            [ACLs]
  WHERE add a filter for parents here

minus 
SELECT -- children
            [UserGroup],
            [AllowDeny],
            [Permissions],
            get_parent_path_from_child_path([FullPath]) as parent_path
    FROM
            [ACLs]
    WHERE add a filter for children here

)
union 
(
SELECT -- children
            [UserGroup],
            [AllowDeny],
            [Permissions],
            get_parent_path_from_child_path([FullPath]) as parent_path
    FROM
            [ACLs]
    WHERE add a filter for children here
minus
SELECT -- parents
            [UserGroup],
            [AllowDeny],
            [Permissions],
            [FullPath] as parent_path
    FROM
            [ACLs]
  WHERE add a filter for parents here

) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM