简体   繁体   English

以编程方式在没有外键的RDBMS中提取表之间的关系?

[英]Programmatically extracting relationships between tables in an RDBMS w/out foreign keys?

I'm reverse engineering the relationships between a medium-sized number of tables (50+) in an Oracle database where there are no foreign keys defined between the tables. 我正在反向工程Oracle数据库中中等数量的表(50+)之间的关系,在这些表之间没有定义外键。 I can count (somewhat) on being able to match column names across tables. 我可以指望(某种程度上)能够在表中匹配列名。 For example, column name "SomeDescriptiveName" is probably the same across the set of tables. 例如,列名称“ SomeDescriptiveName”在整个表集中可能是相同的。

What I would like to be able to do is to find a better way of extracting some set of relationships based on those matching column names than manually going through the tables one by one. 我想做的是找到一种比手动逐个遍历表更好的方法,该方法基于那些匹配的列名提取一组关系。 I could do something with Java DatabaseMetaData methods but it seems like this is one of those tasks that someone has probably had to script before. 我可以使用Java DatabaseMetaData方法执行某些操作,但似乎这是某人以前可能必须编写脚本的那些任务之一。 Maybe extract the columns names with Perl or some other scripting lang, use the column names as a hash key and add tables to an array pointed to by the hash key? 也许使用Perl或其他脚本语言提取列名称,将列名称用作哈希键,然后将表添加到哈希键指向的数组中?

Anyone have any tips or suggestions that might make this simpler or provide a good starting point? 任何人都有任何提示或建议,可能会使此过程变得简单或提供一个很好的起点? It's an ugly need, if foreign keys had already been defined, understanding the relationships would have been much easier. 这是一个丑陋的需求,如果已经定义了外键,那么理解这些关系就容易得多。

Thanks. 谢谢。

You pretty much wrote the answer in your question. 您几乎在问题中写下了答案。

my %column_tables;
foreach my $table (@tables) {
    foreach my $column ($table->columns) {
        push @{$column_tables[$column]}, $table;
    }
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
    my @tables = @{$column_tables[$column]};
    next
        if @tables < 2;
    print $column, ': ';
    foreach my $table (@tables) {
        print $table->name, ' ';
    }
    print "\n";
}

You can use a combination of three (or four) approaches, depending on how obfuscated the schema is: 您可以使用三种(或四种)方法的组合,具体取决于模式的混淆程度:

  • dynamic methods 动态方法
    • observation : 观察
      • enable tracing in the RDBMS (or ODBC layer), then 在RDBMS(或ODBC层)中启用跟踪,然后
      • perform various activities in the application (ideally record creation), then 在应用程序中执行各种活动(最好是创建记录),然后
      • identify which tables were altered in tight sequence, and with what column-value pairs 识别哪些表按紧密顺序更改,以及哪些列值对
      • values occurring in more than one column during the sequence interval may indicate a foreign key relationship 在序列间隔中出现在多列中的值可能表示外键关系
  • static methods (just analyzing existing data, no need to have a running application) 静态方法(仅分析现有数据,无需运行应用程序)
    • nomenclature : try to infer relationships from column names 命名法 :尝试从列名推断关系
    • statistical : look at minimum/maximum (and possibly the average) of unique values in all numerical columns, and attempt to perform a match 统计的 :查看所有数字列中唯一值的最小值/最大值(可能是平均值),并尝试执行匹配
    • code reverse engineering : your last resort (unless dealing with scripts) - not for the faint of heart :) 代码逆向工程 :您的不得已的方法(除非处理脚本)-不适合胆小的人使用:)

My strategy would be to use the Oracle system catalog to find columns that are the same in column name and data type but different in table name . 我的策略是使用Oracle系统目录查找列数据类型相同表名不同的 Also which one of the columns is part of a table's primary or unique key. 也是表的主键或唯一键的一部分。

Here's a query that may be close to doing this, but I don't have an Oracle instance handy to test it: 这是一个接近完成此查询的查询,但是我没有方便的Oracle实例来测试它:

SELECT col1.table_name || '.' || col1.column_name || ' -> ' 
    || col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1 
  JOIN all_tab_columns col2
    ON (col1.column_name = col2.column_name 
    AND col1.data_type = col2.data_type)
  JOIN all_cons_columns cc
    ON (col2.table_name = cc.table_name 
    AND col2.column_name = cc.column_name)
  JOIN all_constraints con
    ON (cc.constraint_name = con.constraint_name 
    AND cc.table_name = con.table_name 
    AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;

Of course this won't get any case of columns that are related but have different names. 当然,不会得到任何相关但名称不同的列的情况。

This is an interesting question. 这是个有趣的问题。 The approach I took was a brute force search for columns that matched types and values for a small sample set. 我采用的方法是蛮力搜索与小型样本集的类型和值匹配的列。 You'll probably have to tweak the heuristics to provide good results for your schema. 您可能必须调整启发式方法才能为您的模式提供良好的结果。 I ran this on a schema that didn't use auto-incremented keys and it worked well. 我在不使用自动递增键的架构上运行了它,并且效果很好。 The code is written for MySQL, but it's very easy to adapt to Oracle. 该代码是为MySQL编写的,但是很容易适应Oracle。

use strict;
use warnings;
use DBI;

my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");

my @list;
foreach my $table (show_tables()) {
    foreach my $column (show_columns($table)) {
        push @list, { table => $table, column => $column };
    }
}

foreach my $m (@list) {
    my @match;
    foreach my $f (@list) {
        if (($m->{table} ne $f->{table}) &&
            ($m->{column}{type} eq $f->{column}{type}) &&
            (samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
        {
            # For better confidence, add other heuristics such as
            # joining the tables and verifying that every value
            # appears in the master. Also it may be useful to exclude
            # columns in large tables without an index although that
            # heuristic may fail for composite keys.
            #
            # Heuristics such as columns having the same name are too
            # brittle for many of the schemas I've worked with. It may
            # be too much to even require identical types.

            push @match, "$f->{table}.$f->{column}{name}";
        }
    }
    if (@match) {
        print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- @match\n";
    }
}

$dbh->disconnect();

exit;

sub show_tables {
    my $result = query("show tables");
    return ($result) ? @$result : ();
}

sub show_columns {
    my ($table) = @_;
    my $result = query("desc $table");
    my @columns;
    if ($result) {
        @columns = map {
            { name => $_->[0],
              type => $_->[1],
              samples => query("select distinct $_->[0] from $table limit 10") }
        } @$result;
    }
    return @columns;
}

sub samples_found {
    my ($table, $column, $samples) = @_;
    foreach my $v (@$samples) {
        my $result = query("select count(1) from $table where $column=?", $v);
        if (!$result || $result->[0] == 0) {
            return 0;
        }
    }
    return 1;
}

sub query {
    my ($sql, @binding) = @_;
    my $result = $dbh->selectall_arrayref($sql, undef, @binding);
    if ($result && $result->[0] && @{$result->[0]} == 1) {
        foreach my $row (@$result) {
            $row = $row->[0];
        }
    }
    return $result;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM