以編程方式在沒有外鍵的RDBMS中提取表之間的關系？

Question

我正在反向工程Oracle數據庫中中等數量的表（50+）之間的關系，在這些表之間沒有定義外鍵。 我可以指望（某種程度上）能夠在表中匹配列名。 例如，列名稱“ SomeDescriptiveName”在整個表集中可能是相同的。

我想做的是找到一種比手動逐個遍歷表更好的方法，該方法基於那些匹配的列名提取一組關系。 我可以使用Java DatabaseMetaData方法執行某些操作，但似乎這是某人以前可能必須編寫腳本的那些任務之一。 也許使用Perl或其他腳本語言提取列名稱，將列名稱用作哈希鍵，然后將表添加到哈希鍵指向的數組中？

任何人都有任何提示或建議，可能會使此過程變得簡單或提供一個很好的起點？ 這是一個丑陋的需求，如果已經定義了外鍵，那么理解這些關系就容易得多。

謝謝。

Answer 1

您幾乎在問題中寫下了答案。

my %column_tables;
foreach my $table (@tables) {
    foreach my $column ($table->columns) {
        push @{$column_tables[$column]}, $table;
    }
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
    my @tables = @{$column_tables[$column]};
    next
        if @tables < 2;
    print $column, ': ';
    foreach my $table (@tables) {
        print $table->name, ' ';
    }
    print "\n";
}

Answer 2

您可以使用三種（或四種）方法的組合，具體取決於模式的混淆程度：

動態方法
- 觀察：
  - 在RDBMS（或ODBC層）中啟用跟蹤，然后
  - 在應用程序中執行各種活動（最好是創建記錄），然后
  - 識別哪些表按緊密順序更改，以及哪些列值對
  - 在序列間隔中出現在多列中的值可能表示外鍵關系
靜態方法（僅分析現有數據，無需運行應用程序）
- 命名法 ：嘗試從列名推斷關系
- 統計的 ：查看所有數字列中唯一值的最小值/最大值（可能是平均值），並嘗試執行匹配
- 代碼逆向工程 ：您的不得已的方法（除非處理腳本）-不適合膽小的人使用:)

Answer 3

我的策略是使用Oracle系統目錄查找列名和數據類型相同但表名不同的列。 也是表的主鍵或唯一鍵的一部分。

這是一個接近完成此查詢的查詢，但是我沒有方便的Oracle實例來測試它：

SELECT col1.table_name || '.' || col1.column_name || ' -> ' 
    || col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1 
  JOIN all_tab_columns col2
    ON (col1.column_name = col2.column_name 
    AND col1.data_type = col2.data_type)
  JOIN all_cons_columns cc
    ON (col2.table_name = cc.table_name 
    AND col2.column_name = cc.column_name)
  JOIN all_constraints con
    ON (cc.constraint_name = con.constraint_name 
    AND cc.table_name = con.table_name 
    AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;

當然，不會得到任何相關但名稱不同的列的情況。

Answer 4

這是個有趣的問題。 我采用的方法是蠻力搜索與小型樣本集的類型和值匹配的列。 您可能必須調整啟發式方法才能為您的模式提供良好的結果。 我在不使用自動遞增鍵的架構上運行了它，並且效果很好。 該代碼是為MySQL編寫的，但是很容易適應Oracle。

use strict;
use warnings;
use DBI;

my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");

my @list;
foreach my $table (show_tables()) {
    foreach my $column (show_columns($table)) {
        push @list, { table => $table, column => $column };
    }
}

foreach my $m (@list) {
    my @match;
    foreach my $f (@list) {
        if (($m->{table} ne $f->{table}) &&
            ($m->{column}{type} eq $f->{column}{type}) &&
            (samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
        {
            # For better confidence, add other heuristics such as
            # joining the tables and verifying that every value
            # appears in the master. Also it may be useful to exclude
            # columns in large tables without an index although that
            # heuristic may fail for composite keys.
            #
            # Heuristics such as columns having the same name are too
            # brittle for many of the schemas I've worked with. It may
            # be too much to even require identical types.

            push @match, "$f->{table}.$f->{column}{name}";
        }
    }
    if (@match) {
        print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- @match\n";
    }
}

$dbh->disconnect();

exit;

sub show_tables {
    my $result = query("show tables");
    return ($result) ? @$result : ();
}

sub show_columns {
    my ($table) = @_;
    my $result = query("desc $table");
    my @columns;
    if ($result) {
        @columns = map {
            { name => $_->[0],
              type => $_->[1],
              samples => query("select distinct $_->[0] from $table limit 10") }
        } @$result;
    }
    return @columns;
}

sub samples_found {
    my ($table, $column, $samples) = @_;
    foreach my $v (@$samples) {
        my $result = query("select count(1) from $table where $column=?", $v);
        if (!$result || $result->[0] == 0) {
            return 0;
        }
    }
    return 1;
}

sub query {
    my ($sql, @binding) = @_;
    my $result = $dbh->selectall_arrayref($sql, undef, @binding);
    if ($result && $result->[0] && @{$result->[0]} == 1) {
        foreach my $row (@$result) {
            $row = $row->[0];
        }
    }
    return $result;
}

以編程方式在沒有外鍵的RDBMS中提取表之間的關系？

問題描述

4 個解決方案

解決方案1
1 2009-02-27 20:00:21

解決方案2
1 2009-02-27 23:19:53

解決方案3
1 2009-02-28 01:55:15

解決方案4
0 2009-02-28 05:03:19

以編程方式在沒有外鍵的RDBMS中提取表之間的關系？

問題描述

4 個解決方案

解決方案1 1 2009-02-27 20:00:21

解決方案2 1 2009-02-27 23:19:53

解決方案3 1 2009-02-28 01:55:15

解決方案4 0 2009-02-28 05:03:19

解決方案1
1 2009-02-27 20:00:21

解決方案2
1 2009-02-27 23:19:53

解決方案3
1 2009-02-28 01:55:15

解決方案4
0 2009-02-28 05:03:19