简体   繁体   English

有没有办法使用版本控制功能存储数据库修改(用于最终版本比较)?

[英]Is there a way to store database modifications with a versioning feature (for eventual versions comparaison)?

I'm working on a project where users could upload excel files into a MySQL database.我正在开发一个项目,用户可以将 Excel 文件上传到 MySQL 数据库中。 Those files are the main source of our data as they come directly from the contractors working with the company.这些文件是我们数据的主要来源,因为它们直接来自与公司合作的承包商。 They contain a large number of rows (23000 on average for each file) and 100 columns for each row!它们包含大量行(每个文件平均 23000)和每行 100 列!

The problem I am facing currently is that the same file could be changed by someone (either the contractor or the company) and when re-uploading it, my system should detect changes, update the actual data, and save the action (The fact that the cell went from a value to another value :: oldValue -> newValue) so we can go back and run a versions comparison (eg 3 re-uploads === 3 versions).我目前面临的问题是某人(承包商或公司)可能会更改相同的文件,并且在重新上传时,我的系统应该检测更改,更新实际数据并保存操作(事实是单元格从一个值变为另一个值 :: oldValue -> newValue),因此我们可以返回并运行版本比较(例如 3 次重新上传 === 3 个版本)。 (oldValue Version1 VS newValue Version5) (oldValue Version1 VS newValue Version5)

I developed a tiny mechanism for saving the changes => I have a table to save Imports data (each time a user import a file a new row will be inserted in this table) and another table for saving the actual changes我开发了一个用于保存更改的小机制 => 我有一个表来保存导入数据(每次用户导入文件时,都会在此表中插入一个新行)和另一个用于保存实际更改的表

Versioning data版本控制数据

I save the id of the row that have some changes, as well as the id and the table where the actual data was modified (Uploading a file results in a insertion in multiple tables, so whenever a change occurs, I need to know in which table that happened).我保存了有一些变化的行的id,以及id和实际数据被修改的表(上传文件会导致插入多个表,所以每当发生变化时,我需要知道在哪个发生的表)。 I also save the new value and the old value which is gonna help me with restoring the "archives data".我还保存了新值和旧值,这将帮助我恢复“档案数据”。

  • To restore a version : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber}恢复版本: SELECT * FROM 'Archive' WHERE idImport = ${versionNumber}
  • To restore a version for one row : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber} and rowId = ${rowId}恢复一行的版本: SELECT * FROM 'Archive' WHERE idImport = ${versionNumber} and rowId = ${rowId}
  • To restore all version for one row : SELECT * FROM 'Archive' WHERE rowId = ${rowId}恢复一行的所有版本: SELECT * FROM 'Archive' WHERE rowId = ${rowId}
  • To restore version for one table : SELECT * FROM 'Archine' WHERE tableName = ${table}恢复一张表的版本: SELECT * FROM 'Archine' WHERE tableName = ${table}
  • Etc.等等。

Now with this structure, I'm struggling to restore a version or to run a comparaison between two versions, which makes think that I've came up with a wrong approach since it makes it hard to do the job!现在有了这个结构,我正在努力恢复一个版本或在两个版本之间运行比较,这让人觉得我想出了一个错误的方法,因为它很难完成这项工作! I am trying to know if anyone had done this before or what a good approach would look like?我想知道之前是否有人这样做过,或者一个好的方法会是什么样子?

Cases when things get really messy :事情变得非常混乱的情况:

  • The rows that have changed in a version might not have changed in the other version (I am working on a time machine to search in other versions when this happens)在一个版本中更改的行在另一个版本中可能没有更改(我正在使用时间机器在发生这种情况时在其他版本中进行搜索)
  • The rows have changed in both versions but not the same fields.两个版本中的行都发生了变化,但字段不同。 (Say we have a user table, the data of the user with id 15 have changed in 2nd and 5th upload, great! Now for the second version only the name was changed, but for the fifth version his address was changed! When comparing these two versions, we will run into a problem constrcuting our data array. name went from "some"-> NULL (Name was never null. No name changes in 5th version) and address went from NULL -> "some' is which obviously wrong). (假设我们有一个用户表,第 2 次和第 5 次上传时 id 为 15 的用户的数据发生了变化,太好了!现在对于第二个版本只更改了名称,但是对于第五个版本他的地址被更改了!比较这些时两个版本,我们会在构造我们的数据数组时遇到问题。名称来自“some”-> NULL(名称从不为空。第 5 个版本中没有名称更改)和地址来自 NULL ->“some”显然是错误的)。

My actual approach (php)我的实际方法(php)

<?php
//Join records sets and Compare them
foreach ($firstRecord as $frecord) {

  //Retrieve first record fields that have changed
  $fFields = $frecord->fieldName;
  
  //Check if the same record have changed in the second version as well
  $sId = array_search($frecord->idRecord, $secondRecord);
  if($sId) {
      $srecord = $secondRecord[$sId];
      //Retrieve straversee fields that have changed
      $sFields = $srecord->fieldName;
      //Compare the two records fields
      foreach ($fFields as $fField) {
          $sfId = array_search($fField, $sFields);
          //The same field for the same record was changed in both version (perfect case)
          if($sfId) {
              $sField = $sFields[$sfId];
              $deltaRow[$fField]["oldValue"] = $frecord->deltaValue;
              $deltaRow[$fField]["newValue"] = $srecord->deltaValue;
              //Delete the checked field from the second version traversee to avoid re-checking
              unset($sField[$sfId]);
          }
          //The changed field in V1 was not found in V2 -> Lookup for a value
          else {
              $deltaRow[$fField]["oldValue"] = $frecord->deltaValue;
              $deltaRow[$fField]["newValue"] = $this->valueLookUp();
          }
      }
      $dataArray[] = $deltaRow;
      //Delete the checked record from the second version set to avoid re-checking
      unset($secondRecord[$srecord]);
  }

I don't know how to deal with that, as I said I m working on a value lookup algorithm so when no data found in a version I will try to find it in the versions between theses two so I can construct my data array.我不知道如何处理这个问题,正如我所说,我正在研究一个值查找算法,所以当在一个版本中找不到数据时,我会尝试在这两个版本之间找到它,这样我就可以构建我的数据数组。 I would be very happy if anyone could give some hints, ideas, improvements so I can go futher with that.如果有人能提供一些提示、想法和改进,我会很高兴,这样我就可以继续前进。

Thank you!谢谢!

Is there a way to store database modifications with a versioning feature (for eventual versions comparaison [sic!])?有没有办法使用版本控制功能存储数据库修改(对于最终版本比较 [原文如此!])?

What constitutes versioning depends on the database itself and how you make use of it.什么构成版本控制取决于数据库本身以及您如何使用它。

As far as a relational database is concerned (eg MariaDB), this boils down to the so called Normal Form which is in numbers.至于关系数据库而言(如MariaDB的),这归结为所谓的范式这是在数字。

On Database Normalization: 5th Normal Form and Beyond you can find the following guidance:关于数据库规范化:第五范式及以上,您可以找到以下指南:

Beyond 5th normal form you enter the heady realms of domain key normal form, a kind of theoretical ideal.超越第 5 范式,您将进入领域密钥范式的令人兴奋的领域,这是一种理论理想。 Its practical use to a database designer os [sic!] similar to that of infinity to a bookkeeper - ie it exists in theory but is not going to be used in practice.它对数据库设计者 os [原文如此!] 的实际用途类似于对簿记员的无穷大——即它在理论上存在,但不会在实践中使用。 Even the most demanding owner is not going to expect that of the bookkeeper!即使是最苛刻的所有者也不会期望簿记员!

One strategy to step into these realms is to reach the 5th normal form first (do this just in theory, by going through all the normal forms, and study database normalization ).进入这些领域的一种策略是首先达到第 5 范式(理论上这样做,通过所有范式,并研究数据库规范化)。

Additionally you can construe versioning outside and additional to the database itself, eg by creating your own versioning system.此外,您可以在数据库本身之外解释版本控制,例如通过创建自己的版本控制系统。 Reading about what you can do with normalization will help you to find better ways to decide on how to structure and handle the database data for your versioning needs.阅读规范化可以做什么将帮助您找到更好的方法来决定如何构建和处理数据库数据以满足您的版本控制需求。

However, as written it depends on what you want and need.但是,正如所写的那样,这取决于您想要和需要什么。 So no straight forward "code" answer can be given to such a general question.因此,对于这样一个普遍的问题,无法给出直接的“代码”答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM