简体   繁体   中英

Compare the data of two csv file PHP

As of now this is my code. I got stuck on how to look up/compare the data of payment.csv to transactions.csv.

<?php
        //this shows the name, email, deposit date and amount from payment.csv

          $data = file("data/payment.csv");
          foreach ($data as $deposit){
            $depositarray = explode(",", $deposit);
            $depositlist = $depositarray;
            $name = $depositlist[0];
            $email = $depositlist[1];
            $depositdate = $depositlist[9]; 
            $depositamount = $depositlist[10];

            //echo $depositamount;
        }

        //this shows the payment date and amount from transaction.csv

        $databank = file("datas/transactions.csv");
          foreach ($databank as $payment){
            $paymentarray = explode(",", $payment);
            $paymentlist = $paymentarray;

            $paymentdate = $paymentlist[0];
            $paymentamount =  $paymentlist[5];

            //echo $paymentamount;

        }
?>

Example:

From payment.csv , every ( $depositdate && $depositamount ) will compare to transactions.csv ( $paymentdate && $paymentamount ).

Every row with matched will save to an array then display on table later. If not matched save to an array and display later.

Would anyone help me or give me an idea to accomplish this? Just to display all rows with matched data.

or just like this: 在此处输入图片说明

This should be the output when a transaction have multiple matched 在此处输入图片说明

The requirements: Find and record, transaction records that have a match with a corresponding master record.

A list of all the transaction records that match a master record must be kept.

The comparison is based on 'date' and amount. (updated to easily allow any values from the current row, as you pass an array of 'column names' to use.)

The issue is that this can get quite expensive if the arrays are not sorted by the keys you want to compare.

One approach is to generate a 'key' that is unique for each 'data key fields' but is easy to generate and is a fixed size to make comparisons easy.

Then use these 'keys' to generate a lookup array of 'generated key' to the original record.

This saves having to provide sorted data on the fields you want to compare. However, the resulting lookup array must fit in memory.

I decided to use an MD5 hash for the concatenated data keys. The chances of collisions are not important in this application. And MD5 is excellent at generating unique hashes. It is also quick.

Working code at eval.in

Source Code

The class that does the work:

// ---------------------------------------------------------------------------------
class HashMatch {

   /*
    * Generate a MD5 hash for each master and tranasaction using some
    * of the data fields as the string to be hashed.
    */ 

    /**
    * Master source records
    * 
    * @var array 
    */
    private $master = null;

    /**
    * Transaction Source records must have the same field names as the master
    * of the indexes that are used to generate the MD5 hash
    * 
    * @var array 
    */
    private $transaction  = null;

    /**
    * The generated MD5 hash is the key in the Master source records.
    * 
    * Each record has a list of other Master Record Ids that also have the same hash 
    * 
    * @var array
    */
    private $hashMaster = array();

    /**
    * The generated MD5 hash is the key in the Transaction source records.
    * 
    * Each record has a list of other Transaction Record Ids that also have the same hash 
    * 
    * @var array
    */
    private $hashTransaction = array();

    /**
    * Specify which index names to use from the supplied data record arrays
    * to generate the MD5 hash with.
    * 
    * @var array 
    */
    private $keyNames = array();

    /**
    * Generate a MD5 hash for each master and transaction using some
    * of the data fields as the string to be hashed.
    * 
    * You can pass an array of field names to used to generate the key.
    * 
    * This allows any records to be used in this class as you just provide
    * the li9st of names to generate the MD5 hash
    *  
    * 
    * @param array $master
    * @param array $transaction
    * @param array $keyNames
    * 
    * @return void
    */    
    public function __construct(array $master, 
                                array $transaction, 
                                array $keyNames = array('when', 'amount')) 
    {
        $this->master = $master;
        $this->transaction  = $transaction;
        $this->keyNames = $keyNames; 
    } 

    /**
    * Generate all the Hashes and store all the matching details
    * 
    * @return bool
    */    
    public function generateMatches()
    {
        $this->processMaster();
        $this->processTransaction();
        return !empty($this->hashMaster) && !empty($this->hashTransaction);
    }

    /**
    * Generate a list of MD5 hashes as a key  
    * 
    * Keep a list of other master records with the same hash 
    *  
    * @return void
    */    
    public function processMaster()
    {
        foreach ($this->master as $recordId => $data) {

            $hash = $this->generateHash($data);
            if (empty($this->hashMaster[$hash])) { // add it...
                $this->hashMaster[$hash]['masterId'] = $recordId;
                $this->hashMaster[$hash]['matchIds'] = array($recordId);
            }            
            else { // is a duplicate so add to the match list
                $this->hashMaster[$hash]['matchIds'][] = $recordId;
            }
        }
    }

    /**
    * Generate a list of MD5 hashes as a key for the Transaction source  
    *   
    * Match the hashes against the master list and record if there is a match
    * 
    * @return void
    */
    public function processTransaction()
    {        
        foreach ($this->transaction as $recordId => $data) {
            $hash = $this->generateHash($data);
            if (empty($this->hashMaster[$hash])) { // skip this record
               continue;
            }

            // record a match with the master
            if (empty($this->hashTransaction[$hash])) { // new record
                $this->hashTransaction[$hash]['masterId'] = $this->hashMaster[$hash]['masterId'];
                $this->hashTransaction[$hash]['matchIds']  = array();
            }

            // add to the list of matches
            $this->hashTransaction[$hash]['matchIds'][] = $recordId;
        }
    }

    /**
    * Return Master MD5 list 
    * 
    * The keys are unique, however there are extra values:
    *   
    *   'masterId'  ==> The first record in the array with this key
    * 
    *   'matchIds'  ==> A *complete* list of all the master records that have this key.
    *                   Yes, it includes itself, this allows you to just use this list
    *                   when reporting.
    * 
    * @return array
    */
    public function getHashMasterList()
    {
        return $this->hashMaster;
    }

    /**
    * Return Master MD5 list with more that one matching master
    * 
    * i.e. duplicate master records with the same hash
    * 
    * @return array
    */
    public function getHashMatchedMasterList()
    {
        $out = array();
        foreach ($this->hashMaster as $key => $item) {
            if (count($item['matchIds']) >= 2) {
                $out[$key] = $item; 
            }
        }
        return $out;
    }

    /**
    * All the transactions  that matched a master record
    * 
    * @return array
    */
    public function getHashTransactionList()
    {
        return $this->hashTransaction;
    }

    /**
    * given a master hash then return the details as:
    * 
    * i.e. this converts a hash key back into source records for processing.
    * 
    * 1) A list of matching master records 
    * 
    *    e.g. $out['master'][] ...  
    *    
    * 
    * 2) A list of matching transaction records 
    * 
    *    e.g. $out['transaction'][] ...   
    * 
    * @param string $hash
    * 
    * @return array
    */
    public function getMatchedRecords($hash)
    {
        $out = array('key'         => $hash,
                      'master'      => array(),
                      'transaction' => array(),
                     );

        if (!empty($this->hashMaster[$hash])) { // just in case is invalid hash
            foreach ($this->hashMaster[$hash]['matchIds'] as $recordId) {
                $out['master'][] = $this->master[$recordId];
            }
        }

        if (!empty($this->hashTransaction[$hash])) {
            foreach ($this->hashTransaction[$hash]['matchIds'] as $recordId) {
                $out['transaction'][] = $this->transaction[$recordId];
            }
        }

        return $out;
    }

    /**
    * Generate an MD5 hash from the required fields in the data record 
    * The columns to use will have been passed in the constructor
    * and found in '$keyNames'
    * 
    * It is so you don't have to edit anything to use this class
    * 
    * @param  array  $row
    * 
    * @return string
    */
    public function generateHash($row) 
    {
        $text = '';
        foreach ($this->keyNames as $name) {
            $text .= $row[$name];
        } 
        return Md5($text);
    }   
}

Explanation...

later....

The code to run it:

// !!!! You can pass the names of the fields to be used to generate the key 
$match = new HashMatch($master, 
                       $transaction, 
                       array('whenDone', 'amount'));
$match->generateMatches();


// print output...
echo '<pre>Hash Master Records with multiple Matching Masters ... ', PHP_EOL;
    print_r($match->getHashMatchedMasterList());
echo '</pre>';    

Output:

Matching Master to Transaction... 
Array
(
    [key] => 296099e19b77aad413600a1e2f2cb3cd
    [master] => Array
        (
            [0] => Array
                (
                    [name] => John Matched
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => johnMatched@y.com
                )

            [1] => Array
                (
                    [name] => Jane Matched
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => janeMatched@y.com
                )

        )

    [transaction] => Array
        (
            [0] => Array
                (
                    [name] => John Doe
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => johndoe@y.com
                )

            [1] => Array
                (
                    [name] => micky mean
                    [whenDone] => 2016-04-01
                    [amount] => 12345
                    [email] => mickym@y.com
                )
        )
)

Test Data

$master[]      = array('name' => 'First last',     'whenDone' => '2016-03-03', 'amount' => 12000,  'email' => 'sample@y.com', );
$master[]      = array('name' => 'John Matched',   'whenDone' => '2016-04-01', 'amount' => 12345,  'email' => 'johnMatched@y.com');
$master[]      = array('name' => 'Jane Unmatched', 'whenDone' => '2016-05-02', 'amount' => 12345,  'email' => 'janeUnmatched@y.com');
$master[]      = array('name' => 'Jane Matched',   'whenDone' => '2016-04-01', 'amount' => 12345,  'email' => 'janeMatched@y.com');

$transaction[] = array('name' => 'Mary Lamb',      'whenDone' => '2016-03-04', 'amount' => 12000,  'email'  => 'maryl@y.com');
$transaction[] = array('name' => 'John Doe',       'whenDone' => '2016-04-01', 'amount' => 12345,  'email' => 'johndoe@y.com');
$transaction[] = array('name' => 'micky mean',     'whenDone' => '2016-04-01', 'amount' => 12345,  'email'  => 'mickym@y.com');

According to @Ryan Vincent's comment:

<?php
  $masterData = array();

  //this shows the name, email, deposit date and amount from payment.csv
  $data = file("data/payment.csv");
  foreach ($data as $deposit){
    $depositarray = explode(",", $deposit);
    $key = md5($depositlist[9] . $depositlist[10]); //date + amount
    $depositlist = $depositarray;

    $masterData[$key]['payment'] = array(
      'name' => $depositlist[0],
      'email' => $depositlist[1],
      'depositdate' => $depositlist[9],
      'depositamount' => $depositlist[10]
    );
  }

  //this shows the payment date and amount from transaction.csv
  $databank = file("datas/transactions.csv");
  foreach ($databank as $payment){
    $paymentarray = explode(",", $payment);

    $key = md5($paymentlist[0] . $paymentlist[5]); //date + amount

    $masterData[$key]['transaction'] = array(
      'paymentdate' => $paymentlist[0],
      'paymentamount' =>  $paymentlist[5]
    );
  }
?>

Now you have an array $masterData which have all data with same date and amount under same key.

But I still do not think that this list is good for anything because you do not know which payment belongs to which transaction because date and amount could be same.

However, if you do now a check like:

<?php
  foreach($masterData as $data) {
    echo explode(',', $data[0]);

    if(count($data) == 2) {

      echo explode(',', $data[1]) . ', matched';
    }

    echo '<br/>';
  }
?>

you should have each line with your data and at the end of line a matched when there was a transaction.

But like I said - since people could have a transaction with same amount on same date, you do not know which transaction belongs to which person.

First of all, if it's valid csv-files, you should parse the data via fgetcsv() instead of relying upon that there's no comma inside of a string.

You could create an index-array for deposit-date while reading csv1, and simply lookup this index while reading csv2. If there's matching dates, compare the amounts and proceed further.

something like this:

// read csv1, store in array and create index
$data = array();
$fh = fopen($csv1, 'r');
while($row = fgetcsv($fh)) {
  $data[] = $row;
  $val = $row[$interestd_in];
  $key = count($data) - 1;
  $idx[$val][] = $key;      // array containing all indices
}
fclose($fh);

// read csv2, lookup in index and process further
$fh = fopen($csv2, 'r');
while($row2 = fgetcsv($fh)) {
  $val = $row2[$interest2];
  if(!empty($idx[$val])) {
    foreach($idx[$val] as $key) {
      $row1 = $data[$key];
      /*
    do your further comparisons of these 2 data-lines
    and output, if matches found
       */
    }
  }
}
fclose($fh);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM