简体   繁体   中英

Date filter for inserting into DB

I have built a simple crawler for one of our clients. I am facing issues with duplicate entries in the database.

Basically what I am doing is looking into a website which has a lot of houses for sale and then pulling from there the address, postcode, town, price and status.

Later when inserting into database I am also generating creation_date .

The reason for that is that the name CAN be duplicate in case it has been INSERTED at least 2 years ago. So one house can be twice in the database, as long as the creation dates are within a minimum of 2 years range.

<?php 
    //Comparison to current houses

    $query = mysql_query("SELECT street, postcode, town, price, status, creation_time, print_status FROM house"); // Selecting the table

    if (!$query) {
        die('Invalid query: ' . mysql_error()); // checking for errors
    }

    while ($row = mysql_fetch_array($query)) {
        // $row['street'];
        // $row['postcode'];
        // $row['town'];
        // $row['price'];
        // $row['status'];

        $creation_time = $row['creation_time'];
        $print_status = $row['print_status'];

        $c = 0;
        foreach ($houses as $house) {
            $creation_time_u = strtotime($creation_time); // Makes creation time into Unix
            $life_time = strtotime('+2 years', $creation_time_u); // Calculates +2 years from creation time
            if (($row['street'] == $house[0]) && ($row['postcode'] == $house[1]) && ($row['town'] == $house[2]) && ($life_time >= $now)) {
                    unset($houses[$c]); // maybe use implode? When i do unset its leaving the array but the values are gone, so we get an empty row
            }   
        }
        $c++;
        $houses = array_values($houses); // FIXES BROKEN INDEX AFTER USING UNSET
    }
?>

After this has been completed, I insert the new $houses array into the database and then print, which is the next step but kind of irrelevant in this case.

So, i don't know exactly what is going wrong. If I run it twice in a row, it doesn't enter duplicate entries but if I run it the next day or something.

It makes the same entry but double. Here is an example of what i found in the database:
screenshot

So yeah, I have spent too much time looking at this code and I can't figure out why my filter is not working. I expect it has to do with how I am managing time, but not completely sure.

Please advice!

Instead of calculationg the time-interval in php you should select relevant houses in your SQL-query (see DATE_ADD here ):

 SELECT 
      street, postcode, town, price, status, creation_time, print_status 
 FROM house AS a
 JOIN house AS b
      ON a.street = b.street
      AND a.postcode = b.postcode
      AND a.town = b.town
 WHERE
      a.creation_time <= DATE_ADD(creation_time, INTERVAL 2 YEARS) -- select duplicates

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM