简体   繁体   中英

SQL syntax error with parsed text from html page

I know this problem has been asked many times...I have basics in PHP and Mysql. I'm trying to set up a webpage tracker. I managed to do it using md5 function. I'd like to go further and see what has changed. I can parse the link of a webpage. i would like to store it in a database to compare it later with the content of the same page.

Here is my code:

$website = "www.example.com"
$input = file_get_contents($website) or die("Could not access file: $website");
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {


    foreach($matches as $match) {
        $final .= $match[3] . "<br>";

    }

}

$oldchecksum_text = "INSERT INTO websites (website, hash, text) VALUES ('$website', '$newchecksum', '$final')";

if (mysqli_query($conn, $oldchecksum_text)){
    echo "New record created successfully";
} else {
    echo "Erreur: "  . "<br>" . $conn->error;
}

Basically, everything work...Except the SQL query fails because of a "syntax error on line 1". The problem comes from the text parsed. If i replace the variable by a word, or a long string of letters, it works perfectly.

I tried to replace ' by ` ...Didn't change anything.

Here are the characteristics of my SQL row: text/longtext/utf8_general_ci

I don't really know what to do anymore... Thanks for your help !

The best solution is to use a prepared query:

$oldchecksum_text = "INSERT INTO websites (website, hash, text) VALUES (?, ?, ?)";
$stmt = mysqli_prepare($conn, $oldchecksum_text);
mysqli_stmt_bind_param($stmt, "sss", $website, $newchecksum, $final);
if (mysli_stmt_execute($stmt)) {
    echo "New record created successfully";
} else {
    echo "Erreur: <br>" . $conn->error;
}

If there's some reason you can't do this, use mysqli_real_escape_string to escape variables before substituting them into a query.

I eventually looked at other methods, and it finally work. I changed the way i parsed the links, maybe this is the key ?

 function getLinks($link)
{
    /*** return array ***/
    $ret = array();

    /*** a new dom object ***/
    $dom = new domDocument;

    /*** get the HTML (suppress errors) ***/
    @$dom->loadHTML(file_get_contents($link));

    /*** remove silly white space ***/
    $dom->preserveWhiteSpace = false;

    /*** get the links from the HTML ***/
    $links = $dom->getElementsByTagName('a');

    /*** loop over the links ***/
    foreach ($links as $tag)
    {
        $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue;
    }

    return $ret;
}


/*** a link to search ***/
$link = $website;

/*** get the links ***/
$urls = getLinks($link);

/*** check for results ***/
if(sizeof($urls) > 0)
{
    foreach($urls as $key=>$value)
    {
        $final .= $key . '<br >';
    }
}
else
{
    echo "No links found at $link";
}


$oldchecksum_text = "INSERT INTO websites (website, hash, text) VALUES ('$website', '$newchecksum', '$final')";
mysqli_set_charset($conn, "utf8");

$final = mysqli_real_escape_string($conn, $final);

echo $final;

$stmt = mysqli_prepare($conn, $oldchecksum_text);
mysqli_stmt_bind_param($stmt, "sss", $website, $newchecksum, $final);
if (mysqli_stmt_execute($stmt)) {
    echo "New record created successfully";
} else {
    echo "Erreur: <br>" . $conn->error;
}

thanks for your help, i never heard about mysqli_real_escape_string before, but quite interesting. At least i learned something new today :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM