简体   繁体   English

如何刮一个<script type = “text / javascript”> tag in php?

[英]how to scrape a <script type = “text / javascript”> tag in php?

  1. my question is how can I scrape this tag我的问题是我怎样才能刮掉这个标签
<script type="text/javascript">
var BCData = {"csrf_token":"686611cabde717e63c8ad811ac28ff1a2566168df14ec1439799dbfc0569f2c8","product_attributes":{"purchasable":true,"purchasing_message":null,"sku":"STICKER_PACK","upc":null,"stock":null,"instock":true,"stock_message":null,"weight":null,"base":false,"image":null,"price":{"without_tax":{"formatted":"$3.99","value":3.99,"currency":"USD"},"tax_label":"Tax"},"out_of_stock_behavior":"label_option","out_of_stock_message":"Out of stock","available_modifier_values":[],"available_variant_values":[7375],"in_stock_attributes":[7375],"selected_attributes":[]}};
</script>
  1. what I want to extract is the value of csrf_token or 686611cabde717e63c8ad811ac28ff1a2566168df14ec1439799dbfc0569f2c8我要提取的是 csrf_token 或686611cabde717e63c8ad811ac28ff1a2566168df14ec1439799dbfc0569f2c8的值

  2. I already tried as below but did not get the result I expected我已经尝试如下但没有得到我预期的结果

$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, '$url');
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36');
curl_setopt($ch,CURLOPT_HTTPHEADER,array("accept-language: es-419,es;q=0.9"));
curl_setopt($ch,CURLOPT_TIMEOUT, 10);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
preg_match_all('(<script type="text/javascript">
var BCData = {"csrf_token":\"(.*)\","product_attributes":{"purchasable":true,"purchasing_message":null,"sku":"STICKER_PACK","upc":null,"stock":null,"instock":true,"stock_message":null,"weight":null,"base":false,"image":null,"price":{"without_tax":{"formatted":"$3.99","value":3.99,"currency":"USD"},"tax_label":"Tax"},"out_of_stock_behavior":"label_option","out_of_stock_message":"Out of stock","available_modifier_values":[],"available_variant_values":[7375],"in_stock_attributes":[7375],"selected_attributes":[]}};</script>)siU', $result, $matches1);
$titulo = $matches1[1][0];
echo $titulo;
  1. I can't get the result我无法得到结果

You can probably grab the variable BCData and then convert it into JSON:您可能可以获取变量 BCData,然后将其转换为 JSON:

$data = preg_match_all('/var\s+BCData\s*=\s*({.*?});/m', $result , $matches);
if (!empty($matches[1]) && !empty($matches[1][0])) {
   $data = json_decode($matches[1][0], true);
   echo $data['csrf_token'];
}

This assumes that the code will have a JSON valid value within the script tag, which seems to be true now, but may not be true forever.这假设代码将在脚本标记中具有 JSON 有效值,这现在似乎是正确的,但可能不会永远正确。

Sandbox link 沙盒链接

For reliability, the whole html document should be parsed by a DOM parser to isolate the <script> node.为了可靠性,整个 html 文档应该由 DOM 解析器解析以隔离<script>节点。

Then use regex to carve out the json string.然后使用正则表达式来雕刻出json字符串。 The m modifier makes ^ match the start of a line and $ match the end of a line. m修饰符使^匹配行首, $匹配行尾。 \\K restarts the fullstring match so that no capture groups are needed. \\K重新启动全字符串匹配,因此不需要捕获组。

Then, for reliability, parse the json string and access the desired value by key.然后,为了可靠性,解析 json 字符串并通过键访问所需的值。

Code: ( Demo )代码:(演示

$html = <<<HTML
<script type="text/javascript">
var BCData = {"csrf_token":"686611cabde717e63c8ad811ac28ff1a2566168df14ec1439799dbfc0569f2c8","product_attributes":{"purchasable":true,"purchasing_message":null,"sku":"STICKER_PACK","upc":null,"stock":null,"instock":true,"stock_message":null,"weight":null,"base":false,"image":null,"price":{"without_tax":{"formatted":"$3.99","value":3.99,"currency":"USD"},"tax_label":"Tax"},"out_of_stock_behavior":"label_option","out_of_stock_message":"Out of stock","available_modifier_values":[],"available_variant_values":[7375],"in_stock_attributes":[7375],"selected_attributes":[]}};
</script>
HTML;

echo preg_match(
         '~^var BCData = \K.*(?=;$)~m',
         $html,
         $match
     )
     ? json_decode($match[0])->csrf_token
     : 'pattern found no match';

Output:输出:

686611cabde717e63c8ad811ac28ff1a2566168df14ec1439799dbfc0569f2c8

Admittedly, I don't know how the input string may vary so I can only build a pattern for the string provided.诚然,我不知道输入字符串会如何变化,所以我只能为所提供的字符串构建一个模式。

The simplest expression to extract the CSRF from the page:从页面中提取CSRF的最简单的表达式:

# matches all occurrences of the format of the CSRF token
if (preg_match_all('/[a-f0-9]{64}/', $string, $matches))
{
    # should equal the value of the transmitted CSRF
    print_r($matches[0][0]);
}

This specifically matches multiple instances of the "csrf_token":"..." portion of the JSON and extracts the token value in a named group这特别匹配 JSON 的"csrf_token":"..."部分的多个实例,并在命名组中提取令牌值


// Match all occurrences
if (preg_match_all('/\"csrf_token\"\s?\:\s?\"(?<csrf>[a-f0-9]{64})\"/', $string, $matches)) {

    // One or more token matches extracted from the JSON
    print_r($matches['csrf']);

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何刮<script text/javascript> - How to scrape <script text/javascript> 刮<script> tag in PHP Goutte - Scrape <script> tag in PHP Goutte <script> tag vs <script type = 'text/javascript'> tag - <script> tag vs <script type = 'text/javascript'> tag html脚本标签不使用类型javascript <script type =“text / html”>? - html script tag not using type javascript <script type=“text/html”>? php heredoc-不回显&#39; <script type=“text/javascript”>' - php heredoc — not echoing '<script type=“text/javascript”>' 如何在PHP中使用JavaScript? 无法识别脚本标签 - How to use javascript in PHP? The script tag is not recognized 如何使用脚本标签将 php 值发送到 javascript? - How to send php value to javascript with script tag? 我该如何插入<script type='text/javascript' src=''> tag on a 3rd party webpage using rails? - How can I insert a <script type='text/javascript' src=''> tag on a 3rd party webpage using rails? 脚本或链接标记上的“type”或“rel”,“text / plain”的Javascript onload回调 - Javascript onload callback for “type” or “rel”, 'text/plain' on script or link tag 如何从javascript标签内的函数中抓取超链接,其中没有使用元素<script> tag. Just a single function inside <script> - How to scrape hyperlink from function inside javascript tag, no elements used in <script> tag. Just a single function inside <script>
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM