[英]Using Goutte with Symfony2 in Controller
I'm trying to scrape a page and I'm not very familiar with php frameworks, so I've been trying to learn Symfony2. 我正在尝试抓取页面,并且对php框架不是很熟悉,因此我一直在尝试学习Symfony2。 I have it up and running, and now I'm trying to use Goutte.
我已经启动并运行它,现在我正在尝试使用Goutte。 It's installed in the vendor folder, and I have a bundle I'm using for my scraping project.
它安装在vendor文件夹中,我有一个捆绑用于我的抓取项目。
Question is, is it good practice to do scraping from a Controller
? 问题是,从
Controller
抓取是否是一种好习惯? And how? 如何? I have searched forever and cannot figure out how to use
Goutte
from a bundle, since it's buried deep withing the file structure. 我一直在搜索,但无法从捆绑软件中找出如何使用
Goutte
,因为Goutte
与文件结构一起深埋。
<?php
namespace ontf\scraperBundle\Controller;
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Goutte\Client;
class ThingController extends Controller
{
public function somethingAction($something)
{
$client = new Client();
$crawler = $client->request('GET', 'http://www.symfony.com/blog/');
echo $crawler->text();
return $this->render('scraperBundle:Thing:index.html.twig');
// return $this->render('scraperBundle:Thing:index.html.twig', array(
// 'something' => $something
// ));
}
} }
I'm not sure I have heard of "good practices" as far as scraping goes but you may be able to find some in the book PHP Architect's Guide to Web Scraping with PHP . 我不确定我是否听说过“好的做法”,但是您也许可以在《 PHP架构师的PHP Web爬虫指南 》一书中找到一些技巧 。
These are some guidelines I have used in my own projects: 这些是我在自己的项目中使用的一些准则:
php app/console scraper:run example.com --env=prod --no-debug
Where app/console is where the Symfony2 console applicaiton lives, scraper:run is the name of your command, example.com is an argument to indicate the page you want to scrape, and the --env=prod --no-debug are the flags you should use to run in production. php app/console scraper:run example.com --env=prod --no-debug
Symfony2控制台应用程序所在的应用程序/控制台是scraper:run是命令的名称,example.com是表示要抓取的页面的参数,--env = prod --no-debug是在生产中运行时应使用的标志。 see code below for example. Ontf/ScraperBundle/Resources/services.yml Ontf / ScraperBundle /资源/ services.yml
services:
goutte_client:
class: Goutte\Client
scraperCommand:
class: Ontf\ScraperBundle\Command\ScraperCommand
arguments: ["@goutte_client"]
tags:
- { name: console.command }
And your command should look something like this: 您的命令应如下所示:
<?php
// Ontf/ScraperBundle/Command/ScraperCommand.php
namespace Ontf\ScraperBundle\Command;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Goutte\Client;
abstract class ScraperCommand extends Command
{
private $client;
public function __construct(Client $client)
{
$this->client = $client;
parent::__construct();
}
protected function configure()
{
->setName('scraper:run')
->setDescription('Run Goutte Scraper.')
->addArgument(
'url',
InputArgument::REQUIRED,
'URL you want to scrape.'
);
}
protected function execute(InputInterface $input, OutputInterface $output)
{
$url = $input->getArgument('url');
$crawler = $this->client->request('GET', $url);
echo $crawler->text();
}
}
You Should take a Symfony-Controller if you want to return a response, eG a html output. 如果要返回响应(例如html输出),则应使用Symfony-Controller。
if you only need the function for calculating or storing stuff in database, You should create a Service class that represents the functionality of your Crawler, eG 如果仅需要用于在数据库中计算或存储内容的功能,则应创建一个代表您的Crawler eG功能的Service类
class CrawlerService
{
function getText($url){
$client = new Client();
$crawler = $client->request('GET', $url);
return $crawler->text();
}
and to execute it i would use a Console Command 并执行它,我将使用控制台命令
If you want to return a Response use a Controller 如果要返回响应,请使用控制器
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.