简体   繁体   中英

preg_match_all regex

Having issues using regex to grab HTML contained in a certain span. Trying to get it to get safeytrfyh is available! on NameMC.com to make a fast checker that will check a pre-specified list if usernames are available instead of constantly typing in the username and clicking check.

An example page you guys can use is https://namemc.com/u/safeytrfyh Im using cURL for this:

<?php
//Urls to scrape from.
$URLs = array();
$URLs[] = 'https://namemc.com/u/safeytrfyh';
$working = '';

//Curl scraper.
foreach($URLs as $URL){
$ch     = curl_init();
curl_setopt($ch, CURLOPT_URL, $URL);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);        
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$page = curl_exec($ch);
$accounts = array();
preg_match_all('#<div><span[^>]*>(.*?)</span></div>#',$page,$accounts);
foreach($accounts[0] as $account){
    $working .= ''.$account.''. PHP_EOL . '';
}
}

//Put the scraped check into the new .txt file.
file_put_contents('accounts.txt', $working, FILE_APPEND);
?>

The usually simpler / less efficient approach is typically traversing the HTML structure with a neat frontend, such as QueryPath etc. qp($html)->find(".alert-danger .alert-link")->text() . Albeit that actually looks less reliable for the concrete task.

Now if for some reason you don't want to look at the HTML source, and adapt your regex, or don't know how placeholders work; then a simpler alternative is just matching for raw text :

$text = strip_tags($html);
preg_match_all("/(\w+) \s+ is \s+ available/x", $text, $matches);

Where \\w+ stands for word characters, \\s+ for spaces, and /x for readability.

You can convert page in to DOM object can get what ever you want as: 

    <?php
            $url = "http://stackoverflow.com/";
            $ch     = curl_init();
            curl_setopt($ch, CURLOPT_URL, $url);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); //  if page is https (use if you are using local host)
            curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36');
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch, CURLOPT_HEADER, 1);        
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

             $page = curl_exec($ch);  //  Can echo to check page 

                $dom = new DOMDocument();
                @$dom->loadHTML($page);
                $xpath = new DOMXPath( $dom );
                $query21 = '//div[@id="question-mini-list"]//h3//a[@class="question-hyperlink"]' ; 
                $nodes21 = $xpath->query( $query21 );  

                $title = "questions.txt";
                $file_title = fopen($title, 'w');

                foreach( $nodes21 as $node21 )
                  {
                    $tit = trim($node21->nodeValue);  // HEADING 
                    fwrite($file_title, $tit . "\r\n");
                   }        

    ?>

OUTPUT as:

    I have an araay in one file and i want to find the size of it in another file using “sizeof” , i dont want to use any extra variables?
    No Activity found to handle Intent act=android.intent.action.VIEW when trying to play an audio file
    Naming a variable dynamically in Ruby
    uanble to use Bootstrap Notify with angular js in mvc application
    How do I combine a bootstrap carousel with a sidebar menu?
    Stop pausing when mouse hover -Full Slider
    How to Let Recordset #2 in the Same Position as the Similar Recordset#1
    Bash backup script. Read list of files. [OS X]
    extracting multiple columns from mt0 in hspice simulations using awk command
    Can't invoke *method=* type methods in instance_eval
    Couldnt understand the Array behavior in ruby
    Could not connect to sql server using msado15.dll in c++
    Slick 3.0.0 AutoIncrement Composite Key
    Swift Error type 'usersVC' does not conform to protocol 'UITableViewDataSource'
    Installation Error Unknown Failure
    Is it possible to 'emulate' a regular post that loads a new page in angularjs? or plain java as a backup?
    puppet file protocol handle throws Could not evaluate
    Hazard of load address in mips
    how to post multipal files to a url from jscript?
    How to organize the viewmodel of tableview with section in reactiveUI
    CQRS with legacy MSSQL database
    Should I use Blob storage or Azure VM storage for files?
    Copy cell content from a column to another column in matlab
    How do I debug a crash on iOS device from a crash log
    Combobox in windows phone 8.1 not showing 4th and 5th element in emulator
    How to add padding in printing table in F#?
    I don't understand the SpriteAccessor class (Universal Tween Engine)
    mule reliable pattern with file streaming and JMS
    How to tell Faraday to preserve hashbang in site URL?
    maven-license-plugin by mycila (replacing license header)
    Customise `JOptionPane.YES_NO_OPTION`
    AWS: Boto SQS writing isn't saving
    Android expandable listview always scrolls down to bottom
    Inconsistency in TypeConverter behavior?
    Using function as prototype
    Adjust width of inline buttons automatically based on parent width
    GetWeek of Month, Week starts from Monday
    Has anybody tried to recreate UITableViewController with static cells?
    Why shows --“cannot pass objects of non-trivially-copyable type”?
    Search and update a string in a text file in JAVA
    What is Countdown Latch in Java MultiThreading?
    Slim Framework with ORM (Eloquent) connect multiple db
    Why isn't the frame centred in this GUI program when it is run?
    Custom Logout Handler Not Working Grails
    Response to post request to AWS “breaks the pipe”, cannot read
    how to set focus to a SearchBox control in windows 8.1 store app?
    Removing a word from after a string
    need to generate css from scss file on windows 8.1 using gruntjs compass
    Arduino YUN - complex JSON response
    How to use expandable list view in the following scenario
    Unique DB entry to the user
    R : Save big objects to disk then only load parts of them
    What is wrong based on these dbus system bus log files?
    NLP Shift reduce parser is throwing null pointer Exception for Sentiment calculation
    Excel VBA - Combine Rows with duplicate values, merge cells if different
    what's TransactionID and RowID and Roll Point size in InnoDB
    File associations in vscode
    Difference Between IEnumerable Model and Model
    efficient way of passing Data between Matlab functions
    Open new Form in same window silverlight app via c#?
    Hibernate configuration to create hbm and POJO
    FTP Client gives “ECONNREFUSED - Connection refused by server”
    Timer in Selective Repeat ARQ
    Can TXL be used for code clone detection
    MATLAB - Callback after reparenting
    Asynchronous execution with datastax mapper
    Stopping gobbler threads in blocking reads on Process InputStream
    how to get gabor filter image using opencv?
    WebView shows source html with loadDataWithBaseURL, not rendered view
    git merge forked repo to local repo
    Scrapy (Python): Iterating over 'next' page without multiple functions
    android:uiOption=“SplitActionBarWhenNarrow” does not work
    md5 hash a large file incrementally?
    Instagram relationship request endpoint registration issue
    cuda calc distance of two points
    How to share contents of ListView row on facebook in Android?
    how will the socket act when the receiving speed is larger than process speed
    cannot see particle (cocos2d-x 3.5 with Particle Designer2)
    Couldn't find FoodObject without an ID
    CardView and RecyclerView divider behaviour
    Verification google play purchase from server side
    dyld: Symbol not found: _iconv when using javac to compile on MacOS
    R not producing a figure in jupyter (IPython notebook)
    Entity Framework 6 update a table and insert into foreign key related tables
    I have integrated CLIPS with VC++(MFC), why there are some function does't execute,such as “strcmp”
    Using SelectBoxIt in AngularJS Directive
    Where is the Google Information Rights Management API?
    Open Graph in Laravel 5
    CodeIgniter 3 Unable to locate the model you have specified
    how to have a static url for shopify oauth?
    Use AnnotationReader under namespace
    No such .h file or directory(Android, Cocos2d-x, NDK)
    Getting total sum of rows and adding and removing rows using knockoutjs
    Dynamic default value for Kendo Grid
    Ruby's class expression---how is it different from `Class.new`?
    socket.emit is not working in mobile chrome (but it works in incognito mode)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM