简体   繁体   中英

Check used files in directories with PHP

I got a job from another programmer. Unfortunately, libraries are full of test files too and I don't know which ones are actually in used. I'd like to filter this out by looking at the links in the files.

It would be a long time by hand. I wrote a code but did not list all the files in use.

Clean up the root directory would be enough.

Thank you your advice!

$files = scandir('/public_html/');

$hrefs = array();

foreach ($files as $file) {

   $info = pathinfo($file);
   if ($info["extension"] == "php") {

      $php = file_get_contents($file);

      $dom = new DOMDocument();
      $dom->loadHTML($php);

      $tags = $dom->getElementsByTagName('a');
        foreach ($tags as $tag) {
          $href = $tag->getAttribute('href');
          $href = basename($href);
          if (is_file($href) && !in_array($href, $hrefs)) {
              $hrefs[] = $href;
          }
      }

      $tags = $dom->getElementsByTagName('form');
      foreach ($tags as $tag) {
          $href = $tag->getAttribute('action');
          $href = basename($href);
          if (is_file($href) && !in_array($href, $hrefs)) {
              $hrefs[] = $href;
          }
      }

      $tags = $dom->getElementsByTagName('img');
      foreach ($tags as $tag) {
        $href = $tag->getAttribute('src');
        $href = basename($href);
        if (is_file($href) && !in_array($href, $hrefs)) {
            $hrefs[] = $href;
        }
      }

  }

}

print_r($hrefs, true);

I just quickly put the following together to scan a directory & sub-directories to list files according to discovered content within files - it might be of use.

error_reporting( E_ALL );
ini_set( 'display_errors', 1 );
set_time_limit( 60 );

/* edit to suit. Choose directory, file extensions and exclusions */
$config=(object)array(
    'directory'     =>  __DIR__,
    'extensions'    =>  array( 'php', 'html', 'htm' ),
    'exclusions'    =>  array(
        'bookmarks_11_01_2019.html',
        'bookmarks_05_01_2019.html'
    )
);


function getnodes($type,$attr){
    /*
        helper function to get $type elements 
        and return attribute $attr
    */
    global $dom;
    global $info;
    global $ext;

    $col=$dom->getElementsByTagName( $type );
    $tmp=[];
    if( $col->length > 0 ){
        foreach( $col as $node ){
            $tmp[]=array(
                $attr   =>  $node->getAttribute( $attr ),
                'file'  =>  $info->getFileName(),
                'dir'   =>  $info->getPathInfo()->getRealPath(),
                'type'  =>  $type,
                'ext'   =>  $ext
            );
        }
    }
    return $tmp;
}


libxml_use_internal_errors( true );
$dom=new DOMDocument;
$links=[];  

/* create the recusive iterators */
$dirItr=new RecursiveDirectoryIterator( $config->directory, RecursiveDirectoryIterator::KEY_AS_PATHNAME );
foreach( new RecursiveIteratorIterator( $dirItr, RecursiveIteratorIterator::CHILD_FIRST ) as $obj => $info ) {
    if( $info->isFile() ){

        $ext = pathinfo( $info->getFileName(), PATHINFO_EXTENSION );

        /* only scan files of specified extensions that are not in the exclusions list */
        if( in_array( $ext, $config->extensions ) && !in_array( $info->getFileName(), $config->exclusions ) ){
            /* load a new file into DOMDocument */
            $dom->loadHTMLFile( $info->getPathName() );

            /* ignore errors */
            libxml_clear_errors();

            /* find elements that may be of interest */
            $links=array_merge( 
                $links,
                getnodes( 'a', 'href' ),
                getnodes( 'form', 'action' ),
                getnodes( 'img', 'src' ),
                getnodes( 'iframe', 'src' )
            );
        }
    }
}

/* display scan results*/
printf( '<pre>%s</pre>', print_r( $links, true ) );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM