I got a job from another programmer. Unfortunately, libraries are full of test files too and I don't know which ones are actually in used. I'd like to filter this out by looking at the links in the files.
It would be a long time by hand. I wrote a code but did not list all the files in use.
Clean up the root directory would be enough.
Thank you your advice!
$files = scandir('/public_html/');
$hrefs = array();
foreach ($files as $file) {
$info = pathinfo($file);
if ($info["extension"] == "php") {
$php = file_get_contents($file);
$dom = new DOMDocument();
$dom->loadHTML($php);
$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
$href = $tag->getAttribute('href');
$href = basename($href);
if (is_file($href) && !in_array($href, $hrefs)) {
$hrefs[] = $href;
}
}
$tags = $dom->getElementsByTagName('form');
foreach ($tags as $tag) {
$href = $tag->getAttribute('action');
$href = basename($href);
if (is_file($href) && !in_array($href, $hrefs)) {
$hrefs[] = $href;
}
}
$tags = $dom->getElementsByTagName('img');
foreach ($tags as $tag) {
$href = $tag->getAttribute('src');
$href = basename($href);
if (is_file($href) && !in_array($href, $hrefs)) {
$hrefs[] = $href;
}
}
}
}
print_r($hrefs, true);
I just quickly put the following together to scan a directory & sub-directories to list files according to discovered content within files - it might be of use.
error_reporting( E_ALL );
ini_set( 'display_errors', 1 );
set_time_limit( 60 );
/* edit to suit. Choose directory, file extensions and exclusions */
$config=(object)array(
'directory' => __DIR__,
'extensions' => array( 'php', 'html', 'htm' ),
'exclusions' => array(
'bookmarks_11_01_2019.html',
'bookmarks_05_01_2019.html'
)
);
function getnodes($type,$attr){
/*
helper function to get $type elements
and return attribute $attr
*/
global $dom;
global $info;
global $ext;
$col=$dom->getElementsByTagName( $type );
$tmp=[];
if( $col->length > 0 ){
foreach( $col as $node ){
$tmp[]=array(
$attr => $node->getAttribute( $attr ),
'file' => $info->getFileName(),
'dir' => $info->getPathInfo()->getRealPath(),
'type' => $type,
'ext' => $ext
);
}
}
return $tmp;
}
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$links=[];
/* create the recusive iterators */
$dirItr=new RecursiveDirectoryIterator( $config->directory, RecursiveDirectoryIterator::KEY_AS_PATHNAME );
foreach( new RecursiveIteratorIterator( $dirItr, RecursiveIteratorIterator::CHILD_FIRST ) as $obj => $info ) {
if( $info->isFile() ){
$ext = pathinfo( $info->getFileName(), PATHINFO_EXTENSION );
/* only scan files of specified extensions that are not in the exclusions list */
if( in_array( $ext, $config->extensions ) && !in_array( $info->getFileName(), $config->exclusions ) ){
/* load a new file into DOMDocument */
$dom->loadHTMLFile( $info->getPathName() );
/* ignore errors */
libxml_clear_errors();
/* find elements that may be of interest */
$links=array_merge(
$links,
getnodes( 'a', 'href' ),
getnodes( 'form', 'action' ),
getnodes( 'img', 'src' ),
getnodes( 'iframe', 'src' )
);
}
}
}
/* display scan results*/
printf( '<pre>%s</pre>', print_r( $links, true ) );
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.