How to Parse All Links From a Page With PHP Using DOM Technology

By | December 17, 2008

Today I will share a simple script that may be used to extract all URLs from a single page. You don’t have to deal with regular expressions anymore, if you don’t like them. DOM technology that is integrated in PHP5 allows you to do this in just some strings of your code without any specific knowledge. Here is the solution.

<?php
error_reporting(0);
$url=”http://mail.ru”;
$content=file_get_contents($url);
$dom = new DOMDocument;
if ($dom->loadHTML($content))
{
$as = $dom->getElementsByTagName(“a”);
foreach ($as as $a)
{
$allurl.=$a->getAttribute(“href”).”rn”;
}
}

echo $allurl;
?>

Everything is quite simple and I don’t think there are any explanations necessary. The only limitation is that you need to have a valid enough HTML code on the page you’re parsing. The script above will collect all links from a single page by href attribute of <a> tag. If you can do this simpler, you’re welcome to show me the solution.