Regular Expression to Extract all E-mail Addresses from a File With PHP

By | October 8, 2008

Sometimes you need to extract some data from text files. E-mails, passwords, just some simple tags… no matter what it is, your best choice to do this is to use regular expressions. I will show you a PHP script that will extract all valid e-mails from a text file.

<?
$fs=fopen(“best.txt”, “r”);

$f3=fopen(“clean.txt”, “a”);
while(!feof($fs))
{
$gan=fgets($fs);
preg_match(“/[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(.[a-zA-Z0-9-]+)*.(([0-9]{1,3})|([a-zA-Z]{2,3})|(aero|coop|info|museum|name))/”, $gan, $matches);
fwrite($f3, trim($matches[0]).”rn”);
}
fclose($f3);
fclose($fs);

?>

best.txt is a file containing valid e-mail addresses. clean.txt will contain e-mail addresses only. We’re checking every string of best.txt against a regular expression that represents a valid e-mail pattern. “/[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(.[a-zA-Z0-9-]+)*.(([0-9]{1,3})|([a-zA-Z]{2,3})|(aero|coop|info|museum|name))/” is the pattern and I don’t think it is necessary to explain what does it mean. If you’re familiar with regular expressions, you’ll be able to modify it to find any specific e-mails. If not, you may use this example and you will find that it really works. There are some programs on the net, that are doing the same thing, but they work under Windows and don’t allow to process big files.

This script can work with big files, don’t forget to set time limit to 0 (I have this done in my php.ini). Happy parsing! :)

3 thoughts on “Regular Expression to Extract all E-mail Addresses from a File With PHP

Comments are closed.