Home > Apache Performance, PHP Solutions, Regexps > Regular Expression to Parse Text Between Simple Tags (XML)

Regular Expression to Parse Text Between Simple Tags (XML)

September 6th, 2008 Leave a comment Go to comments

It is often necessary to extract text from a variable that contains HTML or XML code. I’ve created a simple regular expression that will help you to extract all text between certain tags into an array. It is a PHP solution, though regular expression is compatible with other programming languages.

preg_match_all(“/<tag>(.*?)<\/tag>/”, $source, $results);

This construsion will create an array with extracted data. All you need is to change “tag” to any tag you like. This string was created to parse xml files, but it will work for simple HTML tags without attributes too.

The function above will extract all occurences of regular expression match. $output will contain an array with the extracted values. Please, run var_dump to check what’s in this array

  1. Tester
    November 20th, 2008 at 20:57 | #1

    (.*?) does not work. It gives entire tag not value of the tag. For example

    text = “one”

    Here i wish relsult “one” not “one”

  2. admin
    November 21st, 2008 at 05:26 | #2

    Don’t quite understand your question: do you need to avoid getting results with quotes? Did you perform a var_dump of your resulting array? Can I have more speific example of text you’re trying to parse?

  1. No trackbacks yet.