I seem to be having some trouble writing a regular expression parser for a PHP application I'm developing. I'd like to be able to parse a Wikipedia page and have an expression match each link that points to another Wikipedia page. I've written this regular expression:
Code:
/<a href=\"http:\/\/en.wikipedia.org\/wiki\/(.*)\"/
but it only works on the first link per line in the html file. Any idea why this is?
magicdanw wrote:
I seem to be having some trouble writing a regular expression parser for a PHP application I'm developing. I'd like to be able to parse a Wikipedia page and have an expression match each link that points to another Wikipedia page. I've written this regular expression:
Code:
/<a href=\"http:\/\/en.wikipedia.org\/wiki\/(.*)\"/
but it only works on the first link per line in the html file. Any idea why this is?

Code:
/<a href=\"http:\/\/en.wikipedia.org\/wiki\/(.*)\"/ig
Hmm g doesn't seem to be a valid regex modifier in php... What were you attempting to do? I do see the goal of the i modifier (case-insensitivity), but Wikipedia seems to be consistent with it's url case.
'g' is the global flag, it means the regex will match every instance of the specified pattern rather than just the first instance. I can't remember the g flag not working for me, what's your full line of php?

Code:
echo preg_replace("/<a href=\"http:\/\/en.wikipedia.org\/wiki\/(.*)\"/i", "<a href=\"index.php?date=" . $_GET["date"] . "&title=$1\"", $original);
The odd thing is, I run it once for the whole document, not once per line, yet it replaces exactly one instance of the pattern per line.
Your regular expression (.*) is greedy. Make it non-greedy (.*?) instead.

Code:
echo preg_replace(
    '#<a href="http://en.wikipedia.org/wiki/(.*?)"#i',
    '<a href="index.php?date="' . $_GET["date"] . '&title=$1"',
    $source
);
Thanks Ben, it works like a charm! Smile
benryves wrote:
Your regular expression (.*) is greedy. Make it non-greedy (.*?) instead.

Code:
echo preg_replace(
    '#<a href="http://en.wikipedia.org/wiki/(.*?)"#i',
    '<a href="index.php?date="' . $_GET["date"] . '&title=$1"',
    $source
);
That was going to be my next suggestion, thanks Ben. Smile I learned something today too; I would usually do something like [^"]* to replace .*.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement