JANUARY 2005
VOLUME IV - ISSUE 1
TM
www.phparch.com
The Magazine For PHP Professionals
This copy is registered to: livia carboni jackson
[email protected] TABLE OF CONTENTS
php|architect
TM
Departments 6
Features
EDITORIAL
I N D E X
Hello, Goodbye 10 7
What’s New
24
Test Pattern
Generating OpenOffice.org Documents with PHP by Bård Farstad
Shedding a Tier by Marcus Baker
45
18
Product Review
by Ron Goff
Will the Best PHP IDE Please Stand Up? by Peter B. MacIntyre
56
Where in the World was that Photo Taken?
30
Transliteration with PHP by Derick Rethans
Security Corner
SQL Injection by Chris Shiflett
37 61
Tips & Tricks
Akaar Being Smarty with Smarty! by Chirag Ahmedabadi
Javascript Remote Scripting with PHP by John W. Holmes
48 65
exit(0);
Iterators in PHP5 by Rami Kayyali
The PHP Security Saga by Marco Tabini
January 2005
●
PHP Architect
●
www.phparch.com
4
EDITORIAL
Hello, Goodbye
php|architect
TM
E D I T O R I A L
R A N T S
Volume IV - Issue 1 January, 2005
A
new year is upon us—and quite a few interesting things have already happened. We just published our first book, for example. The Zend PHP Certification Practice Test Book, which I co-wrote with John Coggeshall, has just been unleashed on the PHP community with (if I may unleash some personal pride) extremely good results. In a separate—but far more important— piece of news, PHP was named “language of the year 2004” by a site that tracks language usage in the development community. PHP 5 continues to plow along quickly and efficiently, with a new point release scheduled for release soon that will introduce some much-anticipated new functionality. However you look at it, 2005 is poised to be a marquee year for PHP. There is so much going on that I can hardly keep my head around it and continue my daily activity here at php|a headquarters (although I must say that the two weeks of vacation I took around Christmas were very helpful in getting my head wrapped around doing absolutely nothing. What a pleasant change of pace…). On the other hand, 2004 was, in many ways a marquee year for PHP as well. This just highlights the positive direction that the language is taking, shaped in many ways by the vast amount of work that everyone in the community—even those who can just be found complaining on the mailing lists—has put into defining its goals and needs. Here at php|a, there are three important news items that I want to share with you this month. First of all, John W. Holmes, who has taken care of our Tips & Tricks column since the very first issue, is leaving us. John is a Captain in the U.S. Army, and his “day job” is keeping him way too busy to deal with such a demanding column on a monthly basis. I know you expected me to say this, but I really, really enjoyed working with John. He has the (unfortunately rare) ability to be technically accurate and linguistically clear—his writings were always as pleasant to read as they were to edit. Luckily, the T&T column is far from over—but more about that will have to wait for another editorial. Thank you, John, and Godspeed. Just to stay on the editorial front, we have a new column starting this month, titled Test Pattern and penned by Marcus Baker. In his column, Marcus will be dealing with the issue of proper software design as applied to PHP development, from patterns, to tiering, to testing. The goal of this column is to challenge you, dear readers, not just to write more efficient, but also more beautiful code—to make every single one of your applications a little work of art that is well-thought-out, properly designed and executed flawlessly. Marcus has an awesome job ahead of him, but, then again, he is an awesome fellow, so I’m sure that you’ll enjoy his writings. Finally, you may have heard about some recent security issues that have struck both PHP and PHP-based applications, notably the popular forum software phpBB. The reaction to these issues has been less than stellar, in my humble opinion,
January 2005
●
PHP Architect
●
www.phparch.com
Publisher Marco Tabini
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Graphics & Layout Arbi Arzoumani
Managing Editor Emanuela Corso
News Editor Leslie Hill
[email protected] Authors Marcus Baker, Bård Farstad, Ron Goff, Rami Kayyali, Andreas Koepke, Peter MacIntyre, Derick Rethans, Chirag Ahmedabadi
php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
Contact Information:
[email protected] General mailbox: Editorial:
[email protected] Subscriptions:
[email protected] Sales & advertising:
[email protected] Technical support:
[email protected] Copyright © 2003-2004 Marco Tabini & Associates, Inc. — All Rights Reserved
6
NEW STUFF
What’s New!
N E W
S T U F F
php|architect launches php|tropics 2005 Ever wonder what it's like to learn PHP in paradise? Well, this year we've decided to give you a chance to find out! We're proud to announce php|tropics 2005, a new conference that will take place between May 11-15 at the Moon Palace Resort in Cancun, Mexico. The Moon Palace is an allinclusive (yes, we said all inclusive!) resort with over 100 acres of ground and 3,000 ft. of private beach, as well as excellent state-of-the-art meeting facilities. As always, we've planned an in-depth set of tracks for you, combined with a generous amount of downtime for your enjoyment (and your family's, if you can take them along with you). We even have a very special early-bird fee in effect for a limited time only. For more information, go to http://www.phparch.com/tropics.
Zend Technologies Unveils Integrated Software Platform Zend has announced the unveiling of Zend Platform 1.1 Zend Technologies, Inc., creator and ongoing innovator of PHP, products and services supporting the development, deployment and management of PHP-based applications, today unveiled Zend Platform 1.1. The newest member in the Zend family of products is the first integrated software platform that supports the reliability, scalability and interoperability requirements of business critical PHP applications. The product was developed based on direct feedback from hundreds of Zend customers currently using Zend products to develop and manage corporate applications, and is currently in use at Zend customer sites. Zend Platform adds a wide range of new functionality that speeds time to production and improves end user satisfaction by increasing the overall performance of enterprise applications. Zend Platform 1.1 is available immediately. "As PHP matures and evolves, the need for an integrated solution for building and deploying business critical applications becomes more relevant," said Pamela Roussos, vice president of marketing at Zend. Zend Platform is the first comprehensive lifecycle management solution for PHP users and is the only next generation infrastructure product that directly supports the development and deployment of business critical enterprise PHP applications. The feedback from customers was critical in our development of this solution, and directly addresses the needs in our user community." For more information visit: http://www.zend.com
The Zend PHP Certification Practice Test Book is now available! We're happy to announce that, after many months of hard work, the Zend PHP Certification Practice Test Book, written by John Coggeshall and Marco Tabini, is now available for sale from our website and most book sellers worldwide! The book provides 200 questions designed as a learning and practice tool for the Zend PHP Certification exam. Each question has been written and edited by four members of the Zend Education Board--the very same group who prepared the exam. The questions, which cover every topic in the exam, come with a detailed answer that explains not only the correct choice, but also the question's intention, pitfalls and the best strategy for tackling similar topics during the exam. For more information, visit http://www.phparch.com/cert/mock_testing.php
January 2005
●
PHP Architect
●
www.phparch.com
7
NEW STUFF
struts4php 1.0.4_20041212 struts4php.org has announced the latest release of struts4php. The developers of struts4php released a new version of the well-known frameworks with numerous bugfixes and improvements. The development team has also released a new website for better information and communication purposes. New is the possibility to download struts4php in the current version as PEAR Package under http://www.struts4php.org/pear/struts4php-current.tgz For more information visit: http.//www.struts4php.org
Php 4.3.10 and 5.0.3 Released The PHP Development Team would like to announce the immediate release of PHP 4.3.10 and PHP 5.0.3. These are maintenance releases that in addition to non-critical bug fixes address several very serious security issues. All Users of PHP are strongly encouraged to upgrade to one of these releases as soon as possible. For changes since PHP 4.3.9, please consult the PHP 4 ChangeLog. For changes since PHP 5.0.2, please consult the PHP 5 ChangeLog. For more information, visit http://www.php.net
Check out some of the hottest new releases from PEAR.
PEAR 1.3.4 PEAR 1.3.4 fixes a serious problem caused by a bug in all versions of PHP that caused multiple registration of the shutdown function of PEAR.php, makes pear help listing more useful by putting the how-to-use info at the bottom of the listing, and several bug fixes.
Net_Monitor 0.0.7 A unified interface for checking the availability services on external servers and sending meaningful alerts through a variety of media if a service becomes unavailable.
I18Nv2 0.10.0 This package provides basic support to localize your application, like locale based formatting of dates, numbers and currencies. Beside that it attempts to provide an OS independent way to setlocale() and aims to provide language, country and currency names translated into many languages.
Net_FTP 1.3.0RC2 Net_FTP allows you to communicate with FTP servers in a more comfortable waythan the native FTP functions of PHP do. The class implements everything nativly supported by PHP and additionally features like recursive up- and downloading, dircreation and chmodding. It although implements an observer pattern to allow for example the view of a progress bar.
PHP_Fork 0.2.0 PHP_Fork class. Wrapper around the pcntl_fork() stuff with a API set like Java language. Practical usage is done by extending this class, and re-defining the run() method. [see basic example] This way PHP developers can enclose logic into a class that extends PHP_Fork, then execute the start() method that forks a child process. Communications with the forked process is ensured by using a Shared Memory Segment; by using a user-defined signal and this shared memory developers can access to child process methods that returns a serializable variable. The shared variable space can be accessed with the two methods: • void setVariable($name, $value) • mixed getVariable($name)
$name must be a valid PHP variable name; $value must be a variable or a serializable
object.
Resources (db connections, streams, etc.) cannot be serialized and so they’re not correctly handled. Requires PHP build with —enable-cli —with-pcntl —enable-shmop. Only runs on *NIX systems, because Windows lacks of the pcntl ext.
January 2005
●
PHP Architect
●
www.phparch.com
8
NEW STUFF
Editorial
Hello Goodbye Continued... Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
WinBinder 0.34.117 WinBinder is a new extension that allows PHP programmers to build native Windows applications. It wraps the Windows API in a lightweight, easy-to-use library so that program creation is quick and straightforward.
mailparse 2.1 Mailparse is an extension for parsing and working with email messages. It can deal with rfc822 and rfc2045 (MIME) compliant messages.
newt 0.3 PHP-NEWT - PHP language extension for RedHat Newt library, a terminal-based window and widget library for writing applications with user friendly interface. Once this extension is enabled in PHP it will provide the use of Newt widgets, such as windows, buttons, checkboxes, radiobuttons, labels, editboxes, scrolls, textareas, scales, etc. Use of this extension if very similar to the original Newt API of C programming language.
ssh2 0.4.1 Provides bindings to the functions of libssh2 which implements the SSH2 protocol. libssh2 is available from http://www.sourceforge.net/projects/libssh2
eAccelerator 0.9.1 eAccelerator announces their latest release 0.9.1. What is it? The eAccelerator sourceforge page describes it as: ” a further development from mmcache PHP Accelerator & Encoder. It increases [the] performance of PHP scripts by caching them in compiled state, so that the overhead of compiling is almost completely eliminated.” Get more information from http://eaccelerator.sourceforge.net/Home.
PHP awarded programming language of 2004 A new post on php.net announces PHP being awarded programming language of the year. PHP has been awarded the Programming Language of 2004, according to the TIOBE Programming Community Index. This index uses information collected from the popular search engines, and are based on the world-wide availability of skilled engineers, courses and third party vendors. Congratulations to us all! For more information visit: www.php.net
with the blame bucket being passed around several hands rather than focusing everyone’s energy on ensuring not only that bugs would be fixed, but that people would be properly informed and made aware of them. For my part, I’ve decided that we should help with what we know how to do best: informing people. On January 1st (talk about a New Year resolution!), we started a new mailing list dedicated exclusively to PHP security. You can read more about it in this month’s exit(0) column (which you’ll find at the end of the magazine), so I won’t bore you here by duplicating the details. I simply hope that the mailing list will help everyone keep security more in check; PHP has become so popular that we can no longer afford to hide beneath the folds of the Web and hope that no-one will find out about any of our weaknesses—they will, and we must be ready to deal with the consequences. Until next month, happy readings!
MySQL Query Browser 1.1.5 MySQL.com announces the latest relase of the MySQL Query Browser. MySQL.com claims: ” MySQL Query Browser is the easiest visual tool for creating, executing, and optimizing SQL queries for your MySQL Database Server. The MySQL Query Browser gives you a complete set of drag-and-drop tools to visually build, analyze and manage your queries.” For more information or to download, visit: http://www.mysql.com/products/query-browser
phpMyFAQ 1.5.0 Alpha 2 phpmyfaq.de announces “The second alpha version of phpMyFAQ 1.5.0 is available. This version is PHP5 compatible and introduces a faster template engine. LDAP support is now a selectable option and the traditional Chinese and Japanese language files were updated. Beside some code improvements we fixed a lot of bugs. Do not use this version in production systems, but test this version and report bugs!” Get the latest info from phpmyfaq.de.
January 2005
●
PHP Architect
●
www.phparch.com
9
FEATURE
Generating OpenOffice.org Documents with PHP F E A T U R E
by Bård Farstad
As you might know, OpenOffice.org is getting more and more users. This article will show you how you can generate documents from PHP that the Writer component of OpenOffice.org can read. It’s a follow up to the author’s previous article, which appeared in the October 2004 issue of php|architect and dealt with extracting information from OpenOffice documents.
Getting started Before we get started with the actual coding, we need to get an overview of the format we are working with. OpenOffice.org documents are XML files stored inside ZIP archives. In the list below, you can see the directory structure of the files inside an OpenOffice.org document. When you unzip this document, you will simply get some plain XML files and directories. . |— | |— |— |— |— `—
META-INF `— manifest.xml content.xml meta.xml mimetype settings.xml styles.xml
Of course, in order to generate an OpenOffice.org document, we need to perform this process in reverse and create a ZIP file from the XML files that we create. In this article, I will show you exactly how you can use PHP to produce all the different files required to form a valid OpenOffice.org Write document. The files shown in the example above are the bare minimum required to make up a document that OpenOffice.org will recognize as valid. If you, for example, have embedded an image inside your document,
January 2005
●
PHP Architect
●
www.phparch.com
then this is also stored as a separate file in a folder named Pictures (not shown above). The content.xml file contains the actual text in your document. Headers, paragraphs, lists and tables are recorded in an XML format for content that is well documented, unlike the formats used by competitors like Microsoft Word. This is the file on which this article will focus for most of the time. The styles.xml file contains the definition of the fonts, colors, sizes and other stylistic elements used by the document. To draw a parallel with HTML, this would be the equivalent of a CSS file, while content.xml would be the equivalent of the HTML document to which the stylesheet applies. meta.xml contains “meta” information about the document. In this file, you can, for example, find the name of the document, its keywords and statistics like the total number of words that it contains. You can also store meta information, like Dublin Core, in meta.xml.
REQUIREMENTS PHP
4.3+
OS
Any
Other Software
N/A
Code Directory
office
10
FEATURE
Generating OpenOffice.org Documents with PHP
Dublin Core is a meta data standard that defines a generic set of attributes used to describe a specific piece of information. Settings.xml is actually intended for the OpenOffice.org editor itself. It’s used to store GUI settings and, since this has nothing to do with content, we are not going to look into how we can store arbitrary settings into this file. Manifest.xml is a simple XML file that contains a reference to each file that makes up the document. The type of the document is stored in the mimetype file, which contains a single line with the MIME denomination of the document’s type, such as, for example, “application/vnd.sun.xml.writer” for a Writer file. Getting the Content Right Since the most important part of our document is (obviously) its content, I will start by showing you how we can generate a minimal content.xml file. As you can see in Listing 1, after the document type definitions we get to the main element, office:document-content, which, in turn, contains optional elements like office:script, office:font-decls and office:automatic-styles. In our example, they are empty. The office:body element contains the document’s contents themselves, but, before we get into its details, we should also mention the text:sequence-decls element, which is used for numbering items in the document and defining in which order different items are numbered. In our sample document, I have just supplied the default order. Immediately after the sequence declaration, we insert our actual content in the office:body element. In this minimal document, I have just added a small para-
graph that displays the text “Hello World!” Paragraphs and Inline Styles The most common element that is usually added a document is a simple paragraph. In the code snipped below, you can see the basic syntax of a minimal paragraph: Hello World!
Notice that the namespace prefix text is used. This is used for all textual elements in the document—which makes it very simple to extract all textual information from it. The paragraph is also styled with the Standard style. The definition of this style can be found in the in the styles.xml file. If you need to add whitespace in your text, you can use the text:s element. This is the spacing element: you define how many characters this space takes up in the text:c attribute. See the example below for a simple space definition which takes up 4 characters:
Inside paragraphs, you normally have formatting elements like bold, italic and underline. In OpenOffice.org, these styles do not have any matching tags—a text:span tag to which different styles are attached is used instead. All unique spans are given a style name with the text:style-name attribute. The definition of these styles if actually found in the content.xml files under the office:automatic-styles
Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Hello World!
January 2005
●
PHP Architect
●
www.phparch.com
11
FEATURE
Generating OpenOffice.org Documents with PHP
element. To make some text bold, we therefore first need to create a definition of the style, for example like the one shown below:
Every style that is part of the automatic styles group is defined with the style:style element and given a unique name. Its properties are set using the style:properties element—that is, the same way as all styles are defined in styles.xml. In our style, we simply defined the fo:font-weight to be bold. This is the basic definition of bold text. Once we have created a style definition for it, we can use the T1 style to mark bold text as such. We simply add a span element around the text we want to affect and set the style name to T1. Below, you see an example of how we mark text as being bold in our content: Here are some bold text.
Headers Headers are important parts of any document, as they are used to structure content. They are defined at the same level as paragraphs using the text:h element. As with all other text elements, you define the style of the element separately. In addition, you need to define the level of the header using the text:level attribute. For example, here’s an example of a Level-1 header element. To create a header with another level you simply change the level and style attribute. A header
Images Of course, any nice-looking document contains images. The OpenOffice.org document format is a collection of files, and images are no exception. You need to put any image you want displayed in your documents in a subdirectory called Pictures. This image, in turn, needs to be referenced in the content.xml file. You can place images inside paragraphs using the draw:image element. The XML code below shows a typical image element:
For a simple centered image, we do not have to worry about all the attributes of this node, but what we do need to take care of is the svg:width and svn:height attributes, which define how large the image is when it is displayed in the document. We also need to supply the path to the image, which is relative from the root of your document package; this is done with the xlink:href attribute. In order to calculate the size of the image, we need to translate pixels into inches. To do so, we must find the size of the image in pixels, which can be done (for example) using the getimagesize() standard PHP function. We also need the DPI (dot-per-inch) settings for the image. For example, 75 is a common setting for low-resolution images. If you use a high quality printer or publisher, images usually need to be at least 300DPI. Once we know how many dots (which in our case is the same as pixels) we have, we can easily calculate the size of the image in inches. The formula for this is: Size in inches = pixel width/DPI
For example, a 300-pixel-wide image printed at 75DPI will be 300 / 75 = 4 inches wide. The same calculation is used for the height. In PHP, the size calculation can be done as shown below. $fileName = “/path/to/myimage.jpg”; $sizeArray = getimagesize( $fileName ); $width = $sizeArray[0] / 75; $height = $sizeArray[1] / 75;
Note that this is just one example. You can also use the EXIF extension to retrieve the number of DPI. Lists You may also want to have some lists inside your document. These are placed at the same level as paragraphs and headers in your the content XML file. You can have both unordered and ordered lists using the text:unordered-list and text:ordered-list elements respectively. The list contains one or more text:listitem elements, which, in turn, enclose the content for each list item—normally, just a paragraph with some text inside, but (at least in theory) as complex as you need it to be. The XML snippet below shows an unordered list containing two elements. You can see that lists, like all other block elements, are also styled with the text:style-name attribute. OpenOffice.org normally uses L1, L2, and so on for naming list styles, but that’s just an arbitrary convention you can choose to ignore in favour of your own flavour if you like.
12
FEATURE
Generating OpenOffice.org Documents with PHP
Item text Second item
Writing the XML file We have now looked at how we can build the content.xml file, which is the most important portion of our document. In fact, if we do not care about what styles are used, the only thing we have to worry about is the content file—and use just a set of standard templates for the remainder ofthe document components. In my previous article about OpenOffice.org and PHP, I showed how you can use PHP to parse the XML files using a standard PHP DOM XML parser and XSLT transformations. We can also use a DOM XML library to gen-
erate our XML documents, but, in this article, I will use a much simpler approach: we’ll just write the XML text directly to a file. In my production code, I also do not use a DOM library to generate the XML, since doing so would make the code quite a bit slower. My code, in fact, is as simple as the one shown in the snippets below. Essentially, I place the XML code making up our content.xml file in a variable and then write it to disk. In the code snippet below, for example, I named the variable $contentXML: $contentXML = “
Ron is the technical director/senior programmer for Conveyor Group (http://www.conveyorgroup.com), a Southern-California based web development firm. His responsibilities include technology development, programming, IT and network management, strategic research, server systems management (webmaster), and website projects leader.
To Discuss this article:
http://forums.phparch.com/193
Award-winning IDE for dynamic languages, providing a powerful workspace for editing, debugging and testing your programs. Features advanced support for Perl, PHP, Python, Tcl and XSLT, on Linux, Solaris and Windows.
Download your free evalutation at www.ActiveState.com/Komodo30
TEST PATTERN
Shedding a Tier
T E S T
P A T T E R N
by Marcus Baker
The four-layer, or four-tier, architecture is an enterprise development classic. The trouble is that, for small projects (or big simple ones) it is complete overkill. What happens when we try to simplify this layering?
L
ayering is essential. The only way our rather feeble brains can cope with software development at all is by a process of divide and conquer. This is because bugs are easy to fix once you find them—but finding them is the problem. If we can make a part of our code completely unaware of the other parts, we know for sure that any errors in it are local. Layering is the grandest expression of divide-and-conquer: it divides our entire application into a very few pieces and declares that each one can only be influenced by itself and, at most, one other. In particular each layer can only see the next one down. This is easy to understand and works well. It’s no surprise, then, that this technique has been applied to complex enterprise applications and that there are lots of layered systems to choose from. It also means, unfortunately, that terminology has suffered. Layers are sometimes called logical tiers, or just tiers. You also see texts where “tiering” or layering is described as the separation of hardware, that is, the use of multiple servers. Faced with this confusion and the need to fit an explanation into a single article, I am going to have to punt. My preferred solution in this arena is four layers, so I’ll take as my starting point the one used by Eric Evans in “Domain Driven Design” (published by Addison Wesley).
January 2005
●
PHP Architect
●
www.phparch.com
Then we’ll prod and poke it. The Four-Layer Architecture As you can see in Figure 1, the layers in our model are presentation, application, domain and infrastructure. If you are not used to UML, then the tabbed boxes are packages—they are, basically, big dollops of code. The arrows show visibility, so that the application layer is blissfully unaware of the presentation layer, for example. To demonstrate the way the layers work, I am going to use the very trivial example of a contact manager. Firstly, let’s see what the presentation code would look like for the single task of e-mailing someone: Mail sent
REQUIREMENTS PHP
4.x +
OS
Any
Other Software
None
Code Directory
tier
24
TEST PATTERN
Shedding a Tier
Message sent to
The method and style of interaction, or, in this case, the lack thereof, is what makes up the presentation layer. If you can imagine changing the way the application is used—for example, switching to a GUI or a web services API—then anything that would change must go into this layer. That’s actually a lot of stuff: it naturally includes JavaScript, CSS, form parameters and the HTML, but it also includes sessions and maintaining authorization. After all, these will be different for, say, a desktop application compared with a web one. The presentation layer is allowed to interrogate the application one, here represented by the Community class. Let’s look at that next: class Community { ... function mail($name, $title, $message) { $finder = new PersonFinder(); $person = &$finder->findByName($name); $contact = &$person->getEmail(); return $contact->send($title, $message); } }
I don’t have the space to build a complete four-layer application at this point, so I am going to have to illustrate the ideas with code fragments from now on.
the application code should tell a simple story of what is going on. In our example, this boils down to finding a person, getting a contact point from them and finally sending the message. The grammar just then is English, but the grammar of our code snippet is PHP syntax. In our example, I am choosing the Community class to be part of the application layer, but I would expect classes like Person to be used in several applications within an organization. Because of this, I think it’s safe to assume then that Person is a domain layer object. Let’s look at a domain object next: class Person { ... function &getEmail() { $query = new SelectQuery(‘contacts’); $query->mustEqual(‘owner’, $this->id); $query->mustEqual(‘media’, ‘email’); $query->orderBy(‘preference’); $rows = &$this->connection->select($query); if ($row = $rows->next()) { return new Contact($row); } } }
There are business rules, even here in this trivial example. Ordering by preference means that we are taking the first of a possible list of contact points. Because there are other ways to contact our people, we had to specify a media, in this case e-mail. Unlike the applica-
“All a template engine really does is separate the visual formatting from the remainder of the application—it does not, by itself, separate all of the presentation logic. ”
The application layer is the glue that binds all of the components together. It’s all about actions written in a language that the business stakeholders would understand. The domain objects contain the more innate business rules. An example of domain knowledge is how the e-mail is sent. The application layer knows nothing of this process, extra headers, formatting, and so on. It just kicks off the domain code. What makes something an application object and what makes it a domain object is subtle. The distinction comes about because applications change more frequently, often in response to what users want from the business. The knowledge of the business domain itself is acquired more slowly and with a lot more effort. In fact, so much effort goes into this process of discussion between the developers and stakeholders that it is a process known as “knowledge crunching.” By contrast,
January 2005
●
PHP Architect
●
www.phparch.com
tion layer example, we have some clutter caused by the database access. We’ll take a broom to this in a little while. As we descend to the lowest infrastructure level, we start to get to the nitty-gritty. The code the domain object is using is stuff that could be common to any organisation—library code if you will. Here is some infrastructure code: class Connection { ... function &select($query) { $sql = $query->asString(); $result = mysql_query($sql, $this->connection_id); if (! $result) { return false; } } return new ResultSet(result); }
25
TEST PATTERN
Shedding a Tier
If you are like me, then you have written this type of code a lot of times. More likely, you have had the good sense to use one of the many free libraries instead. You are probably thinking that all of these little classes and files could be replaced with a single top level script that would be a whole lot simpler. That would be a good point. For such a simple task, it’s definitely worth noting that I would have a hard time disagreeing with you. The four-layer architecture only really comes into play once the job starts to get complicated. For smaller projects, we can simplify to taste, so let’s look at some shortcuts. Merging Application and Presentation The blending of the layers can be seen in its most naive form like so: Mail sent Message sent to
All we have done is taken the code in the old application class and pasted it straight into our top level script. It’s the simplest way to combine the layers and, in fact, you often see this approach hidden behind a template engine:
have to be tested by looking at web pages—altogether a rather coarse approach. It works well for a small project, but this is about as much of a hole as I like to dig before I get nervous. The warning signs are tricky bugs with things like security and also excessive duplication of code across the top level scripts. Merging Application and Domain layers Because of the very slight difference between these two layers, it is common to merge them into a single one. That becomes one of the classic three-layer architectures. There is no difference in the code—it is just that the Community class is declared to be in the domain layer. You can actually merge the application and domain layers with few ill effects, but with one caution. The symptom to watch out for is domain layer objects that are difficult to test because of configuration. Because domain objects are the part of the business rules you would like to reuse across the organisation, you don’t want them tied to a specific server. This may happen because they have fixed paths for files, or perhaps resources such as database passwords need to be global. This is a sign of future trouble. These kinds of decisions most definitely belong in the application layer and there is a big win in passing all of this into the domain Figure 1
All a template engine really does is separate the visual formatting from the remainder of the application—it does not, by itself, separate all of the presentation logic. We still have the $_GET array in our code, for instance. This choice is great for separating the HTML so that it can be edited by graphic designers. It doesn’t manage to free you of the navigation and form handling. However, this is usually fine if you are just building web applications and are changing only the look and feel. On the positive side, this approach is often good enough for standalone applications. It is also well understood, especially within the PHP community, and is a quick way to turn an HTML mock up into a working system. The downside is that it will be hard to integrate into other applications and much harder to test. Because the application code here lives in scripts, it will
January 2005
●
PHP Architect
●
www.phparch.com
26
TEST PATTERN
Shedding a Tier
objects as parameters. I would split them into two camps if you have a lot of server-specific configuration. It’s no fun searching hard drives for missing files. Purifying the Domain Layer Objects representing the business domain will probably have to be saved to a database, a process called persistence. The so called ActiveRecord pattern is the simplest way to make objects persistent. If the infrastructure layer is very primitive, then the domain objects have to do a lot of work communicating with the database. The ActiveRecord pattern is really no pattern at all—the domain object will handle all of this work itself, although you may be able to factor some of it out with inheritance. The earlier Person class is an ActiveRecord. Although it has some help from the infrastructure classes, the metaphor is still one of database rows. This extra translation effort to go from a tabular database view to an object view is called object/relational impedance, and that’s not so nice when mixed in with your business code (you may want to read up on Rick Morris’ articles on this topic that appeared in the August 2004 and November 2004 issues of php|a). Now, a full discussion of persistence patterns is a book in itself (e.g. Nock), but pushing out the database code comes down to two basic ideas: external mappers and internal accessors. The DataAccessor pattern, or DataAccessObject or DAO, wraps all of the database code into a single object that the domain object can call. For example, when saving data: class Person { ... function save() { return $this->accessor->save(); } }
This separation is invisible to the outside world. The domain object, here our earlier Person, is in charge of creating and using the accessor. Note that the accessor just deals with database data and only has getters and setters. The data coming back could be other objects or arrays of data. It doesn’t have to correspond to a single row on the database and this can do a lot to clean up the domain layer code: it’s the very simple approach of delegating to an internal object to do all of the dirty work. The opposite approach is the DataMapper pattern. With this method, we gut the domain objects of all of the database code and use another separate class to do the work: $mapper = new PersonMapper(); $mapper->save($person);
January 2005
●
PHP Architect
●
www.phparch.com
The Person just sits back passively, while the mapper interrogates it. The Person class will have to have sufficient accessors for the mapper to read all of the data in the object. Otherwise, it cannot be extracted and saved out. Either the application code, or an extra domain object, has the burden of creating and calling the mappers, but this system has the big advantage that the business objects can be created without thinking about the database at all. Changing databases is easier too, because you can use the domain objects without modification. PHP is, at last, starting to see some libraries emerge to ease the workload for saving objects. In order of sophistication, they include PEAR::DB_DataObject, Propel and MetaStorage. Removing the Domain Layer So much has revolved around the business domain layer up to now that it may seem rather strange that it could be removed. What does an application look like without any business logic? Well, you can still use the database operations of creating, reading, updating and destroying, or CRUD for short, and you will also get mainly tabular data back. The end result is just a simple reporting application, but these are common in the PHP world. Although limited in applicability, there is a way to make applications of this type spectacularly quick to write. Apart from dispatching queries, they only have to deal with a single type of object, namely the set of results returned from a query. The class is usually named RecordSet or some such similar name and, for memory efficiency, it is usually implemented as some type of iterator—that is, you read a row at a time. As there is only one type of object to display, it is easy to build a library of display components, usually called widgets or controls, to work with it. These can range from simple drop-down list widgets right up to elaborate editable table widgets. To show what this looks like, imagine we are going to display a table of people. Here is a possible code fragment for the presentation layer: My Friends
Here $people is the RecordSet. Behind the scenes, the widget will call next() on the RecordSet, but we don’t have to be aware of any of that. We just throw the results at it and it gets on with it. The code has a simple declarative feel of “just do it.” Once we descend to 27
TEST PATTERN
Shedding a Tier
the application layer, things get trickier: class Community { ... function &findByCategory($category) { $query = new SelectQuery(‘people’); $query->mustEqual(‘category’, $category); return $this->connection->select($query); } }
Looks easy—and it is—but that’s only because we are playing to this system’s strength by simply displaying tables. What happens if we have to do some calculations on the columns or add other external data into the output? Because the row data is only actually loaded on demand, we have two options. The first option is to pull all of the data out, perform our calculations and then create a new RecordSet from scratch. We then pass that back instead. That’s OK for small amounts of data, but messy. Notice that the RecordSet is a key abstraction here. Because these objects don’t have to come from a database, we are free to build or intercept them and so squeeze in an additional logical layer. Only in so doing can we justify calling this another form of three-layer architecture. The other option is to run our code as the rows are fetched. As the widgets are going to do all of the work, we have to modify the next() call. We could inherit from the RecordSet, but preferable is wrapping it in a class that looks identical. This just passes the calls to the real RecordSet underneath. This trick is called the Decorator pattern, or, in this context, usually a filter. The code then looks like this:
driven application. If all of your information is stored on a relational database, the application does not change too much and the skills are available, then this is also a tried and tested option. Most of the decisions so far have been easy to back out of, but the decision to go the RecordSet route rather than the domain layer route is more of a fork in the road. If you will be dealing with mostly tabular data and the bulk of your system is database-driven, then the RecordSet model is probably the way to go. If, on the other hand, you are frequently working with single complicated items or managing information from more than just databases, go with the domain model. In going that route, the simplest first split will be straight down the middle, namely get the domain layer away from the presentation. That should give our brains a fighting chance. Further Reading The essential enterprise patterns book is “Patterns of Enterprise Application Architecture” by Martin Fowler (Addison-Wesley). Most of the patterns described here come from that book. In the same vein, but limited to persistence mechanisms, is “Data Access Patterns” by Clifton Nock (also Addison-Wesley). The persistence libraries mentioned are: PEAR DB_DataObject at pear.php.net/package/DB_DataObject, Propel at propel.phpdb.org/wiki/ and MetaStorage at www.meta-language.net/metastorage.html .
My Friends
By writing our WithEmailsAsLinks filter, we can later intercept the next() call to manipulate the rows as they pass through. I am imagining that the “email” field would be converted to an HTML anchor tag on each next() call. For big lumps of tabular data, this is a common technique. It has the added benefit that the same filters can be used again and again over an application. They are difficult to work with for tricky domain logic, though, and ridiculous overkill if you mostly fetch a single row or object at a time. Faced with this constraining style, one solution is to move all of the complex business logic into the database as stored procedures or triggers and use PHP as a presentation tier only. This then becomes a database-
January 2005
●
PHP Architect
●
www.phparch.com
About the Author
?>
Marcus Baker is a senior software developer at Wordtracker and part time web development consultant. His website is at http://www.lastcraft.com/. Marcus is also a co-founder of the PHPLondon organization.
To Discuss this article:
http://forums.phparch.com/194 28
FEATURE
January 2005
●
PHP Architect
●
www.phparch.com
29
FEATURE
Transliteration with PHP
F E A T U R E
by Derick Rethans
T
here are a couple of different methods of converting characters to other characters. Transliteration is the process of converting a specific character to different characters or groups of characters. Examples of transliteration are the converting of the Norwegian “å” to “aa” (ligature normalization), “ç” to “c” (diacritical removal), “ÿ” to “Ÿ” (changing case), “ ” to “YU” (Cyrillic to Latin transliteration) and “©” to “(c)” (special decomposition). For each of those conversions, special filters can be used and the order of filters is important too. For example, you will want to run a ligature normalization filter before the diacritical removal filter so that “å” does not become “a”, but “aa” like Norwegian people would expect. As you can imagine, the definition of some of those filters can be pretty large, especially the Han to Pin Yin transliteration because of the great number of Chinese characters. Transliteration from one script to another will most likely never be one hundred percent accurate, as the way characters are transliterated to the Latin script is sometimes affected by country, but most often just by the person who does the transliteration. Therefore, transliteration can only achieve an approximation of a script when we transliterate texts. Why Is This Needed at All? You might be wondering why one would need a method or an extension to transliterate characters from
January 2005
●
PHP Architect
●
www.phparch.com
one character set to another one, but there are a couple of situations where this is really useful. One example is a content management system where you would want to create an URL path out of the title of a document. A first method would be to simply conduct the following steps: • Convert the title of the document to lower case characters • Replace all characters not in the range of az0-9 with an underscore • Remove underscores at the beginning and ending of the generated title • Remove multiple underscores in a row As an example, the title: “42: The answer to life, and everything.” would first become: “42: the answer to life, and everything.”, then “42__the answer_to_life__and_everything_” and, finally, “42_the_answer_to_life_and_everything,” which is a suitable name to use in a URL. This algorithm works fine for English text, but, if the title of the document had
REQUIREMENTS PHP
4.3+
OS
N/A
Other Software
“translit” PECL extension
Code Directory
trans
30
FEATURE
Transliteration with PHP
contained the word “français,” for example, the final result would have been “fran_ais,” which is no longer representative of the document’s title. For different scripts, such as Cyrillic or Japanese Katana, this is obviously not going to be useful at all. In eZ publish, URLs are not the only things that need some form of “mangling” in order to create a usable string. For example, other items include identifiers for attributes (fields) in a content object, searching and generated package names, and so on. Each of those cases might need different rules for creating a usable string as representation for the items. For example, if you are normalizing a string for a search engine, you might want to retain spaces, while they should be removed if you are preparing a string for use in a URL. Other uses for strings might not even allow underscores at all. The Translit Extension One possibility is to implement these filters with PHP code, though this is not very fast. It is what you had to do before the translit extension existed. The translit extension makes it possible to apply filters on strings of text to perform different transliteration rules. The extension provides two functions only: transliterate_filters_get and transliterate. The first function returns an array with all available filters, while the second one provides the functionality needed to apply transliteration filters to strings. Installing the translit extension can be done by simply running: pear install http://pecl.php.net/get/translit
This will install the latest version of the transliteration extension, which, at the time of this writing, was beta version 0.5. In order for this to work, you do need a correct build environment for PECL extensions; this includes “fitting” versions of the autotools: autoconf 2.13, libtool 1.4.3, and automake 1.4-p6 or similar. Newer versions might also work, but they can throw quite a few warnings. If this is the case, you should downgrade your autotools to the versions I just mentioned. You can also get some information from the PHP manual by visiting this URL: http://php/manual/en/install.pecl.php . Another installation dependency of the translit extension is the iconv extension, which you either need to have compiled into PHP (Unix) or loaded in by specifying it in your php.ini file with an extension= line before
the translit extension. When the extension is installed and enabled in php.ini, you can use the transliterate_filters_get function to see if everything is working:
This should return all supported filters. Character Sets and Unicode The extension needs to deal with a lot of different character sets (e.g.: Latin, Greek, Cyrillic, and so on). Because none of the normal 8-bit character sets, or the Chinese Big5, are compatible with each other, the extension uses Unicode characters to perform its transformations on. To implement efficient filters, the transliteration extension does not use UTF-8 encoding internally, as doing so would require too much overhead when parsing the string each time; instead, it uses UCS-2, which always stores one Unicode character as two bytes. This makes it possible to perform integer arithmetic on the characters, allowing for very fast filtering. In return, this means that you need to convert your data if you want to transform strings encoded in character-sets other than UCS-2. Fortunately, the transliterate function allows you to specify the input and output character sets for the transliteration. This is where the iconv extension, on which the transliteration extension depends, comes into place. In Listing 1, for example, we execute the normalize_ligature filter on the string “Vær så god.” resulting in “Vaer saa god.” As you can see, the function is easy to use: the first parameter is simply the string that you want to execute a filter on, while the second parameter is an array containing the filters that you want to execute. The third and fourth parameters are the character set of the incoming data and outgoing data respectively. The second parameter contains an array of filters, which means you can execute multiple filters with the same function call and be sure that the order in which they are executed is preserved, based on the contents of the array you pass.
Listing 1 1
January 2005
●
PHP Architect
●
www.phparch.com
31
FEATURE
Transliteration with PHP
Transliteration Filters - Latin Currently, the transliteration extension provides support for different groups of scripts, and each of those scripts has different filters. For the latin script, the group of filters consists of diacritical_remove, lowercase_latin, normalize_ligature and uppercase_latin. Not only do those four filters deal with the Basic Latin and Latin-1 Supplement Unicode blocks—they also support Latin Extended-A and LatinExtended-B. This means that the diacritical_remove filter will be able to remove diacritical signs for all Latin and Latin-like characters available in Unicode. It will therefore correctly convert all characters in the string “ ” to uppercase and then remove all the diacritical signs from it. Listing 2 illustrates this. A couple of additional filters are required to be able to generate URL-safe names from article titles. For this, we need to follow these steps: • Expand all ligatures (the normalize_ligature filter, “å” to “aa”) • Remove all remaining diacritical signs (the diacritical_remove filter, “é” to “e”) • Convert the string to lowercase (the latin_lowercase filter, “FoO” to “foo”) • Normalize all punctuation so that the remove_punctuation filter can remove it (the normalize_punctuation filter, “—” to “-”) • Remove all punctuation (the remove_punctuation filter) • Replace all spaces by underscores (the
spaces_to_underscore filter) • Remove underscores at the beginning and at the end, as well as multiple occurrences of underscores inside the string (the compact_underscores filter, “_foo__42__” to “foo_42”) It is fairly trivial again to execute all those filters on the string from which we want to create a fitting URL-safe name, as you can see in Listing 3. In line 2, we define our string Politiet: - Økseangrepet var planlagt.\n, and in lines 3 to 7 our filter array. Line 8 then executes all the filters on the string, treating incoming data as UTF-8, but producing 7bit ASCII data as output. In our case, we want 7bit ASCII as output because this filters out all other scripts. Line 9 instructs vim, my editor of choice, to treat the file as UTF-8 data. Greek The technique above works very well for Latin-based languages, but as soon as a different writing is used, it will miserably fail. Imagine a Greek string like “ .” If we execute the same filters as for latin strings, the result will be “____”, which is, of course, completely useles. Hence, we have to transliterate this Greek text to the Latin script first, which is what the greek_transliterate filter does. In Listing 4, you can see that by prepending the greek_transliterate filter, the output of the transliteration process is something we can use as part of a URL, although it might not be a 100% correct translitera-
Listing 2 1
Listing 3 1 2 3 4 5 6 7 8 9
Listing 4 1
January 2005
●
PHP Architect
●
www.phparch.com
32
FEATURE
Transliteration with PHP
tion: “eygenikhe_mas_phhile_kai_sympatrihote.” There are two more filters for the Greek script: one, greek_uppercase, will convert all lowercase letters to uppercase, while the other, greek_lowercase, will convert all uppercase letters to lowercase. Listing 5 shows both filters in action. Cyrillic The same ideas apply to the Cyrillic script as for the Greek script. Unlike the Greek script, which is currently only used in Greece, the Cyrillic script is used for several languages. There are indeed some differences in preferences in those countries on how to transliterate Cyrillic into Latin. Therefore, the transliteration extension does not only contain a generic transliterate_cyrillic filter, but also a transliterate_cyrillic_bulgarian filter that changes some of the transliterations for specific letters. In the future, extra filters for even more languages may be added, but for now there is only one, specific to the Bulgarian language. In Listing 6, you can see the different filters for the Cyrillic script in action; notice how the “ ”is transliterated differently in Bulgarian compared to the default transliteration rules for the Cyrillic script—naturally, there are more characters that are different than only this one. Hebrew This is a very interesting one, as it has no concept of
upper and lower case characters. Accordingly, there is only one filter related to the Hebrew script: hebrew_transliterate. Listing 7 shows the filter in action. From the string “ ” it will produce “mobrq_yskn_t_mdynot”. One other interesting feature about Hebrew is that most writings don’t seem to use vowels at all. Asian Scripts This is where all the real fun begins—at least for someone who doesn’t know anything about Asian languages. Asian languages usually don’t conform to the “we use letters to represent text” rule. For example, Chinese uses ideograms, while Japanese uses those in addition to two other scripts, and Korean uses combined letters as characters. Another point is that Chinese doesn’t use any spaces between words, which makes it really hard to come up with a sensible Romanization strategy. For now, the transliteration implements a few filters related to CJK (Chinese, Japanese, Korean) scripts. The first one, hangul_to_jamo, converts the combined Korean syllables (Hangul) back into letters (Jamo). The Unicode character set supports both the combined syllables as well as the separate letters that form the combined syllables. This is done in a very algorithmic way, fortunately. The second Korean related filter— jamo_transliterate—Romanizes the separate Jamo char-
Listing 5 1
Listing 6 1
January 2005
●
PHP Architect
●
www.phparch.com
33
FEATURE
Transliteration with PHP
acters into the Latin script equivalent. Listing 8 shows the first step of converting Hangul to Jamo, and further steps to create a URL usable string again. The output is:
nba_magnakaneun_nba_kwanjung_seonghuirongngeuro_churjangjeongji (63)
Unless you happen to know Korean, you can’t really tell the difference between the first and second string, although their respective lengths clearly show that one exists. The Chinese script works in a totally different way again, as each ideogram “picture” usually forms one work, but in some cases multiple ideograms form a word. If you want to figure this out, you either need a very advanced algorithm, or a large dictionary containing the Chinese words (or, as a third possibility, a friend who speaks Chinese and has lots of time on his hands). As that is virtually impossible to achieve, the transliteration extension’s han_transliterate filter will instead add a space between each ideogram. Listing 9 shows the filter in action, and the results of the transliteration steps are shown below:
Of course, you will once again need to use some more filters to produce a string formatted so that it can be used in a URL. Other Filters There are more filters available in the transliteration extension; they include mostly normalization
or general purpose filters. For example, normalize_superscript_numbers, normalize_subscript_numbers and normalize_numbers convert all the available Unicode characters in each of those character groups to their Latin representation. The string “ ” will be converted to “46 * 42 = 172032” using the first transliteration from Listing 10. The normalize_superscript and normalize_subscript filters add some more transformations for non-numeric characters to normal-case characters. There are also two decomposition filters. The first one, decompose_special, converts the special characters ©, ®, «, ± and » into (c), (r), . The second one, decompose_currency, converts currency signs into their ASCII equivalents. An example of this is shown in the second transliteration in Listing 10. How Does the Extension Work? The transliteration extension uses different filters to perform different tasks. Most of those filters are not implemented in C, as that would require way too much time. Instead, the transliteration extension uses mapping files (in the data/ directory) that are converted to C code by a PHP script, convert.php. Each mapping file contains rules for a character, a set of characters or a range of characters. In the diacritical_remove.tr file, for example, we find the following rules: U+00FF = U+0079 # y U+00F9 - U+00FC = “u” U+0100,U+0102,U+0104 = U+0041 # A
The first rule simply states that the character with Unicode value 0x00FF should be remapped to character 0x0079. Instead of using a Unicode value for the result, you can also simply use the characters (if they are US-
Listing 8 1
Listing 9 1
January 2005
●
PHP Architect
●
www.phparch.com
34
FEATURE
Transliteration with PHP
ASCII) as a string literal. You can see this in the second rule, which maps all characters in the range 0x00F9 to 0x00FC to the character “u”. The third rule says that each of the characters in the set 0x0100, c0x0102c, 0x0104 should be mapped to “A”. There are a few other types of rules: U+0450 > “ie” Even U+1E02-U+1E07 = “B” Odd U+1E02-U+1E07 = “b” U+5416,U+5475,U+9312,U+9515,U+963F > U+0101,U+0020 U+0400 - U+040F += U+0050 U+2074 - U+2079 -= U+2040
The first one maps one Unicode character to multiple characters; the second one maps all odd Unicode values in the range 0x1E02 to 0x1E07 to the character “B”; and the third rules maps all even values in the same range to the character “b”. The fourth rule maps a set of Unicode values to a string of multiple other Unicode values. In the example, the five characters on the left side will be replaced by U+0101, followed by a space. The fifth rule adds 0x50 positions to the Unicode values in the range 0x0400 to 0x040F; and the sixth rule subtracts 0x2040 positions from the characters in range on the left side. All .tr files have the following structure: : rules : rules
There are a few special rules that make overriding previously-defined rules easier. The cyrllic_transliterate.tr file shows them in action: cyrillic_transliterate: U+0416 > “ZH” cyrillic_transliterate_bulgarian: #pragma INCLUDE cyrillic_transliterate #pragma OVERRIDE 1 U+0416 > “J”
” to “ZH”). The second filter includes the previous one with the #pragma INCLUDE cyrillic_transliterate directive and then turns off the protection against multiple rules for the same character (##pragma OVERRIDE 1). Next, the filter overrides the definition for “ ” and maps it to “J” for Bulgarian instead. After these translation files are created, the make shell script in the data/ directory is run to create all the necessary C files. In case a new file is added, the name of the new C file (which matches the name of the .tr file) should also be added to the config.m4 file (cconfig.w32 for the Windows folks) in the root of the source for the extension, so that the PHP build system can pick it up when you run phpize. The C files that are generated from the .tr files by the conversion script usually consist of mapping tables which map Unicode characters to actions. The actions that are recorded are: 0 (do nothing), 1 (replace with one other character), 2 (expand to multiple characters), 3 (remove), 4 (transpose up) and 5 (transpose down). Each of these actions has another table defining the result of the transliteration of a specific character to something else. All map tables are only 256 positions large and code to use the correct table is automatically generated from the .tr files again. This ensures optimal memory size and performance. What’s Next? The current extension—although working—still misses transliterations for many scripts and other filters. These should be added, but, as I am not familiar with those scripts myself, it would not be an easy task for me. Because making new filters is relatively easy, I hope that other people will want to contribute their filters and correct or extend the current ones. Furthermore, some more optimizations regarding the memory footprint of some filters could be made. An addition to the extension might be filter-groups, so that you only have to use the “clean_url” filter, or the “search_words” filter, which, in turn, define a list of filters to execute in sequence. This would avoid having to pass large arrays of filters in your code,
The first line starts the filter cyrillic_transliterate which in our simple example defines one rule only (it maps “ About the Author
Listing 10 1
January 2005
●
PHP Architect
●
www.phparch.com
?>
Derick Rethans provides solutions for Internet-related problems. He has contributed in a number of ways to the PHP project, including writing the mcrypt extension, providing new code and bug fixes and leading the QA team. He now works as a developer for eZ systems (http://www.ez.no). In his spare time, he likes to work on SRM: Script Running Machine ( http://www.vl-srm.net/) and Xdebug ( http://www.xdebug.org), watch movies and travel. You can reach him at
[email protected].
To Discuss this article:
http://forums.phparch.com/195 35
FEATURE
Being Smarty with Smarty!
Akaar
F E A T U R E
by Chirag Ahmedabadi
Wouldn’t it be great to free up the development time you spend on mundane, repetitious tasks such as form generation and client side input validation on forms, and really concentrate on the fun and interesting side of PHP development? Whether using standard blocks of code or even customized blocks from your library, intelligent use of these assets in a templated-based environment can save you tons of time. This article describes a few concepts that can reduce this repetitive coding dramatically while really accelerating your development.
I
f you were to prepare “tomato soup” for your dinner and by mistake you forgot to add “salt,” what would you normally do? Would you prepare your soup again or would you just add salt to the one you’ve already made? Of course, you would normally just add salt to your soup. This is a pretty simple example, but it might be an interesting analogy if you compare the simple task of preparing your next meal to computer programming. From a programming perspective, we quite often prepare the soup again and again, when all we really have to do is add a grain of salt to our PHP scripts. Think of all of the various conditions where you have to go back to add something, and on a repeated basis. A good case in point is a registration form on your site where you ask lots of information from your users. As a practical example, let us say that the username or email address provided by the user is already stored in your database and, therefore, you would like to inform him or her to “please select another username.” In another case, you may present a checkbox to accept the Terms and Conditions of your website and output an alert message if, for some reason, the user decides not to check it. How would you go about taking care of these tasks from a programming perspective? In all cases, you will probably also want all the infor-
January 2005
●
PHP Architect
●
www.phparch.com
mation provided by the user to be saved and displayed again if validation of the form fails, so that he or she won’t have to re-enter the data from scratch. Obviously, this would be an important way of retaining visitors by providing an easy and yet accurate user experience. Since these are necessary tasks, however rudimentary, they need to be addressed correctly. In the normal “reinvent the wheel” fashion, where we rely on custom-written PHP scripts rather than a template-based architecture, we would just hard-code our HTML form and place an echo statement in the value property of each HTML tag. If we had a select box on the form, then we would just provide some manipulation to its value to the user’s selection. The same process would apply for the additional coding necessary to handle multiple select boxes, checkboxes, radio buttons, and
REQUIREMENTS PHP
4.3+
OS
Linux or Windows
Other Software
Smarty 2.6.0 or Higher, Akaar PHP Library
Code Directory
akaar
37
FEATURE
Akaar: Being Smarty with Smarty!
so on. If you are using a template-based system, like Smarty for example, then you will normally assign the values submitted by the user to a set of template variables. In these cases, the template system requires a bit more coding compared to orthodox PHP scripting. For me, when working with template systems, the most boring and time-consuming task is writing the code that puts the values entered by the user back into form elements after validation. So, back to the soup! Adding salt to it is not a complicated task, and neither should be returning user information back to a form. Since we already have all the information submitted by the user, it seems a little unreasonable that we would be required to echo or reparse this same data all over again. We just need to add the salt, rather than starting from scratch and actually running through the process of making our soup once again. In this article, I will explain a few approaches that let you “just add the salt” to your soup. You will eliminate the need to prepare it again, as these suggestions will both help you save time and optimize your development approach and speed. I have named this approach “Akaar.” It is based on the Smarty Template system. All of the points discussed in this article implicitly assume that you have a working Smarty development environment—and it would also
be useful if you had some background with programming Smarty itself. The Akaar system is a bundle of classes that make use of the Smarty Plugin, with sample scripts that provide all the necessary files to take advantage of its functionality. The BackFire Component (Just Add the ‘Salt’) An integral component of the Akaar system is what I call “BackFire.” If we consider the example above of needing returned registration form data, BackFire does a wonderful job and lets you take care of the problem with a single line of code In a nutshell, BackFire provides the ability to regenerate your template with all the values filled in, as posted by the user, without the need for any extra coding or manipulation. In the example shown in Listing 1 (found in the code package), we have a template called index.tpl (of course based on Smarty!) that produces a page used for user registration. This is basically a simple HTML form where we ask the user to enter some basic information, such as Email Address, First Name, Last Name, User ID, and Password. Let Us Submit This Form Let us assume that, as it is often the case with many commercial websites, it is essential for your user to
Listing 2
January 2005
●
PHP Architect
●
www.phparch.com
38
FEATURE
Akaar: Being Smarty with Smarty!
accept your Terms and Conditions by checking the appropriate checkbox (tterms). If the user does not ‘check’ this box, we need to take care of two processes. First, we need to present a message to the user, alerting him that checking the T&C box ‘is required’ in order to proceed. In addition, we need to retain and return the original filled-in data back to the form. With “BackFire,” you can perform both these tasks elegantly with a single line of code, as shown in Listing 2. The form is returned back to the user with all of the original data intact, ready for editing or correction. No additional coding is required, as the ‘Akaar’ module and the BackFire component are handling the entire task. Additionally, you do not have to bother with user selections in select boxes—like Country—or what the user selected on a radio button—like Gender. This example registration form is a standard Smarty template file that is called by the index.php script. The resulting HTML form is then submitted to the index1.php file, where, if the user has not accepted the Terms and Conditions by selecting the appropriate checkbox, we reuse the same template and take advantage of BackFire to handle the validation. If the user has accepted the Terms and Conditions, we move on straight to the welcome.tpl file. Listing 3 shows the code for index.php, while Listing 4 shows the code for index1.php, (in which our form is submitted to the user through BackFire). How Can You Use BackFire? This is an interesting question, and one with many possible answers. The salt and soup analogy provides one example, but there are many other possible ingredients for which this principle works just as well.
If you look at the source code of the index1.php file in Listing 4, you will notice how simple it is to use BackFire. You just need to call one method—a fairly straightforward approach by any measure. However, there are a couple of points that you must be aware of before using BackFire. All the HTML elements assembled for BackFire processing must be formatted by appending the string PHP at the end of the appropriate HTML tag. This suffix needs to be placed just before the closing tag, as shown in the example below:
It is not essential that you use only this particular suffix, as you could really use just about any combination of characters. In fact, you could even user your name instead of PHP. All you need to is modify the variable $PHPHtmlTag in the BackFire_class.php file. Changing the value of this variable from PHP to your own custom word will effectively let you define your own suffixes. Just remember to use this unique word on any line where you need to call the BackFire process. This syntax and its subsequent proper usage are necessary for several reasons. First, it is used to reduce overhead so that the script does not process any unnecessary HTML elements that are unrelated to BackFire and the task at hand. For example, in the case of a password field, we probably will not want our scripts to “remember” the value of a password field when a form is submitted, so we simply prevent BackFire from processing certain elements by omitting the suffix. An additional, but equally important, rule is that you
Listing 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
January 2005
●
PHP Architect
●
www.phparch.com
39
FEATURE
Akaar: Being Smarty with Smarty!
must have the value property set in your HTML form input element. Without this property, the process will not work. There is one exception here: in the case of a selectbox or textarea, BackFire does not require the value property to be set in order to function properly. The final requirement, which may sound quite obvious, is that your script must use same template every time you display the form—in other words, you must use same template that you used to collect the information from the user when the latter is redirected to a BackFire-managed page for validation. I think this is a natural approach—if you need to add salt to your tomato soup, you’re not going to add it to the roast chicken instead! In the example discussed here, the registration form is the index.tpl file; we use this same template file with Smarty’s display method on the action script where we validated the Terms and Conditions.
To understand how this all is going to work, just think of a case where you might have an ‘edit profile’ form in your members area. When your user clicks on this, you want to display a form with all the information properly pre-filled—of course, you will fetch this data from the database. Usually, we would program this functionality by assigning a value to each form element in the HTML code—something similar to the following:
Adding More “Taste” to Your Soup Now, let’s add a little more taste to the soup to make it a little more interesting. I call this approach the “Akaar Value Insertion.” It can provide additional flexibility and is a simple way to assign values to your HTML elements. This feature is also part of Akaar’s BackFire class.
This is a normal example of a text box input string. If we need a select box, we may have a different way to handle this, and the same goes for a radio button, a check box, and so on. All of these require unique handling and repetitive laborious coding.
January 2005
●
PHP Architect
●
www.phparch.com
40
FEATURE
Akaar: Being Smarty with Smarty!
Wouldn’t it be better if you had a single way of handling this type of coding, regardless of what kind of HTML element you had to deal with? How about a consistent single call to place a value in any checkbox, list box, radio or textarea element? Here, Akaar come to the rescue. The Akaar module provides a single method to handle this function, and does not require writing any specific input-value code in your template or in your HTML code value property to produce results. You do this simply by calling the setValue method. For example, in the case above your code would look something like: $obj->setValue(“FirstName”, $FirstName);
Now, your FirstName can be any HTML element—it can be a checkbox, a textbox, a radio button, a textarea, a select box or even a multiple-select box. Akaar takes care of everything for you. The code in Listing 5 illustrates an optimized process of pulling information from your database and presenting it on an HTML form for the user to edit. The form entry information for the name textbox, the gender radio button and country select box are all handled by Akaar. Client Side Validation (Covering The Soup!) Client side validation (using JavaScript) is also a very important part of any web application. At the same time, however, it tends to become tedious and repetitive very quickly—particularly when you start having to perform the same kind of validation a little too often. Most programmers have found their own way of dealing with this monotonous task; some are using “generic” JavaScript handlers, while others are using libraries with their own customizations sprinkled in depending on their requirements. Akaar provides an easy and elegant method of implementing JavaScript validation, based on Smarty’s Plugin infrastructure, with some help from the client Validation Class. I received the inspiration for this from ASP.NET’s validation control and thought it was a cool idea that should be included in my own work. The obvious advantage of using Akaar is that you are not required to do any JavaScript validation coding. Your creative designers can even take care of this aspect of your applications without your help, all the while providing a wider range of features to your end users. The additional side benefit is that you can perform any JavaScript validation with your PHP code and from within the PHP script itself. So, what do I mean by “Covering the Soup?” With reference to the same example I spoke about on the registration form before, we can use Client Side validation here as well for checking for user input on required fields, validating email addresses, and other data entry elements. It all becomes very easy.
January 2005
●
PHP Architect
●
www.phparch.com
For example, if we have a textbox on our form where the user can enter an email address and we wish to validate his input to insure it is a valid email address, we can do this simply by placing one more line of code in our HTML source, as illustrated below: {ClientValidation VFor=”vEmail” VType=”ValidateBlank” VMsg=”Please Enter Your Email Address” VMethod=”LayerAlert”}
Figure 1
Parameter
Description
VFor
This is the name of the element for which you want to do validation
VType
This is the type of validation that you wish to have, like Validation for Blank, validation for Date etc. You must use the name given for that, as given in bracket on above list of validation type available. Basically, you must use the name of the function, as available in JavaScript Validation file. If you use some other name, for which there is no corresponding function available in your JavaScript file, the browser will return a JavaScript error in the page..
VMsg
It is the message that you wish to show to your user if the validation fails.
Extra
This optional parameter can be used if you have to pass additional data to your function; take a look at the password validation example for more information.
VMethod is the type of Alert/Display that you wish to show to your use when validation fails. Akaar provides three type of method for this (the default is Alert).
VMethod
1. "Alert", is a simple java script alert 2. "LayerAlert" is a somewhat fancy alert message, with a couple of options 3. "Display" displays validation text to the user and does not show any alert. This method uses layer. This method can have some "funny" results if you are not careful with the positioning of your tags..
41
FEATURE
Akaar: Being Smarty with Smarty!
This is a typical Smarty function call (to ClientValidation) with a few parameters, as described in Figure 1. As illustrated in this figure, you can have three different types of message handlers to dialogue with your users during validation: Alert, LayerAlert and Display.
"Akaar" is a Sanskrit (Indian) word that means "shape" or "structure”. Alert is a standard JavaScript alert. LayerAlert is a nicer-looking alert with some attractive effects (for IE 6), and is based on the QLib library (http://qlib.quazzle.com). Display is the browser’s layerbased alert message. In the case of this method, you would see the error/notice message in the same spot where you placed the code for validation. As described above, when you use Display, you must observe caution in placing the location of the Smarty function for validation (CClientValidation) in your HTML code. Currently, the library supports the following types of automated validation: • • • • • •
Validation for blank (VValidateBlank) Validation for password (VValidatePassword) Validation of a US ZIP code (VValidateUSZip) Validation for date (VValidateDate) Validation for number (VValidateNumber) Validation for only alpha-numeric (VValidateAlphaNumeric)
These are the current built-in validation routines. You are not limited to use only these, as you can easily create your own libraries for additional validation tasks. Listing 1 illustrates several different combinations that we can use. An Interesting Twist Akaar’s Client-side Validation feature does provide an interesting twist. You can perform client-side validations from your PHP scripts themselves. You are not required to add anything to your template file. Therefore, if your creative designer does not know JavaScript and even if you do not know that much about it, validation is still available, waiting to be called from within your PHP code! This is how you go about doing this. Let’s use the same example as above for first name and let’s suppose
January 2005
●
PHP Architect
●
www.phparch.com
that you have following input box in your template file:
If you wish to check for a “blank” entry, you can easily do so from your PHP script file only by calling the following method:
Here, $obj is an object of class Akaar and the SetValidation method has the same parameters as we explained in Figure 1. The first parameter defines the HTML control where we need to use the validation. The second parameter defines the type of validation we wish to perform. The third parameter defines the message to be displayed and, finally, the last parameter defines the actual method to be used for validation. How Do You Create or Add Your Own JavaScript Validation? As I explained, we can easily create client side validation without coding any JavaScript. However, the few default validation features provided by the library are probably not going to be sufficient for all applications, and you’re likely to run into the need for more. Luckily, we can easily add our own validation funcListing 6
Listing 7
42
FEATURE
Akaar: Being Smarty with Smarty!
tions to the standard validation script available with Akaar. The best way to find out how is to “learn by example” by looking at the JavaScript file provided with the distribution of Akaar. You will find the main JavaScript file, called AkarValidation.js, within the js folder. Let us create a sample function to better illustrate how this works. We’ll put together a new method for validating email addresses entered by the end user. The code in Listing 6 illustrates the “orthodox” JavaScript code for this—it’s a bit old-school, but it does its job. We can now add our new function to AkarValidation.js in any syntactically valid location. We just need to modify the original JavaScript function a little bit—you can see the end result in Listing 7. If you look at the modified code, you should notice some changes to the original function that I showed you in Listing 6. This is because there are a few rules that we must keep in mind while working with Akaar Validation: • We need to create a function with a logical name, so that the framework can pick it up. Our function must also have at least one parameter called Element. This is basically used to identify the particular HTML element to which the validation applies. We can then gain access to the element to validate by using this parameter—for example, document.forms[0].elements[Element].value . • In the function, we should not make any call to the alert function; we can only either return true (if the validation was successful) or false (if it wasn’t).
ple, password validation is handled by a simple function that compares the values of two password boxes. If you do not enter same password in both, it outputs an error message. If you look at Listing 7, you can see that we need to match the password entered by the user with the text entered in the “confirm password” box. In order to do this, we require passing the name of the other password input box so that we can perform our comparison. We do this by passing the name of the two elements in the extra parameter.. Limitations Currently, you can use this validation routine for textboxes, textareas and select boxes only. Multipleselect boxes are not yet supported. Of course, you can always create additional functions to handle validation for other elements to extend the library for your own needs. Akaar “should” work with template files containing multiple forms. However, I recommend a bit extra care to make sure that everything works as needed. We are still working to determine the best solution for multiple forms on single templates. These are the techniques and code sets that I have designed to address common development issues that we all face. I am always interested in creative and elegant solutions to common problems and welcome input from the PHP community. If you have ideas on improvements or optimizations, I’d love to hear from you.
Other than these two simple rules, you are free to implement your function any which way you want. The important point is that Akaar will always pass the name of the element in the first parameter and expect a Boolean return value. The following code shows the standard syntax to use in your HTML file for client-side validation: {ClientValidation VFor=”” VType=”” VMsg=”” VMethod=””}
With our newly-created function for validating e-mail addresses, if we have an input box in our HTML form called email, we can add validation to it by using the following syntax: {ClientValidation VFor=”email” VType=”ValidateEmail” VMsg=”Please Enter Valid Email Address” VMethod=”Alert”}
In some specific cases, you may require extra parameters for our JavaScript validation functions. For exam-
January 2005
●
PHP Architect
●
www.phparch.com
About the Author
?>
Chirag is a Web Application Developer who has been working with PHP since the year 2000. He is also the webmaster and owner of a community website for PHP developers in India called www.phpindia.net. Currently, he works as a Project Manager for one of the leading PHP development organizations in India, Indianic Infotech Ltd. (www.indianic.com). You can reach him at
[email protected].
To Discuss this article:
http://forums.phparch.com/196 43
Any more, and we’d have to take the exam for you! We’re proud to announce the publication of The Zend PHP Certification Practice Test Book, a new manual designed specifically to help candidates who are preparing for the Zend Certification Exam. Available in both PDF and Print
Written and edited by four members of the Zend Education Board, the same body that prepared the exam itself, and officially sanctioned by Zend Technologies, this book contains 200 questions that cover every topic in the exam. Each question comes with a detailed answer that not only provides the best choice, but also explains the relevant theory and the reason why a question is structured in a particular way. The Zend PHP Certification Practice Test Book is available now directly from php|architect, from most online retailers (such as Amazon.com and BarnesandNoble.com) and at bookstores throughout the world.
Get your copy today at http://www.phparch.com/cert/mock_testing.php
P R O D U C T
R E V I E W
PRODUCT REVIEW
Will the Best PHP IDE Please Stand Up? by Peter B. MacIntyre
his month, I thought I would start off the New Year with a bit of a treat for you all. I will be comparing and contrasting some of the front-runners in the PHP IDE race. I will be providing comments on each product I review here and also present a comparison chart for them all. Of course, I may have missed a few of the PHP IDEs that are currently on the market (EnginSite claims there are at least fifty-five of them on the market), but the ones that I have chosen to compare are a good cross section. I will also give my opinions on the top three and give them the inaugural php|architect gold, silver, and bronze product review award. The evaluations are all based on a Yes/No mark for each feature that was evaluated. Now, this is
T
January 2005
●
PHP Architect
●
not an overly scientific evaluation—and there is also room for the “gut feelings” that one often get from just “using” a software product. Of course, your results may differ from mine and I may have missed a feature or two, so please only take this as an educated and guided review of these products rather than the Ultimate Guide to Buying an IDE™.
So let’s get started… the products in the competition are: Zend Studio, Davor’s PHP Constructor, EnginSite’s Editor, Maguma’s Workbench, NuSphere’s PhpED, and WaterProof’s PHPEdit. The evaluation chart is shown in Figure 1. As you can see, there
www.phparch.com
are three products that come to the top of the heap: Zend Studio, NuSphere PhpEd, and Maguma. Zend, of course, has the inside track because it is intimately involved with the PHP project itself and, therefore, has an advantage on knowing the inner workings of the product like few others. That being said, I was quite impressed with the capabilities that other two products provide even without the perceived benefit that Zend possesses. PhpED by NuSphere has database integration (the only one of the 6
PHPEdit 45
PRODUCT REVIEW
Will the Best PHP IDE Please Stand Up?
that has it, although Zend has announced it for its beta version of Zend Studio 4) and, based on that feature alone, allows you to build web forms directly from a database table schema (with the option of data validation built
Magume Workbench
right in). This combined feature can be very powerful if used in the right way. Although I have seen it in other places, the Code Snippet feature in the WorkBench (Maguma) was well developed and quite useful. It allows the user to save pieces of code that are used often and insert them into multiple files whenever they are required. Even though I reviewed the EnginSite Editor separately in last month’s issue, it was a review that did not compare it to other IDE’s like this article does, and in putting it into this light, I found it a little lacking. Don’t get me
wrong, EnginSite has some nice features and is an up-and-comer to be sure, but their product is not yet mature. PHPEdit was the bottom
Zend Studio
Figure 1
NuSphere PhpED
Zend Studio
• • • • •
• • •
Features Installation Wizard Confortable IDE Navigation Context Sensitive Help Good Startup Speed FTP Connection WYSIWYG HTML Internal Web Browser Class / Function Browser Color Coded Editor Debugger Multiple Project Editing Split Screen Editor Database Integration CVS (Group Development) Project Management Profiler (Code Efficiency Test) Code Analyzer Demo Version Linux Version Windows Version MAC Version Version Compared Base Price (US$) Web Site Overall Score
January 2005
●
PHP Architect
• • • • • • • • • • • •
3.3.3 $299.00 nusphere.com Gold
●
www.phparch.com
•
• • •
Maguma Workbench
• • • • • • • • • •
• • • • • • • • • 3.5.2 $249.00 zend.com Silver
• • • • • •
2.1.0 $265.98 maguma.com Bronze
46
PRODUCT REVIEW
Will the Best PHP IDE Please Stand Up?
dweller here and its only redeeming quality that I could find was its Code Beautifier. The people at Waterproof Software have a leak in their boat and should be working to patch that up. All the products that I reviewed have their own unique features, but in the versions that I reviewed I would have to give the overall
“gold” to NuSphere’s PhpED product. Even though it comes in
as the highest priced IDE, it is quite mature, very stable, and currently has product features (database integration) for which the others will have to play catchup. The silver is awarded to the Zend Studio, while the bronze goes to the Workbench product from Maguma.
About the Author NOTE: Honorable mention goes to Davor's Constructor as being a product that is not claiming to be a major PHP IDE player, and yet has enough decent functions built into it (the only one of the six to have a WYSIWYG HTML designer) to make it almost reach the same level.
?>
Peter MacIntyre lives and works in Prince Edward Island, Canada. He has been and editor with php|architect since September 2003. Peter’s web site is at http://paladin-bs.com
Figure 1
Davor's Constructor
EnginSite Editor
PHPEdit
• • • • •
•
• • • •
• • • •
•
•
Features Installation Wizard Confortable IDE Navigation Context Sensitive Help Good Startup Speed FTP Connection WYSIWYG HTML Internal Web Browser Class / Function Browser Color Coded Editor Debugger Multiple Project Editing Split Screen Editor Database Integration CVS (Group Development) Project Management Profiler (Code Efficiency Test) Code Analyzer Demo Version Linux Version Windows Version MAC Version Version Compared Base Price (US$) Web Site Overall Score
January 2005
●
PHP Architect
• • • • • • • • •
• • • •
2004-12-15 $150.00 pleskina.com
• • •
•
2.3.5 $69.00 enginsite.com
• • •
• • • • •
1.0.4 $100.38 waterproof.fr
Honourable Mention
●
www.phparch.com
47
FEATURE
Iterators in PHP5
F E A T U R E
by Rami Kayyali
When PHP5 came out, a buzz started about Design Patterns and their application, how to use them, what exactly are “patterns,” and why they are so important. Today, we take a look at one of the famous patterns, the Iterator.
S
o what exactly are design patterns? If you’ve been reading php|a for a while, you probably have already heard of words like Singleton, Observer, Factory, and so on. These are only fancy names for widely-accepted solutions to problems that face developers over and over again. Design patterns only outline solutions—they don’t discuss implementation details or language features; in fact, they don’t even have to be object-oriented (although most of them are). The Iterator Pattern, which is the subject of this article, is one of the most widely used patterns. In theory— and this is where all the intimidating confusion comes from—it’s a standard mechanism to access elements of an object sequentially, without having to expose their underlying representation. Definitions like the one above can be difficult to grasp at first; it cost me some long and tedious conversations to get the idea behind certain patterns. We, however, will focus on practice instead of theory… so you’ll be able to learn by example—the best kind of learning there is! The Iterator Loops are so common that most of us don’t even think about them any longer; we loop over arrays, over a database result, over files in a directory, but no two of those loops look the same (see Listing 1). We need to have an agreement on how to loop over collections and
January 2005
●
PHP Architect
●
www.phparch.com
data structures; this is where the Iterator Pattern comes to rescue. An Iterator is an object that knows how to traverse data structures, from simple arrays to complex N-trees. It hides the implementation details and provides a standard interface for us to use, typically having at least one method to retrieve the next element (that is, moving in one direction). Other iterators might provide methods to retrieve elements in more than one direction, or have different ordering schemes. Listing 2 shows an example interface that can be used as our iterator. Having this interface available, we can now create classes that implement SimpleIterator, like a SimpleDirectoryIterator (Listing 3), which we can use to loop over files in a directory. “But why the hassle,” you may ask. It isn’t as if we “invented” something ground-breaking here. Well, the idea is to make all loops look the same, no matter what we are looping over, be it an array, an XML collection, or LDAP listings.
REQUIREMENTS PHP
5.0
OS
Any
Other Software
None
Code Directory
iterator
48
FEATURE
Iterators in PHP5
The only problem is that if, as developers, we have to implement all iterators ourselves (like we just did), we’ll lose the whole idea of having an agreement. Different iterators shouldn’t look different. Fortunately, PHP5 already comes with a great extension that saves us the trouble, called the Standard PHP Library (or SPL for the acronym-oriented). SPL (Standard PHP Library) The SPL extension didn’t get as much buzz as PHP5’s new XML and OOP features, but that didn’t stop it from making it to the final distribution, so you can almost always rely on SPL being available if you’re using PHP 5. This collection of interfaces and classes gives PHP developers a chance to finally agree on something and stop reinventing the wheel every other project. It’s currently focused on solving one problem—Iterators. Only, SPL does it with a twist by letting us use iterators in plain old foreach loops, so we can stop worrying about what goes where. The code in Listing 3 shows how we would regularly code to loop over a directory using SimpleIterator. It might look OK in other languages, but to those of us with a strong PHP background, the code isn’t really natural. With SPL’s DirectoryIterator class, we can loop over the same directory as if it was an array, using nothing more than one of PHP’s most widely-used features: good ole foreach (see Listing 4). Essentially, a directory is an array of files names, so why not make it act like one? And if we could make DirectoryIterator act like an array, why not make all iterators act the same? That’s the idea behind SPL.
Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
SPL’s abilities don’t stop at making iterators available for foreach constructs; they go as far as chaining multiple iterators together, limiting results, filtering unwanted ones, iterating over recursive structures and even implementing custom iterators (see the reference at the end of the article for a complete overview). Implementing iterators isn’t a difficult task at all. There’s a limited set of methods that SPL interfaces define: implement those and you end up with an iterator that you can plug into foreach. Listing 5 is an example of how we would implement a SmartDirectoryIterator that tries to be a little smarter (as the name implies) and stores more information about files in each iteration than our original attempt, such as their size, last modification date, and so on. The implementation isn’t perfect, though—it’s only an example that illustrates SPL’s possibilities and how you can make them work for you. Feeling adventurous yet? Let’s take it one step further, then. Filtering the Unwanted One of SPL’s important features is the ability to filter results based on a chosen set of criteria. Of course, we don’t write rules to filter our results, but we create a class that extends the built-in FilterIterator, and define only one method named accept(), which, in turn, takes care of “making” the rules and deciding
Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1 2 3 4 5 6
January 2005
private $handle; public function __construct($path) { $this->dir = opendir($path); } public function getNext() { return readdir($this->dir); } public function reset() { rewinddir($this->dir); } public function __destruct() { closedir($this->dir); } } /* Usage: */ $dir = new SimpleDirectoryIterator(‘/tmp’); while ( $file = $dir->getNext() ) { // Do something with $file } ?>
Listing 4
Listing 2 1 2 3 4 5 6
49
FEATURE
Iterators in PHP5
whether to accept a certain element or not. Let’s say that we want our directory iterator to drop out all hidden files (files starting with a dot in a UNIXlike operating system), or maybe drop out certain extensions, or files that contain obscene language. With SPL, it’s as simple as using an if check. Listing 6 illustrates how this can be done. ExtensionFilter is a class that accepts two arguments when instantiated, a DirectoryIterator object and the extension to be filtered. It calls its parent’s constructor (FFilterIterator::__construct) to make sure everything works fine, and then implements the accept method, which decides when files are to be dropped out. Listing 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Listing 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
?>
January 2005
●
PHP Architect
●
www.phparch.com
50
FEATURE
Iterators in PHP5
ator, a starting position and a number of elements to return—LLimitIterator takes care of the rest. Of course, you can also “pipe” it through another FilterIterator that’s passed through a FilterIterator that’s passed through... you get the idea! Listing 7 is an example of such usage: it gets the first ten files and then filters them through the bz2 extension.
that implement RecursiveIterator , like our RecursiveDirectoryIterator . The code in Listing 9 outputs the same list of directories as the non-Iterator code in Listing 8—except that the listing is reduced to only three lines. I can already hear a “Wow. But hey, how does the magic happen?” coming from you. Let’s a take a look under the hood.
Recurse Thyself Not everyone likes recursion. It can be cumbersome to code recursive functions without extra care for terminating conditions. For instance, a common recursive function that every developer has somewhere in his library is a reader that lists all the files in a directory and in its subdirectories. Without SPL, we would have to write code similar to that in Listing 8: as you can see, it’s not the most intuitive code you can come across, and it takes a good eye to spot the recursive call in the middle. Luckily for us, SPL comes with built-in recursive iterators to simplify this kind of task. The RecursiveDirectoryIterator can be used to accomplish the same task, with less and better-looking code. Of course, it can’t operate on its own (technically it can, but not recursively); that’s why there’s a RecursiveIteratorIterator (yes, you’ve read it right, it’s really called that). RecursiveIteratorIterator is essentially an Iterator, except it knows how to iterate over classes
Implementing Recursive Iterators At first glance, it might seem difficult to implement the RecursiveIterator interface. I can’t exactly say it’s a nobrainer, as it does need some careful coding, but, in the end, there’s really not much to it. By definition, RecursiveIterator is an Iterator—in fact, it only adds two new methods, hasChildren() and getChildren() . These methods are used by the RecursiveIteratorIterator to figure out what to do next. If an $object->hasChildren(), the iterator loops over them and calls the same method for each of those children, moving deeper until it has reached a child with no children (a leaf). Then, it moves on to the next child on the previous level (or depth), and continues on
Listing 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Listing 9 1 2 3 4 5 6 7 8 9 10
January 2005
●
PHP Architect
●
www.phparch.com
Listing 10 1
51
FEATURE
Iterators in PHP5
like thus. As an example, we’re going to implement a recursive directory iterator. In order to do that, we need a flat directory iterator first—one that doesn’t go into subdirectories. Let’s call it DirIterator to avoid name-clashing with DirectoryIterator. There are five methods that we should implement in order to make it a “real” iterator: rewind(), which resets the iterator to the beginning, next(), which moves the iterator to the next item, key(), which returns the key (usually just an index) of the current item, current(), which returns the actual item, and valid(), which tells whether there are any left items in the list. Listing 10 is an implementation of DirIterator, which is used inside a foreach loop, just like all our iterators before. Now it’s time for us to make it recurse. Since a RecursiveIterator is already an Iterator, we don’t need to start from scratch: by simply extending DirIterator and adding hasChildren() and getChildren() , we’ll get our own RecursiveDirIterator . In Listing 11, RecursiveDirIterator::hasChildren() returns True only if a file is a directory and isn’t a “dot” (“.” or “..”), otherwise PHP will try to open the current directory time and again until it dies with an error because of a stack overflow—a typical result of a poorly thought-out recursive function. Now, when RecusriveDirIterator says it hasChildren(), getChildren() is called and returns a new instance of RecusriveDirIterator, a child iterator, if you want, only this iterator’s path is a subdirectory. Think of it as calling a function within itself, except in a way that is much easier to code than the
Listing 11 1
January 2005
●
PHP Architect
●
www.phparch.com
traditional approach. Note that, unlike reclusive functions, we don’t need to keep track of the current path, what we have read already, and so on. RecursiveIteratorIterator takes care of all that by itself. Turning Objects into Iterators One interesting interface in SPL is IteratorAggregate. It only has one method to implement, getIterator(), and once it’s implemented, any object can be “iteratable.” But why is this useful? After all, we can directly implement Iterator—and, so far, we seem to have been able to get along just fine with it. The problem is in the fact that role separation is important in Object-oriented Programming. Not every object is an iterator, but most objects can provide iterators for their internal data. Take, for example, a class that represents a student’s profile with some personal information (first name, last name, and so on). A student isn’t an iterator (that is, not related with an is-a relationship), but the information inside it can be iterated through and, therefore, we should be able to get a list of a student information without having to literally loop over the student himself (see Listing 12). Suppose that this information is stored in an associaListing 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Listing 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
52
FEATURE
Iterators in PHP5
tive array. With IteratorAggregate available, all we need is to add one method to the student class and let it return an iterator. Now, we can use an instance of Student in a foreach loop without having to implement all five methods of Iterator, since implementing them would be both logically and practically incorrect. You’re probably wondering why we used an ArrayObject to wrap the $info array in getIterator(). By definition, SPL (and thus PHP) expects getIterator() to return an Iterator (or any class that implements it), it doesn’t care how it represents its data, whether in an array, an XML file, or an SQL database. It only cares about what data is provided and it knows how to ask for it. Therefore, SPL defines an ArrayObject class (that implements Iterator) which wraps around an array and is used in getIterator(). However, there’s a subtle difference here. On one hand, looping over an array looks exactly the same as looping over an ArrayObject. On the other hand, we’re technically looping over ArrayObject’s properties and their values rather than array’s keys and values. Yes, that’s right—SPL can take any object and loop over its properties in a foreach loop. An Iterator… Without an Iterator! PHP5’s default behaviour is to list an object’s properties when you stick it into foreach. That’s great stuff, but when PHP can’t access a property because it isn’t public (private or protected), it just skips it. This doesn’t really mean that we can’t turn that object into an iterator—we’ll just have to do it from the inside. Take the code in Listing 13, for example: the showInfo method uses $this to loop over the object; naturally, $this has access to all the properties of the object, so it doesn’t skip any of them. One Big List Suppose you have a list of different iterators for different things, like a QueryIterator, DirectoryIterator and SimpleXMLIterator—and maybe five or ten more— and you really need to use all of them in one go. Now, this doesn’t mean that you need to write a foreach and loop through every one of them (no pun intended). SPL provides us with an AppendIterator (also known as CombinedIterator). AppendIterator takes a list of iterators (or classes that implement any Iterator) and loops over them as if they are one huge list. Just call append(), pass it an iterator and place an AppendIterator in foreach, and everything will be taken care of automatically for you. How does it work? SPL has an interface, called OuterIterator , which is used to wrap iterators. Although we’d rarely need to implement it, since so much of it is automated, we can use it as an easy way of taking looping through multiple iterators in one go.
January 2005
●
PHP Architect
●
www.phparch.com
Each time it moves from an inner iterator to another, OuterIterator starts acting like it; we can call methods from the inner iterator and expect it to automatically use an overloaded __call() to invoke the corresponding method in the outer iterator. A little warning though—AAppendIterator has yet to be released in a “stable” version of PHP (5.0.3 as of the time of this writing), so you might have to wait until the next release, or check out the latest version of SPL from CVS. SimpleXML Iteration PHP 5 comes with SimpleXML, a library that simplifies almost all XML-related operations, from loading, to parsing, to iteration. However, SimpleXML’s native iteration isn’t performed through SPL—it’s closer to the way our first SimpleIterator worked: a SimpleXML object simply provides the methods required for iteration. SPL makes SimpleXML even simpler by having SimpleXMLIterator. Note that this iterator doesn’t come with SimpleXML, so if you compile PHP with SimpleXML but without SPL, you won’t be able to use it. SimpleXMLIterator is a recursive iterator, although you don’t have to use it recursively if you have a flat-listed XML file. Take, for example, the XML cookbook in Listing 14, which contains a list of recipes with their names, ratings and ingredients. We can load the whole file and pass it to an instance of SimpleXMLIterator and it will take care of the rest (see Listing 15). There’s no need for hair-pulling sessions trying to figure out how the SAX parser works, what DOM is, and otherwise try-
Listing 14 1 2 3 4 5 6 Chocolate Muffins 7 Yummy 8 Flour, Cocao, Sugar 9 10 11 12 Caramel Cake 13 Sweet 14 Caramel, Flour, lots of love 15 16 17
Listing 15 1 2 3 4 5 6 7 8 9
53
FEATURE
Iterators in PHP5
ing to deal with fancy acronyms. This iterator returns a SimpleXML object on each loop, making it more convenient to access XML elements. As with all iterators, you can filter the results, limit them by offset and length, display only parents, and so on. Of course, you can pass a SimpleXMLIterator to a RecursiveIteratorIterator, but in most applications I came across, XML was used to store simple data like configuration files, so recursive iteration is probably not what you’re looking for. Reference Here’s a list of the most important interfaces and classes available with SPL Interfaces: • Traversable: An internal interface that should not be implemented, it’s the base interface for all iterators. • Iterator: Defines how to iterator forward over a collection. • SeekableIterator: Same as Iterator, except it can seek to a position in a collection. It can be used with LimitIterator to efficiently move to an offset. • RecursiveIterator: Defines how to traverse tree-like collections. • IteratorAggregate: Creates an external iterator for an object. Outer iterators. These classes wrap around iterators and traverse them in different manners: • RecursiveIteratorIterator: Takes a class the implements RecursiveIterator and loops over it going deeper with each child. • FilterIterator: Filters results based on accept() criteria. • LimitIterator: Limits results based on an offset and a length. • InfiniteIterator: Takes and iterator and traverses it forever! • AppendIterator: Combines a number of iterators and converts them to one big list. • CachingIterator: Pre-fetches the next element in order to know if an iterator has more elements. • ParentIterator: Filters out leafs (elements with no children) so that only parents are fetched. • ArrayObject: Converts an array into an iterator object. Useful Iterators. These classes are either built-in or come with SPL as examples in external files.
January 2005
●
PHP Architect
●
www.phparch.com
• DirectoryIterator: Used to loop over files in a directory and provides some useful methods like isDot() and isFile(). • RecursiveDirectoryIterator: Same as above only traverses subdirectories. • DirectoryTree: An iterator that doesn’t show dot-directories (“.” and “..”). • FindFile: Base class for searching files in a certain path. • RegexFindFile: Same as above except it takes a regular expression rather than a file name to search for. • SimpleXMLIterator: As shown above, iterates over an XML tree returning SimpleXML objects.
Final Words I’ve spent a couple of months making actual use of SPL, and I have to say I am very impressed. There’s are a lot of libraries that can be converted into easier, understandable and, more importantly, more maintainable classes using iterators. Knowing that I’m not limited by SPL, that I can create my own iterators and my colleagues don’t have to worry about the way they work, was a big advantage in terms of productivity. I think SPL is a big step forward for PHP. It makes real use of interfaces, defines certain standards that everyone should follow, and it states that PHP can be a language for the enterprise. Thanks to Marcus Boerger for making it happen!
About the Author
?>
Rami has been a PHP developer for more than four years. When he’s not consuming coffee, he plays the piano and tries not to think of code. He’s currently developing Codeflakes (www.codeflakes.com), a PHP framework for large-scale applications. You can visit his website at www.ramikayyali.com.
To Discuss this article:
http://forums.phparch.com/197 54
NEW live online training courses start in
February and March 2005 Certification Central
SECURITY CORNER
S E C U R I T Y
C O R N E R
Security Corner
SQL Injection by Chris Shiflett Welcome to another edition of Security Corner. This month’s topic is SQL injection, a style of attack that frequents the minds of PHP developers, but for which there is a shortage of good documentation. Most Web applications interact with a database, and the data stored therein frequently originates from users. Thus, when creating an SQL statement, a developer may use client data in its construction. A typical SQL injection attack exploits this scenario by attempting to send valid SQL as unexpected values of GET and POST data. This is why an SQL injection vulnerability is almost always the fault of poor data filtering, and this fact cannot be stressed enough. This article explains SQL injection by looking at a few example attacks and then introducing some simple and effective methods for prevention. By applying these best practices, you can practically eliminate SQL injection from your list of security concerns.
F
or a moment, place yourself in the role of an attacker. Your goal is initially simple: to get any unexpected SQL statement executed by the database. You’re only looking to get something to work, because that will reveal the fact that the application either completely fails to filter data or that there are weaknesses in the data filtering logic. You have as many chances as you want, and you have a lot of information to work with. For example, consider the simple registration form shown in Figure 1. In order to get more information about this form, you view the source:
Username:
Email:
You can already make a very educated guess about the type of SQL statement that this application will con-
January 2005
●
PHP Architect
●
www.phparch.com
struct. It will most likely be an INSERT statement that uses $_POST[‘reg_username’] and $_POST[‘reg_email’]. You can also make a guess about the naming convention used in the database, because it possibly matches the names used in the HTML form. Because this form is for registration, there is also likely to be a password generated and included in the query. From all of this, you guess that the following construction is performed: $sql = “INSERT INTO users ( reg_username, reg_password, reg_email) VALUES (‘{$_POST[‘reg_username’]}’,
REQUIREMENTS PHP
4.0.3
OS
Any
Other Software
None
56
SECURITY CORNER
SQL Injection
‘$reg_password’, ‘{$_POST[‘reg_email’]}’)”;
Assuming this guess is correct, what can you do to manipulate this query? Imagine sending the following username: bad_guy’, ‘mypass’, ‘’), (‘good_guy
If
[email protected] is given for reg_email, and the password generated by the application is 12345, then the SQL statement becomes the following: INSERT INTO users (reg_username, reg_password, reg_email) VALUES (‘bad_guy’, ‘mypass’, ‘’), (‘good_guy’,‘12345’, ‘
[email protected]’)
This statement creates two accounts: good_guy with a valid email address and bad_guy with no email address. Because reg_email is valid, if the application emails the password for the good_guy account, it will arrive safely. You already know the password for the bad_guy account, because you set it yourself. Thus, by sending a specially crafted username, you have created two accounts that you can perhaps use for further malicious activity. You can use the good_guy account to investigate the application and learn how it works (a valid account might be required to access certain parts). With the bad_guy account (which is also a valid account), you can launch additional attacks with your heightened privilege without the risk of losing your real account if something goes wrong (the bad_guy account is disposable). More importantly, if this is successful (no error is given by the application, and you can log in as
Figure 1
bad_guy), it sufficiently proves that there is very poor data filtering, if any at all. You may be wanting more examples of SQL injection attacks, so I will demonstrate another style. Keep in mind that creativity plays a large role, as is the case for most styles of attack. In the example just explained, the attack is limited by the type of query (IINSERT) and the placement of the client data. Other types of queries present new opportunities, and the best practices mentioned in this article prevent practically all SQL injection attacks. WHERE Hacking The WHERE clause is used to restrict the records that a particular query matches. For a SELECT statement, it determines the records that are returned. For an UPDATE statement, it determines the records that are altered. For a DELETE statement, it determines the records that are deleted. If a user can manipulate the WHERE clause, there are a lot of opportunities to make drastic changes—selecting, updating, and deleting arbitrary records in the database. Imagine a SELECT statement intended to fetch all credit card numbers for the current user: $sql = “SELECT card_num, card_name, card_expiry FROM credit_cards WHERE username = ‘{$_GET[‘username’]}’”;
In this particular case, the application might not even be soliciting the username, but rather providing it in a link: View Credit Card(s) for Your Account
Because a user can have multiple cards, the application loops through the results, displaying the card number, the name on the card, and the card’s expiration date for each one. Imagine a user who visits the following URL: /account.php?username=shiflett%27+OR+username+%3D+%27 lerdorf
This submits the following value for the username: shiflett’ OR username = ‘lerdorf
If used in the previous SQL statement, the following is the result: SELECT card_num, card_name, card_expiry FROM credit_cards WHERE username = ‘shiflett’ OR username = ‘lerdorf’
January 2005
●
PHP Architect
●
www.phparch.com
57
SECURITY CORNER
SQL Injection
Now the user sees a list of all credit cards belonging to either shiflett or lerdorf. This is a pretty major security vulnerability. Of course, a larger vulnerability exists in this particular example, because a user can arbitrarily pass any username on the URL. In addition, a username that causes the WHERE clause to match all records can be used: shiflett’ or username = username
Imagine if this particular username were actually stored in the database (using a previous SQL injection attack) and used as the attacker’s username by the application. Everywhere that a WHERE clause is used to restrict a query to the user’s own record can actually include additional records (or all records). This is not only extremely dangerous, but it also makes further attacks very convenient. Data Filtering Note: The methods to be described assume that magic_quotes is disabled. If magic_quotes is enabled, you can use the fix_magic_quotes() function that is listed at:
http://phundamentals.nyphp.org/storingretrieving .
There are best practices that you should follow to prevent SQL injection attacks, and these offer a very high level of protection. The most important step is to filter all data that comes from the client. This includes $_GET, $_POST, $_COOKIE, and $_FILES. To help clarify this, consider the following HTML form:
filtered data. It is a good idea to choose a naming convention that will help you identify potentially tainted data. In this example, $clean[‘color’] - if it exists - can be trusted to contain a valid color, because it is first initialized and then only set to $_POST[‘color’] if it passes the validation. Another option for a set of expected values is to use a switch statement: $clean = array(); switch ($_POST[‘color’]) { case ‘red’: case ‘green’: case ‘blue’: $clean[‘color’] = $_POST[‘color’]; break; default: /* Display error */ break; }
For numeric data, the is_numeric() function is a good choice. Filtering other types of data can be more difficult, but regular expressions can be very helpful. For example, my favorite validation logic for an email address is as follows: $email = ‘
[email protected]’; $clean = array(); $email_pattern = ‘/^[^@\s]+@([-a-z0-9]+\.)+[az]{2,}$/i’; if (!preg_match($email_pattern, $email)) { /* Display error */ } else { $clean[‘email’] = $email; }
The two most important points for data filtering are: red green blue
Clearly, the expected values are red, green, and blue. So, the data filtering should verify this: $clean = array(); $valid_colors = array(‘red’, ‘green’, ‘blue’); if (!in_array($_POST[‘color’], $valid_colors)) { /* Display error */ } else { $clean[‘color’] = $_POST[‘color’]; }
This code uses a separate array ($$clean) to store the
January 2005
●
PHP Architect
●
www.phparch.com
1. Only accept valid data rather than trying to prevent invalid data. 2. Choose a naming convention that helps you distinguish tainted data from filtered data. Escaping Data With properly filtered data, you’re already pretty well protected against malicious attacks. The only remaining step is to escape it such that the format of the data doesn’t accidentally interfere with the format of the SQL statement. If you are using MySQL, this simply requires you to pass all user input through mysql_escape_string() prior to use: $clean[‘color’] = mysql_escape_string($clean[‘color’]); $sql = “... {$clean[‘color’]} ...”;
In this case, assuming $clean[‘color’] comes from
58
SECURITY CORNER
SQL Injection
the previous example, we can be sure that the data only contains alphabetic characters. However, it is a good habit to always escape data. This practice will help you avoid forgetting this crucial step. Until Next Time... Preventing SQL injection is easy, but it is one of the most common PHP application vulnerabilities. Hopefully you will now always perform the following two steps:
only new step is to escape data before you use it in an SQL statement. If you use MySQL, this only requires a function call to mysql_escape_string(). There is a helpful resource located at http://phundamentals.nyphp.org/storingretrieving that explains this second step in much more detail (focus on the section regarding data storage). I hope that you are now protected against SQL injection attacks and can prevent such vulnerabilities in your applications. Until next month, be safe.
1. Filter data from the client 2. Escape data used in SQL Of course, you should always filter client data, so the About the Author
?>
Chris Shiflett is a frequent contributor to the PHP community and one of the leading security experts in the field. His solutions to security problems are often used as points of reference, and these solutions are showcased in his talks at conferences such as ApacheCon and the O’Reilly Open Source Convention, and his articles in publications such as PHP Magazine and php|architect. Security Corner, his monthly column for php|architect, is the industry’s first and foremost PHP security column. Chris is the author of the HTTP Developer’s Handbook (Sams), a coauthor of the Zend PHP Certification Study Guide (Sams), and is currently writing PHP Security (O’Reilly). As a member of the Zend Education Advisory Board, he is also one of the authors of the Zend Certification. In his spare time, he is leading an effort to create a PHP community site at PHPCommunity.org. You can contact him at
[email protected] or visit his Web site at http://shiflett.org/ .
January 2005
●
PHP Architect
●
www.phparch.com
59
You’ll never know what we’ll come up with next For existing subscribers Upgrade to the Print edition and save! Login to your account for more details.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
The Magazine For PHP Professionals
php|architect Subscription Dept. P.O. Box 54526 1771 Avenue Road Toronto, ON M5M 4N5 Canada Name: ____________________________________________ Address: _________________________________________ City: _____________________________________________ State/Province: ____________________________________
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
Choose a Subscription type:
Canada/USA International Air Combo edition add-on (print + PDF edition)
$ 97.99 CAD $139.99 CAD $ 14.00 CAD
($69.99 US*) ($99.99 US*) ($10.00 US)
ZIP/Postal Code: ___________________________________ Country: ___________________________________________ Payment type: VISA Mastercard
American Express
Credit Card Number:________________________________ Expiration Date: _____________________________________
Signature:
Date:
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.
E-mail address: ______________________________________ Phone Number: ____________________________________
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
TIPS & TRICKS
T I P S
&
T R I C K S
Tips & Tricks
Javascript Remote Scripting with PHP by John W. Holmes
T
here have been many times when I wished that I could combine the power of PHP with JavaScript. When I signed up for a Gmail account, I was very interested in how they accomplished the slick user interface, using JavaScript interacting with server-side code to give a real application feel to the site. So, there my quest began to try and develop a simple solution for PHP that could be easily used by the masses. The first part of the process is figuring out how to get JavaScript to request a file on the server. I played around with many ideas and technologies trying to figure this out. I wanted something that was not complex and could be easily reused and adapted for all sorts of uses. The solution that I ended up using meets all the above criteria. The PHP file is called by adding a new script tag to the head of the HTML document. Now, adding a new script tag to the document head each time the function is called will quickly result in a large amount of unnecessary script tags. After realizing this, the function was changed so that it will automatical-
January 2005
●
PHP Architect
●
ly clean up the previous script tag, resulting in a document that remains always uncluttered by useless data. The PHP script is passed information from the JavaScript function via the query string. The information needed by the function is the filename of the PHP script, the parameters to pass the script and the callback function. The latter is used at the end of the returned JavaScript—it is the function that will handle the information returned by the PHP script. function jsrs_call_server(scriptname,params,callback) { var head = document.getElementsByTagName(‘he ad’).item(0); var oldhead = document.getElementById(‘lastloaded’); if (oldhead) head.removeChild(old); script = document.createElement(‘script’); script.src = scriptname + “?callback=” + callback + “&” + params; script.type = ‘text/javascript’; script.defer = true; script.id = ‘lastloaded’; void(head.appendChild(script)); }
www.phparch.com
The function above can be called either from another function or by any event (oonClick(), onChange(), and so on). The example I am using in this article is composed of two linked select boxes. When an item is selected from the first select box, the JavaScript is called to populate the second select box. The following function is called by the onChange event and is passed the value of the selected item from the select box. function get_second_values(myID) { strScriptName=”getdata.php”; strParams=”myid=” + myID; strCallBack=”populate_values”; jsrs_call_server(strScriptName,st rParams,strCallBack); }
Now that we have the client successfully making calls to the server, the majority of the hard work is complete. From here, it’s just a matter of writing the necessary PHP scripts to handle the information and then coming up with the correct JavaScript. The following code example is a script that will read a
61
TIPS & TRICKS
Javascript Remote Scripting with PHP
file based on information passed through the query string. Once it has opened the file, its contents are put into an array, which is then passed to the callback function for processing. The header function is used to ensure the document is not cached and the content type is correct.
As mentioned earlier, this example uses two select boxes. The call back function I have included will use a JavaScript array to populate the second select box. Naturally, what the callback function does depends on what you are using it for. In my case, it looks like this: function populate_values(arrValues) { objSelect=document.form.two; objSelect.options.length = 0; for (n=0;n
Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
January 2005
●
PHP Architect
●
www.phparch.com
If you want to get into the advanced features of the spreadsheets, you’re left to discover how to use COM or figure out the file formats for everything. Luckily, though, other ingenious programmers have already figured this out for you and make their code available to you online. Isn’t open source software great? One spreadsheet writer class that was recently recommended to readers of the php-general mailing list is the PEAR package Spreadsheet_Excel_Writer (SEW), which can be found at : http://pear.php.net/package/Spreadsh eet_Excel_Writer
While learning the API of this class and implementing it may be more difficult at first than the other methods, it offers you interfaces to a lot more spreadsheet features than any other approach we have seen so far. Listing 3 shows a simple example from the SEW documentation of how to create a spreadsheet. The class also provides method to implement spreadsheet features such as text decoration and alignment, formulas, multiple worksheets, page properties, notes, images, zoom and many other things. You can send the file directly to the user, as the earlier examples to, or save it to the server. OpenOffice.org doesn’t have any trouble opening Excel files, either, so you’re not locked into using Microsoft Excel as your spreadsheet program. Depending on your needs and how fancy of a spreadsheet you need to deliver to your users, either the CSV method, HTML table method or the Excel_Writer class can hopefully solve your problems. If you have a better solution than the ones presented here or any other PHP tips to solve your troubles, send them to
[email protected] and get them published here. One other quick tip that’s been mentioned before but is related to the above has to deal with Internet
63
TIPS & TRICKS
Javascript Remote Scripting with PHP
Explorer downloading file in specific instances. If you are using SSL and sessions, Internet Explorer will report that it can’t find the file to download when you use any of the above methods. The problem is because the default session handler sends a Cache-limiter header of nocache , which causes Internet Explorer to not download the file. The fix to this is to either modify the session.cache-limiter setting in php.ini to public or place session_cache_limiter(‘public’); in your code before you call session_start() .
My Last Tip This will be my last Tips and Tricks article for php|architect. Unfortunately, the demands of my full time job have taken away from the time I can dedicate to writing my own PHP programs, helping others, and finding the latest tips and tricks out there. I’m confident someone else will pick up the reigns, though. I want to thank Marco and the rest of the php|architect team for putting up with me over the past couple years and wish them the best in the future. I’ll still be around, of course. You can always find me trying to
help (or confuse?) people on the php-general mailing list or in the forums on http://www.devshed.com. Happy New Year!
About the Author
?>
Chris Shiflett is a frequent contributor to the PHP community and one of the leading security experts in the field. His solutions to security problems are often used as points of reference, and these solutions are showcased in his talks at conferences such as ApacheCon and the O'Reilly Open Source Convention, his answers to questions on mailing lists such as PHP-General and NYPHP-Talk, and his articles in publications such as PHP Magazine and php|architect. Security Corner, his new monthly column for php|architect, is the industry's first and foremost PHP security column. Chris is the author of the HTTP Developer's Handbook (Sams Publishing) and is currently writing PHP Security (O'Reilly and Associates). In order to help bolster the strength of the PHP community, he is also leading an effort to create a PHP community site at PHPCommunity.org. You can contact him at
[email protected] or visit his Web site at http://shiflett.org/.
Available Right At Your Desk All our classes take place entirely through the Internet and feature a real, live instructor that interacts with each student through voice or real-time messaging.
What You Get Your Own Web Sandbox Our No-hassle Refund Policy Smaller Classes = Better Learning
Curriculum The training program closely follows the certification guide— as it was built by some of its very same authors.
Sign-up and Save! For a limited time, you can get over $300 US in savings just by signing up for our training program! New classes start every three weeks!
http://www.phparch.com/cert
January 2005
●
PHP Architect
●
www.phparch.com
64
EXIT(0);
The PHP Security Saga
e x i t ( 0 ) ;
by Marco Tabini
C
hristmas is usually a time for rest, relaxation and spending time with the family (which always ends up being anything but rest and relaxation), but this past holiday season had the PHP community sweating needles as two important security issues were uncovered. First, a problem with the unserialization code in PHP 4.3.x prompted the PHP development team to release version 4.3.10— which, in turn caused problems with the acceleration code published by Zend Technologies. In a separate—but much more sensational—incident, a bug in the popular phpBB forums package was successfully exploited by a worm that used Google to find phpBB-powered sites and deface them. Meanwhile, these problems were reported on traditional secu-
January 2005
●
PHP Architect
●
rity-oriented mailing lists days (and, in some cases, two full weeks) after they first became publicly known. Call me Italian, but I see a huge—and very troublesome—disconnect here: on one hand, the devastation brought upon by the Santy virus has clearly shown that PHP is so widely used as to create such a serious ripple effect when something goes wrong that even the mainstream media—traditionally lethargic to things that really matter to IT professionals—became aware of it. On the other, the IT community at large seems to maintain its odious attitude with respect to PHP being just a “script kiddie tool.” This is costing real companies real money and, if left unchecked, is bound to eventually come back and affect PHP’s reputation as well. Over the last few years, with security becoming an ever more
www.phparch.com
real and present danger, the IT industry has managed to build a solid infrastructure that everyone takes advantage of for discovering, reporting, fixing and disseminating security issues and relative fixes in an ordinate and safe way. PHP lacks such a mechanism, as demonstrated not only by the amount of chaos that surrounded the latest security problems, but also by the fact that not everyone was in the loop and took them seriously. There is no way to make these problems go away for good. People can complain about writing good code all they want, but the truth is that bugs happen, no matter how careful one is. It’s easy to attribute phpBB’s security hole to sloppy programming, but that doesn’t solve the problem at its roots; the fact that one person made a mistake while writing a particular line of code doesn’t
65
EXIT(0);
The PHP Security Saga
necessarily detract from the application’s overall quality—at least as long as problems are corrected in a timely manner when they are discovered and reported. The real problem in this case was the fact that the security issue was downplayed, or at least misinterpreted, with catastrophic results. Again, I don’t fault the phpBB team for this; I fault the lack of a place where security issues can be discussed and communicated in a timely manner. There is just too much white noise on the “generic” security mailing lists—let’s face it: not everyone has the time to sift through the latest Internet Explorer exploits to find the information that is highly relevant to PHP. To help solve this problem in my own small way, I’ve started a new mailing list dedicated exclusively to PHP and related applications. My initial idea was to set up a separate website for it, but I figured that (a) people want to be informed about problems, not go looking for them and (b) php|a already has a website and all the infrastructure needed to run a mailing list, and I’m not in the business of reinventing the wheel
every time I have a new braindump. Thus was born phpsec, the PHP Security Mailing List. The idea behind this list is to provide the missing link between security and the PHP community at large. If phpsec had been around when the phpBB bug came around and we had done a good job of it, you’d have known right away—and possibly would have been able to take the necessary steps to protect yourself. In setting up the list, we took a few important decisions: • First, the list is moderated. This ensures the optimal noise/signal ratio (and also helps keeping spam off of it). The moderation is limited to ensuring that the posts remain on-topic and follow the rules. • The list is also completely automated using the ezmlm mailing list manager. You can unsubscribe at any time without having to rely on a human being to take care of things for
Dynamic Web Pages www.dynamicwebpages.de sex could not be better |
you. • The list posts news about security-related issues in developing, deploying and maintaining PHP and applications that integrate or are related with it. It is not a general-help list or a place for personal interaction outside of its main topic. Again, this is simply a way to keep the noise low and the signal high. • The moderators will only report to the list security issues that are either in the open (reported publicly on a website or on another mailing list) or that relate to packages whose maintainers have been properly informed and given the opportunity to correct their software. After all, we are in the business of informing people, not creating panic! With these simple rules in mind, the mailing list should provide a simple mechanism for people to stay informed—and to inform others—of any new problem that arises within PHP. If you think that the PHP world doesn’t really have that many problems, think again: in its first week of existence, eight different vulnerabilities were reported to the list—and we’re just getting started. To sign up for the list and to find out more about it, you can visit the phpsec homepage at: http://www.phparch.com/phpsec
dynamic web pages - german php.node
news . scripts . tutorials . downloads . books . installation hints
php|a
January 2005
●
PHP Architect
●
www.phparch.com
66
Can’t stop thinking about PHP? Write for us! Visit us at http://www.phparch.com/writeforus.php