This copy is registered to: livia carboni jackson
[email protected] Certification Central
TABLE OF CONTENTS
php|architect Departments
I N D E X
5
Features
9
Editorial
What’s New! 17
57
Object-oriented vs. Relational Part II Managing Distractions and ReverseEngineering Abstractions by Rick Morris
To XML, or Not to XML!
6
TM
Developing a PHP - XML Generator by Man-ping Grace Chau
Product Review webEdition by Peter B. MacIntyre
32 62
PHP 5 & XML by Ilia Alshanetsky
Product Review DHTML Menu Studio by Peter B. MacIntyre
42 67
Practical Caching for the PHP Developer by Allen Smithee
Security Corner Ideology
70
exit(0); A Look at PHP in Government by Andi Gutmans and Marco Tabini
November 2004
●
PHP Architect
●
www.phparch.com
Hiding Your Sins
50
Secure Your System with a Port Security Guard by Ron Goff
3
You’ll never know what we’ll come up with next EXCLUSIVE!
For existing subscribers
Subscribe to the print edition and get a copy of Lumen's LightBulb — a $499 value absolutely FREE †!
Upgrade to the Print edition and save!
In collaboration with:
Login to your account for more details.
† Lightbulb Lumination offer is valid until 12/31/2004 on the purchase of a 12-month print subscription.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
The Magazine For PHP Professionals
php|architect Subscription Dept. P.O. Box 54526 1771 Avenue Road Toronto, ON M5M 4N5 Canada Name: ____________________________________________ Address: _________________________________________ City: _____________________________________________ State/Province: ____________________________________
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
Choose a Subscription type:
Canada/USA International Air Combo edition add-on (print + PDF edition)
$ 97.99 CAD $139.99 CAD $ 14.00 CAD
($69.99 US*) ($99.99 US*) ($10.00 US)
ZIP/Postal Code: ___________________________________ Country: ___________________________________________ Payment type: VISA Mastercard
American Express
Credit Card Number:________________________________ Expiration Date: _____________________________________
Signature:
Date:
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.
E-mail address: ______________________________________ Phone Number: ____________________________________
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EDITORIAL
E D I T O R I A L
R A N T S
To XML, or Not to XML!
I
have a confession to make. I’m an XML-phobe. I know, in today’s society of political correctness and respect for other cultures, that’s nothing short of inexcusable, but what can I say? I’m getting old and, therefore, cranky. When I first heard of XML a while back, I couldn’t help but thinking of it as the answer to a question nobody asked. This feeling was, in fact, augmented by the continuous misuse of this outrageously verbose in all sorts of places where it really didn’t belong. Have a configuration file? Let’s make it XML. Need to store data? Why use a database—we have XML! At the time, I was part of a team whose job was interfacing a Microsoftbased web system to a legacy mainframe system, which, at the time, was no walk in the park, as everything had to take place through a carefullychoreographed exchange of text files. I remember that, at some point, the irony of it all struck me: here was a perfect scenario in which being able to use XML would have been, at last, beneficial (although manipulating text files with ASP was about as pleasant as stapling your lips together) and, of course, there was no way I could have used it, because it would have required too much work on the mainframe side. On the other hand, I had to use XML for pretty much half of the configuration files on the Windows machine. Lovely. As time has gone by, the XML craze has somewhat faded to a more reasonable level: we now use it for its intended purpose—the structure communication of data between two or more heterogeneous systems—more often than not. It has become the basis of XHTML, whose importance is, in my opinion, not in the beauty of a “well-designed” web page, but in the fact that any robot should be able to parse XHTML, thus paving the way for better search engines, more focused online advertising, and so on. Still, despite (or, more probably, because) XML was designed to be a human-readable format, manipulating an XML document “by hand” is not fun—regardless of how good your particular platform is at handling strings and arrays. What it takes is a strong set of tools designed specifically to parse, modify and produce XML documents and are capable of interfacing with the underlying platform in a way that hides the minutiae of the language from the developers. PHP 4 had what could be considered a germinal XML infrastructure, but with PHP 5 we now have a robust toolset (despite a few kinks here and there) that can be used to perform all sorts of XML manipulations—and which will be indispensable to the introduction of PHP in an enterprise environment that requires interoperation between a variety of different systems. I had a taste of just how important this is to our readers during php|works, when I sneaked into Ilia Alshanetsky’s session on PHP 5 and XML… and had to leave because I couldn’t get a seat. Once the conference was over, therefore, I had to ask Ilia if he wanted to write a comparable article for the magazine; the result is this month’s main piece. As usual, Ilia came through in style with an article that explores every single facet of the functionality available in PHP 5 (with a quick look at what was available in PHP 4), with plenty of practical examples and realworld suggestions. I hope you will find it as useful as I did—and that it will help you in your projects. Until next month, happy readings!
php|architect TM
Volume III - Issue 11 November, 2004
Publisher Marco Tabini
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Graphics & Layout Arbi Arzoumani
Managing Editor Emanuela Corso
Director of Marketing J. Scott Johnson
[email protected] Account Executive Shelley Johnston
[email protected] Authors Ilia Alshanetsky, Ron Goff, Man-ping Grace Chau, Peter MacIntyre, Rick Morris, Chris Shiflett, Allen Smithee php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
Contact Information: General mailbox:
[email protected] Editorial:
[email protected] Subscriptions:
[email protected] Sales & advertising:
[email protected] Technical support:
[email protected] Copyright © 2003-2004 Marco Tabini & Associates, Inc. — All Rights Reserved
November 2004
●
PHP Architect
●
www.phparch.com
S T U F F
What’s New! Zend announces the PHP 5 Coding Contest Winners
PHPX 3.5.4 Released
Zend has released the names of the winners of the PHP 5 Coding Contest!!
A quick note from phpvolcano.com announces the latest version of PHPX:
“ It really wasn’t that easy to choose between the top applications; there are quite a few that ended up in the top 20 or so that could just have easily been in the top 6. Without your input, we’d still be arguing over them! A special mention goes to MyObjects*, a project that provides its own persistent object library and tools for generating classes directly from a MySQL database. A minor coding style issue was all that prevented the project from being one of the top prizewinners. The voters liked it too, and it ended up coming in 7th place. Keep an eye out for the author, Erdinc Yilmazel of Turkey - we’d put money on his winning next time, if there’s a next time! Another special mention goes to Hive**, which came in 41st because nobody in the public domain voted for it. We disagreed - it ranked 3rd in the judges list so we’ve scrambled around to find a judges prize for the author, Robert Janeczek. Ironically, Robert describes Hive as ‘a low-level version of the PRADO project’... Our judges and the public agreed over PRADO***, which won outright. All we need to do now is get a laptop to Qiang Xue, the author of the winning application, and then we can sit around in the office drinking too much caffeine and playing hangman with a clear conscience. “
N E W
For more information, and to try the winning software for yourself, visit http://www.zend.com/php5/contest/contest.php
* [MyObjects ] http://www.zend.com/php5/contest/contest.php?id=126&single=1 ** [Hive] http://www.zend.com/php5/contest/contest.php?id=138&single=1 *** [PRADO] http://www.zend.com/php5/contest/contest.php?id=36&single=1
“ After much too long in waiting, 3.5.4 can be downloaded. There are a lot of new features, including a guestbook and shoutbox, user groups have been added and stuff that I dont feel like putting here too! PHPX is a constantly evolving and changing Content Management System (CMS). PHPX is highly customizable and high powered all in one system. PHPX provides content management combined with the power of a portal by including in the core package modules such as FAQ, polls, and forums. PHPX uses dynamic-template-design, what this means is that you have the power to control what your site will look like. Themes are included, but not required. You can create the page however you want, and PHPX will just insert code where you want it. No more 3 columns if you don’t want it! Written in the powerful server language, PHP, and utilizing the amazingly fast and secure database MySQL, PHPX is a great solution for all size website communities, at the best price possible...free! “
Zend Encoder for MAC OSX Now Available Zend has announced the release of Zend Encoder for Mac OS X. “ The Zend Encoder is the recognized industry standard in PHP intellectual property protection. The Zend Encoder allows an unlimited number of PHP applications to be distributed, while ensuring your investment and source code are protected from copyright infringement. Independent Software Vendors (ISV’s) and Professional Service Providers (PSP’s) rely on the Zend Encoder to deliver their exclusive and commercial PHP applications to customers without revealing their valuable intellectual property. By protecting their PHP applications, these and other enterprises expand distribution and increase revenue. The Zend Encoder compiles and converts plain-text PHP scripts into a platformindependent binary format known as a ‘Zend Intermediate Code’ file. These encoded binary files are the ones that are distributed instead of the human-readable PHP files. The performance of the encoded PHP application is completely unaffected! The Zend Optimizer, a free download, is the run-time environment that enables end-users to transparently execute these files as if they were regular PHP scripts. The Zend Optimizer not only provides an additional level of increased security against reverse engineering, it also improves performance speed. “ For more information visit: www.zend.com
November 2004
●
PHP Architect
●
www.phparch.com
MySQL Version 4.1 Certified as Production Ready MySQL.com has announced that version 4.1 of its database management system is now production-ready for large-scale enterprise deployment: “ MySQL AB, developer of the world’s most popular open source database, today announced the general availability of MySQL® 4.1. Certified by the company as production-ready for large-scale enterprise deployment, this significant upgrade to the MySQL database server features advanced querying capabilities through subqueries, faster and more secure client-server communication, new installation and configuration tools, and support for international character sets and geographic data. MySQL 4.1 can be downloaded now at http://dev.mysql.com/. “
6
NEW STUFF
Apache HTTP Server 1.3.33 Released From Apache.org: “ The Apache Software Foundation and The Apache HTTP Server Project are pleased to announce the release of version 1.3.33 of the Apache HTTP Server (“Apache”). This Announcement notes the significant changes in 1.3.33 as compared to 1.3.31 (1.3.32 was not formally released). The Announcement is also available in German and Japanese . This version of Apache is principally a bug and security fix release. A partial summary of the bug fixes is given at the end of this document. A full listing of changes can be found in the CHANGES file. Of particular note is that 1.3.33 addresses and fixes 2 potential security issues: CAN-2004-0940* Fix potential buffer overflow with escaped characters in SSI tag string. And CAN2004-0492** Reject responses from a remote server if sent an invalid (negative) Content-Length. We consider Apache 1.3.33 to be the best version of Apache 1.3 available and we strongly recommend that users of older versions, especially of the 1.1.x and 1.2.x family, upgrade as soon as possible. No further releases will be made in the 1.2.x family. “ Apache 1.3.33 is available for download from http://httpd.apache.org/download.cgi * http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-0940 **http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2004-0492
PostNuke Security Alert Postnuke.com has posted a new security alert regarding a Hack into the the ZIP archive of PostNuke .750 “ We discovered last night that http://downloads.postnuke.com was the target of a malicious attack and files in the ZIP archive of PostNuke .750 were changed. Immediately upon discovering this all links to the downloads section were removed and on Tuesday the 26th at 8:30 GMT the original download package was restored. Our investigations so far have revealed the attack was initiated on Sunday, 24.Oct, at 23:50 (11:50 PM) GMT. The attacker used an exploit in the download management software pafiledb to change the download address of PostNuke-0.750.zip to point to a compromised archive. We must stress this is a security compromise of paFileDB and has nothing to do with the PostNuke application. Note, if you downloaded the tar.gz archive you are not affected so you do nothing, only those who downloaded the zip version were affected and must take immediate action as detailed below. The changes made by the hackers were in two places. First, during the installation routine all data submitted (this includes the server, the database credentials, the admin name and password) is sent to a different server. Second, in one file there was code allowing a malicious user to execute any shell command on the web server. As noted before, immediate action is required from everyone who downloaded the .zip package between Sunday (24.Oct) at 23:50 GMT until Tuesday (26.Oct) at 8:30 GMT. “ For more information visit: news.postnuke.com
Take Advantage of Marco’s Wonky Math and Save Up to $80 Our fall/winter 2004 subscription campaign is in full effect—and this year we have some great offers for all our subscribers, regardless of whether you’re becoming a member of our family for the first time or if you’re been looking forward for your copy of php|a since the very beginning. Signing up for a php|a subscription (or adding another 12 months to your existing one) right now means some great offers, which include: • A $80 discount on the Zend Certification Guide • A free 64MB USB Memory Key, complete with the php|architect logo For more information, visit our website at http://www.phparch.com/wonky
Errata In last month’s Tips & Tricks column, John mentioned that a reader had pointed out a small flaw in his random-line access algorithm. In true php|architect fashion, we misspelled his name—quoting him as Chris Cowell, while his real name is Chris Dowell. Sorry, Chris!
November 2004
●
PHP Architect
●
www.phparch.com
7
NEW STUFF
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
WinBinder 0.27.093 WinBinder is an extension that allows PHP programmers to build native Windows applications. It wraps a limited but important subset of the Windows API in a lightweight, easy-to-use library so that program creation is quick and straightforward.
Parsekit 1.0 This package provides a userspace interpretation of the opcodes generated by the Zend engine compiler built into PHP. This extension is meant for development and debug purposes only and contains some code which is potentially non-threadsafe.
Imlib2 0.1 imlib2 is a very fast image manipulation library, but without the support for as many image formats as other libraries such as imagemagick. You will need the imlib2 library from http://sourceforge.net/projects/enlightenment/ in order to compile this extension. This extension is experimental. It's been tested on a number of Linux installs, but nothing else. Please report any bugs to the maintainer!
Translit 0.1 This extension allows you to transliterate text in non-Latin characters (such as Chinese, Cyrillic, Greek etc) to Latin characters. Besides the transliteration, the extension also contains filters to convert to upper- and lower-case words in Latin, Cyrillic and Greek, and perform special forms of transliteration, such as converting ligatures such as the Norwegian "ÃE" to "ae," as well as normalizing punctuation and spacing.
Check out some of the hottest new releases from PEAR.
PHPUnit 2.1.2 PHPUnit is a regression testing framework used by the developer who implements unit tests in PHP.
DB_odbtp 1.0.2 DB_odbtp is a PEAR DB driver that uses the ODBTP extension to connect to a database. It can be used to remotely access any Win32-ODBC accessible database from any platform.
Mail_mbox 0.3.0 This extension can split messages inside an Mbox, return the number of messages, return, update or remove a specific message, or add a message to it.
I18Nv2 0.9.1 This package provides basic support to localize your application, like locale based formatting of dates, numbers and currencies. It also attempts to provide an OS-independent way to setlocale() and aims to provide language and country names translated into many languages. It provides these classes: • • • • • • •
I18Nv2-OS-independent (Linux/Win32) setlocale(), other utilities I18Nv2_Locale-locale based formatter I18Nv2_Negotiator-HTTP negotiation of preferred language and charset I18Nv2_Country-multilingual list of country names I18Nv2_Language-multilingual list of language names I18Nv2_Charset-list of common and not so common charsets and aliases I18Nv2_AreaCode-list of international area codes (phone)
Stream_Var 1.0 Stream_Var can be registered as a stream with stream_register_wrapper() and allows stream-based access to variables in any scope. Arrays are treated as directories, so it's possible to replace temporary directories and files in your application with variables.
November 2004
●
PHP Architect
●
www.phparch.com
8
Managing Distractions and Reverse-Engineering Abstractions
Object-oriented vs. Relational Part II
F E A T U R E
by Rick Morris
To many object-oriented developers, there is a sense that the relational model for data management is at odds with the concepts of object-oriented development. Are these views justified? Are they practical? In part II of this twopart series, we build a simple example of an object-oriented application that derives business logic from the database without object/relational mapping.
Distractions Abound Apparently, a few of our loyal readers have been wondering just when Part II of this series would come out, or whether it had vanished into the /dev/null bitbucket. Please bear with me; since writing that first article I have gone through three South Florida hurricanes and the birth of my second child (a boy!). So, suffice it to say that there have been a few distractions in my life. But, in the best tradition of making lemons into lemonade, let’s think about what distractions can teach us. Foremost, I would have to say that they highlight the need to think clearly and accomplish as much as possible in the shortest possible time (and if my parents had known when I was 16 that I would grow up to write a line like that, they would truly believe in miracles). With that in mind, let’s think about how databases help us manage distractions. I have a very small synopsis of what the relational model of data affords us: the ability to name things in order to control them. That may seem a tad simplistic, but think about it: relational databases give us the ability to apply a distinct and clear name with a value attached to everything—and, most importantly, they don’t require any other mechanism than a simple declaration of the named element and its attributes in order to retrieve the associated value. In other words, you don’t need to think about “the information in column #2 of row #34,” or “the eye_color field is in column #4 of this table.” Instead, the mechanism for accessing your data simply is the statement of what you are looking for. “Get all eye_color values for employees born before 1968” can be directly translated into SELECT eye_color FROM employee WHERE employNovember 2004
●
PHP Architect
●
www.phparch.com
ee.birthday >= ‘1968-01-01’;. The fact that most database systems also allow you to ask for the data on row #34 and the value in column #2 should be regarded as a nuisance rather than a feature, if you truly follow the principles of relational database design. Distractions also teach us that abstraction helps us manage the organization of our lives. If we had to think about every single step we took, we would be pretty frustrated by the end of the day, but our bodies allow us to abstract that into a general concept called “walking.” Thus, rather than thinking about moving our feet up and down in the exact steps we will have to take in order to get to the refrigerator, we just decide to walk to the refrigerator, and let the abstraction take over (believe me, with a wife, a 4-year-old and a newborn, I have been doing a lot of walking to the refrigerator lately). In the same way, both relational database systems and object-oriented programming allow us to create abstractions that save us time in the future. They just do so in different ways. What’s wrong with this picture? As we discussed in Part I, one of the biggest distractions in modern programming is the “object-relational
REQUIREMENTS PHP: 4.x+ OS: Any Other software: PostgreSQL version 7.4+
Code Directory: relational
9
FEATURE
Object-oriented vs. Relational Part II
impedance mismatch.” For example, let’s look at a basic SQL query and online form that a PHP developer’s apprentice might create. Our developer has been wise in taking advantage of abstraction, so she uses the PEAR HTML_Quickform class to generate the form for a simple user signup script (see Listing 1). It is, undoubtedly, a nice choice—and kudos to our aspiring guru for taking the time to read, instead of just hacking a quick-n-dirty HTML form with interspersed PHP. We will, for the moment, assume that $db is an instance of a database abstraction class with nice methods like create_entry() and last_error(). This example might look nice, clean and readable— really a pleasure to look at if you are tired of the spaghetti code littering many web-based applications. So what’s wrong with it? Well, just think about what has to be done if you have to modify the database: your developer will have to modify every script that interacts with the table and other DB structures that have been changed. Amazingly, this is the expected—and accepted—state of affairs for most application programming out there, especially in the area of web-based software. The benefits of the relational model are often seen to be at odds with the capabilities of Object-oriented Programming. While the relational model is based on set-oriented thinking, which presents a unified logical method of accessing and modifying all kinds of data, object-oriented programming provides a metaphorical approach to data and essentially uses custom functions to bundle behaviors with it. While at first it seems logical that one would just develop objects in parallel with the tables in one’s logical database design, we saw that
this leads to a fundamental problem: you inevitably end up duplicating effort, as you slavishly model one environment to follow the other. Really, the best parallel to objects and behaviors in the relational world could be datatypes and operators. Think about it. The real objection is that datatypes are seen as “scalar” and “primitives,” but there is actually nothing in the relational model itself that prevents datatypes/domains from being complex, composite, or even customized by the user, along with customized operators. This begs the question, however, since most database systems don’t have a truly comprehensive way to deal with custom datatypes. However, we can at least take a step in that direction with the use of domains.
A Note on Terminology: Since this is an attempt at a serious look at what can be accomplished by programming in accordance with the relational model, we will use relational terminology. Rather than “datatype”, we will refer to “domain” and, rather than “table,” we will use the term “relation”. There is a good reason for this: we should be able to treat custom domains the same way as baseline scalar datatypes like Int and, in fact, a domain is simply the “set of all possible values” for a certain term (think about your high school algebra). Also, a table is a relation, but so is a view (in fact, so is the result set from any query or stored procedure). The point here is that there is a decided advantage to our programming front-end if we don’t make distinctions between datatypes and custom domains, or between tables and views. Your application front-end has no real need to know the underlying design Listing 1 of your database, but simply what is pre1
November 2004
●
PHP Architect
●
www.phparch.com
10
FEATURE
Object-oriented vs. Relational Part II
rather than inheritance” to automatically “know” what to do when presented with a specific table, query, or other database construct, given that you already have a class that parallels each datatype? Are these just rhetorical questions? Not really. If you think this through a little, you’ll find that the concept we presented in the last article hints exactly at this type of solution. In other words, rather than the peculiarly brittle approach of creating a parallel class for every table, we instead opt to model orthogonal concerns, leaving us much freer to change one aspect of the system without changing another. Now, if we want to change the way our application deals with autonumbered primary key columns, for example, we just have to change it in one place, instead of many. Adding validation capabilities to our front-end is not such a frustrating experience anymore. If you use domains, you have even more control. For example, if you decide that wherever the “zipcode” domain is used as a column definition, you want the application to present it in the Courier font, it is a one-
Listing 4: Continued...
Listing 4 1
November 2004
●
PHP Architect
●
www.phparch.com
base system. It deals with your database at the microscopic level. What about dealing with your database on the macroscopic level? What if you could just “throw” a table name at a certain class or group of classes, and have the application just read the table metadata, instantiate the right objects for each datatype in a column, and automatically build your form? If this sounds like a too-good-to-be-true theoretical approach, just think about it a moment: if you have a class for every datatype, as well as classes to handle all the “mechanical” aspects of connecting to the database, querying, retrieving data, and so on, then all you need is a class or set of functions for retrieving table/view metadata and some methods for tying these classes together into a logical interface for the database. Finally, we will have a class called RelGui (Listing 4) to handle the display and user interaction. RelGui can inherit from some existing classes, such as PEAR’s HTML_QuickForm, and HTML_Table (and many additional helpful PEAR classes, if you wanted it to). The whole idea here is that your entire application can pass every database interaction through a dispatch file that looks essentially like the one you see in Listing 5, with the classes in the listings mentioned above. In addition, the code in Listing 6 ties them together as required files. Of course, as I mentioned earlier, PEAR’s HTML_Table and HTML_Quickform are required.
“The benefits of the relational model are often seen to be at odds with the capabilities of Object-oriented Programming.” Briefly, here is what is going on in these files: • When a browser request is made to Listing 5, at a minimum, the GET parameters for relname and mode need to be passed as part Listing 6 1 2 3 4 5 6 7 8
12
FEATURE
Object-oriented vs. Relational Part II
of the URL. This tells the program what relation to deal with, and how we want to interact with the relation. Valid modes (for now) are list, view, edit, and search. In “list” mode, the table will be queried, and the data output as rows. • When the user clicks an “edit” link on one of the rows, the page re-draws itself in “edit” mode, using URL-encoded arguments passed in the link to provide enough information for the WHERE clause in SQL. A form is Figure 1
Figure 2
then drawn, based on the relation metadata (column definitions) and the values for each column in the fetched row. The user can then edit the row and submit the new values. To look a little deeper, the code in Listing 5 first initializes a new Relation object, based on the relname parameter, and then passes a database connection identifier to it. The Relation object prepares the metadata for the given table/view. Then, the code initializes a new RelGui object and passes the Relation object to its constructor. The appropriate mode is passed as well, so that relGui can automatically know how to display the data. If the mode is list, for example, then the page looks like Figure 1. If the mode is edit, RelGui automatically knows to create an editable form (as you can see in Figure 2). Inside Listing 3, you will see that, after extracting the basic metadata with pg_meta_data(), we loop through the basic information to first create the related object for each domain—and then run some more sophisticated queries in PostgreSQL to get further metadata. We take advantage of PHP’s variable classname calling on lines 68 and 75 of Listing 4 to create our domain/datatype objects, thus allowing RelGui to derive hints on presentation, and other aspects of the GUI. Of course, it is clear that we could do this without creating the custom RelDomain classes, but then we would lose much of the ability to add extra behavior to different column types. To further illustrate the idea of how we can affect any area of the system through this, uncomment $this>setCssStyle(array(‘color’=>‘red’)) in the rel_int class stored on line 43 of Listing 2. Now, any integer type column will display in red (the new style will be added to the fourth parameter of HTML_Quickform’s AddElement call in Listing 4). There are many more things that can be added this way, if you want to play around with this concept, including additional HTML attributes, Javascript event handlers, and so on. What About MySQL? You might be wondering how you can implement these concepts if you are
November 2004
●
PHP Architect
●
www.phparch.com
13
FEATURE
Object-oriented vs. Relational Part II
using MySQL. While it is true that MySQL offers far less functionality, lacking views, constraints, and domains, than other DBMSs, it is precisely our sort of framework that can help overcome some of these difficulties. For example, if you exercise a certain amount of discipline in table design, you can “fake” domains by naming columns after PHP classes you have created, and use the column names as you reverse-engineer your tables. This way, if you have a column named “zipcode”, you can assume that it corresponds to your rel_zipcode class in PHP. Also, you could compensate for the lack of views by creating a class that maintains a lookup table with a list of names, and associate them with stored queries, and uses the metadata functions in the Relation class to return all the column definitions for these “views”. Similarly, you could maintain a lookup list of constraints expressed in PHP which apply to certain tables, column names, and so forth. The same could also apply even more to SQLite, or other lightweight database engines. It’s a win-win situation. Garbage Collection: A Few Final Notes Of course, a perfectly valid complaint about this sort of approach is “what about efficiency?” Don’t all these extra classes and methods present a serious performance problem? It’s possible. As a general rule, hard-coding can lead to increased performance, and that would apply to how we build our forms and interact with the database. However, as usual again, this comes at the price of flexibility and manageability (or “developer performance”). Which is cheaper, computer performance or your performance as a programmer? If you want these advanced capabilities but are worried about performance consider three things: first, the code presented here is not optimized; there should be plenty of ways—especially with PHP5—to get better performance. Second, there are several good PHP optimization systems, such as the Zend Optimizer and Zend Accelerator, as well as the Ioncube Accelerator and the PECL::apc extension, for example, and Third, computer hardware is really cheap these days. It would probably cost less than a week’s wages to upgrade a server significantly in performance. Therefore, performance concerns notwithstanding, in the end we have an interesting situation. We are using quite a few of the concepts of object-oriented programming: inheritance, encapsulation, aggregation, and composition, without once infringing on the database’s handling of business logic. Now I agree that this small example of coding here is incomplete (it is up to the reader, for example to implement the insert/update/search modes), and the data presentation leaves much to be desired. In order to really create a seamless whole between the hardcore logic management in the database and the interactivity/presentation of the application layer, one would probably need ten
November 2004
●
PHP Architect
●
www.phparch.com
Listing 4: Continued from page 11 101 $colnum = 0; 102 $rownum = 0; 103 $this->gui->setHeaderContents( $rownum, $colnum, ‘’); //empty space 104 $colnum++; 105 foreach($this->metadata as $colname => $colinfo) 106 { 107 108 $this->gui->setHeaderContents( $rownum, $colnum, $colname); 109 $colnum++; 110 } 111 112 //create data rows 113 114 $rownum = 1; 115 while($row = $this->relation->fetch_row()) 116 { 117 $colnum = 0; 118 $escaped_args = urlencode(serialize($row)); //argumenst to uniquely identify a row 119 $this->gui->setCellContents($rownum, $colnum, “relation->relname}&mode=edit&arguments={$escaped_args}\”>edit”); 120 $colnum++; 121 foreach($row as $colname => $colval) 122 { 123 $this->gui->setCellContents($rownum, $colnum, $colval); 124 $colnum++; 125 } 126 127 $rownum++; 128 } 129 130 echo $this->gui->toHTML(); 131 } 132 else 133 { 134 //in all other cases there should be only one row returned 135 136 $this->gui->addElement(‘header’, ‘’, “Edit {$this->relation->relname}”); 137 138 $row = $this->relation->fetch_row(); 139 140 141 foreach($row as $colname => $colval) 142 { 143 144 $html_attribs = array(“value”=>$colval); 145 146 //additional HTML attribs are set here 147 if(is_array($this->data[$colname]>css_attribs)) 148 { 149 $html_attribs = array_merge($html_attribs, $this->data[$colname]->css_attribs); 150 } 151 if(is_array($this->data[$colname]->css_class)) 152 { 153 $html_attribs = array_merge($html_attribs, $this->data[$colname]->css_class); 154 } 155 156 //set max size attribute for textboxes 157 if($this->metadata[$colname][‘len’] > 0) { 158 159 $html_attribs = array_merge($html_attribs, array(‘maxlength’ => $this->metadata[$colname][‘len’])); 160 } 161 162 $this->gui->addElement(‘text’, $colname, $colname, $html_attribs); 163 } 164 165 $this->gui->addElement(‘submit’, ‘btnSubmit’, ‘Submit’); 166 167 $this->gui->display(); 168 169 } 170 171 } 172 173 174 }
14
FEATURE
Object-oriented vs. Relational Part II
times as much code as we have here. This article, however, provides a basic prototype on how to handle this sort of thing. Again, PEAR can come to our rescue in handling more sophisticated presentation and interaction needs, with such classes as the following: DB_QueryTool DB_Table DB_DataObject DB_DataObject_FormBuilder SQL_Parser HTML_Table_Matrix HTML_Select HTML_Table_Sortable Structures_DataGrid
With these classes and the concepts presented in this article, one could construct a very sophisticated set of tools, not only for an application framework, but even for an application development framework. Think about the possibilities of creating something what would be like the web-based version of commercial systems like Access, Paradox, or FileMaker Pro. In one sense, our tools could be even better, because the forms could correctly adjust themselves to changes in the database, which is something even the commercial tools mentioned above don’t really handle properly. Where this
sort of approach can really shine, though, is in the creation of large enterprise applications, where there might be several development teams, database administrators, and sysadmins. Imagine being on the phone with the database administrator saying “sure, go ahead and make that change, then see what the form looks like” and wondering what the look on his face looks like as he hears you speaking!
About the Author
?>
Rick Morris heads application development for MOS Imaging Systems in Miami, Florida (www.mos.us, www.netcompass.us). He lives near Fort Lauderdale, Florida with his wife, 2 children, and the world’s laziest cat.
To Discuss this article: http://forums.phparch.com/181
Available Right At Your Desk All our classes take place entirely through the Internet and feature a real, live instructor that interacts with each student through voice or real-time messaging.
What You Get Your Own Web Sandbox Our No-hassle Refund Policy Smaller Classes = Better Learning
Curriculum The training program closely follows the certification guide— as it was built by some of its very same authors.
Sign-up and Save! For a limited time, you can get over $300 US in savings just by signing up for our training program! New classes start every three weeks!
http://www.phparch.com/cert
November 2004
●
PHP Architect
●
www.phparch.com
15
Developing a PHP - XML Generator
F E A T U R E
by Man-ping Grace Chau
Developing a UI that manipulates large data sets is difficult. Developing a UI dynamically upon user request is even more difficult. This article aims to discuss the difficulties in developing a complex UI, demonstrated by the XML/XSL generator made in the Infospheres Lab at the California Institute of Technology. Examples of how to generate JavaScript dynamically at run time and how to represent a complex DOM tree with simple HTML data structure will also be shown. Furthermore, the article will discuss the optimization of input validation. Several error checking/correction algorithms will be demonstrated. This application is developed in PHP5, with extensive use of the new XML library.
Introduction Most XML generators are desktop applications—you can drag-and-drop beautiful GUI nodes to build a DOM document. The XML generator introduced here is a web application. All the richness that you can find in a desktop application is absent: the form only refreshes upon submit, a lot more code is needed to make a dynamic UI, HTML cannot support ‘complex’ data structures like DOM, the browser cannot maintain state, and so on. Would a web XML generator implemented in PHP be able to overcome all these constraints? This generator was developed in the Infospheres Lab at the California Institute of Technology to help users generate state/transformation files for a crisis management system. The states inside the system are described by XML files. During an event, an agent changes its state in ways described by XSL files. For more details about the crisis management system, you can refer to the Infospheres web site at www.infospheres.caltech.edu. Making this XML generator a web application has the advantage of being lightweight—the client only needs an Internet connection and a browser to use the service, thus it is platform independent. The server can also keep track of all the files generated and save copies. As with all web applications, efficiency is always the first priority to ensure performance and reduce server workload. Let’s understand how it works by looking at a fire accident scenario. The initial state is ‘at peace’: state name – normal; dead – 0; injury – 0; police – sleep; fire station – sleep; and so on. By putting all these values November 2004
●
PHP Architect
●
www.phparch.com
into the generator (Figure 1), the corresponding XML file (Figure 2) is generated. A message will be delivered to the system when a fire breaks out, indicating the ‘fire agent’ should change its state. The transformation rule for this transition (e.g.: if the number of injuries stated in the message is greater than four, the state name should change to ‘critical’) is specified in the XSL generator (Figures 3 and 4) to generate the corresponding XSL file (Figure 5). The flowchart of the entire generator is shown in Figure 6. As you can see, different XSL files will be invoked upon different events; for example, handletimeout.xsl will be invoked during timeout. UI that represents a changing dataset with complex data structure Initially, there are only four nodes under the root node in the XML generator (Figure 7). A new input form is made when the user modifies this structure (Figures 8 and 9). The user interface is dynamically generated after every refresh/submit, based on the different ‘states’ (which consist of datasets). In addition, we have to maintain ‘state’ change, which is based on the previous ‘state’ and the user’s modifications. You should immediately think of MVC (Model/View/Controller)
REQUIREMENTS PHP: 5.x OS: Any Other: None
Code Directory: xmlgen
17
FEATURE
Developing a PHP - XML Generator
Figure 1
“This generator was developed in the Infospheres Lab at the California Institute of Technology to help users generate state/transformation files for a crisis management system.” Figure 6
Figure 2
Figure 3
Figure 4
Figure 5 Figure 7
November 2004
●
PHP Architect
●
www.phparch.com
18
FEATURE
Developing a PHP - XML Generator
Figure 8
Figure 9
Figure 11
Figure 10
Figure 12
November 2004
●
PHP Architect
●
www.phparch.com
here: the Model encapsulates the ‘states’; the Controller interprets the user’s input, modifies the ‘states’ in the Model and chooses a new View; the View generates the UI based on the datasets in the Model (Figure 10). The main focus here is the datasets: in order to make an efficient Model/View/Controller, the dataset should be easy to manipulate. There are already many articles about the MVC design pattern; you can refer to the book Design Patterns by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides or to Jason Sweat’s articles that have appeared on the pages of php|architect for more information. By just looking at the form, it seems that appending a child is as simple as adding one more row to the table. It is, indeed, easy to manipulate a table— but this is only half of the picture. This table actually represents a DOM tree and the rows are the child nodes at different levels (you can see in Figure 11 how the node level is indicated in the input table). Thus, the dataset should be a DOM document, with its structure specified by the user. The problem arises that HTML does not support complex data structures like DOM, and there is no way to POST a DOM node. The ‘most complex’ data structure supported by HTML POST is an array. In solving this problem, we retrieve the user’s input in such a way that the it can be mapped to the DOM document inside the Model. Even if you ignore the previous problem, imagine how complicated it would be to generate the HTML input table from a DOM document. Therefore, let’s look at the problem in another way: as mentioned, the most complicated data structure supported by POST is an array; is it possible that we can switch the dataset from a DOM document to an array? This may seem quite difficult, but PHP makes it less so. PHP is a scripting language, which means no compilation is needed. It supports many more powerful operations on arrays than most compiled languages, such as the ability to change the size of an array at run time and better support for associative arrays. An
19
FEATURE
Developing a PHP - XML Generator
XML structure like the one in Figures 12 and 13 can be represented by an array as shown in Figure 14. In storing the nodes’ values to generate a new file or form, variables with names like $attribute_2 , which can easily be mapped to the array, are used for the attribute of the second child of the root node. This is necessary, as the array cannot store the values and structure at the same time (you can easily determine this by looking at the ‘parent node’ in the array). After a POST, the values can be retrieved simply as $HTTP_POST_VARS [$variable_name], in which the variable names can be assigned when we loop through the DOM-structure-array by using the array keys. This approach solves two problems at the same time without a complex algorithm: it is much easier to generate the input table from an array (that represents the structure) and variables (that store the node value) (Figure 15— HTML of input table); both simple variables and arrays can be POSTed to notify the server of the changing structure and node content. The algorithm for generating the XML file and retrieving user input will be described in the following sections, when we discuss the View and Controller. As we will see, this ‘data structure’ also helps improve efficiency. Arrays in PHP are more useful than in other languages, since the structure can be changed during runtime. You can easily devise a program that utilizes arrays, like, for example, a computer purchasing application. Let’s apply the MVC design pattern again: the Model is initialized with all the computer models available; based on the Model, the View generates an input form to accept purchasing information for each computer. When the form is submitted, the Controller interprets the input and notifies the Model about this additional information. Then, based on the changed Model, a new View selected by the Controller generates the validation form or another more complicated input form. An array is used here because it can simplify the mapping between the purchasing information/other more complex data
Figure 13
Figure 14
Figure 15
Figure 16
November 2004
●
PHP Architect
●
www.phparch.com
20
FEATURE
Developing a PHP - XML Generator
and each computer, as demonstrated in Figures 16 (array with computer models), 17 (array with buyer’s name), and 18 (array with extra hardware). This mapping is not restricted to be a one-to-one relationship; multiple-to-one is also possible by using nested arrays. This is only possible if we can change the array size at run time, because the amount of information is uncertain. This helps a lot in terms of efficiency, because no searching is needed during mapping and most algorithms that are based on this ‘data structure’ can be finished in time linear with respect to the number of items in the arrays. The same approach is applied in making the XSL generator: the Model is initialized with the schema of the XML file and the corresponding transformation rules are stored in arrays as mentioned above. In dealing with a changing dataset, the MVC design pattern is the only choice for a web application. On the other hand, we need to take care of several things when we use a certain way to store complex data: the efficiency of manipulating the data, the ease of generating a user interface from the data, whether HTML is able to transmit those data, the ease of performing
Figure 17
Figure 18
Listing 1 1
ing in the invoker (perhaps based on the input parameters) to decide what function to call on command. However, this exposes the command object interface and renders the Command pattern ineffective. Another approach is to pass in function handlers. It is easy to create a function handler in PHP because it is only a string containing the function name, and the function can be called by $object->$function_name() (an example is drawLevel in Listing 3 to print the node level, called by the tableData function in Listing 4), which does not involve any ‘complex concept’ like function pointers in C and C++. In this case, the invoker is completely ignorant about the command object. The main difference between rendering the file and the UI is that we have to perform extra error checking when the former is being handled. For example, Figure 23
Figure 24
Listing 4 1
November 2004
●
PHP Architect
●
www.phparch.com
23
FEATURE
Developing a PHP - XML Generator
Listing 5 1
Listing 6 1
November 2004
●
PHP Architect
●
www.phparch.com
adding a node without appending it to any parent in a DOM tree is invalid. In this case, the node is valid when we merely look at the node content, but when we look at the tree structure as a whole, the node becomes an error. This kind of checking is unnecessary if we store the values in a DOM document, but we store them separately here. Thus, we need the functions getCNode (Listing 5) and getMNode from Listing 2. These functions are responsible for error checking and for rendering the child or parent node using the DOM functions in PHP5. The DOM functions throw exceptions for invalid expressions, thus ensuring the correctness of the node content. In addition, getCNode() and getMNode() return NULL when reading a node without tag names; this causes all the node’s children to be neglected because they do not have a parent. With the checking performed by the Controller when appending a number of children to the tree, the validity of the file can be guaranteed. However, there are cases where checking cannot be accomplished by just looking at the current value, but other entries also need to be taken into account. This is inefficient because some values may be visited multiple times for checking and the running time may become exponential. If we apply this method to the XSL generator, it does not even work—for example, if we try to identify whether a node is a value-of node or a text node by looking at the node value (vvalue-of node in Figure 21; text node in Figure 22). This is error prone because an expression like //level+1 seems to be a value-of node, but may actually be a text node (using a strange expression). The most robust solution is to require the user to specify the types and store them in another array, which again can be mapped to the original set of data easily. This approach can be further applied to mark whether a field is used, whether the node is a child or parent, and so on. Although more resources are needed, this ensures that no node is visited more than once for error checking. In addition, it helps simplify the algorithm for building UI/file as all related information can be found easily without searching (Listing 6). Input processing The Controller is responsible for handling the user’s input according to the MVC design pattern. In order to design an efficient Controller, we must first understand what data is transmitted to the server in a PHP web application. The GUI in a desktop application usually has event handling to invoke programs, for example MFC or Window forms in .NET, and so on. However, in a web application, there is no way that you can ask a text box to invoke a function from the server. The only way for users to communicate with the server is by submitting the whole form, which means the server will be sent all
24
FEATURE
Developing a PHP - XML Generator
Figure 25
Figure 26
“...JavaScript can help restrict the amount of data sent to the server. In addition, it can produce a truly dynamic user interface.”
Figure 27
Figure 31
Figure 32
Figure 28
Figure 29
Figure 30
November 2004
●
PHP Architect
●
www.phparch.com
25
FEATURE
Developing a PHP - XML Generator
the GUI information. However, the server usually does not want that much—fields that have not been modified do not contain any useful information. Sending all of them to the server wastes bandwidth and adds extra workload to the server, as it needs to search for specific information. If the data is sent in this way, it is difficult to develop an efficient Controller, as the information to be processed will be overwhelming. Thus, before we design the Controller, let’s see what we can do to restrict the amount of data sent. Let’s look at the XSL generator again (Figure 23). When we click “Go!”, we only want to submit the ‘number of items’ for that particular table. Only that number is needed in drawing a new table, so sending any more data would be a waste. The only solution is to make a form that only contains the ‘number of items’ and the “Go!” button; thus, there will be multiple forms in a single page. However, this method does not work when the input fields are interleaved, as the contents in a single form must be grouped together. This is also true for the XSL generator; the input fields of XSL node values are interleaved with the ‘number of items’ text boxes (Figure 24), thus the input fields for XSL nodes cannot be put in the same form. We can, however, use JavaScript to group these entries. In this example, these input fields actually do not belong to any form. The values in the fields are extracted by JavaScript during submit and placed into another separate form, in the corresponding hidden fields (figure 25). The caution: the JavaScript fails if it operates on non existing fields. Thus, the JavaScript must also be dynamically generated if the form is dynamically generated; we discuss this in the next section. We are able to limit the amount of data sent in the above example, so the Controller does not have to search for the information. However, in some cases, most data in the form are needed and therefore it is wasteful to write so much extra code to restrict the amount of data sent. In these situations, we need to figure out how to process the data efficiently. Let’s look at the XML generator as an example. When we append children to a node, all data in the form except the “number of children to append” for other nodes are needed (Figure 26). The user’s input will be in one of the “number of children to append” fields and we need to search that out. One way is to walk through all of them and search for the one that contains a number. The running time here is linear in the number of nodes. Can it be more efficient? Yes, when we can utilize the data sent. The button data is also sent at submission time. Interestingly, when a button name is specified as an array (e.g. button[0][0][1][0]), the submitted data is an array as well (Figure 27). We can find out what button the user has pressed and retrieve the corresponding “number of children” without visiting all the input fields. Listing 7 shows how to find the node and modiNovember 2004
●
PHP Architect
●
www.phparch.com
Listing 7 1
26
FEATURE
Developing a PHP - XML Generator
Listing 8 1
November 2004
●
PHP Architect
●
www.phparch.com
fy the “DOM array tree”. The running time becomes linear in the depth of the tree, which is much more efficient. In addition, we perform error checking to discover whether the user has entered any malformed input (like negative numbers or letters). The original ‘DOM array’ remains untouched in such a case. Dynamic UI As mentioned in the last section, JavaScript can help restrict the amount of data sent to the server. In addition, it can produce a truly dynamic user interface. In JavaScript, the change in user interface is performed by the browser instead of the server. More specifically, we can specify a function and ‘relate’ it to a control so that the browser will invoke the function when that Listing 9 1
Listing 10 1
27
FEATURE control
is
Developing a PHP - XML Generator
triggered.
For more detail see where you can find lots of good tutorials on JavaScript. Let us examine the XSL generator to see how JavaScript can produce a dynamic UI. This is the expression generator (Figure 28). It can generate simple expressions as shown in Figure 29, or http://www.w3schools.com/ ,
Listing 11 1
Listing 12 1
November 2004
●
PHP Architect
●
www.phparch.com
complex expressions as shown in Figure 30. The dynamic UI part is in adding and removing terms with parentheses, changing the type of expression (Figure 31) and switching between self-defined values and system-provided variables (Figure 32). In this example, not only is a dynamic UI rendered, but bandwidth is also saved and error checking is performed; instead of sending the menu values one by one, they are combined as a string and the expression is checked for validity before sending to the server by the toString function shown in Listing 8). However, there is no formal way to generate the JavaScript using PHP at runtime. We need to do so because the content of our client-side script cannot be fixed when the data is encapsulated in the PHP script. In this example, the number of input fields that the JavaScript needs to operate on is not fixed until the XML file is parsed. If the scripting mechanism is carelessly implemented, everything turns into a big bowl of spaghetti code, and changing either the server- or client-side scripts becomes extremely difficult. Therefore, let’s solve it step by step. The first thing to do in separating the JavaScript code is to encapsulate it into PHP functions and put it into a single class, which I called CommonJS. Some functions can be hardcoded (Listing 9), while others need to get parameters from the function callers (Listing 10). Next, we separate the code and define a class to call the functions, called Expression here. Expression somehow gets all parameters required by the CommonJS functions from different parts of the system, and calls the functions from CommonJS. Thus, the CommonJS interface is only exposed to Expression. The Bridge pattern is again applied: the implementation part is in CommonJS and the logic part is in Expression; this way, changing either of those will be easy (in most cases, we are just changing the implementation part). So far, so good. However, there are two problems: • What happens if there is more than one object that wants to call the JavaScript functions? These objects may call the same function twice, passing different parameters, and the JavaScript will crash.
Listing 13 1
28
FEATURE
Developing a PHP - XML Generator
• How can Expression get the parameters from the whole system? If this is implemented carelessly, Expression may know about the interfaces of most parts of the system, and we would need to rewrite Expression whenever more data is added. The solution to the first problem consists of introducing a new class, JSDrawer, to be the only function caller of CommonJS (instead of Expression). It retrieves function handler/parameters from different objects and merges them before calling the functions from CommonJS (Listing 11). All functions are called exactly once. The function handlers (Listing 12) are again only strings that contain the function names. In this case, no mat-
ter how many objects need to call the JavaScript functions, it can be done through JSDrawer, as long as these objects have the same functions to provide the function handlers and parameters. The solution to problem number two is to require the system to put data in a centralized object, named DataSet in my case. This way, Expression will be ‘fed’ by DataSet for the parameters and remain unaware of the rest of the system. In the previous case, if Expression asks the system ‘explicitly’ for data, it has to be rewritten when more more data from other parts of the system is included. However, this is not necessary with DataSet; storing all data collectively in this structure also allows passing the data from page to page by serialization and session mechanisms (Listing 13), and
Listing 1: Continued from page 21 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 ?>
$new = $this->file.”,”.$string; file_put_contents(“files/agents.dat”, $new); } return $this->statement; } /** * The recursive function to append the children **/ function xmlBuild($info, $bean, $mother, &$motherNode) { //$info is the structure-array //$bean contains XML data //$mother is the strings indicating node level //$motherNode is the parent DOM node foreach ($info as $key=>$value) { if (is_array($value)) { $this->domWriter->setData($bean, $mother, $key, false); $result = $this->domWriter->getMNode(); if ($result != NULL) { $motherNode->appendChild($result[0]); $this->expression->setData($result[1], $mother, $key, false); $this->statement .= $this->table->rowStart(); $this->statement .= $this->table->tableData($this->expression, “drawLevel”, 1, 1); $this->statement .= $this->table->tableData($this->expression, “drawTag”, 1, 1); $this->statement .= $this->table->tableInfo(“N/A”, 1, 1, NULL); $this->statement .= $this->table->tableData($this->expression, “drawAttr”, 1, 1); $this->statement .= $this->table->tableData($this->expression, “drawAttrV”, 1, 1); $this->statement .= $this->table->rowEnd(true); $newMother = $mother.$key.”_”; $this->xmlBuild($value, $bean, $newMother, $result[0]); } } else { $this->domWriter->setData($bean, $mother, $key, true); $result = $this->domWriter->getCNode(); if ($result != NULL) { $motherNode->appendChild($result[0]); $this->expression->setData($result[1], $mother, $key, true); $this->statement .= $this->table->rowStart(); $this->statement .= $this->table->tableData($this->expression, $this->statement .= $this->table->tableData($this->expression, $this->statement .= $this->table->tableData($this->expression, $this->statement .= $this->table->tableData($this->expression, $this->statement .= $this->table->tableData($this->expression, $this->statement .= $this->table->rowEnd(true);
“drawLevel”, 1, 1); “drawTag”, 1, 1); “drawValue”, 1, 1); “drawAttr”, 1, 1); “drawAttrV”, 1, 1);
} } } }
November 2004
●
PHP Architect
●
www.phparch.com
29
FEATURE
Developing a PHP - XML Generator
introduces one more way of storing data: storing the serialized DataSet in a file. Review There are many more constraints in implementing the user interface for a web application than for a desktop application. As we walk through examples of how to display complex data from changing datasets, how to generate a UI, how to process users’ input and how to produce a dynamic UI, we see that the approaches are completely different. Besides functionality, performance and flexibility are also important: the server will be easily overloaded if the code is inefficient, and UI of web applications changes often so the cost of redevelopment should be minimized. The XML/XSL generator can be found online at http://unity.cs.caltech.edu/~mpchau13/start.php. It is still in the prototype stage—and you are, of course, welcome to report any bugs you encounter. Acknowledgements I would like to thank Prof. K. Mani Chandy and Dr. Daniel M. Zimmerman for their unfailing guidance and support, and for the freedom to conduct the research. Thanks also to Mr. Elliott Michael Karpilovsky, Mr.
November 2004
●
PHP Architect
●
www.phparch.com
Jonathan Lurie, Mr. Shaun Lee and Mr. Siu-on Chan for their valuable comments and advice. This project is financially supported by the Hong Kong Chapter of the Caltech Alumni Association.
About the Author
?>
Grace is a year-three undergraduate student at the Chinese University of Hong Kong, majoring in information engineering. She earned her expenses by developing and maintaining over ten systems in PHP for the whole school, including all kinds of attendance record systems, as well as school project databases. This XML generator is part of her Summer Undergraduate Research Fellowship at the California Institute of Technology.
To Discuss this article: http://forums.phparch.com/182
30
PHP 5 & XML
F E A T U R E
by Ilia Alshanetsky
When most people talk about PHP 5, they tend to concentrate on the many improvements in the way it works with objects, which is certainly a significant component of the new release. This often leaves equally important improvements, such as the changes made to the way PHP works with XML, neglected. This article’s goal is to familiarize you with how XML is handled by PHP 5 and how these changes make working with XML much easier.
O
ne of the changes introduced in PHP 5 in regard to XML handling is a change in the underlying library that is being used to parse XML docu-
ments. In PHP 4, the Expat library was used to parse XML documents when the standard XML parsing extension was used. To ensure that the XML extension would be enabled by default, the Expat library was bundled with PHP, meaning that you’d always have the necessary dependencies—the fact that the library had a PHPfriendly license and was quite small helped cement this decision. Furthermore, new versions of the library were far in between, making the job of ensuring that the latest stable release was always bundled with PHP fairly simple. While this convenience was nice, it did cause a number of problems. First of all, bundled libraries are built into PHP directly during the compilation process, meaning that, whenever you wanted to upgrade the library, you would need to recompile your entire PHP build. Moreover, if PHP was used as a module—with Apache, for example—the library could end up conflicting with another Expat library variant used by another piece of software, such as an Apache 2 server. You could compile PHP by specifying an external Expat library via the —with-expat-dir flag, but very few people actually did so. Another problem with Expat is the fact that its development cycle was quite slow and new functionality was very rarely, if it all, added, meaning that the capabilities of the library were somewhat limited. Other XML extensions, such as DOMXML used a different XML parsing library, Libxml2, which meant
that you could end up having PHP use two libraries that duplicate each other’s functionality—never a good idea. Libxml2: The Swiss Army Knife of XML Libraries With the impending release of PHP 5, in order to unify the underlying XML parsers, all extensions that work with XML were made to use the Libxml2 library, which is an XML C parser and toolkit developed for the Gnome project. This library is a very popular XML parser and virtually every modern *NIX distribution has it installed by default. It is being very actively developed and, consequently, supports all of the new and advanced XML features, such as DTD validation, Xpath, HTML support, and so on. After much talk on the PHP development list, the decision was made not to bundle the library, since it is quite burdensome (as a source package, at least, it’s almost as large as PHP itself) and its frequent releases would make bundling it a maintenance nightmare. However, because the library can be found on virtually any system (for Win32 builds, the libxml2.dll file is included with PHP) this not much of a problem and actually allows easy upgrades of the library without having to recompile PHP, since the library is linked dynamically. Libxml2 happens to support an Expat-emulation
REQUIREMENTS PHP: 5.x OS: Any Other: N/A
Code Directory: php5xml November 2004
●
PHP Architect
●
www.phparch.com
32
FEATURE
PHP 5 & XML
mode, which makes it behave in an almost identical manner to the original Expat library. This means that, for the most part, the behavior of the XML extension based on Expat in PHP 4 is not different from the one in PHP 5. The new library is significantly faster in parsing XML documents, however, especially when DOM trees are built. This said, when using SAX, in some cases the original Expat library will be a little faster, which is why some people who only use Sax still can compile the XML extension in PHP 5 against the Expat library by specifying the —with-libexpat-dir configuration directive. Keep in mind, however, that, since the Expat library itself is no longer bundled, if you choose to go this route, you will need to make sure that it is separately installed on your system before you configure and compile PHP. The Traditional Way The “oldest” way of parsing XML documents via the XML extension—using the xml_parser_* family of functions—didn’t change much between PHP 4 and 5. The only changes are caused by slight incompatibilities between the underlying XML libraries, which, for the most part, are fairly minor and hard to trigger and will not, therefore, affect most users. This parsing methodology uses the Simple API for XML (SAX), which is an event-driven interface in which the parser invokes one of several methods supplied by the caller. “Events” include recognizing an XML tag, finding an error, encountering a reference to an external entity, or processing a DTD specification. The advantage of this particular approach is that it does not require you to have the complete document in memory. This is particularly good for large XML files that would require immense amounts of memory if they were loaded into memory. It also allows you to parse XML documents as you receive them from external sources (such as an HTTP feed) rather then having to wait for the entire document to be retrieved before getting to the parsing stage. Because the parser does not work with the entire document, it cannot perform parsing optimizations that result from knowing the document’s structure. This means that, for small XML documents that do not require much memory, using DOM to create a DOM tree would actually be faster. To handle the SAX events in PHP, you need to specify a function or a class method that will be used to handle this particular event. At minimum you need to handle at least three events: tag start, tag end and tag data, as shown in Listing 1. The parsing process does not return any data—the job of collecting data and returning it in some way is left to the event handler functions. Those functions will need to determine what tag they are operating on, what event is being processed and use global variables
November 2004
●
PHP Architect
●
www.phparch.com
or class properties to store the data while the document is being parsed—you can see an example of this in Listing 2. Here, the tag_start_start function creates a temporary array variable and stores the name of the current tag, including its attributes (if available), into it. It also initializes the third array element, which later will be used to store the tag’s data. If the tag contains any attributes, they will be stored inside the $attr variable; these will be included in an array in which every element constitutes an attribute whose name is stored in the key. If no attributes are available, the variable will contain NULL. Listing 3 shows the data handling function, which simply appends any data passed to it to the data element of our temporary array. It is important to append the data, rather than to assign it directly, since if the
Listing 1 1 2 3 4 5 6 7 8 9 10 11
Listing 2 1 2 3 4 5 6 7 8 9
Listing 3 1 2 3 4 5 6 7 8 9
Listing 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14
33
FEATURE
PHP 5 & XML
tag’s data is very large or contains new lines it may be returned in portions. Thus, appending ensures that all of the data for the tag will be fetched, rather than just the last portion. When an end of the tag is encountered, the XML parser will call our tag_end_func function, shown in Listing 4. Here, we place the data inside the $final global variable, which at the end of the parsing process will contain the data from the XML document. The tag name is used as the key of the return array and itself points to an array of values, since a tag is likely to occur more the once. The value stored contains all the attributes of a tag and their value. Once the data has been stored, the temporary $tmp_data array is destroyed, so that its values will not conflict with the next tag. When Simple and XML Meet As you can imagine, this does not make for a very simple implementation, which is why very few developers actually like SAX. Unfortunately, in PHP 4 the alternative—that is the DOM extension—had, until very recently, a variety of serious problems, which prevented it from being a true alternative to XML/SAX. Additionally, because PHP code is heavily involved in the parsing process, it does not make for a very fast parser, since every tag results in at least three PHP function calls. So, it is hardly surprising that developers spent the time needed to improve this process for PHP 5! One of the new extensions designed specifically for fast and simple parsing of XML documents is SimpleXML. This extension heavily relies on the new object-oriented features of PHP 5 to make the process of parsing XML documents as trivial as possible. Unlike the XML extension, this extension uses the Document Object Model (DOM) approach to parsing documents. This means that, in order to process a document, the entire contents of the document must be retrieved— and only then will a DOM tree representing the document be created in memory. This is quite fast for small documents, which don’t require too much memory and can be quickly retrieved even when they are stored on remote servers. However, when dealing with an XML document whose size is in the tens of megabytes, this may become quite slow due to the significant memory requirements. Fortunately, not many people encounter such large XML files, so for the most part this is a perfectly acceptable limitation.
Parsing a document with SimpleXML could not be any simpler (no pun intended). As you can see in Listing 5, it only takes a single function call, which takes a single parameter that contains the XML data to be parsed. This data can come from a local or remote file or even a text string. The return value of SimpleXML’s simplexml_load_*() functions is an object that represents the supplied data. You can now access that data as if it were a regular PHP object, with each XML field stored in a separate property of matching name. The attributes become the array keys of the XML field to which they belong to, as shown in Listing 6. In the event that there are multiple instances of a particular field, you can use to foreach() language construct to iterate through the XML elements as if they were an array and retrieve the data in the form of an object. It should be noted that while foreach() iterations will work, you cannot use most of PHP’s array functions on objects created by SimpleXML. In the example shown in Listing 7, another OO feature of PHP 5 is being used: while $a is actually an object, SimpleXML implements __toString(), which allows you to “print” the object and get the value of the data stored inside it. In PHP 4, this would output some useless text, completely irrelevant to the needed output. This is one of the reasons why SimpleXML cannot be back ported to PHP 4, where the Zend Engine 1 simply lacks the functionality this new extension needs. Another neat feature of SimpleXML is the ability to access a particular element of the XML document without having to iterate your way to it. In fact, this can be done in more than one way. The simplest and fastest Listing 6 1 2 3 4 5 6 7 8
Listing 7 1 2 3 4 5 6 7 8
Listing 5 1 2 3 4 5 6 7 8
Listing 8
November 2004
●
PHP Architect
●
www.phparch.com
1 2 3 4 5 6 7
34
FEATURE
PHP 5 & XML
approach consists of using indirect object references to simply access the desired element from the DOM tree. The example that you can see in Listing 8 will make PHP print the value of the second element, which is contained within the element, itself a child of the element. SimpleXML is intelligent enough to figure out that, when the array key is numeric, I am actually attempting to access a particular element rather than one of its attributes. The one thing to be aware of is that, if you access a non-exist element, SimpleXML will not print any warning message—your output will simply be NULL. Therefore, in order to verify that the element exists, you should always use the empty() construct. Even if, for whatever reason, the value of the element is blank, you’ll still have an object as the result, meaning that empty() will return false, indicating that the element that you’ve tried to access does in fact exist. XPath Another way to retrieve a particular element from a DOM tree is by using the XPath query language that SimpleXML supports. XPath is language—ratified by W3C—for addressing parts of an XML document. This language allows accessing an XML document as if it were a file system and each element were either a file or a directory. It also supports the ability of searching for a particular element inside the document regardless of its location. The example in Listing 9 will access the text contents of the second element; please note that in XPath, unlike what happens with PHP, the element count starts from one and not zero. The output of the xpath() method is going to be an array of matching entries, unless nothing can be matched, in which case a Boolean value of False will be returned. Each element of the array is an object containing the value of the matching element—in this case, the value will be “Two”. As an alternative, you can try searching an XML document for an element whose attribute value matches the one you are looking for, as shown in Listing 10. In this case, SimpleXML will use libxml2 facilities to search for an element inside the document and isolate the element (or elements) that contain the id attribute and whose value is equal to 3. The one thing to remember when using the XPath query language is that searching through an XML document—especially a large one—is not a very fast operation. Therefore, if you know the location of the data, it would be far more efficient to reference it by hand from the DOM tree object. Aside from data retrieval, SimpleXML can also be used to modify the existing XML documents. However, you can only modify existing elements and not add new ones. The process of modifying the data is quite simple; all you do is assign an alternate string or integer November 2004
●
PHP Architect
●
www.phparch.com
value to the property (element value) or the array key (attribute). The modified document can then be returned as an XML string, as shown in Listing 11. The Document Object Model Extension While SimpleXML cannot add new elements to an existing document, you can convert SimpleXML objects to DOM-extension objects by calling the simplexml_import_dom function. The DOM extension can then be used to perform any number of modifications to the converted object and, if necessary, convert it back to a SimpleXML object via the dom_import_simplexml function. This is an advantage of having both extensions use a DOM tree to represent the XML document while utilizing the same underlying library. This brings us to another new XML extension in PHP 5: the DOM extension, which is a complete rewrite of the DOMXML extension that was available in PHP 4. The new extension, like SimpleXML, makes use of the new PHP 5 OO features and addresses some of the limitations found in the older code, as well as introducing new functionality. Most importantly, it suffers from none of the problems that plagued the DOMXML extension in PHP 4, such as memory leaks. The new interface makes it much simpler to work with XML documents, which you can create, modify, join and even validate. The extension supports the ability to work with HTML documents, even if they are not well formed, which gives you the ability to easily parse HTML without writing a custom parser. While the DOM extension can be used to parse documents, the process is not quite as simple as it is with SimpleXML, which is why interoperability with the latListing 9 1 2 3 4 5 6 7
Listing 10 1 2 3 4 5 6 7
Listing 11 1 2 3 4 5 6 7 8 9
35
FEATURE
PHP 5 & XML
ter is very handy feature. The extension is primarily intended for manipulating or creating XML documents rather than just parsing them. As you can see in Listing 12, the first step when working with the DOM extension is the instantiation of the domDocument object. After that, you can load an existing XML document from a string by calling the loadXML method or from a file with the load method. If you are working with HTML, you should use different method names, since HTML does not use syntactical rules as strict as XML’s (XHTML does, but then again you can parse an XHTML document as if it were written in XML, of course). The methods you want to use in this case are loadHTML() and loadHTMLFile(), for loading data from strings and files respectively. The loading process is not only going to retrieve the document, but also parse it and generate a DOM tree that can be then used to access the underlying data. Speaking of the data, this is fetched by calling the getElementsByTagName method, which takes an element name and returns a DOMNodeList object that can be iterListing 12 1 2 3 4 5 6 7 8
Listing 13 1 2 3 4 5 6 7 8 9 10
Listing 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
November 2004
●
PHP Architect
●
www.phparch.com
ated through, just like in SimpleXML. The variables resulting from the iteration are DOMElement objects whose properties contain information about the element they represent. For example, the value of the element is available through the nodeValue property, while other properties offer information about the children, siblings and parents of the current node, its namespace information, and so on. If an element contains an attribute, the latter can be retrieved by calling the getAttribute method of the DOMElement object. This method will always return a string, containing the value of a specific attribute. If the attribute does not exist, an empty string will be returned, so you should check the output with empty() if you want to validate an attribute’s availability. Alternatively, you can use the hasAttribute() method of DOMElement to check if an attribute is available (Listing 13). The DOM extension also allows access to an XML document via iteration through the internal DOM tree that was created based on its contents. This can be very useful if you want to build a visual representation of the structure and data of a particular XML document, which can then be used to determine what information is available—just like I did in Listing 14. If traversing the DOM tree is not your cup of tea, the DOM extension, like SimpleXML also supports the XPath query language, which provides a much simpler mechanism of accessing XML data in a random fashion. The neat thing about XPath in DOM is that it can be used to access HTML elements in a non-well formed documents, which gives you a very simple shortcut to getting data from HTML pages. As you can see in Listing 15, unlike what happens with SimpleXML, the DOM extension requires a DomXpath object to be instantiated based on the DOM tree object in order to be able to run XPath queries. The queries themselves are executed by calling the query method of this object, which takes the standard XPath query as its sole parameter. In this case, I want to search for a element anywhere on the page and retrieve its text value. The result is another object, of type DOMNodeList, which contains items matching the query. For simplicity’s sake, I indirectly reference the title of page which will reside in the first element; had there been more than one title element, I could have iterated though the node list with foreach(), or Listing 15 1 2 3 4 5 6 7 8 9
36
FEATURE
PHP 5 & XML
assigned the object returned by query() to a variable that would be used elsewhere. The resulting item is a DOMText object, with a value property containing the text title of PHP.net’s front page. Aside from reading documents, the DOM extension can also be used to validate them, which can be useful tool for checking any XML you generate. The validation routines can perform simple XML validation, to verify that the document is properly structured, contains the appropriate DTDs and does not include any invalid characters. Self-contained validation is performed by calling validate method of the DomDocument object, which returns a Boolean value indicating whether or not a document is valid. You also can validate a document based on a specified schema by calling the schemaValidateSource method of DomDocument, which takes a string argument containing the schema. Alternatively, you can load the schema from a file using the schemaValidate method. Both of these methods will return a Boolean value indicating whether the previously loaded document is compliant with the specified schema. You can also perform RelaxNG validation based on a string or a file using the relaxNGValidateSource and relaxNGValidate methods respectively; as with the rest of the validation functions, the return value will be a Boolean indicating the document’s validity. If you take a look at Listing 16, you’ll notice that Listing 16 1 2 3 4 5 6 7 8
Listing 17 1 2 3 4 5 6 7 8 9 10 11
Listing 18 1 2 3 4 5 6 7 8 9 10
November 2004
●
PHP Architect
●
www.phparch.com
before the document is validated, the track_errors INI option is enabled. The effect of this setting is that any error messages raised from that point on will be stored inside the $php_errormsg variable. This is a very important step, since the validation methods will output XML parsing errors in the form of PHP warning messages. By enabling error tracking, you give your script the means to access this information. You will have also noticed that I placed the error blocking operator @ in front of my call to the validate method. In most cases, when performing validation it’s not necessary to have any errors printed to the screen or logged to a file, which is why these two tricks are necessary. The real strength of the DOM extension is not in reading XML documents, or even validating them, but rather in creating, modifying and even merging them. Take a look, for example, at Listing 17. As with other DOM operations, the first step is the initialization of the domDocument object and loading of an XML data string or file. Then, the good stuff starts: first of all, we need to create a new element by calling the createElement method, which takes, as its only parameter, the name of the element. The result of this operation is a DOMElement object, to which we can now add attributes via the setAttribute method, which takes two parameters—the attribute’s name and its value. To add the actual content, a text node needs to be created by calling the createTextNode method of the domDocument object; this method takes the desired text value as its only argument. The resulting DOMText object can then be appended to the element we’ve created by calling appendChild(). Naturally, the children elements do not have to be text—they can just as easily be other elements with child elements of their own. Once element b is complete, it is time to add it to the DOM tree. The element should be a child of the <xml> element, which is why appendChild() is called from the documentElement object and not directly from the domDocument object. At this time, the modified XML can be returned in the form of a string by calling the saveXML() method, or written to a file by calling save(), which takes the destination filename as a parameter. Of course, you are not limited to modifying existing XML documents—it does not take much to make the DOM extension generate a completely new XML. The primary difference is that now it is up to you to create the root node, assuming one is needed. In Listing 18, you’ll notice that, when creating a new document, I pass an argument to the constructor for my domDocument object; this argument is optional and is used to specify the XML version for this document. Generally speaking, it is not necessary to use it; however, well-formed XML documents should explicitly specify their version. This option will do just that for the document generated by our little script. Next, we cre-
37
FEATURE
PHP 5 & XML
ate a root element; this will be the document’s primary element and will be parent (or ancestor) to any other element stored within it. A child element is then created and a text value added to it. Now, the DOM is tree is built by appending the child element a to its parent element; the root element, in turn, is appended to the document itself creating the desired XML structure that will look like this: Test
Templating with XSL Given the flexibility of DOM extension, it is really unsurprising that the extension is used as the engine of two other XML extensions introduced by PHP 5. One of those is the XSL extension, which is a complete rewrite of the XSLT extension found in PHP 4. XSLT can be (simplistically) defined as an XML-based templating language, where the XSLT style sheet, an XML file itself, defines how the data should be output and another XML file is used as a the data source. The PHP 5 XSLT interface supports extensions to EXSLT (www.exslt.org), which improves on the basic functionality as defined by the XSLT spec. However, this functionality is not always available and the xsl_xsltprocessor_has_exslt_support() function should be used to determine whether your PHP install supports this functionality. The extension abilities include support for transformation to HTML, XML and even DOM. The latter allows for quick validation of the generated output before delivery without having to re-parse the XML data. The interface of the extension is quite simple, but it should be noted that it is not compatible with the PHP 4 XSLT extension. The first step in using the DOM extension consists of parsing both the data source XML file as well as the XSL style sheet, as shown in Listing 19. Now that the two DOM objects containing the data are available, the XSLT class, XsltProcessor, can be Listing 19 1 2 3 4 5 6 7 8
Listing 20 1 2 3 4 5 6 7
November 2004
●
PHP Architect
●
www.phparch.com
instantiated and the style sheet loaded by calling the importStylesheet method of this object. At this point, all you need to decide is what format your desired output will be in and call the appropriate method with the XML data object as the parameter. HTML and XML outputs can be generated by calling the transformToXML method (as I do, for example, in Listing 20), while a domDocument object can be retrieved by calling transformToDoc(). The output can be written directly to a file or a remote source such as an FTP site by calling the transformToUri method, whose second parameter specifies the path where the output should be saved to. Other features of the XSL extension include the ability to allow the XSLT style sheet to specify PHP functions to be executed when particular elements or data of the source XML document is encountered. Because of the inherent danger of allowing PHP execution, you need to explicitly enable it by calling the registerPHPFunctions method. Generally speaking, enabling this functionality is highly unadvisable for externally supplied XSLT style sheets, as it could be a source of security vulnerabilities if the feed is compromised—not to mention the performance impact that it could have. The extension also supports a procedural interface for those who don’t like to use objects, which offers the same functionality without all the OOP candy.
“The creation of a SOAP server is not much more difficult than initializing a SOAP client...” And Then There Was SOAP The last—but certainly not least—XML extension is the SOAP extension, which, like the XSL extension uses DOM as its underlying XML-handling engine. This extension allows a very simple and convenient interface for communicating and even creating SOAP web services. It is a big improvement over the PHPbased userland classes that, before the advent of PHP 5, were the only way to work with SOAP. This is a significant development, as SOAP web services are very common and many big data providers, such as Amazon and Google, tend to use them as their preferred external API interfaces. The improvements that you can expect from using the extension compared to the old userland code range from increased speed and stability to increased functionality. While the extension itself is still considered experimental and every release since its introduc-
38
FEATURE
PHP 5 & XML
tion has fixed a few bugs, it is rapidly moving towards stability—and increased acceptance will only accelerate this process. Perhaps the best feature of this extension is that it makes working with the normally complex SOAP protocol extremely easy, even for someone who has never used it before. Most SOAP web services offer a WSDL file that provides a description of the methods provided by the service, what arguments are accepted by those methods and, finally, what type of output you can expect. When PHP’s SoapClient class is instantiated with the WSDL path as the parameter, it will parse this file and make the methods supported by this web service available via the newly created object. Because the WSDL provides information about the supported arguments, it allows PHP to validate the data passed to the methods and prevent invalid input from ever going out the wire. As you can see in Listing 21, using the SOAP extension is trivial. In this instance, a WSDL downloaded from www.xmethods.net is used to describe a web service that provides stock quotes. Technically, I could have accessed the WSDL file directly from the Xmethods website, which provides information about a large number of SOAP web services. For the sake of performance, however, it is better to use a local WSDL file to reduce network activity. Once the SoapClient object is created, I can use it to call methods provided by the web service—ggetQuote() in this case. Because parsing WSDL can be a rather slow process and these files rarely change, the SOAP extension pro-
vides a caching mechanism that will save the parsed web service definition for further runs. The caching can be enabled by setting the soap.wsdl_cache_enabled php.ini entry to 1. Further settings allow the control over the cache age (ssoap.wsdl_cache_ttl) and allow you to specify where the cached data should be stored on disk (ssoap.wsdl_cache_dir). Because it leads to significant speed improvements, this functionality is enabled by default and WSDL files will be cached for a period of one day inside the /tmp directory (or its equivalent on your operating system). The creation of a SOAP server is not much more difficult than initializing a SOAP client, as long as you don’t need to generate a WSDL file. At this time, the extension does not support any functionality that would automate this process, meaning that you’d need to write it manually or find a PHP class capable of doing so on your behalf. Fortunately, you can make do without WSDL— and PHP’s SOAP extension will happily let you create and connect to SOAP services without it. The first step to creation of a SOAP server is specifying a class whose methods will be offered through the web service. As you can see in Listing 22, after doing so we create the server by instantiating an object of the SoapServer class. If we had a WSDL definition, we could pass it as the first argument to the constructor; however, because we don’t, we simply set it to NULL. As the second parameter, an option array is used to specify the mandatory URI for the web service, then our class, filesystem, is set as the method provider by calling the setClass method of $soap_s. At this point, all that’s left to do is to call the handle method, which will make the web service listen for requests sent via POST by reading the HTTP_RAW_POST_DATA variable. If the data is coming from a different source, then it will need to be supplied to handle() in the form of a string argument. The client for our WSDLess service is a just slightly different to account for the fact that, as a provider, we don’t offer a detailed description of our web service. Naturally, in this case PHP will also be unable to validate the data sent to the web service.
“One of the new
extensions designed specifically for fast and simple parsing of XML documents is SimpleXML.”
Listing 21 1 2 3 4 5 6
Listing 22 1 2 3 4 5 6 7 8 9 10 11 12 13
November 2004
●
PHP Architect
●
www.phparch.com
Listing 23 1 2 3 4 5 6 7 8
39
FEATURE
PHP 5 & XML
As you can see in Listing 23, here, too, because there is no WSDL file available, the first parameter to the constructor of SoapClient is set to NULL; therefore, it is up to the option array to supply the necessary information about the SOAP server. The two mandatory values in this case are the URI, which must be the same as the one set by the server, and a location, which tells the client where to find the server. Lack of WSDL also means that we have no information about the type of the parameters, so we cannot rely on automatic parameter encoding. Therefore, we need to manually encode the parameters by using the SoapParam class, which is instantiated with two arguments, corresponding to the value and name of the variable to be sent to the server. The resulting SoapParam object can then be passed to the ls method (supposedly) provided by the server, which, when executed, will return the desired output (hopefully). But Wait, There’s More! Aside from the extensions I mentioned in this article, further work on XML parsing is in progress inside the PECL repository. Two noteworthy projects currently under way are XMLReader and XMLWriter, which are geared towards
providing an extremely efficient way of writing and reading XML files. Perhaps, at some point in the future they will replace the legacy XML extension that is still by far the most common way of working with XML. This concludes a not so brief overview of the XML extensions in PHP 5. Hopefully, if you’re planning to start a new PHP5 project, this new functionality will come in handy, while if you’re planning to port an existing application from PHP 4 to PHP 5, it might be just the incentive you need.
About the Author
?>
Ilia Alshanetsky is a senior software engineer at Advanced Internet Designs Inc, a company specializing in development of web based solutions such as FUDforum, a high performance open source bulletin board. He has contributed in a number of ways to the PHP project, including writing code for the SQLite, Shmop, StatGrab and other extensions, providing countless bug fixes and by being the release manager of PHP 4.3.X series. These days, Ilia can be found giving talks and writing on a variety of PHP topics. You can reach him at
[email protected] To Discuss this article: http://forums.phparch.com/183
FavorHosting.com offers reliable and cost effective web hosting... SETUP FEES WAIVED AND FIRST 30 DAYS FREE! So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions. Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!* - Strong support team - Focused on developer needs - Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications. Please visit http://www.favorhosting.com/phpa/ call 1-866-4FAVOR1 now for information.
November 2004
●
PHP Architect
●
www.phparch.com
40
F E A T U R E
Practical Caching for the PHP Developer
Hiding Your Sins by Allen Smithee
After a code acceleration (also called an opcode cache), caching is one the easiest way to make an existing website faster by saving the result of a time-consuming operation in a temporary storage medium from which it can be retrieved until a particular event takes place. This article illustrates a practical approach that requires no external dependencies.
N
o matter how many times I see a discussion about performance on a PHP forum (or, for that matter, on most developer forums), I never seem to find anybody who offers caching as a simple solution to the problem of a script that takes a long time to execute. Naturally, optimization is the best solution—and, often, the only one—but, let’s face it, developers don’t always have the time to put out all the fires at once. If something can be accelerated by introducing a simple cache, it should—that will give you the time to either look after more urgent matters or, more importantly, optimize your code carefully. Optimization is a bit of a dark art under the best of circumstances— and can rapidly turn into an international disaster if performed in a hurry. Also, I like to think of caching as a solution to a developer’s problem, rather than a website’s. Let me explain: your site can be slow either because your scripts inherently take a large amount of time to execute (relatively speaking, of course), or because the traffic that it receives is higher than what it can manage. If your site has slowed down because of a surge in traffic—what I would call a website problem—and your scripts are well-optimized, then caching won’t do much for you, except, perhaps, help you by decreasing the number of times your hit external bottlenecks, such as a database. You should, instead, turn to other options, such as opcode caches and optimizers—and, ultimately, more hardware to share the load across multiple machines. If, on the other hand, your scripts are not well-optimized, as is often the case with pretty much any website, caching can lend a helping hand by buying you
November 2004
●
PHP Architect
●
www.phparch.com
the time you need to figure out what’s wrong with your code without management constantly screaming in your ears. In some cases, your scripts may be slow even if it isn’t your fault—for example, if one of your pages depend on retrieving information from an external source, such as a third-party advertising agency, there isn’t much you can do to improve their performance, and caching the content may be the way to go. Before I get too far deep into the article, I should point out that there are some excellent facilities that you can use to add caching capabilities to your project without requiring a special package for the purpose. PEAR::Cache and PEAR::Cache_Lite are two excellent examples. You may also want to check out Bruno Pedro’s excellent article, Caching Techniques for the PHP Developer, which appeared in the March 2004 issue of php|architect; Bruno covers a wider variety of caching techniques than I will in this article, and his code makes good use of PEAR’s libraries. The Anti-PEAR Although I love PEAR, I can’t use it in every project— and I certainly don’t feel like introducing it into an existing one for the sole purpose of providing a caching
REQUIREMENTS PHP: Any OS: Any Other: Shared Memory (shm) extension Code Directory: sins
42
FEATURE
Hiding Your Sins
mechanism. Given that I usually myself adding caching to a website because it’s exhibiting some sort of problem, I prefer to come up with a simple, self-contained library of my own. The basics of caching are quite simple. A cache is, simply, a mechanism capable of intercepting the output of a particular action (such as running a script), save it in a temporary location that can be accessed rapidly and then use the saved data during subsequent requests until a particular event occurs that resets the entire mechanism. Most people seem to think that a cache should be reset based on a time-limit, but this doesn’t necessarily have to be the case. For example, if you’re caching data based on the contents of a file, you may want to check that file for any changes and reset your cache only when they occur. Better yet, you can combine the two things and only check whether your file had changed every so often; this will likely make your mechanism perform better, since accessing the disk is not a trivial operation in terms of speed. Perhaps the most important aspect of a cache is where it stores its data. Since one usually implements a caching mechanism for performance-improvement purposes, it follows that one would want to choose a temporary storage medium that’s as fast as possible. RAM is definitely fast, and it is made convenient by the shared-memory (SHMOP) extension that is part of PHP. Unfortunately, RAM is also a very limited resource and SHMOP is not supported on Windows unless you’re running PHP as a module from within a web server (be it Apache or IIS). Therefore, we’ll need a backup mechanism to be triggered under two circumstances: either when we’re running on Windows or if we are in a situation where conserving memory is of critical importance. I’m sure that the words “file on disk” have come to mind right away—and, in a way, that’s neither uncommon, nor a bad idea. Having to access a file on a disk may seem like a bit of an expensive proposition—at least as far as performance goes—but, if your operating system is properly tuned, it will automatically compensate for any inefficiency. It all boils down to this: if the script that requires your cache is run very frequently, then the OS will likely cache the file in which the data is stored, thus making accessing it faster. If it isn’t, then the impact of loading the file from disk will be minimal anyway and, therefore, it shouldn’t present much of a problem.
I know—this is a bit of an oversimplification because it doesn’t take into consideration other factors, such as just how busy your server is independently of your caching mechanism, or how many of your scripts use the latter. Remember, however, that I am presenting caching here as a quick-fix solution until you get the root of your performance problem and solve it by way of optimization. If you find yourself forced to use lots of cached material all over your website, you have a systematic problem that requires a well-thought-out solution: either you eliminate the need for the cache, or come up with a careful arrangement that takes full advantage of its capabilities without risking the server’s overall ability to respond to client requests. Let me make an example. Suppose your website pulls ads from an external source and, for some reason (for example, because the ad agency’s system generates different ads for different sections of your website), these ads have to be retrieved by your script as it’s generating each page, as opposed to simply pass along some Javascript code that does the job to the client. Now, this is clearly one of those examples in which the use of a cache does not depend on you making any mistakes—if the ad agency’s code is slow, or if the connectivity between your servers and theirs is anything but perfect, fetching the ads at every single page impression will introduce a slowdown that no optimization short of a caching mechanism on your part will reduce. Therefore, since you can’t kick the agency’s programmer into writing faster code, you decide to implement a caching mechanism—and immediately find yourself facing another dilemma: where should everything be cached? If you only had a single group of ads for every page on your site, you could simply cache it into shared memory and be done with it. If, on the other hand, the ads change potentially with every page on the site, you have to consider that caching every single group of ads could easily hog your RAM and take away precious memory from those processes that need it most—like, say, your web server itself. Even if you work exclusively on a UNIX box, therefore, it’s not a bad idea for a caching library to support both a memory-based and a disk-based storage medium. That way, you can use RAM for your most-frequently used cached objects, and files for those that are only needed less often.
“Most people seem to
think that a cache should be reset based on a timelimit, but this doesn’t necessarily have to be the case. “
November 2004
●
PHP Architect
●
www.phparch.com
Getting Started PEAR’s caching mechanism works by caching the out-
43
FEATURE
Hiding Your Sins
put of a function or an entire page. That’s undoubtedly useful, but I always like to have a little more granularity than that. It’s often possible to circumscribe a problem to a specific few lines of code, rather than an entire function, and being able to isolate that trouble spot can mean the difference between an efficient cache and adding more problems to the pot. For example, if the function that causes problems produces a combination of data that changes very often (such as what happens when you display a user’s login name) and other data that takes a long time to generate (such as ads from a third-party source), caching the function’s output is simply not possible—unless you want User A to see User B’s name in his pages. My solution to this problem is to simply provide a caching mechanism that is completely agnostic of the source of its data. At its most basic, our cache will, therefore, work on variables. As you can see from Listing 1, our core functionality is provided by two functions, cache_cache() and cache_get_cached_version(). As you can imagine, the former inserts the contents of a variable inside the cache, while the second attempts to retrieve them. The functions are pretty simple, mostly because they just act as stubs for the functions that perform the actual work associated with each specific storage medium. There are, however, a couple of things that are worth noting. First, the “preferred” medium for each particular caching operation is determined by the first letter of the ID associated with each cached variable. For example, if we use an ID of F-ADPAGE1, the resulting data will be cached in a file on disk, while M-ADPAGE1 would cause the same data to be stored in a shared memory segment. This may look like a bit of hack, but it isn’t if you consider that this caching library is designed as a drop-in addition to an existing website, where you want to have a quick mechanism for implementing a cache— therefore, the fewer lines of code you have to type in, the easier it’s going to be remove them once you’re done with your optimizations and you no longer need the cache. The dependence on a particular string format for the choice of caching media is also somewhat mitigated by the fact that the system performs a rather strict error control before allowing anything to happen. Speaking of hacks, you’ll note a couple of lines at the beginning of each function that check whether the PHP installation on which the library is being run supports shared-memory functions or not, and, if it isn’t, “hijack” the caching medium specification and force it to be on disk. This is, undoubtedly, a hack—the program overrides the developers and performs actions he is not expecting. However, if you’re dropping the library into an existing project, the SHM extension may not be compiled into the local PHP installation (and probably November 2004
●
PHP Architect
●
www.phparch.com
Listing 1 1
44
FEATURE
Hiding Your Sins
Listing 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92