This copy is registered to: Rodney Burruss
[email protected] TM
CONTENTS
Columns
Features 9
5 EDITORIAL
XMLWriter
Simplify XML creation
7
by ROBERT RICHARDS
18 Database Design Demystified
A step-by-step walk through the database design process by DEB WHITTEN
Scaling your apps to new levels by JOSEPH H. KOUYOUMJIAN
The Attributes of Properties by JEFF MOORE
All Your Session are Belong to Us! by ILIA ALSHANETSKY
59 PRODUCT REVIEW
CodeCharge 3.0
Yet Another IDE?
by PETER MacINTYRE
Distribute your code in convenient packages by TOBIAS SCHLITT
45 TEST PATTERN
52 SECURITY CORNER
27 PHP Clustering on Linux - Part 1 37 Using the PEAR Installer
news
64 exit(0);
Richard, You Just Scare Us! by MARCO TABINI
Download this month’s code at: http://www.phparch.com/code/
WRITE FOR US!
If you want to bring a php-related topic to the attention of the professional php community, whether it is personal research, company software, or anything else, why not write an article for php|architect? If you would like to contribute, contact us and one of our editors will be happy to help you hone your idea and turn it into a beautiful article for our magazine. Visit www.phparch.com/writeforus.php or contact our editorial team at
[email protected] and get started!
NEXCESS.NET Internet Solutions 304 1/2 S. State St. Ann Arbor, MI 48104-2445
http://nexcess.net
PHP / MySQL SPECIALISTS! Simple, Affordable, Reliable PHP / MySQL Web Hosting Solutions P O P U L A R S H A R E D H O S T I N G PAC K A G E S
MINI-ME
$
6 95
SMALL BIZ $ 2195/mo
/mo
500 MB Storage 15 GB Transfer 50 E-Mail Accounts 25 Subdomains 25 MySQL Databases PHP5 / MySQL 4.1.X SITEWORX control panel
2000 MB Storage 50 GB Transfer 200 E-Mail Accounts 75 Subdomains 75 MySQL Databases PHP5 / MySQL 4.1.X SITEWORX control panel
POPU LAR RESELLER HO ST I NG PACKA G ES NEXRESELL 1 $16 95/mo 900 MB Storage 30 GB Transfer Unlimited MySQL Databases Host 30 Domains PHP5 / MYSQL 4.1.X NODEWORX Reseller Access
NEXRESELL 2 $ 59 95/mo 7500 MB Storage 100 GB Transfer Unlimited MySQL Databases Host Unlimited Domains PHP5 / MySQL 4.1.X NODEWORX Reseller Access
: CONTROL
PA N E L
All of our servers run our in-house developed PHP/MySQL server control panel: INTERWORX-CP INTERWORX-CP features include: - Rigorous spam / virus filtering - Detailed website usage stats (including realtime metrics) - Superb file management; WYSIWYG HTML editor
INTERWORX-CP is also available for your dedicated server. Just visit http://interworx.info for more information and to place your order.
WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!
NEW! PHP 5 & MYSQL 4.1.X
php 5
4.1.x
We'll install any PHP extension you need! Just ask :) PHP4 & MySQL 3.x/4.0.x options also available
php 4
3.x/4.0.x
128 BIT SSL CERTIFICATES AS LOW AS $39.95 / YEAR DOMAIN NAME REGISTRATION FROM $10.00 / YEAR GENEROUS AFFILIATE PROGRAM
UP TO 100% PAYBACK PER REFERRAL
30 DAY MONEY BACK GUARANTEE
FREE DOMAIN NAME WITH ANY ANNUAL SIGNUP
ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS
De dicat e d & M an ag ed Dedica t e d s e rv e r s olu tion s als o av aila ble Serving the web since Y2K
EDITORIAL Volume 5 - Issue 5 Publisher Marco Tabini
Editor-in-Chief Sean Coates
I
Grown Up
f you didn’t make it to php|tek in Orlando at the end of April, I genuinely feel bad for you. You missed what turned out to be an excellent PHP conference; world class speakers and PHP experts, groundbreaking information on upcoming PHP-related topics, and best of all: face time with PHP experts. Every time I attend a PHP conference (I try to get out to a few each year), I’m reminded of how beneficial it is to sit down with the gurus who develop our favourite web tools, and talk shop. There’s definitely a certain value to having a face-to-face conversation with someone (one of a small group—I’d guess 20 people in the world) who simply gets Zend Engine internals. A forum that simply isn’t possible on a mailing list, by chat or even on a conference call. It’s far easier for a flamewar to erupt on the Internals list when the participants are $someLargeNumber kilometers away, and the receivers of a complaint can’t discern, from the provided text, the spirit in which it was made. Not to mention that it was fun. We had a number of great surprises in store for attendees, the highlight of which, at least in my opinion, was the Thursday-night cocktail party. I believe that our attendees were the first people in the world to ever the PHP logo manifested in ice. Yes, ice. What a great time. The educational parts of the conference were excellent, too. Our theme was $build->deploy->scale(); As PHP matures, and finds itself in more and more, larger and larger organizations, it is expected to fit holes that Rasmus never envisioned, back in 1995. Fortunately for us, PHP has adapted well to its changing role in the Web community. It has gone from a small set of convenient tools for hosting a personal home page, to serving millions (billions, even!) of page views for large companies and organizations like Yahoo! and Wikimedia. PHP has been called on to scale in new ways. This leads us to our series on Linux Clustering with PHP. As your applications increase in size and traffic, and as you and your organizations succeed, you’ll outgrow your hardware. A few years ago, the solution to this problem might’ve been to jump on the enterprise-behemoth bandwagon, but PHP scales just as well, and is a fraction of the cost to both develop and deploy. If you doubt me, take a look at the Wikipedia, which according to Alexa has become the 12th most trafficked site on the Internet— and it’s fully invested in PHP. Back to the conference: if you missed php|tek in Orlando, don’t fret too much. php|works/db|works is coming up in September, in Toronto. See you there!
Editorial Team Arbi Arzoumani Steph Fox Peter MacIntyre Eddie Peloke
Graphics & Layout Aleksandar Ilievski
Managing Editor Emanuela Corso
News Editor Leslie Hill
[email protected] Authors Ilia Alshanetsky, Joseph H. Kouyoumjian, Peter B. MacIntyre, Jeff Moore, Robert Richards, Tobias Schlitt, Marco Tabini, Deb Whitten php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material. php|architect, php|a, the php|architect logo, Marco Tabini & Associates, Inc. and the Mta Logo are trademarks of Marco Tabini & Associates, Inc.
Contact Information: General mailbox:
[email protected] Editorial:
[email protected] Sales & advertising:
[email protected] Printed in Canada Copyright © 2003-2006 Marco Tabini & Associates, Inc. All Rights Reserved
news PHPX 3.5.10
The PHPX team is pleased to announce their latest release, 3.5.10. What is PHPX? PHPx.com describes it as “a constantly evolving and changing Content Management System (CMS).” This latest version brings with it some exciting new features. Image Verification on registration, Cookie Authentication on Vox Box, faster Forum Search Feature, Trash (Deleted Item Recovery), Forum Ranks are fixed for images now and several more. Please keep the suggestions and bug reports coming! So go ahead and cruise over to the Project Page and download it! For more information visit: www.phpx.org
eZ Components 1.1beta1
eZ Systems has been hard at work on their PHP 5 enterprise platform, eZ Components. Tobias Schlitt posted, announcing the release of eZ Components 1.1beta1, well on the way to a new feature release: “Yesterday we released eZ components version 1.1beta1, a first step to the next feature release of eZ components. The first release does not contain all enhancements, yet, so a 2nd beta will definitely follow. Beside that, the Template engine will be released in about a week, which is IMHO the most awaited component for 1.1. Several great enhancements are already included in 1.1beta1, like sub-select and multijoin support for the query builder in our Database component, parsing mails with the Mail component and automatic help text generation for the ConsoleTools?. We also have 2 new components: File - which will provide convenience methods for dealing with files in PHP - and SystemInformation, which can be used to determine information about the underlying system that an application gets installed on.” You can get the latest beta release at: http://ez.no/community/news/ez_components_ 1_1beta1
Zend Framework 0.1.3 Released
A new preview release of the Zend Framework is now available for download from the newly designed Zend Framework site. The Zend Framework has been moving quickly towards
its 1.0 release. The release notes show fixes to many of the components, and Zend_Search_ Lucene being promoted from the incubator: • Zend_Filter is* methods return strictly TRUE or FALSE. (Chris) • Zend_InputFilter has test* methods for retrieving valid data. (Chris) • Fixed bug in Zend_View_Abstract::__ isset(). Reported by James Simmons. (Mike) • Zend_Db_Adapter_Pdo_Mysql:: limit() now compatible with MySQL versions prior to 4.0. Reported by Greg Neustaetter (Mike) • Fixed bug in Zend_Controller_ Dispatcher_Token::setParams(). Reported by Rob Allen. (Chris) • Fixed bug in Zend_Log::log(). Reported by Mislav. (Mike) • Updated Zend_Filter::isFloat() and Zend_Filter::isInt() to respect locale. (Chris) • Improved Zend_Db_Adapter_Pdo_ Mssql contributed by Rob Allen. (Mike) • Fixed bug in Zend_Controller_ Dispatcher::_formatName. Reported by Arpad Ray. (Chris) • Zend::dump() now works from CLI (Rob Allen) • Improved support for XMLRPC namespaces (Mike, Chuck) • Registry can now be tested with Zend::isRegistered (Shekar C. Reddy, Mike) • Zend_Search_Lucene promoted from incubator (Alex) • Zend_Cache has been accepted to the incubator (Fabien, Mislav) • Zend_Json testing expanded; covers all major cases (Matthew) • Fixed Zend_Json encoding of empty values (Matthew, Davey) • Fixed Zend_Json encoding of associative arrays (Matthew, Davey) • Fixed Zend_Json encoding of numeric indices in associative arrays (Matthew) • Removed formatting (newlines, tabs) from Zend_Json encoding methods (Matthew) • Fixed escaping in Zend_Json_Encoder (Matthew) • Zend_HttpClient moved to Zend_ Http_Client (Mike) • Zend_Console_Args in the incubator but not yet refactored (Jason Garber) • Zend_Mail enhancements in the
• • • • • • • • •
incubator by Austria Telekom (Nico, Clez) Zend_Service classes no longer subclass Zend_Service_Rest (Davey, Andi, et al) Zend_Service classes now use new Zend_InputFilter (Davey) Fixed bug in Zend_Service_Amazon:: itemLookup() (Davey) Fixed bug in Zend_Service_Flickr:: userSearch() (Davey) Fixed bug in Zend_Uri_Http::__ construct(). Reported by Adrian Gheorghe. (Mike) Improved some not wellformed PDF processing with Zend_Pdf. (Alexander) Minor Zend_Pdf documentation fixes. (Alexander) Fixed Zend_Pdf processing of inherited page attributes. (Alexander) Fixed Zend_Pdf umlauts support for standard fonts. (Alexander)
You can get the latest release http://framework.zend.com/download
at:
onPHP 0.2.13 & 0.4.3
onPHP.org announces the latest release of their “mature GPL’ed multi-purpose objectoriented PHP framework.” The release comes in two versions 0.2.13 and 0.4.3. Check out onPHP.org for more information.
PHP Object Generator 2.0.1
Looking for something to generate your OO code? Check out PHP Object Generator. According to phpobjectgenerator.com, “PHP Object Generator, (POG) is an open source PHP code generator which automatically generates clean & tested Object Oriented code for your PHP 4/PHP 5 application. Over the years, we realized that a large portion of a PHP programmer’s time is wasted on repetitive coding of the Database Access Layer of an application simply because different applications require different objects. By generating PHP objects with integrated CRUD methods, POG gives you a head start in any project. The time you save can be spent on more interesting areas of your project.”
Volume 5 Issue 5 • php|architect •7
Check out the hottest new releases from PEAR.
DB_QueryTool 1.0.3 This package is an OO-abstraction to SQL. It provides methods such as setWhere, setOrder, setGroup, setJoin, etc. to easily build queries. It also provides an easy to learn interface that interacts nicely with HTML-forms using arrays that contain the column data, that shall be updated/added in a DB. This package bases on an SQL-Builder which lets you easily build SQL-Statements and execute them.
Log 1.9.5
The Log framework provides an abstracted logging system. It supports logging to console, file, syslog, SQL, Sqlite, mail, and mcal targets. It also provides a subject observer mechanism.
Image_Puzzle 0.2.1
PEAR::Image_Puzzle divides an image into puzzle pieces. • Provides a few edge styles to generate puzzle pieces • Allow saving each piece to a separate file • Allow getting information about each piece’s coordinates, relative to the original image
Validate_ptBR 0.5.3
Package contains locale validation for ptBR such as: • Postal Code
Looking for a new PHP extension? Check out some of the latest offerings from PECL.
zip 1.3.1
Zip is an extension to create and read zip files.
hash 1.3
Native implementations of common message digest algorithms using a generic factory method.
• CNPJ • CPF • Region (brazilian states) • Phone Number • Vehicle’s plate
DB_Table 1.3.0
Builds on PEAR DB to abstract data types and automate table creation, data validation, insert, update, delete, and select; combines these with PEAR HTML_QuickForm to automatically generate input forms that match the table column definitions.
HTML_Table 1.7.0
The PEAR::HTML_Table package provides methods for easy and efficient design of HTML tables. • Lots of customization options. • Tables can be modified at any time. • The logic is the same as standard HTML editors. • Handles col and rowspan. • PHP code is shorter, easier to read and to maintain. • Table options can be reused. For auto filling of data and such then check out http://pear.php.net/package/HTML_ Table_Matrix
Mail 1.1.10
native mail() function, sendmail, and SMTP. This package also provides a RFC822 email address list validation utility class.
PEAR_PackageUpdate 0.4.3
PEAR_PackageUpdate (PPU) is designed to allow developers to easily include auto updating features for other packages and PEAR installable applications. PPU will check to see if a new version of a package is available and then ask the user if they would like to update the package. PPU uses PEAR to communicate with the channel server and to execute the update. PPU allows the end user to take some control over when they are notified about new releases. The PPU Preferences allow a user to tell PPU not to ask about certain types of releases (bug fixes, minor releases, etc.), not to ask about certain release states (devel, alpha, etc.), not to ask until the next release or not to ask again. PPU is just an engine for package updating. It should not be used directly. Instead one of the driver packages such as PEAR_PackageUpdate_Gtk2 should be used depending on the application or other package.
Console_ProgressBar 0.4.0beta
PEAR’s Mail package defines an interface for implementing mailers under the PEAR hierarchy. It also provides supporting functions useful to multiple mailer backends. Currently supported backends include: PHP’s
Console_ProgressBar allows you to display progress bars in your terminal. You can use it for displaying the status of downloads or other tasks that take longer periods of time.
mqseries 0.10.0
language and charset, as well as a convenient way to send any arbitrary data with caching and resuming capabilities. It provides powerful request functionality, if built with CURL support. Parallel requests are available for PHP-5 and greater. PHP-5 classes: HttpUtil, HttpMessage, HttpRequest, HttpRequestPool, HttpDeflate Stream, HttpInflateStream, HttpQueryString. PHP-5.1 classes: HttpResponse
This package provides support for IBM Websphere MQ (MQSeries).
pecl_http 1.0.0RC3
This HTTP extension aims to provide a convenient and powerful set of functionality for one of PHPs major applications. It eases handling of HTTP urls, dates, redirects, headers and messages, provides means for negotiation of clients preferred
Volume 5 Issue 5 • php|architect •8
FEATURE
PHP 5 introduced a number of ways of working with XML. The only problem, however, was that in order to create XML, you were either required to master the complexities of DOM or fall back to the old standby of manually creating XML using strings. Each of these methods has their drawbacks, which eventually lead to the creation of the XMLWriter extension.
XMLWriter XMLWriter simplify xml creation
X
ML, no matter what you may think of the technology, is not something that is going to be going away and time soon. Every day you see more and more RSS and Atom feeds popping up, content being delivered through podcasts, and APIs being exposed through web services. If you haven’t worked with XML by now, it’s just a matter of time before you get assigned a project that requires you to create XML documents. Before thinking about trying to tackle the complexity of working with DOM, or possibly using SimpleXML—yes, it is now possible to create documents, to a certain extent—or even thinking about manually creating XML using strings, you owe it to yourself to check out XMLWriter; it may be exactly what you are looking for. XMLWriter is a fast, simple and lightweight API for creating XML documents, and assuring that they are well-formed. Modeled after the C# implementation of the XmlTextWriter and XmlWriter classes, the XMLWriter extension streams the serialized document directly to a URI, using PHP streams, or to a buffer, allowing the
by ROBERT RICHARDS
XML to be returned as a string. The API is also easy and intuitive to use. Not only can XMLWriter ensure that the XML is well formed—and I stress can here, as you will see further in the article—but you the methods it offers are used in a similar way as to how you would logically think about creating a document.
PHP: 4.3+ SOFTWARE: libxml2-2.6.0+ LINKS: http://www.xmlsoft.org; http://pecl.php.net/package/xmlwriter
CODE DIRECTORY: xmlwriter TO DISCUSS THIS ARTICLE VISIT: http://forum.phparch.com/302
Volume 5 Issue 5 • php|architect • 9
XMLWriter
The Original Problem A few years ago, I was tasked with creating an export tool to dump hierarchical data from a database into XML documents. This was not a simple task because the structure of the database and exported data was complex. It was not a simple dump into an XML document but into multiple documents. It needed to be done in a single process. It needed to be done quickly, and finally, system resource usage had to be kept at the absolute minimum. Before you bring up the “Hardware is cheap” line, which
hundred times, the processing time was just not acceptable. The next attempt at a solution was to manually create the XML using strings. While memory usage and processing time were better than the first attempt, a whole new set of problems appeared. First, there was no way to determine if the XML being created was well formed. DOM at least issues errors when attempting to create invalid XML. Handling special characters using a function like htmlspecialchars() only gets you so far. Element and
XMLWriter is a fast, simple and lightweight API for creating XML documents, and assuring that they are well-formed. I hear quite often, I know there are more people than just myself who work at small and/or startup companies where expenses are kept to an absolute minimum. Much of the hardware I had to work with, for behind the scene processes at least, consisted of old workstations “converted” into servers. It’s the old mentality of working with what you’ve got and where ingenuity is found. Back then, in the pre-PHP 5 world, I had the choice of using domxml or creating XML manually, using strings. To put it in more of a modern day perspective, I will explain the issues I ran into, in the context of DOM, SimpleXML and strings within PHP 5. Being an XML guy myself, I of course attempted to first solve this problem using DOM. Given a document only containing elements and attributes and the learning curve required to use DOM, you may lean towards attempting this with SimpleXML using its new features found in PHP 5.1.3. In either case, the output I needed to create contained a couple hundred XML documents each time the export takes place and this is where the problems with these parsers began. Both DOM and SimpleXML are tree-based parsers, so creating a serialized document requires the document first be built as an in-memory tree and then serialized, creating additional overhead in both the amount of memory and time required for the process. To keep memory manageable, the process consisted of creating the document, serializing it, and then using the memory management of XML documents in PHP 5, unloading the document from memory. Having to do this a couple
attribute names also have their own eccentricities that must be taken into account. Add to this the creation of a complex document. All elements must be properly closed, so having to keep track of every single opening and closing tag became a real nightmare.
XMLWriter is Born With PHP 5 in its early stages of development and libxml2 being used as the basis for all of the XML based extensions, I came across the XMLWriter API that gave me hope that the task at hand was actually achievable. Just like using strings with fwrite(), the XML is sent directly to the output destination, so the memory constraints were not a problem. Dealing with special characters was no longer an issue because text content is automatically escaped. Trying to create elements and attributes with invalid names results in errors that can be handled, preventing the creation of a badly formed document. One of the nice features of the extension is the fact that no matter how deep within the document you are, with a single function call, every single open element tag can be properly closed. As if all these features were not enough, to top it all off, being a native C extension, it is very fast at creating documents. Now that you know all the benefits this extension has to offer, you are probably wondering how to get it installed. For versions of PHP less than 5.1.2, XMLWriter is found in the PECL repository and can be installed using either pear install xmlwriter or pecl install xmlwriter; Volume 5 Issue 5 • php|architect • 10
XMLWriter depending on the version of PEAR you have installed. Windows users can simply obtain the pre-compiled binary from http://pecl4win.php.net/ext.php/php_xmlwriter.dll and enable it within the php.ini file. If you happen to be running PHP 5.1.2 or greater, you will be happy to know that XMLWriter has been added to the core PHP distribution and enabled by default, so unless it has been explicitly disabled, it should already available to you. Before we jump right into actually using the extension, there is one more piece of information you should be
Once the writer has been properly initialized, there is one more aspect you can control, and that is whether or not the writer performs automatic formatting as well as the character string used for the indentations. Although the XMLWriter extension only requires libxml 2.6.0, this formatting feature requires version 2.6.5, otherwise the functions are not available from the extension. By default, the writer does not perform any automatic formatting. The XML is produced exactly as you tell it to be created. Although not easily readable by a human, there are no
XMLWriter has been added to the core PHP distribution and enabled by default, as of 5.1.2. aware of. Because this extension was originally written for PHP 4.3.x and later expanded for PHP 5, it has a procedural interface for PHP 4.3+—this also includes versions of PHP 5—as well as an object oriented (OO) interface only available when running PHP 5+. Both styles will be demonstrated within this article, so if you happen to still be running PHP 4.x, all examples written using the OO style must be converted to procedural style—which is not difficult at all—before they will run on your system. With all the background and formalities now behind us, let’s finally take a look at XMLWriter in action.
Setting up the Writer The first step to creating a document is the creation of the XMLWriter resource or object, to be known as the writer, which is dependent upon whether the output is being sent to a URI, $writer = xmlwriter_open_uri($uri), or buffered in memory, $writer = xmlwriter_open_memory(). When working with the object oriented interface under PHP 5.x, there is a slight difference, as the XMLWriter object must first be instantiated and then the destination output set. For example, the following snippet of code is used to create the object and specify an in memory buffer.
potential concerns about any whitespace handling when being parsed by an XML parser. In most cases, this is not an issue, so many opt to have the XML automatically formatted using the xmlwriter_set_indent() function. It simply takes a boolean, defaulted to FALSE that turns this features on and off. This behavior may be changed at any time while creating the XML, but for output consistency it is best to set this feature once, prior to writing any output and leave it alone. When enabled, the default behavior is to add a line feed and indent, based on how deep the tag is within the document, using a single space as the indentation string. While this does make it easier to read the serialized document, a single space is not always enough. Compare the XML from Listing 1 to that of Listing 2. The document in Listing 1 uses the default indenting, while the document in Listing 2 uses three spaces for indenting. Even though the documents are not very complex, it is definitely easier to read and discern the position of the elements in Listing 2. You may not want to use spaces at all, but rather tabs. The string used for indenting is changed using the xmlwriter_set_indent_string() function. For example to enable formatting and use a tab character for indenting, you would simply need to call the following when using the procedural interface:
/* Creation of buffered XMLWriter object under PHP 5.x */
/* Enable formatting with tabs using procedural interface */
$writer = new XMLWriter();
xmlwriter_set_indent($xw, TRUE);
$writer->openMemory();
xmlwriter_set_indent_string($xw, “\t”);
Volume 5 Issue 5 • php|architect • 11
XMLWriter
Procedural vs. Object Oriented API I mentioned earlier that XMLWriter contains both procedural and OO interfaces, although the OO API is only available when using PHP 5+. The choice of interface to use is completely up to you. Neither offers more or less functionality than the other, so the choice of which one to use is really only dependent upon the style of coding you prefer and how portable you need your code to be. For example using the OO interface limits your code to PHP 5+, while the same code written using the procedural interface will work for PHP 4.3+. The only limiting factor you may run into is the version of libxml2 installed on the target systems, as some functionality requires certain versions of libxml2. When we come across any of these functions within this article, it will be noted if a certain version of libxml2 other than 2.6.0 is required. Once you choose a style, determined by how you set up the original writer, you cannot change to the other
LISTING 1 1 2 3 4 5 6 7 8
Special chars: & < > "
LISTING 2 1 2 3 4 5 Special chars: & < > " 6 7 8
style while working the writer. If, by chance, you decide to change styles after writing your code, the good news is that it is not difficult to quickly make the change. Other than changing how the writer is initially created, the remaining changes all follow the same set of simple rules: • Using the procedural style, all function names being with xmlwriter_. With the OO interface, this is replaced with the instantiated object and the method operator. • All remaining underscore characters are removed from the function name with the following character changed to uppercase. • The first parameter for the procedural function calls is the writer itself. For the OO interface, simply remove this parameter, leaving the remaining parameters as they are. Consider the functions used to set up formatting. You have already seen these written using procedural style. Had the variable $xw been initialized using OO syntax by calling $xw = new XMLWriter();, formatting would have been setup using the following snippet of code: /* Enable formatting with tabs using the OO interface */ $xw->setIndent(TRUE); $xw->setIndentString(“\t”);
As you can see, the difference between the two code snippets is very minor. This not only makes it easy to change code written in one style to another, but once you know one interface, it is very simple to write code in the other style.
LISTING 3
Creating a Simple XML Document
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Anyone who has worked with XML before knows that while an XML declaration is not required for a document, it is highly encouraged, just like a text declaration for external subsets. These declarations allow you to define the version of XML being used, the encoding, and in the case of an XML declaration, whether or not the document is standalone; indicating if the document utilizes an external subset or not. This information allows an XML parser to understand how the document was originally created. Regardless of the type of declaration, each of these can be created using the xmlwriter_start_document() function. The example in Listing 3 demonstrates the creation of the XML shown in Listing 2. Although it is still a simple example, it should give you an idea of how the functionality starts to all fit together. The writer is first set up, identifying how the output is to be handled. In this case, we will be working with a memory buffer, so
Volume 5 Issue 5 • php|architect • 12
XMLWriter output is available by string access. The document is meant to be easily read by a human, so next, formatting is enabled using three spaces for indenting. To aid parsers in understanding how the document was created, an XML declaration is added, noting the use of XML 1.0 and UTF8 encoding.
elements, comments or processing instructions, a single function—xmlwriter_write_element()—can be used to simplify the element and content creation. This function performs three actions in one. The starting element3 tag is created, the supplied text content is written for the element, and finally the element3 tag is closed.
XMLWriter contains both procedural and OO interfaces, although the OO API is only available when using PHP 5+ The document does not contain a DTD or any other information within the prologue, so the next step is to create the document element root using the xmlwriter_start_element() function. This function simply creates the start tag for an element, allowing for the addition of attributes or child content. In this case there are no attributes, so we move to creating its content. Since automatic formatting has been enabled, the whitespace shown in Listing 2 is insignificant to us, meaning we are only concerned with child content, which in this case is simply the element element1. At this point, the writer still has the root start tag open, so a subsequent call to xmlwriter_start_element() for element1, causes the writer to move to within the content of root to create the element1 start tag. Following this logic, element2 is created the same way within the content of element1. This new element, however, also contains the num attribute. Due to the fact that the writer was instructed only to create the start tag for element2, the element is currently still open, allowing for the addition of attributes. Thinking logically about document creation, you would say “I want to create an attribute with a specific value”. Following this logic, it is programatically created by calling the xmlwriter_write_attribute() function, simply passing the name, num, and the value, 1, as the parameters. The element tag remains open, so any number of attributes could be added, but for the document in Listing 2, there is only a single attribute, so at this point we finally reach the insertion of element3. This element is slightly different due to the fact it is the first element in the document containing any significant text content. Because the content contains only text and no other type of markup, such as other
You may be wondering about the text used for the content. The characters used, &, , and “ are all special characters in XML. In most cases, they must be escaped as they are not valid as plain text content. In Listing 3, the string Special chars: & < > “ was passed as the text content for element3, but looking at the document in Listing 2, it appears as Special chars: & < > ". XMLWriter automatically escapes content passed in as text. This is an important feature to realize. When using XMLWriter to create a document, there is no need to LISTING 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Volume 5 Issue 5 • php|architect • 13
XMLWriter call a function like htmlsepcialchars() when creating text content from a source like a database because it is automatically performed for you. This may not always be the case of how you want content written and, as we will see in a more complex document later in this article, there are ways to write content without performing any auto encoding, but this should only be done if you are absolutely sure about the content of your source data. At this point, the only remaining action to complete the document creation is to close all start tags that are still open. For each of the elements still open, consisting of element2, element1 and root, the xmlwriter_end_element() function is called. Each time this function is called, the currently opened element within the stack is closed and the writer positions itself to the same level within the document as the starting tag for the element it just closed. As will be demonstrated later, this provides the capability for creating complex documents where element content may contain multiple elements and/or mixed content. For example, right after closing element2, you could easily create an element named element2a, which would be a sibling of element2, by calling xmlwriter_start_element() prior to closing element1.
A Complex Document Now that you have seen how to create a simple document, let’s take a look at some additional functionality provided LISTING 5 1 2 3 4 This Special chars: & < > “ 5 is 6 mixed no namespace 7 content 8 9
LISTING 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
by XMLWriter that allows for the creation of more complex documents. Take a look at the code in Listing 4, written using the OO interface, and the output it produces in Listing 5. Although the created document is similar to that from Listing 2, you should notice the introduction of namespaced elements and attributes as well as the mixed content contained within element2. The first noticeable difference lies with the document element. In this case, root is a namespaced element created using the startElementNs() method. This method takes the prefix, local element name and namespace URI. From the output produced in Listing 5, you can see that not only is root created with a fully qualified name consisting of prefix and local name, but the namepsace declaration has also been added to the output. By using the namespaced methods/functions, XMLWriter automatically takes care of adding the namespace declarations. This being the case, you may be curious about the xmlns attribute that is added to this element. Here, we are defining a default namespace on this element, yet it is not being used by the element directly. Due to this, it must be added through the use of the writeAttribute() method. There are a few issues when creating a namespaced document with XMLWriter due to some bugs in the libxml2 library. For example, both element1 and element2 reside in the default namespace, yet when they are created in Listing 4, they are created as regular elements. Were you to use the namespaced methods here, the namespace would be declared again on the new element, even though it is currently in scope. This is not technically incorrect since the document still means the same thing, but the problem with the namespace declaration being added whenever namespaced functions are used is exacerbated when an element is created in a namespace and an attribute in the same namespace is added to the element. Each of these calls would end up creating the same namespace declaration on an element, which is invalid. Due to this current problem in libxml2, which may be fixed in its next release, be very cautious when using namespaced functionality. As you can see in Listing 4, once the namespaced declarations were created, the regular non-namespace aware functions were used to create elements even though in reality they do belong to specific namespaces. It is also the reason the num attribute was created within its own urn:attributes namespace using the writeAttributeNs() method, which is only available when built against libxml2-2.6.17+. Had it been created in an existing namespace and prefix, the output in Listing 5 would definitely have been a bit confusing without having read this first. Volume 5 Issue 5 • php|architect • 14
XMLWriter Now that you hopefully understand some of the issues you might face working with namespaces, we can move on to the content of element2. Even though the document is complex, such as the mixed content being used here, creating this mixed content is pretty straightforward. Text is created using the text() method and follows the same escaping rules as you saw when writing text content to an element. It is interspersed with the creation of element3, element4 and a comment, created using the writeComment() method. Again, just like text, the contents of a comment are automatically escaped. Earlier in this article, I had mentioned that it is possible to create content without having the autoescaping occur. This functionality, unfortunately, is not implemented within PHP 5.1.x, but will be available once PHP 5.2 is released. Compare the following two ways of creating an element with text content. The first uses the method you are currently familiar with and auto-escapes text, while the other can be used to write raw data to the element content. $writer->writeElement(‘element3’, ‘Special chars: &’);
/* Create raw text in PHP 5.2 */ $writer->startElement(‘element3’); $writer->writeRaw(‘Special chars: &’); $writer->endElement();
The only question you might have about the code in Listing 4 is probably due to the fact that the elements that were started earlier in the script were never terminated. Instead of the endElement() calls for element2, element1 and myns:root, there is a single call to endDocument(). This is a magic method and extremely helpful when creating a complex document that is many levels deep. When you have finished writing all necessary content within a document, rather than having to remember where in the stack you are, a simple call to the endDocument() method closes all open attributes and elements in the proper order, ensuring that the document is well-formed. It is not required, but highly recommended to always call this method even if you explicitly end every start tag. It cannot hurt and will catch anything you may have possibly missed.
Working with Output and Streams In all of the code listings so far, you have seen a yetunexplained function—flush(). It instructs the writer to output any information that may currently be held within the internal buffer. Its use is dependent upon whether the writer was set up to return strings, based on using
xmlwriter_open_memory() or output the data to a URI, when using xmlwriter_open_uri(). The flush() function accepts only a single optional parameter that only has an effect when working with strings. It is a boolean, with a default value of TRUE, which is used to indicate whether or not the internal buffer should be cleared when the string is output. For example, take a look at the code in Listing 6 and the output produced in Listing 7. The buffer starts out empty, so the first call to flush() simply returns the document as it has been created to this point. The value TRUE has been passed to the function, so after returning the output, the internal buffer is cleared. This is clearly demonstrated by the second call to flush(). The data returned here is the portion of the document that has been created form the point after the initial flush() call was made. This time, however, FALSE is passed to the function, instructing the writer to keep the current state of its internal buffer in tact. As you can see from the output of the final call to flush(), the data from the previous call is maintained with the additional document structure appended. When sending output to a URI, which is handled through the use of PHP streams, flush() works along the same principals though functions slightly different than when used for strings. Like with strings, as the writer is used, the document is created within an internal buffer, but upon the buffer reaching a certain size, dependent upon the underlying libxml2 library, but typically no more than 4000 characters, it is automatically sent to the stream write handler and the writer’s internal buffer is cleared. For example, if you have written a custom protocol handler and were using it to handle the writer’s output, unless explicitly instructed to write data, your handler’s stream_write() method would be called at least every 4000 characters to handle the data. I indicated “unless explicitly instructed” for a reason. The flush() function can be used to force the writer to send any data that is currently held within its buffer, regardless of how much the buffer is holding. The example in Listing 8 is creating the same document as was created in Listing 6. This time, however, the writer is writing to php://output. For anyone unfamiliar with this, php://output is a built-in output stream to write to the PHP output buffer just as you would if you had called echo or print. As the document is being created, the writer is forced to send the data to the output handler every time flush() is called. Rather than returning a string, as you saw in Listing 6, flush() in this case, returns the number of bytes that were written to the stream. You may also notice that the optional parameter was not used when flushing the data. When using streams, the parameter has no meaning because as data is written to the stream,
Volume 5 Issue 5 • php|architect • 15
XMLWriter XMLWriter LISTING 7
LISTING 10 (CONT’D)
1 Initial Output: 2 3 Special chars: & < > “ 7 8 Output: 9 >Special chars: & < > “
31 32 if (! empty($strLink)) 33 $this->writeElement(“link”, $strLink); 34 35 $this->writeElement(“description”, description); 36 37 if (! empty($strGenerator)) { 38 $this->writeElement(“generator”, $strGenerator); 39 } 40 41 $this->writeElement(“lastBuildDate”, date(DATE_RFC822)); 42 43 } 44 45 /* Create an item within the feed */ 46 public function item($strTitle, $strContent, $strLink, $timeStamp) { 47 $this->startElement(“item”); /* Open item element tag */ 48 49 $this->writeElement(“title”, $strTitle); 50 if (! empty($strLink)) 51 $this->writeElement(“link”, $strLink); 52 $this->writeElement(“description”, $strContent); 53 if (empty($timeStamp)) 54 $timeStamp = time(); 55 $this->writeElement(“pubDate”, date(DATE_RFC822, $timeStamp)); 56 57 $this->endElement(); /* Close the item element tag */ 58 } 59 60 /* Insure document is properly closed and flush any remaining content from the buffer */ 61 public function close() { 62 $this->endDocument(); 63 $this->flush(); 64 } 65 } 66 67 68 /* Instantiate the rssWriter and set formatting to use 3 spaces. 69 The document is streamed to the PHP output buffer. 70 It could just as easily be sent to a file (filename) or remote URI (http://...) */ 71 $writer = new rssWriter(“php://output”, TRUE, “ “); 72 73 /* Create the header */ 74 $writer->header(‘My RSS feed’, ‘http://www.example.org/rss’, 75 ‘This is my example 2.0 RSS feed using XMLWriter’, 76 ‘PHP rssWriter’); 77 78 /* Create a timestamp for the first entry created Apr 1, 2006 */ 79 $entryDate = mktime(0, 0, 0, 4, 1, 2006); 80 $writer->item(‘First Entry’, 81 ‘This is the content of my first entry & it has no Link’, 82 NULL, $entryDate); 83 84 /* Create a timestamp for the second entry created Apr 2, 2006 */ 85 $entryDate = mktime(0, 0, 0, 4, 2, 2006); 86 $writer->item(‘My Second Entry’, 87 ‘A “cool” & easy way to create RSS feeds’, 88 ‘http://www.example.org/123456’, $entryDate); 89 90 $writer->close(); 91 ?>
LISTING 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
LISTING 9 1 2 3 4 5 6 7 8 9 10
Special chars: & < > " Output 70 bytes written. Final Output 30 bytes written
LISTING 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Volume 5 Issue 5 • php|architect • 47
TEST PATTERN: The Attributes of Properties Is this scenario a bit esoteric to worry about? Perhaps. However, it is an illustration of one way in which coupling the implementation of attributes together in a single method can reduce the maintainability of a class. The solution? A concerned class author can implement base __get() and __set() methods that do nothing but raise an “attribute not found” exception. Thus, subclass authors will know to call the parental methods and the class reserves the right to add attributes in the future. One additional thing to be aware of is that __get() and __set() are limited with respect to attribute visibility. The method based attribute implementation allows attributes to have any visibility that a method can have. In fact, getters and setters can even declare different levels of visibility. For example, an attribute’s getter might be public, while its setter may remain protected. Attributes implemented with __get() and __set() are always public. No matter what visibility modifier is used with these methods, they are always called when a property is not found. Even if the method’s visibility declaration were respected, coupling all of the attributes together would mean that they would all have to share the same visibility anyway. Public it is. One last __get() and __set() caveat. These methods are only called when an instance level property is not found. That means that class (static) properties do not trigger these methods. There is no corresponding property not found handler for static properties. Class attributes must be implemented with some other technique. Of course, the getXXX() and setXXX() style works fine for static attributes.
Last month, we covered a couple of arguments from the accessor methods are evil school of thought. We went into depth on the breaking encapsulation argument, but glossed over the verbosity argument. I’d like to return to the verbosity issue. Accessor methods don’t tend to do much. They tend to be simple methods that can be tedious to declare and they add to the conceptual weight of a class. Some people don’t feel they are worth the space they take up. Now, if you are not a fan of declaring these simple accessor methods, you might think, “Hey, my common dispatchers can do that for me, too.” Indeed, they can. Listing 5 introduces yet another naming convention, a variation on Listing 4, where an attribute is mapped to a protected storage property of the same name prefixed by an underscore. This cannot be a private storage variable because __get() is not defined in the same class as the storage property. The implementation of Rectangle in this version is positively terse. One thing to watch out for in this implementation is that it can lead to brittle inheritance hierarchies because there is no equivalent of parent::getXXX() or parent::setXXX() in this style. Neither of these implementations, Listing 4 or Listing 5, is very helpful in adding new properties to an existing class by subclassing, because of the need to inherit from PropertySupport. Of course, the methods in this implementation are fairly small and they could be copied and pasted into a new subclass, but that’s distasteful in its own way.
Accessor Methods are Evil?
The reflection API in PHP allows an object to be inspected at runtime. We can determine almost anything that PHP knows about an object including information about its properties. This meta-data is useful in creating generalized tools that work with object instances. In fact, our examples in Listing 5 require meta data to work, they just don’t use the reflection API to get it. Instead they call method_exists() and property_exists(). Additionally, some programming tools use static metadata about the program. This information is determined from parsing the source code files, rather than inspecting the information at runtime. A good example of this kind of tool is PhpDocumentor. Unfortunately, there is no single convention in PHP for how to associate accessor methods with attributes. Tools that rely on meta-data don’t see these as attributes at all. They see that there is a property named _height, but don’t understand the naming convention that turns this into height via __get(). Or, they see a method named setHeight(), but don’t register that as an accessor
There are other possible implementations for our __get() and __set() methods beyond the switch statement. In fact, we can combine elements of our getXXX() and setXXX() accessor method style of declaring attributes with our property not found handler style of declaring properties. Listing 4 implements a PropertySupport base class which defines __get() and __set() methods which dispatch to getXXX() and setXXX() methods. The Rectangle subclass is then free to define its getter and setter methods, just as in Listing 1. However, this time, these accessor methods need not be used with method calls, but rather can be used with the property dereferencing notation. The magic method dispatcher understands our getXXX() and setXXX() naming convention and can dispatch to the appropriate accessor method. Why would you do this? This implementation is nearly the worst of both worlds. It has all of the limitations of __get() and __set() along with all of the verbosity of declaring individual getter and setter methods.
Becoming Self Aware
Volume 5 Issue 5 • php|architect • 48
TEST PATTERN: The Attributes of Properties FIGURE 1: Property and Attribute Access Benchmarks Benchmark A. Standard property access B. Simple getter method C. __get with switch (first property) D. __get with switch (fifth property) E. __get with accessor dispatch
Time (seconds) 1.603 6.178 16.809 28.448 35.858
Relative 100% 385% 1049% 1775% 2237%
Getters and setters can declare different levels of visibility. LISTING 3
LISTING 4 (CONT’D)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 } 48 49 ?>
LISTING 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Volume 5 Issue 5 • php|architect • 49
TEST PATTERN: The Attributes of Properties method for the height attribute. Of course, it’s possible to write tools that make these assumptions. You can wrap the reflection API with your own API that understands your own conventions for accessor methods. Unfortunately, until the standard reflection API becomes aware of one of these conventions, these home grown tools will generally not be interoperable. This accessor method Tower of Babel contributes to the “not invented here” syndrome in the PHP community and makes it more difficult for tools, libraries and frameworks to work together. Fortunately, it looks like this issue is scheduled to be addressed in PHP 6. My hope is that the eventual meta-data solution will tackle both the dynamic use case and the static use case. An example of the dynamic use case would be mirroring the fields of a database record as properties of an object, such as in an active record or ORM implementation. Another example of the dynamic use case for meta data would be creating a proxy for an object and mirroring its attributes and properties via __get(). The key aspect of the dynamic use case is that the properties and attributes are not declared on the class itself. The dynamic use case is runtime only. The static use case is what we have been discussing in the last two articles. This is where a class deals with attributes or properties that are specifically declared in source code and possibly associated with accessor methods.
switch statement. Benchmark C succeeds on the first case, while D succeeds on the fifth case. As, you can see, the number of cases in the switch statement has an impact on performance. As with any switch statement, you want the most common cases at the beginning. This attribute implementation is 10 to 17 times, an order of magnitude, slower than standard property access. Benchmark E tests our dispatching attribute implementation from Listing 5. Unlike the switch statement implementation, this implementation should normally run in a constant time. Unfortunately, that constant time is much slower than the average case in the switch statement implementation, being about 22 times slower than a standard property access. As with any micro-benchmarks, you have to take these with a grain of salt. 22 times slower might seem like a lot, but these are still relatively fast operations. In absolute time, on a decent server, they probably don’t account for that much of a real application’s overall runtime. What I am saying is don’t let performance be an excuse not to use attributes.
How do you Implement Attributes? There are many ways to implement attributes. I didn’t even cover the third major technique, using get($name). Stop by the php|arch forums and share your implementation.
Benchmarks Everybody loves benchmarks, and I’ve saved those for last. Figure 1 benchmarks the methods of implementing attributes that we’ve discussed so for. These benchmarks were performed on a 300Mhz G3 running PHP 5.1.2, hardly a barn burning server. The benchmark performed each operation one million times. Benchmark A is used as a baseline. It measures the speed of standard PHP property access. One million property accesses in 1.6 seconds on a dog slow machine. That’s pretty good. Even more interesting, when I ran this test on PHP 5.0, I found that PHP 5.1 has made significant performance gains in property access. Kudos to the PHP development team. Benchmark B tests a simple getter method, such as that in Listing 1. There is more overhead with a method call, making getter methods three times slower than standard property access, but this idiom proves to be simple, consistent, powerful and relatively fast. Oh, if it weren’t for the lure of the object operator (->) notation… Benchmark C and D test the __get() method with a switch statement, as in Listing 2. The difference between the two benchmarks is in the number of cases in the
JEFF MOORE learned to program in the 80s, worked on ERP systems in the 90s and is devoting this decade to PHP. Jeff does freelance programming, works on the open source framework WACT and occasionally posts to his blog at http://www.procata.com/blog.
Dynamic Web Pages www.dynamicwebpages.de sex could not be better | dynamic web pages - german php.node
news . scripts . tutorials . downloads . books . installation hints
Volume 5 Issue 5 • php|architect • 50
Anytime
Anytime
Anytime
SECURITY CORNER: All Your Session Are Belong to Us!
SECUR ITY COR NER
All Your Session Are Belong to Us! A person’s identity is a precious thing; it defines who we are and ultimately makes each person unique. In the offline world, the identity of a person is often represented by various documents ranging from social security numbers, passports, credit cards and so on. For the most part, people try to keep this information secret to prevent outsiders from taking over their identity. When it comes to the online world, the situation is much simpler: the “identity” of a person is often nothing more than a short stream of bytes passed via URLs or Cookies, known as sessions. Take these away, and your favourite store will not recognize you anymore. Worse yet, give it to another user and all of a sudden you have a case of multiple personalities. In this article we will focus on the various techniques that can be used to compromise sessions, and review some defenses against them in the hope of securing our users’ identities.
by ILIA ALSHANETSKY
B
efore getting into the actual process of compromising and securing sessions, let’s take a moment to review the concept of sessions, where they live, and why we need them in the first place. Let’s start with an answer to “why do we need sessions?” Image an e-commerce site. You visit a site, select a product that you want to buy and are ready to go through the process of purchasing it, which involves
TO DISCUSS THIS ARTICLE VISIT: http://forum.phparch.com/307 entering your address and payment information. Because the HTTP protocol is stateless, there is no inherent way for the Web site that you are using to keep track of you. Each request is independent and totally unrelated to the previous request, as far as the server is concerned. There Volume 5 Issue 5 • php|architect • 52
SECURITY CORNER: All Your Session Are Belong to Us! is no continuous interaction between the user and the server, as you’d encounter in a non-Web application. That said, the store still needs to keep track of which products you selected, your address, and other details related to your purchase. This is where sessions come in; sessions provide the means of keeping track of the user as they are going through the site. “How can a session be maintained in a stateless protocol,” you ask? Well, there are two ways to go about it, one involves cookies and another involves passing a session identifier via URLs and form submissions. The most prevalent way is to use cookies to store the session identifier, simply because it is very easy to do and does not require every link to be modified to contain the session id.
Cookies Cookies are little bits of text that the server asks the browser to store, and if the browser agrees to accept the cookie, it will send the text back with each request made to the same site. Before you ask—no, you cannot set cookies for other sites, unless you encounter a particularly old and buggy browser. The cookie itself has a defined lifetime, which can be as short as a second, or as long as several years. It should be noted that even when a cookie expires, the browser does not remove it immediately, but waits for the browser to be closed before it is discarded. This is why many applications set cookies with an expiry date in the past, so that the cookies are removed immediately when a browser is closed. This is done to make sure that if the user forgets to log out, as long as they close the browser, their session will still be terminated. The trick is particularly handy for users at public terminals, such as computers at a library or an Internet Café, to ensure that the next user is not left with an open gateway to the original user’s account. If you are using PHP’s session extension this is done for you automatically, where the cookie’s expiry time is set to 19 Nov 1981 08:52:00, the birth date of the extension’s developer. To make use of this trick when setting cookies manually, simply pass a UNIX Timestamp of a past date such as 317278800 as the 3rd parameter to the function. // cookie name, cookie value, expiry time setcookie(“session”, “ABC123”, 317278800);
One common mistake that people make when trying to use this trick is that they use the current time, as returned by the time() function, as the expiry time. The issue here is that many people’s computer clocks
are too slow, often by 10-15 minutes and the time of deletion, as determined by the browser, is based on the user’s computer’s time. This means that if the user’s clock is behind the server’s clock, and they spent only a few minutes on the site before closing the browser, the expiry time was not reached and the cookie is not removed. Subsequently when the browser is reopened, possibly by the next user, they still have an active cookie from the previous session. This is why it is a good idea to set the cookie expiry time several years in the past, just to be on the safe side. The problem with cookies is that in order for them to work, the browser has to agree to accept them and sometimes the browser will refuse to do so. More often than not, due to heightened security settings that try to prevent user tracking through cookies, by having the browser reject them. However, there are valid situations when the user needs to be kept track of—our ecommerce site being one such example—so an alternative method of tracking sessions was needed.
URL Sessions This is where URL sessions come in, rather than passing session data via potentially unreliable cookies, the session id is appended to every single link on the site and is also stored in hidden fields for form submissions. This, of course, makes for very long and ugly links that look like foo.php?PHPSESSID=b2682f0bc108d7b1afe026 29ee8f2961. This does, however, solve the problem of cookie rejection, and provides a reliable way of keeping track of the user, independently of the browser’s settings. It also presents a bit of a challenge for the developer, as they now need to ensure that every link and every form submission carries the session id. Otherwise, when the user clicks on the link or submits the form, they’ll be logged out from the site, since there is no longer an identifying mark found inside their request. Oh well, who said Web development was supposed to be easy?
Session ID When it comes to the session identifier, ideally it comprises of a sufficiently unique and unpredictable identifier that differs from user to user. This way, a hacker cannot simply guess a user’s session id and assume the victim’s identity. In many cases, the session id is a 32 character long hexadecimal string that allows for a huge number of possible values. As you can probably guess, the shorter the number, the fewer the possibilities, which increases the chance of duplicates and simplifies the process of guessing the session id. That said, even an 8 byte log hexadecimal string would allow for over 4.2 billion unique combinations, which Volume 5 Issue 5 • php|architect • 53
SECURITY CORNER: All Your Session Are Belong to Us! is more than enough for all but the largest of sites. On the flip side, if the id is too long it becomes impractical to send via URLs and increases the chance of it being corrupted in some way. Thus, the 32 byte long strikes a perfect balance between security and length when it comes to session identifiers.
a consistent length of 32 characters). Technically speaking, we don’t need the MD5, but it does mask the source of the values from external users, making it far more difficult to reverse engineer the algorithm. Even if a hacker knew the algorithm to get at someone’s session, they would need to know the precise time when the
You can not set cookies for other sites, unless you encounter a particularly old and buggy browser. Perhaps the most important part, when generating a session id is making sure it is truly random. Even a length of 100 characters won’t save you if the values are predictable. To give you a real-life example, there was a certain site that generated session ids by taking the sha1 checksum of the user id. User ids, being sequential numbers, are easy to predict and determine via trial and error, despite being 40 bytes long (hexadecimal representation of a 160 bit SHA1 hash), the ids were easily predictable. Somehow, one of the users noticed this and decided to have a bit of “fun” by accessing other people’s accounts and posting messages as those users. To take over a user identity all he had to do was figure out their user id, generate a SHA1 of it, and then send the value as the session id. Needless to say there were plenty of red-faced developers when this particular trick was discovered. When generating a session id, my recommendation is to take a combination of values that would ensure uniqueness. For example, the current date, a random number and the PID would be an excellent combination. $cur_time = gettimeofday(); // initialize random # generator mt_srand($cur_time[‘sec’] + $cur_time[‘usec’]); $sess_id = md5($cur_time[‘sec’].mt_rand(). $cur_time[‘usec’]. (getmypid()*mt_rand()));
The above example takes a high precision timestamp (seconds & microseconds), combines them with a random number generated using the Mersenne Twister algorithm, and adds the product of the PID and another random number. The end result is an MD5 hash (which guarantees
session was created, down to the microsecond, the id of the web server that served the request and predict two different random numbers. In other words, this is next to impossible, which is precisely what we want. In theory we are done; after all with a non-guessable session id, session takeover should not be possible, right? Well, if this was the case, I suspect this article would not be needed, but alas things are not as simple in practice.
The Sub-Domain Problem Let’s take the cookie-based session for example; we know that an external site cannot read another site’s cookie, but what about sub-domains? In most instances when a cookie is set. it is set for an entire domain, so for www.php.net, the cookie domain would be .php.net; the extra dot at the beginning is needed for some browsers to accept the session. The idea behind the extra dot was to distinguish between php.net and notphp.net, this security feature is something you can thank the Netscape developers for. The problem with cookies is that they are domain wide, meaning that any sub-domain can potentially access and even create cookies of the parent domain. This creates a possible risk for sites offering sub-domains to other users, which is a frequent case with many hosting companies that give a temporary sub-domain to their clients that do not have a domain of their own. This means that if we have foo.example.com and the sub-domain controller is malicious, all he’d need to do in order to read cookies belonging to example.com is get the cookie holders to visit his sub-domain. This is a fairly easy task, which may be as trivial as posting a message on example.com’s forum, with an image linked from foo.example.com. The image Volume 5 Issue 5 • php|architect • 54
SECURITY CORNER: All Your Session Are Belong to Us! itself is a small PHP script that logs all of the cookies and their values, subsequently giving the hacker access to other people’s accounts. It does not even have to be a malicious external user; someone’s sub-domain may have a cross site scripting issue that would permit injection of JavaScript, which can then retrieve the cookies and send them to an external site. So what is the solution? A partial solution is to always set a directory limit for a cookie. If the application is installed inside /app/ directory, set the cookie’s home to /app/. This tells the browser to only send the cookie when making requests for files on the site’s domain or subdomain and the current directory is app. This however, is only a limited solution since there is nothing to prevent the malicious user from creating an app directory under their sub-domain as well. To address this problem when setting the cookie, we always want to use the full hostname, including the sub-domain. For example, when setting the cookie for www.example.com, the cookie domain would be the full www.example.com rather than just example.com. This tells the browser that the created cookie is specific to www.example.com and should not be accessible from any other sub-domain. This also prevents cookie conflicts between multiple copies of the same application installed on different sub-domains, where the cookie id is the same. // Potentially unsafe cookie creation setcookie(“session_ud”, “DE4D…”, -1, “/”, “.me.ca”);
cookies, otherwise they are cleanly exposed to any user of the computer. Unfortunately, no matter what we do on the server side, we cannot make the browser encrypt the cookie file(s), so the only means of protection is to shorten the expiry times in the hope that by the time the next user comes it’ll no longer be active. The reason for this is that for the most part, people don’t bother to log out of Web applications; they merely close the browser, so we need to make sure the cookie will be gone when that happens. An additional solution—which slightly mars the user experience—is to add a password based authentication step before any critical action can take place. Thus, if the malicious user gains access to someone else’s account, their access is limited to read-only operations as every write operation will require knowledge of the password. This, however, can get pretty annoying in applications where write operations are common. So, an intermediate solution is required. A more user friendly solution can come in the form of a secondary, timed validation block that allows the session to be used for write operations for a short span of time, controlled by the server. The functionality is very similar to the one you may have seen on eBay, where you can go the site and browse your profile, but to bid, you need to re-enter your login and password. Once that is done, you have a few minutes to do your bidding or profile changes, after which you are prompted to reauthenticate.
// A much safer version setcookie(“session_id”, “B34F…”, -1, “/ecom/”, “www.me.ca”);
session_start(); if (empty($_SESSION[‘expiry’]) ||
Keep in mind that if your sub-domains have sub-domains themselves such as foo.www.example.com you get back to the same kind of a problem. So, to ensure that problems do not happen you probably want to restrict yourself to just a single sub-domain level.
Browser Session Storage Issues Another cookie based exploit lies on the user’s own computer. When it comes to storing cookies, browsers are notoriously unsafe. They store the data inside plain text files, which are available for reading to anyone who bothers. In addition to the cookie, they also contain the domain from where the cookie originated, pretty much giving a blueprint to the malicious user as to where to go and what do to in order obtain access to another user’s account. This problem is primarily an issue for public machines, where different people may be using the same computer at different times. This is one of the reasons why it is so important that no sensitive data such as logins and passwords are ever stored inside
$_SESSION[‘expiry’] < time()) { if (login_prompt() == TRUE) { $_SESSION[‘expiry’] = time() + 300; } }
In the above example, the PHP session is used to store a special “expiry” key. This key, when available, contains a UNIX timestamp that indicates when the permissions will expire. As long as the expiry time is set and is lower than the present time, modifications can be applied without further authentication. On the other hand, if the expiry is not set or had passed according to the server— outside of the user’s control—the user is prompted for authentication data on any update operation. If their credentials match, they are given another 300 seconds (5 minutes) of authorized time. The idea behind this trick is to further reduce the window of opportunity during which time, the session id can be used without valid credentials. Volume 5 Issue 5 • php|architect • 55
SECURITY CORNER: All Your Session Are Belong to Us!
Volume 5 Issue 5 • php|architect • 56
SECURITY CORNER: All Your Session Are Belong to Us!
Invisible Session Theft Perhaps the biggest problem that affects cookie based sessions is that the attacker does not even need to know the session id to perform actions as the cookie holder. Suppose for a second that we have a Web management control panel where a user can be granted administrative privileges by having their ID passed to a certain script. Needless to say, this functionality is exposed only to an admin user, but we’ll bypass that restriction in a moment. First thing we need to do is create an image tag that links to the administrative page with our user id and then do a bit of social engineering to get the actual admin’s attention.
the key itself it not passed via a cookie but rather a URL or a form submission. This key changes on every single request. Without this key, the session id does not get you anything. The key itself can be generated in the exact same manner as the session id, which makes key creation a trivial process. When a hacker embeds a link or even a form, the simple fact that the user’s cookie is passed along does not get the user’s action to execute without knowing the key. Since they key is random and continually changing, it cannot be predicted or easily stolen by the attacker. if (!empty($_POST[‘make_admin’]) && $_SESSION[‘skey’] == $_POST[‘SK’]) { make_admin($_POST[‘make_admin’]);
}
One trick involves posting a particularly offensive message that would attract moderation staff; another could be in the form of a support e-mail to the site staff… the possibilities are endless. When someone opens the page with this image tag on it, the browser will dutifully make the request for the image, inadvertently executing the script, which will attempt to grant admin access to user with an id of 78. If the person seeing the message has access to grant permissions to the site, the hack will work. If they don’t, there is always the next user. The hack works since the browser does not care how the user got to the URL when sending that cookie, only that the cookie restrictions such as domain and path were matched. The hack itself is fairly hard to spot since—at most—there will be a small broken image icon on the page, making it difficult to spot an attempted hack. The first step to prevent such attacks is to avoid using GET requests for the purpose of executing system changes. If you have a page that alters the data in some way, make it a form where the data is submitted via POST. This seemingly trivial change makes it far more difficult to execute a request as a privileged user: you cannot just embed a POST request within an image tag. To execute the script, the attacker will either need to make use of cross site scripting, or perform an even more clever social engineering trick to convince a user to submit an arbitrary form. This, however, does not solve the problem entirely; it only makes the hack much more difficult to execute. To solve the problem properly, we need to make sure that knowing the session id does not allow the hacker to get their foot in the door. One solution is the repeated password technique, explained above, but this just about kills the user experience on sites where “write” operations are common. Another solution involves the use of sequence keys,
$_SESSION[‘skey’] = make_new_sequence_key();
In the above example, before the action of making the user an admin is executed, the sequence key associated with the session is compared to the one passed via POST. Only if the two match will the action be performed. Regardless of what happened, on each request the sequence is regenerated and all links and forms are embedded with the value. The key generation function is identical to the bit of code we’ve used before to generate sessions. The reason that we want to change the key on every request is to ensure that if the key is captured in any way it will already be invalid, or will become invalid as soon as the user clicks on the next link, shrinking the window of opportunity for the hacker.
Next Month… In next month’s column, we’ll dive further into URL based sessions, exploring the dangers that lay in passing the session identifier as part of the URL.
ILIA ALSHANETSKY is the principal of Advanced Internet Designs Inc., company specializing in security auditing, performance analysis and application development. He is an active member of the PHP’s quality assurance team with hundreds of bug fixes to his name as well as a sizeable number of performance tweaks and features. Ilia is a regular speaker at-PHP related conferences worldwide, the author of php|architect’s Guide to PHP Security as well as many magazine publications. He also maintains an active blog at http://ilia.ws, which is filled tips and tricks on how to get the most out of PHP. Volume 5 Issue 5 • php|architect • 57
PRODUCT REVIEW: CodeCharge 3.0
PRODUCT REVIEW
CodeCharge 3.0 Yet Another IDE?
by PETER B. MacINTYRE
T
his month I am reviewing another IDE for PHP. I wanted to look at this one, because I have a need to find the “Perfect PHP IDE,” and I have yet to find one. There are limitations to this one as well: the first one being that it is Windowsonly. The plus side (if you consider it a plus) is that it can generate code for many web development platforms, namely ASP.NET (C# & VB), ASP (VBScript), Java (JSP or Servlets), ColdFusion 4.0, PHP, and Perl. Naturally, this review will focus on the PHP side of the fence. Keeping with tradition, here is what the yessoftware web site has to say about the product: CodeCharge Studio is the most productive solution for visually creating database-driven Web applications with minimal amount of coding. The support for virtually all databases, web servers and web technologies makes CodeCharge Studio one of a kind. It is a complete solution for Web development. Well I have to protest against the claim of it being a complete solution, but let’s get into the review before I focus too much on the down side. There are 2 versions of this product, one that you can use for a yearly cost (a little cheaper in the short term) and another that is a little more expensive, for “perpetual” use. The perpetual license is the one to choose if you are committed to using this tool as your long term development environment.
PHP: 4+, 5+ (MySQL - plus other ODBC databases) PRODUCT VERSION: 3.0 O/S: Windows 95, 98, ME, NT4, 2000 or XP PRICING: $279.95 (non-perpetual, per user peryear) $499.95 US (perpetual, per user) LINK: http://www.yessoftware.com/index2.php
Getting Started The install was clean and uneventful. Many ODBC drivers are installed at the same time as well as a few sample MSAccess database files. Icons are created on the desktop and you are ready to explore. Figure 1 shows one of the screens that are involved in the installation process. I show this because I was suitably impressed with the way that this installation process was done. Many install processes are not very concerned with the presentation of this step of the products life. First impressions being what they are, I thought it worth a mention here. The basic layout of this IDE is very well done and very functional right away. It took me only a short time to become familiar with the general use of the product. Figure 2 shows the appearance of the IDE upon initial start up. CodeCharge Studio is project-based and therefore has a project management panel on the left side of the IDE. Volume 5 Issue 5 • php|architect • 59
PRODUCT REVIEW: CodeCharge 3.0
FIGURE 1
The basic layout of this IDE is very well done and very functional right away.
This is useful in many aspects of the development process, as you will see as we go through the review. The first thing that I did was check the type of documentation that was available for this product, and oddly enough there was plenty. I downloaded and printed the 150 page Getting Started guide, and began reading. This is an excellent guide that helps you get acquainted the tool, quickly and efficiently. I only got into trouble a few times and had to call the technical support line just once. The technical support was excellent. My issue was solved in about 5 minutes, and had to do with me getting a little confused on the process of connecting to a data source, but more on that later. So, you have to start with a project definition. Figure 3 shows the screen that helps you with that process. Here, I have the list pulled down showing what optional languages I have at my disposal. Apparently, you can build an entire Web site with one language, then change the language option and have the entire web application re-deployed in that other language without further work. I did not
FIGURE 2
Volume 5 Issue 5 • php|architect • 60
PRODUCT REVIEW: CodeCharge 3.0 attempt this, but I have faith that it is possible with this product. Once you select your language and set the other project directives, you are ready to start designing.
FIGURE 3
Digging Deeper I followed the tutorial fairly closely at first and then I started wandering off on my own, much like being on a field trip to a museum: once you get the idea of how things work, you want to explore a little on your own. So, off I went. What I discovered is that this is indeed a fairly involved tool. It is quite powerful, as well. The power comes primarily from its ability to generate (Object Oriented) code for you, and all you have to do is design the underlying database and then design the web pages. This power is known is a “builder” and it is probably the best form designer I have ever come across. The builder will take a connection to a data source and allow you— the designer—to build forms, or reports, or data entry pages based on that data source. I tried out 4 of the 9 builders and was suitably impressed with all of them. Figure 4 shows the fourth step in the process of designing an editable data grid and Figure 5 shows the end result. There are a few caveats here that you should know about. First, when you are setting up your server connections within this environment, you will see that the default is an ASP/IIS environment, so be sure to adjust this setting before you try to look at any live pages from within the IDE. Also, when trying to connect to a local MySQL server, you need to download and install the MySQL ODBC tool from mysql.com, so CodeCharge can make the connections. If you look at the code that this
FIGURE 4
Volume 5 Issue 5 • php|architect • 61
PRODUCT REVIEW: CodeCharge 3.0
FIGURE 5
IDE generates, you will see that it is very Object based; Figure 6 shows a sample. So, even though CodeCharge allows you to add your own code, you would be well served FIGURE 6
to make sure you spend some time getting familiar with their Object Oriented writing style. Once this is accomplished, you should have no problems at all with extending the code that this generator builds for you. One final concern here,that I came across when dealing with a MySQL data source is that there is no rendering for the DATETIME type, only DATE. This is the only mismatch that I saw, but there may be others: TINYINT and SMALLINT come to mind.
Summary After looking at this product for a few days, I have to summarize that I was very impressed with how it was able to communicate with such a diverse range of data sources. It was able to turn those data definitions into forms, and convert between scripting languages on the fly. There are, however, some shortfalls, that I haven’t yet mentioned. First, there is no debugger, which is a must for a modern day IDE (the documentation tries to put some salve on that by directing the programmer to echo variables out to the browser). Second, there was no menu designer—even though the Application builder generates one for you, it is just a table with ref links in it (to be fair I did not explore this too deeply). On the plus side, the web site, documentation, and technical support were pristine! Any problems that I had, whether due to CodeCharge itself, or caused by my own confusions, were readily resolved. I am still looking for that perfect PHP IDE, however—one that has connections to the database, form designers, debugger, and a menu designer. CodeCharge has come close to the zenith on this, but it is not quite there. Maybe somewhere, some day in the not too distant future…
I give this product 3.5 starts out of a possible 5. PETER MACINTYRE lives and works in Prince Edward Island, Canada. He has been and editor with php|architect since September 2003. Peter is a Zend Certified Engineer. Peter’s web site is at http://www.paladin-bs.com.
Volume 5 Issue 5 • php|architect • 62
///exit(0); ////// Richard, You Just Scare Us! by M ARCO TABINI
N
ot long ago, I had a discussion with a friend of mine that runs a small publishing house. He deals with engineering publications and, much like me, he has a bit of a fixation for automation. Anybody who has ever published a book (without losing his or her shirt over it) will tell you that there is only so much money that can be invested into it, and that the money needs to be divided among a rather large number of pockets: the author’s (who, let’s face it, wouldn’t exactly mind being paid), the editors and the typesetters. Of these three major expenses in the production of a book, typesetting (or layout) is the most frustrating one. From a technical standpoint, it adds nothing to a book’s contents. Of course, it makes the book fit for human consumption—but, in general, the amount of creativity that an artist can place in a technical book is limited by the fact that the reader expects consistency rather than eye-catching graphics. Therefore, layout work on a technical book tends to become very dull and repetitive. Dull and repetitive maybe, but still long and expensive—not just because someone has to sit down and use the composition software to ensure that someone will be able to read the book without suffering brain damage, but also, and especially, because layout is a perfect opportunity to introduce errors into the book that will be incredibly difficult to catch. In the vast majority of the cases, the person who is doing the layout of a technical book will be an expect in typography, not in whatever the specific topic of the book is. Thus, unless you happen to be publishing a book on typography, the finished manuscript needs to go back to the author and editor for a final review, introducing yet more expense and delays. Much like me (and, I suspect, the remainder of the book publishing community), my friend wishes that layout could be completely automated, and taken out of the hands of human beings. Unlike me, he’s actually managed to come up with a pretty neat system that essentially streamlines the entire process: the author turns his or her chapters in to the editor in a special marked-up text format (Docbook, for example), the editor manages them directly in a CVS repository and, when everything is ready, they are converted over to LaTeX format for immediate conversion to PDF—a format from
which they can be sent directly to the printers. The advantages of this approach are far-reaching: the human element has been taken out of the equation where it is more likely to cause damage; costs have been reduced and production times sped up; galleys of the book can be produced at a moment’s notice; the same source can be used to output multiple formats, all the while maintaining a consistent level of linguistic and typographical accuracy. “That’s great,” I told my friend. “You should tell people about how your system works!” “I can’t,” he replied, “I am scared of the GPL.” You see, all the programs he uses in his workflow are GPL’ed, and he, together with about 50% of the civilized world, is having a hard time figuring out how the license affects the software he’s written. It should be noted that, at least in this specific case, the problem is not one of intellectual property. My friend isn’t the greedy, power-hungry CEO of a Multinational Corporation of Evil (if that’s not trademarked, I’m claiming rights to it right here and now)—in fact, he’s just written a collection of shell scripts that call up different applications, some of which are distributed under the GPL, which are responsible for the actual typesetting work. His problem is that he just doesn’t understand how the GPL affects his business. He is genuinely concerned that the fact that he is using GPL software may somehow affect his ability to retain copyright on his books. I wish I could tell him that it’s all “nonsense,” but, to be honest, it’s a bit difficult to tell. Is a book that is laid out with a GPL software a derivative work of the product? From the height of my ignorance of the law, I’d say it’s just a data file, although you couldn’t have produced it without the program in question. The fact remains, however, that the GPL is vague and confusing enough that people will either ignore it to their peril, or be scared of and try hard to stay away from, which may result from an effect that is opposite to the license’s goal: instead of fostering the development of free software, it forces people to hide ideas and programs that could potentially benefit the world at large. If you’re writing a new application, do yourself (and your users) a favour: either dual-license it, so that someone who wants a commercial license can steer clear of the GPL, or ditch the GPL altogether and use one of the “truly free” OSS licenses, like the BSD license. Volume 5 Issue 5 • php|architect • 64