VOLUME III - ISSUE 2
FEBRUARY 2004
The Magazine For PHP Professionals
PROFILING PHP Understand and optimize your code
]
]
W riting an SMS
G ateway w ith PHP and G nokii
EXtending PHP Handling PHP Arrays from C
Caching Techniques for the PHP Developer The Need for Speed www.phparch.com
Writing More efficient PHP scripts
Offline News Management with PHP-GTK Get Ready For php | Cruise See inside for details
Plus:
March 1st - March 5th 2004
Tips & Tricks, Security Corner, Product Reviews and much more...
This copy is registered to: Richard Freem
[email protected] In partnership with Zend Technologies
Zend Studio 3.0 is the official PHP IDE of php|cruise
We’ve got you covered, from port to sockets.
php | Cruise
Port Canaveral • Coco Cay • Nassau
March 1st - March 5th 2004 Signup deadline: Feb 15, 2004 ENJOY LEARNING PHP IN A FUN AND EXCITING ENVIRONMENT—AND SAVE A BUNDLE! Features
Visit us at www.phparch.com/cruise for more details. Andrei Zmievski - Andrei's Regex Clinic, James Cox - XML for the Masses, Wez Furlong - Extending PHP, Stuart Herbert - Safe and Advanced Error Handling in PHP5, Peter James - mod_rewrite: From Zero to Hero, George Schlossnagle Profiling PHP, Ilia Alshanetsky - Programming Web Services, John Coggeshall Mastering PDFLib, Jason Sweat - Data Caching Techniques Plus: Stream socket programming, debugging techniques, writing high-performance code, data mining, PHP 101, safe and advanced error handling in PHP5, programming smarty, and much, much more!
php | Cruise
Conference Pass
$ 899.99**
Hotel
Included
Meals
Totals:
Traditional PHP Conference* $ 1,150.00
($ 400.00)
Included***
$ 899.99
($ 200.00)
$1,750.00
You Save $ 850 * Based on average of two major PHP conferences ** Based on interior stateroom, double occupancy *** Alcohol and carbonated beverages not included
TABLE OF CONTENTS
php|architect Departments
5
Features
9
Editorial
Write SMS Applications With PHP and Gnokii by Eric Persson
I N D E X
6
What’s New!
16 37
Offline Content Management with PHP-GTK
Product Review
by Morgan Tocker
SQLyog
23 58
Writing PHP Extensions: Managing Arrays
Product Review 2003 Quebec PHP Conference DVD by Marco Tabini
by Wez Furlong
28 61
65
Security Corner
The Need For Speed
by Chris Shiflett
Optimizing your PHP Applications by Ilia Alshanetsky
Tips & Tricks By John W. Holmes
41 Profiling PHP Applications by George Schlossnagle
68
exit(0); Why Can’t We All Just Get Along? By Marco Tabini
51 Caching Techniques for the PHP Developer by Bruno Pedro
February 2004
●
PHP Architect
●
www.phparch.com
3
You’ll never know what we’ll come up with next
! W E N
Existing subscribers can upgrade to the Print edition and save! Login to your account for more details.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
The Magazine For PHP Professionals
php|architect Subscription Dept. P.O. Box 54526 1771 Avenue Road Toronto, ON M5M 4N5 Canada Name: ____________________________________________ Address: _________________________________________ City: _____________________________________________ State/Province: ____________________________________ ZIP/Postal Code: ___________________________________
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
Choose a Subscription type:
Canada/USA International Surface International Air Combo edition add-on (print + PDF edition)
$ 83.99 $111.99 $125.99 $ 14.00
CAD CAD CAD CAD
($59.99 ($79.99 ($89.99 ($10.00
US*) US*) US*) US)
Country: ___________________________________________ Payment type: VISA Mastercard
American Express
Credit Card Number:________________________________ Expiration Date: _____________________________________ E-mail address: ______________________________________ Phone Number: ____________________________________
Signature:
Date:
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly. **Offer available only in conjunction with the purchase of a print subscription.
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EDITORIAL
E D I T O R I A L
R A N T S
W
elcome to the February 2004 issue of php|architect. As I write this, I'm sitting in my office—about forty degrees Celsius warmer than outside and, therefore, a much better place to work in that that the local park—suffering from an awful cold and sitting by a collection of (clean) tissues discreetly stashed on my desk, ready for use. As you can expect, I'm not particularly happy about either fact (make that three facts—the cold outside, the cold in my body, and the fact that I'm sitting in an office when I could really be somewhere else far away from anything that even remotely resembles a computer). Incidentally, with php|cruise coming at the beginning of March, I should hopefully be able to get rid of at least two problems—and I'm still working on finding a way to avoid computers during that trip. But I ramble—a clear sign that the cold medicine is wearing off. Let me instead tell you something about this month's issue. With the popularity that PHP enjoys nowadays comes the fact that it is used as the backbone of more and more high-traffic sites. A simple consequence of this is that an increasing number of developers are "hitting the wall" and finally feeling the limits of what the "let's just do it in PHP" approach can do. Building a website is always a high-wire balance of budgeting, respecting deadlines and writing the best code possible, but there's nothing quite as bad as finding out that the way you've done things is incapable of meeting the demands of your website—and, by the time you realize that you have a problem, it's usually too late to think about a solution short of calling your travel agent and inquiring about that non-extradition country you heard of. Therefore, this month we dedicate a fair amount of room to the performance management of PHP applications. George Schlossnagle's article—based on an excerpt from his latest book, published by SAMS—talks about profiling, a concept that I have very rarely seen associated with PHP applications. Profiling takes the guesswork out of understanding where the bottlenecks in your application are, allowing you to focus on finding the best possible resolution. The problem with profiling is that it only allows you to identify the problems and not solve them. Luckily, Ilia Alshanetsky and Bruno Pedro offer two other excellent articles on improving the performance of PHP without affecting the code itself (if you can, why not avoid the risk of introducing even more bugs?). While Ilia focuses on ways to make the PHP interpreter itself run faster, Bruno examines the topic of caching—both at the network and script level. This month we also start a new column—Security Corner—written by Chris Shiflett. The daily number of security advisories, patches, break-ins and source-code thefts that we see reported in the media every day has Continued on page 8... February 2004
●
PHP Architect
●
www.phparch.com
php|architect Volume III - Issue 2 February, 2004
Publisher Marco Tabini
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Graphics & Layout Arbi Arzoumani
Managing Editor Emanuela Corso
Director of Marketing J. Scott Johnson
[email protected] Account Executive Shelley Johnston
[email protected] Authors Ilia Alshanetsky, Wez Furlong, John Holmes, Bruno Pedro, Eric Persson, George Schlossnagle, Chris Shiflett, Morgan Tocker php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
Contact Information: General mailbox:
[email protected] Editorial:
[email protected] Subscriptions:
[email protected] Sales & advertising:
[email protected] Technical support:
[email protected] Copyright © 2003-2004 Marco Tabini & Associates, Inc. — All Rights Reserved
NEW STUFF
N E W
S T U F F
What’s New!
PHP 4.3.5 RC1 PHP.net has announced the release of PHP 4.3.5 RC1 PHP 4.3.5RC1 has been released for testing. This is the first release candidate and should have a very low number of problems and/or bugs. Nevertheless, please download and test it as much as possible on real-life applications to uncover any remaining issues. List of changes can be found in the NEWS file. For more information visit: http://qa.php.net/ PHP Community Logo Contest Following Chris Shiflett’s recent announcement of the PHP Community Site, he is holding a contest to find a logo that embodies the spirit of the PHP community. Everyone is welcome to participate, and you can submit as Many entries as you like. Please send all entries to
[email protected] And include the name with which you want to be credited. The contest ends 29 Feb 2004, and php|architect is offering a free PDF subscription to the winner. For updated news about the contest, as well as a chance to view the current entries, visit: http://www.phpcommunity.org/logos/
ZEND Studio 3.0.2 Zend has announced the release of Zend Studio 3.0.2 client. What’s new? Zend.com lists some of the bug fixes as: • ZDE didn’t load when using a new keymap config from an older version. • Save As Project didn’t always work. • Server Center activator tried to open the wrong URL. • .js files were not opened with JavaScript highlighting. • Shift-Delete and Shift-Backspace didn’t work properly. • Find&Replace was very slow under Linux. • Add Comment sometimes erroneously commented out a line that wasn’t selected. • Added configurable limit for the number of displayed syntax errors There have also been improvements to the debugger, code completion, code analyzer, IE toolbar, and some Mac OSX changes. Get more information from Zend.com.
Good luck to all who enter!
February 2004
●
PHP Architect
●
www.phparch.com
6
NEW STUFF
MySQL Administrator MySQL.org announces: MySQL Administrator is a powerful new visual administration console that makes it significantly easier to administer your MySQL servers and gives you better visibility into how your databases
are operating. MySQL Administrator integrates database management and maintenance into a single, seamless environment, with a clear and intuitive graphical user interface. Now you can easily perform all the command line operations visually, including configuring servers, administering users, dynamically monitoring database health, and more Get more information from: http://www.mysql.com/products/administrator/index.html
Looking for a new PHP Extension? Check out some of the latest offerings from PECL.
Check out some of the hottest new releases from PEAR.
opendirectory 0.2.2 Open Directory is a directory service architecture whose programming interface provides a centralized way for applications and services to retrieve information stored in directories. The Open Directory architecture consists of the DirectoryServices daemon, which receives Open Directory client API calls and sends them to the appropriate Open Directory plug-in.
DB 1.6.0 RC4 DB is a database abstraction layer providing:
statgrab 0.1 libstatgrab is a library that provides a common interface for retrieving a variety of system statistics on a number of *NIX like systems. This extension allows you to call the functions made available by libstatgrab library. Sasl 0.1.0 SASL is the Simple Authentication and Security Layer (as defined by RFC 2222). It provides a system for adding plugable authenticating support to connection-based protocols. The SASL Extension for PHP makes the Cyrus SASL library functions available to PHP. It aims to provide a 1-to-1 wrapper around the SASL library to provide the greatest amount of implementation flexibility. To that end, it is possible to build both a client-side and serverside SASL implementation entirely in PHP. SQLLite 1.0.2 SQLite is a C library that implements an embeddable SQL database engine. Programs that link with the SQLite library can have SQL database access without running a separate RDBMS process. This extension allows you to access SQLite databases from within PHP. Windows binary available from: http://snaps.php.net/win32/PECL_STABLE/p hp_sqlite.dll
February 2004
●
PHP Architect
●
www.phparch.com
• an OO-style query API • a DSN (data source name) format for specifying database servers • prepare/execute (bind) emulation for databases that don’t support it natively • a result object for each query response • Compatible with PHP4 and PHP 5 • much more…. DB layers itself on top of PHP’s existing database extensions. The currently supported extensions are: dbase, fbsql, interbase, informix, msql, mssql, mysql, mysqli, oci8, odbc, pgsql, sqlite and sybase (DB style interfaces to LDAP servers and MS ADO (using COM) are also avaible from a separate package). System_ProcWatch 0.4 With this package, you can monitor running processes based upon an XML configuration file, XML string, INI file or an array where you define patterns, conditions and actions. Net_IMAP 0.7 Provides an implementation of the IMAP4Rev1 protocol using PEAR’s Net_Socket and the optional Auth_SASL class. XML_Beautifier 1.1 XML_Beautifier will add indentation and line breaks to you XML files, replace all entities, format your comments and makes your document easier to read. You can influence the way your document is beautified with several options.
7
NEW STUFF
PHPWeather 2.2.1 PHP Weather announces the release of version 2.2.1. PHP Weather makes it easy to show the current weather on your webpage. All you need is a local airport, that makes some special weather reports called METARs. The reports are updated once or twice an hour. Get more information from : http://sourceforge.net/projects/phpweather/
PHPEclipse Debugger PHP Eclipse adds PHP support to the Eclipse IDE Framework. This snapshot introduces the first version of the PHPEclipse debugger plugin. For more information visit: http://www.phpeclipse.de
MySQL and Zend Working Together From Zend and MySQL – These two have Joined Forces to Strengthen Open Source Web Development MySQL AB, developer of the world’s most popular open source database, and Zend Technologies, designers of the PHP Web scripting engine, today announced a partnership to simplify and improve productivity in developing and deploying Web applications with open source technologies. Through the alliance, the companies are improving
compatibility and integration between the MySQL database and Zend’s PHP products to make it easier for businesses to use complete open source solutions, such as the popular LAMP (Linux, Apache, MySQL and PHP) software stack. As part of the partnership, MySQL AB and Zend are offering partner products to their respective customers, enabling easier product procurement and deployment for Web application infrastructures. The companies will also commit development resources to design product integration and compatibility modules for both vendors’ platforms. For more information visit: www.zend.com SAXY 0.3 SAXY is a Simple API for XML (SAX) XML parser for PHP 4. It is lightweight, fast, and modeled on the methods of the Expat parser for compatibility. The primary goal of SAXY is to provide PHP developers with an alternative to Expat that is written purely in PHP. Since SAXY is not an extension, it should run on any Web hosting platform with PHP 4 and above installed. This release allows CDATASection tags to be preserved, rather than converted to Text Nodes. For more information visit: http://www.engageinteractive.com/saxy/
php|a
Editorial: Contiuned from page 5 convinced us that, at the very least, one should be able to protect his sites from malicious usage, in the hope that all the other companies we rely on to maintain their software will do so in a serious way. Finally, we bring you three more articles that, we hope, will tickle your fancy. The first one, written by Eric Persson, shows you how you can build an SMS gateway using PHP and a few other inexpensive components. SMS is not yet very popular here in North America, but, judging from the amount of people I see glued to their cell phones whenever I visit my native Italy, it is very widely used in Europe. In his article on offline news management, Morgan Tocker writes about how PHP-GTK, that most hidden of PHP gems, can be used to improve content management by providing a proper GUI application that doesn't require you to completely rewrite all your code. Finally—last but not least-Wez Furlong picks up where his article from last month left off and delves into the deep bowels of the Zend Engine to show you how a PHP extension written in C can manipulate PHP arrays— it's not quite as easy as from a script... but close enough once you know what you're doing. Well, that's it for this month. By the time I will be writing my next editorial, I plan to be either boasting about my suntan or complaining about sunburn. Either way, you can expect me to report on our adventure on the high seas—until then, happy reading!
February 2004
●
PHP Architect
●
www.phparch.com
8
Write SMS Applications With PHP and Gnokii
F E A T U R E
by Eric Persson
SMS-shorthand for Short Message Service-is the standard used by cellular phone networks worldwide to allow their customers to exchange small text messages using their handsets. Despite its limitations, SMS is very popular with cell phone users-and it has rapidly become a widely-used bridge between the Internet and mobile users.
espite the fact that it sounds like some mysterious Italian pasta, Gnokii is really just a project aimed to develop tools and drivers for Nokia mobile phones-that is, software that makes it possible to control a Nokia phone physically connected to your server via a serial port. Gnokii works like the Nokia Data Suite, which is shipped with more advanced models from Nokia: you can use it to send SMS messages, edit contacts and so on—pretty much everything you normally do with your thumb on the phone's keypad. Gnokii itself is composed of many tools, including a set of GUI applications that facilitate the remote operation of the telephone; we are really only interested in a small subset of these tools called smsd, or SMS daemon, which provides an interface for rapid access to the phone's SMS capabilities. With the SMS daemon up and running, we can use PHP to interact with the phone, send and receive SMS messages and, of course, build whatever logic we need based on the content of the messages that we receive and send. In short, my goal with this article is to show you how to configure software and hardware so that you can get the same kind of service as you would normally obtain from a big company selling mobile services like SMS gateways— but at a fraction of the price.
three major components:
Major Components of the Final Application The final application that we will create throughout this article is a simple SMS server that awaits a message from a user and acts on its contents. It is made up of
PHP: 4.1 or higher OS: Unix/Linux Applications: Gnokii (http://www.Gnokii.org) Code: http://code.phparch.com/20/3 Code Directory: sms-gnokii
D
February 2004
●
PHP Architect
●
www.phparch.com
• A Nokia cell phone, which must be connected properly to the server. • The smsd application from the Gnokii package, which must, of course, be compiled and configured correctly. • The PHP scripts that provide the actual server functionality. The flow of the application will be as follows: • The user send an SMS message to the server. • The smsd daemon picks it up and automatically puts it into its database. • Our server scans smsd's database periodically for new messages. • When a new message arrives, its contents are examined and the server acts on them, for example by replying to the user with another message.
REQUIREMENTS
9
FEATURE
Write SMS Applications With PHP and Gnokii
Hardware needed When it comes to cellular communications, the bad thing about hardware is that it often costs a lot of money, but the goal of this project is precisely to provide a low-cost alternative, so the expenses associated with it should be quite reasonable. What you'll need in terms of hardware is a Nokia phone and a serial cable to hook it up to your server. I will, of course, expect that you already have a server and that it is capable of running the Gnokii tools and PHP. In my environment, I have used a Nokia 3310, which is quite new but not very expensive, and works perfectly for my needs. There are no "official" connection cables available for the 3310, but a company from the UK called Cellsavers (http://www.cellsavers.co.uk) have come up with a very ingenious serial cable with a connector that you can fit behind the battery on the phone. For those who don't know, there are 4 metal pins that are probably used by Nokia to install software and perform other programming on to the phone, and those nice folks at Cellsavers managed to figure out how to use them to control the phone through a serial port. There might be other companies supplying the same type of product, but I have not seen any around. Another important note about the hardware is that you will need to get a battery charger for the phone. One often comes with the package, and you can plug it in and leave the phone on forever without having to worry about the batteries. Installing Gnokii and smsd Before starting to install Gnokii and smsd, make sure you have MySQL installed and working properly on your server. Installing Gnokii is quite straightforward—it involves little more than the usual configure-make-make install steps. However, there are some configuration options that I find important. The first might be a matter of taste, but I like to place everything belonging to Gnokii in /usr/local/Gnokii. Therefore, I will use --prefix=/usr/local/Gnokii when invoking it. Next, the --without-x configuration switch indicates that we will not need to use the xgnokii GUI application to send SMS messages and manage the phone. If you want to take a look at the graphical tools, you can of course skip this parameter, but on a Unix server where you normally do not have Xwindows installed you'll get a whole lot of errors if you do so. The last parameter is --enable-security, which turns on a lot of security-related features in the package, like the ability to change the PIN number. I find them useful, so I usually turn them on. The resulting configure line will be as follows: ./configure --prefix=/usr/local/Gnokii --without-x --enable-security
February 2004
●
PHP Architect
●
www.phparch.com
Listing 1 1
Wez Furlong is the Technical Director of The Brain Room Ltd., where he uses PHP not only for the web, but also as an embedded script engine for Linux and Windows applications and systems. Wez is a Core Developer of PHP, having contributed SQLite, COM/.Net, ActivePHP, mailparse and the Streams API (and more) and is the "King" of PECL-PHP's Extension Community Library. His consulting firm can be reached at http://www.thebrainroom.net.
To Discuss this article: http://forums.phparch.com/124
27
The Need For Speed Optimizing your PHP Applications
F E A T U R E
by Ilia Alshanetsky
The ever growing popularity of the web is putting a continually growing stress on the software and hardware used to power the common website. This article will help you combat the growing server loads and increase your web serving capacity without resorting to costly hardware upgrades.
B
efore starting on our quest for performance, let me pass along a small word of caution. Making your applications faster is certainly a noble goal but, unfortunately, it will often require a fair bit of time and frequently expose or introduce bugs. It is absolutely critical that you do not begin optimization prematurely, as doing so will virtually guarantee that deadlines will be missed and that the likelihood of ending up with a working program will be slim. Only optimize your applications once the code has been completely written, tested and deemed acceptable, and always set specific performance levels you seek to attain. Without a specific goal, you can just keep on optimizing forever, as there will always be some other tricks and tuneups you could apply. Now that we've gotten the standard optimization disclaimer out of the way, let's get to the fun part— doing the actual work. While you can certainly gain significant performance increases from optimizing your PHP code, this is usually one type of an optimization you would want to leave till the very end when all other options are exhausted. Optimizing the actual script can be a fairly drawn out process and there is always a risk of breaking working code. Whenever possible, it is always better to optimize things outside of your code that will have a positive impact on the performance of your applications. As you can probably guess, the focus of this article will be optimizations that do not actually require code modification and still make your PHP applications run much faster.
February 2004
●
PHP Architect
●
www.phparch.com
Getting Started The first step consists of optimizing the PHP executable itself, which will make all the scripts executed by it run faster. This can be done by making your C compiler, such as gcc, work harder when compiling PHP and tune the binary executable it generates for maximum performance. This optimization is performed by specifying several settings to the compiler via the CFLAGS environment variable. This variable, in turn, is used by the configuration script, which then passes these values on to the compiler at build time. It is important to note that while I am mentioning these options only in the context of PHP, these optimization flags are applicable to all parts of the system—and the more efficient the system, the faster it will be able to run everything, including your PHP applications. Below is an example of a modified PHP building procedure, which leaves room for compile-time tuning. export CFLAGS="-O3 -msse -mmmx -march=pentium3 \ mcpu=pentium3 -mfpmath=sse -funroll-loops" ./configure make make install
REQUIREMENTS PHP: 4.1+ OS: N/A Applications: Optional: Turck Mmcache, APC, PHP Accelerator, Zend Cache
28
FEATURE
The Need For Speed: Optimizing your PHP Applications
What do these options do? The first one, -O3, indicates what level of optimization the compiler should use. Normally, PHP uses only -O2, which is considered to be "safe", as too much optimization can cause stability issues. However, given the evolution of compilers, -O3 is, in my experience, just as safe and many projects have already adopted it as their default optimization level. The main difference between the two is that -O3 enables function inlining, which allows the compiler to optimize out some functions by replacing function calls with a copy of their code. Another optimization technique that is enabled by -O3 is register renaming, which allows the compiler to take advantage of unused registers for various tasks; this is very handy on modern processors with large numbers registers that are frequently left unused. The downside of -O3 is that it makes the generated code nearly impossible to debug, since the register rearrangement creates a situation where a valid backtrace in the event of a crash cannot be generated. However, since you should not encounter crashes in a production environment, this is a fairly acceptable loss in most situations. In our compilation script above, we have a set of options that tell the compiler in a fair bit of detail about what processor the server has and what features it supports. This allows the compiler to apply various tricks and optimizations that are specific to a particular CPU (a Pentium III in our case). This is not normally done when producing binaries for distribution, since the goal is to generate portable code that can run on as many models of CPUs for a particular architecture as possible. Of course, enabling CPU-specific targeting means that the portability of the generated binary will be limited to a single processor type. For example, code tailored for the Pentium III via the -march and -mcpu switches (such as the one in my example) will not work on older Pentiums and AMD processors. If you are compiling PHP for a server farm that uses all types of CPUs, you may not want to use CPU tailoring options as they would require you to compile a separate PHP executable for every CPU type. The other three options, -msse, -mmmx and -mfpmath=sse, indicate that my processor supports these extended instruction sets and tells the compiler it should try to use them to generate a more optimal code. SSE and MMX are primarily math-related instructions sets and their usage can significantly accelerate any mathematical operations the underlying C code needs to perform. The last option I specify, , tells the compiler that it should unroll any small loops. The effect is the reductions in the number of instructions the processor needs to execute, since there is no more loop. February 2004
●
PHP Architect
●
www.phparch.com
However, the resulting binary will be slightly larger since instead of a single instance of the code in the loop, you'll now have the code inside the loop repeated as many times as the loop would have ran. Configuring PHP Properly Now that we have set our compiler options, let's review the configuration of PHP itself, as that, too, can have significant impact on performance. In most cases, PHP is used for serving web pages, usually as an Apache server module. The standard approach is to compile PHP as a shared Apache module that the web server then loads on startup. This is the recommended approach, as it allows for easy PHP upgrades that do not require recompilation of Apache. However, this is most definitely not the most performance-friendly approach. When generating a dynamically loadable module, the linker will add a series of hooks to allow the module to be loaded, which, among other things, does not allow the compiler to optimize the generated code to the fullest. The end result is that the compiled PHP executable is anywhere between 10% and 25% slower than it would be had it been compiled statically into Apache. # PHP configure line ./configure --with-apache=/path/to/apache_source # Apache configure line ./configure --activatemodule=src/modules/php4/libphp4.a
The configuration procedure above will compile PHP directly into Apache, making PHP part of the Apache server executable. As you can image, this means that upgrades of Apache or PHP will require you to recompile both packages. However, given the infrequent releases of both projects and relative quick compilation, the extended build procedure is more than made up for by the performance increase. You can speed up the increase in compilation time caused by the static compilation by reducing the number of extensions PHP compiles—and that will also increase performance. By default, PHP compiles a number of extensions that you may never use and that, in the end, only increase the size of your PHP binary, causing it to use more memory. Worse yet, some extensions will initialize various buffers and parameters on every request, slowing down the data serving process. You should try to compile only the extensions you need and disable extensions that you do not intend to use. ./configure \ --disable-all \ --disable-cgi \ --disable-cli \ --with-apache=/path/to/apache_source \
29
FEATURE
The Need For Speed: Optimizing your PHP Applications
--enable-session \ --with-pcre-regex \ --with-pgsql \
The example above uses the -disable-all configuration flag to disable all extensions that are enabled by default in one go, saving the time needed to find all of the default extensions and disable them. It also will automatically disable all newly enabled-by-default extensions should any appear in the future without having to manually go through the configuration. The -disable-cgi and -disable-cli configuration directives explicitly disable the generation of the CLI and CGI SAPIs, whose compilation is not automatically disabled by the -disable-all flag. Since only the Apache SAPI is needed, there is no need to waste time building binaries that will not be used. Once all the unneeded SAPIs and extensions have been disabled, the needed extensions are enabled and the compilation process can begin. The end result is a smaller binary, which is especially important for SAPIs such as CGI and CLI where the startup costs occur on every request. A smaller binary will load that much faster allowing it to get to code processing quicker. More importantly, unneeded initializations will not be performed, making PHP work faster in all instances, regardless of the underlying SAPI. Optimizing the INI File With the PHP configuration and compilation out of the way, it's time to turn to the PHP.INI configuration directives, which can be used to improve the overall performance of your scripts as well. I'll begin with the register_globals option, which is already off by default as of PHP 4.2.0. However, many people still have it enabled, since their configuration was never updated as they upgraded their versions of PHP. This option makes PHP register a potentially large number variables based on user and system input, as well as making certain security exploits possible. It is is recommended to keep this option off and use the readily available super-globals to access the data passed by the user through POST and GET queries or browser cookies. You can further optimize the process of creating variables based on user input by changing the variables_order directive. It indicates which source of client-generated information should be used to populate the superglobals, as well as in which order they should be considered when building $_REQUEST, which is a cumulative result of the contents of other superglobals. By default, this option has a value of EGPCS, meaning that data from the system environment, the server environment, as well as user GET/POST/COOKIE input is stored. Storage and creation of array elements inside super-globals can take a hefty amount of mem-
February 2004
●
PHP Architect
●
www.phparch.com
ory and will have a negative impact on performance as this process is repeated during every single request. Therefore, you can improve the overall performance of your system by reducing the number of super-globals that are being created. In most situations, this means that you can set the value of variables_order to just GPC, so that only the data passed by the user in the GET/POST queries or through cookies is stored inside super-globals. The effect of this choice is a much faster input parsing procedure and a smaller memory footprint. If you need to use environment or system parameters, you can fetch them individually using the getenv() function instead, which will not cause a consistent performance impact. Beyond the standard super-globals, PHP also creates special variables that are used to store data that is passed via the command line. In a web environment, your PHP scripts will never be passed arguments in such a manner and, therefore, creating those variables is not necessary. You should disable register_argc_argv, which is the PHP setting responsible for the creation of these variables, to further speed up your scripts. Keep in mind that, if you use the CLI SAPI, you will need to leave this option enabled, otherwise your scripts will not be able to retrieve arguments passed to them via the command line. When parsing user input, PHP automatically escapes the data to prevent the user from injecting special characters that can potentially result in an undefined behavior in certain portions of your scripts. This automation is not always needed, since not all data fetched from the user is used in such a manner that provides a chance for special characters to cause trouble. It would be better to disable this automation by turning off the magic_quotes_gpc directive and manually escape the data as needed using addslashes(), or using whatever is the most appropriate escaping function for the situation. For example, in some cases, you need to use special escaping functions that are specifically tailored to secure data in a particular context, such as escapeshellcmd() for command lines and mysql_escape_string() for MySQL queries. The advantages of doing your own escaping are numerous: first of all, you only escape what you need, thus reducing the amount of time PHP spends parsing user input. You also save memory, as the escaping process will allocate twice as much memory to store an escaped string than it would normally for an unescaped one. Moreover, you also get a betterdesigned application that does not depend on a particular server configuration and is capable of working securely in an environment where magic_quotes_gpc is disabled. Beyond variable creation there are a number of
30
FEATURE
The Need For Speed: Optimizing your PHP Applications
other INI settings that are important for optimization purposes. By default, every PHP request is prefixed with an X-Powered-By header, which shows that what version of PHP you are running. For the purposes of rendering the page, this header is completely useless and, unless the user fetches the headers manually, it will never even be visible. In fact, just about the only people who can make use of this field are those trying to compromise your system and for that purpose need to determine what software is being run on it. It would be prudent, therefore, to disable sending of this header by setting the expose_php setting to off. Not only will this make a potential attacker's job more difficult, but it will also save a little bit of bandwidth and slightly increase performance by not sending useless data over the connection with your client. Speaking of sending data across the wire to your users, this is another area where proper INI configuration can be of much use. By default, PHP will print the
data to the user as soon as your script outputs it, resulting in many write operations, each sending a small bit of data to the socket. This can become quite slow, especially for large pages, since many system calls will need to be performed to write the data and at least some browsers will re-render the page each time a small chunk of data is received, making the user's experience less than pleasant. The alternative is to buffer the data in memory and send it in large chunks, thus reducing the number of writes to the socket and potentially speeding up the rendering time on the client. Output buffering can be enabled and controlled via the output_buffering option, which allows you to specify how big the memory buffer used to store a script's output should be. Ideally, you would want this buffer to be about the same size as the average page you send to your clients; this way, your average script output can be sent across the wire in one large chunk.
Figure 1
February 2004
●
PHP Architect
●
www.phparch.com
31
FEATURE
The Need For Speed: Optimizing your PHP Applications
At the same time, you should be careful not to create overly large buffers, as each PHP instance will have a buffer of its own—and, with many instances running at the same time, this can add up to quite a few megabytes, potentially exhausting all available memory. Another solution that can accelerate the process of sending data to the user is compression. PHP supports a GZIP-compressed output buffer handler that can be used to compress the data sent to the user in a manner that is automatically recognized by most modern browsers. For those users with compatible browsers, compression will reduce the size of the page many times over. The decrease in page size is especially convenient for users with slow connections, for whom this technique can shave off several seconds from the time it takes to load each page. In addition, faster data transmission allows server processes to be freed earlier, which, in turn, makes it possible for your server to handle a greater number of requests in any given timespan. Another pleasant side effect (on a very large scale) is the reduced bandwidth bill; I have seen bandwidth usage cut by as much as 40-50% by simply introducing compression. Better yet, implementing this feature does not require any code modification and it can be enabled by simply setting output_handler to ob_gzhandler inside the php.ini file. Alternatively, you can enable it for individual virtual hosts inside httpd.conf or specific directories via .htaccess, or even via ini_set() inside scripts that output large quantities of text. You should, however, keep in mind that compressing the data does require CPU power, and will increase the server load slightly. However, in most cases the benefits of faster loading pages, minimized bandwidth usage and reduced number of server processes will outweigh the inevitable slight increase in CPU usage. On occasion, you may find yourself using PHP not only to send data, but also to retrieve it from a remote source (for example, when implementing a network client like an e-mail application that has to retrieve messages from an IMAP server). In these situations. it is important to keep in mind that the Internet is not a local storage medium, and getting data out of it can be quite slow. You probably don't want to spend too much time waiting for the external source to respond to your query, or you may run the risk of hogging down your whole server. To prevent endless waiting, you should use the default_socket_timeout setting, which allows you to define how many seconds PHP should wait before giving up on fetching data from a remote source. This is especially important in a web environment, since while your script is waiting for data its web server instance cannot be used to serve other requests,
February 2004
●
PHP Architect
●
www.phparch.com
potentially requiring the creation of additional processes and resulting in an increased server load. In addition to remote sockets, you are likely to be working with local sockets in the form of database connections. Tuning your connection parameters is a very important step that will prevent connection overload, which may result in a performance drop and refused connections leading to broken pages. I recommend that you use the max_links and max_persistent options that exist for most database interfaces to specify how many connections PHP may keep open at any one time. By default, these options are set to -1 (unlimited), which in most situations is not a good idea, since it could lead to PHP trying to open more connections than your database server can handle. This setting is especially important when using persistent connections, which in an Apache environment will soon result in each child having their own connection open to the database. It is absolutely critical to ensure that there are strict controls to prevent persistent connections from taking up all possible database sockets, thus causing the DB server to refuse all other connections. In many instances (for example, if you run a shared host), it may be prudent to disable persistent connections altogether via the allow_persistent directive. This will automatically convert all attempts to open persistent connections into regular connections and help preventing a possible overload on your server. PHP's INI settings include several directives that limit the operations that PHP can perform, such as the ability to access and manipulate files and the amount of memory allocated by the interpreter. These settings are quite useful in a shared environment, where you want to keep a tight leash on your users to ensure that they are not abusing the system but, in a dedicated environment where you control a majority (if not all) of the PHP code executed by the interpreter, they only serve to slow down often-used functionality. Thus, for performance reasons it is better not to use the safe_mode, open_basedir and memory_limit directives in dedicated environments; the checks performed by PHP to enforce them are quite expensive and can lead to significant performance losses if enabled. Beyond the Configuration Besides optimization tricks and configuration tuning there are several other methodologies that can improve the performance of PHP applications without actually having to dabble in the application's source code. The first and foremost of these tools is an opcode cache, sometimes referred to as a "PHP compiler", although the term is really misused. Under normal circumstances, before the PHP script can be ran it must
32
FEATURE
The Need For Speed: Optimizing your PHP Applications
first be parsed and converted to a series of instructions (opcodes) that the Zend Engine can understand. This is a fairly fast process, but in large scripts with many include files it can take up a significant amount of time. Even in smaller applications, reading the PHP script from disk and parsing it every single time before execution can add up. It is quite wasteful, since for the most part the scripts rarely change between executions and there is really no need to parse the code from scratch every single time. This is where an opcode cache comes in. Instead of repeated parsing, the generated instructions are stored inside shared memory (or on disk), so that further access to the script does not require reparsing. Additionally, because the opcodes are often stored directly in memory, file system operations are reduced to a simple check to determine whether or not the script has changed since it was cached, thus further improving performance. Most opcode cache implementations—and there are several of them on the market nowadays-go even further and actually optimize the opcodes before storing them. During the traditional compilation process, the PHP parser tries to speed up the opcode generation process and does not always generate the most optimal instructions for the Zend Engine to execute. With an opcode cache, since the parsing is only done once, it makes sense to spend some time analyzing the generated opcodes and optimizing them so that their execution can be as fast as possible. The end result is that, with an opcode cache in place, you may see your PHP's performance improve anywhere between 40-600%. As far as opcode caching products go, for the most part all available solutions offer just about the same level of performance, with some minor differences. My current favorite is Turck-MMcache (http://turckmmcache.sourceforge.net/), which was originally developed by Dmitry Stogov. This particular compiler comes with a particularly efficient opcode caching mechanism and a powerful optimizer that in most cases can allow you to squeeze in a few extra requests per second compared to its competition. This cache also includes a few other features, such as a memory session handler and a content caching mechanism, which can be used to further improve the performance of your PHP applications. Unfortunately, at this time Dmitry is unable to dedicate time to the project and the development of MMCache has stalled. However, a number of volunteers have promised to continue maintaining the project and hopefully will pick up where Dmitry left off. The Zend Performance Suit (ZPS) is a commercially available PHP acceleration package offered by Zend that also implements an opcode cache and an opti-
February 2004
●
PHP Architect
●
www.phparch.com
mizer as well as content caching capabilities. The big plus of ZPS is that it is designed with both experienced and novice users in mind and provides a very powerful and user friendly interface to its components. This is especially useful when configuring content caching, which in Mmcache, for example, can require a bit of manual labor and testing. However, unlike MMcache, ZPS is not free. Its licensing model starts at about $499 per server, which may put it out of the price range of small site operators. Aside from ZPS, there is also APC, an Open Source initiative that has made big strides in the past year. Its performance is similar to that of ZPS and MMcache, but the lack of a good optimizer makes it a little slower in certain situations. Given its active development, however, there is little doubt that it will eventually be able to match the capabilities of the other implementations. I should also mention the IonCube PHP Accelerator, which was one of the original free opcode cache implementations. It still works quite well with PHP 4.3 series, but has not had any new visible developments in over a year and consequently does not perform as well as MMCache or APC in most situations. A Hidden Cache Regardless of whether or not an opcode cache is used, most scripts will still perform a fair number of file system operations. These can become a major bottleneck, because, while processor and memory speeds keep increasing, hard-drive speeds remain quite slow. It does not take much to reach the maximum read or write speed of a drive, which is usually just a few dozen megabytes per second. For ultimate performance, it is best to eliminate all filesystem operations. While this may seem like an impossible goal, a wonderful invention called a "ramdisk" makes it attainable without much effort. A ramdisk is really little more than the emulation of a hard-drive in memory; as far as programs (including your PHP scripts) are concerned, it appears to be just another run-of-the-mill disk partition. However, the data written in a ramdisk is actually stored directly in the system's memory, where data throughput is measured in hundreds of megabytes per second. Nearly all operating systems support ramdisks, but Linux actually goes a step further and allows for it to be bound to a physical drive or directory. This means that, while you get all the benefits of writing and reading data to memory, you also do not risk losing that data in the event of a system crash or reboot, since the kernel will automatically synchronize it back to the physical drive as needed. Incidentally, it's also very easy to turn on this feature-all you need is someone with root access and a few spare minutes:
33
FEATURE
The Need For Speed: Optimizing your PHP Applications
mount --bind -ttmpfs /tmp /tmp mount --bind -ttmpfs /home/webroot /home/webroot
The example above binds two commonly used directories, the temporary directory (frequently used for session storage and other common operations) and the directory where web site files can be found. The end result is that virtually all file operation commonly performed by PHP are accelerated through the reduction in the file I/O overhead. At the same time, reliability is not sacrificed for the sake of performance, making this an ideal solution even for the most demanding of websites. The only downside of this speed-up is that the ramdisk uses your memory and, therefore, binding large directories can eat up quite a bit of space that would otherwise be available to your applications. Thus, you need to exercise a bit of caution to ensure that directories mapped to ramdisks do not end up consuming all available memory and force the operating system to use its much slower swap memory facilities. And We Didn't Even Touch a Line of Code! As you've probably by now realized, there are many ways to improve the speed of PHP applications with-
February 2004
●
PHP Architect
●
www.phparch.com
out having to perform potentially dangerous code changes. Equally important is the fact that the changes for the most part require very little time to implement and can result in massive performance improvements. This does not mean that you should abandon the practice of optimizing the code itself, which is, of course, an important tool for making your applications faster. However, when time is of the essence and the pressure is on, it is always good to know a few tricks to make the code run faster without having to tinker with it.
About the Author
?>
Ilia Alshanetsky is an active member of the PHP development team and is the current release manager of PHP 4.3.X. Ilia is also the principal developer of FUDforum (http://fud.prohost.org/forum/), an open source bulletin board and a contributor to several other projects. He can be reached at
[email protected].
To Discuss this article: http://forums.phparch.com/128
34
SQLyog
P R O D U C T
R E V I E W
www.Webyog.com by Eddie Peloke
I
have never been a fan of administering MySQL databases via the command line. The output of queries is difficult to read and, unless you are a MySQL expert, you need to keep a manual at your side for all the necessary syntax. That being said, as soon as I began developing with MySQL, I quickly looked for a GUI-based administration tool to speed up the development process. I eventually found MySQL Front and have used it ever since—I have tried many other administration tools over the past few years but MySQL Front has always been my favourite choice. However, all that changed as soon as I had the opportunity to use SQLyog. SQLyog is a MySQL GUI tool presented by Webyog.com. Webyog.com describes it as an “easy to use, compact and very fast graphical tool to manage your MySQL database anywhere in the world.” Let’s see how it does in this part of the world. The Details The SQLyog version I reviewed is 3.63, tested on a Windows machine. SQLyog is currently not available for Linux or MacIntosh, but don’t worry, a product called SQLyog Max is in the works and will include Linux and Mac support. SQLyog includes a very wide array of functionality that is certain to make even the most hard-core command-line fans happy:
QUICK FACTS Description: A. SQLyog is a very fast, compact and simple to use GUI tool to manage your MySQL server. The software is primarily for the users who work with MySQL during the development process. Like MySQL, SQLyog also follows the principle of the 14th century philosopher monk Occam. We follow his rule known as Occam’s razor: No complexity beyond what is necessary. Supported platforms: • Windows: 98, 2000, XP Price: 1-9 licenses: $49/license 10-49 licenses: $39/license Site License: $695 Site License: $395 ( Educational / Non Profit Organizations )
Download Page: http://www.webyog.com/sqlyog/download.html Product Homepage http://www.webyog.com/sqlyog/index.php
• It is compatible with MySQL 4.1, fully InnoDB compliant and supports very fast
February 2004
●
PHP Architect
●
www.phparch.com
36
PRODUCT REVIEW data retrieval operations. • It can import data from an ODBC source, with the option to import data through a query. It is also capable of copying entire databases from one server to another. • A schema and data synchronization tool is included to provide manual replication of database contents. • You can use it to schedule various jobs for automatic execution at a later date. • It provides fast client-side sorting and filtering. • It can execute multiple queries returning more than 1000s of rows per result set. It was written entirely in C/C++/Win32 APIs using native MySQL C APIs. • You can drop all tables of a database with a single click. • It allows you to edit BLOBs with support for Bitmap/GIF/JPEG formats. • You can profile queries for performance analysis • Despite being based on a GUI interface, it is
SQLyog
very keyboard friendly—you can access 99% of the features of SQLyog with the keyboard. • It allows access to MySQL’s running statistics, and can view and kill other user processes. You can also perform table diagnostics (check, optimize, repair, analyze). • You can use it to change table-types to ISAM, MYISAM, MERGE, HEAP, InnoDB, BDB. Now that we have the details out of the way, let’s load it up and see exactly what SQLyog can do. For those of you playing along at home, there is a 30 day, trial version of SQLyog available for download from their website. Within five minutes from the end of my download, I had SQLyog installed and was working with my databases. The Layout SQLyog’s layout is very clean and uncluttered. The application consists of three main working panes. The left-hand pane gives you a tree view of your databases. You can expand each database to show tables, columns and indexes. There are several right click options,
Figure 1
February 2004
●
PHP Architect
●
www.phparch.com
37
PRODUCT REVIEW depending on which part of the tree you are located. Sitting on the database, you can right click to alter the table structure, manage indexes, manage relationships, import data, export data, view data, and so on, while highlighting the individual columns gives you the right click option to drop the column or manage the column’s indexes. All of these options are also available via the application’s main menu, but I found it convenient to have them as right-click pop-up menus. The right hand top pane is the query pane, where you can type your select queries, table alterations, insert statements, and so on. Unfortunately, the query pane will automatically highlight the basic MySQL syntax, but not any of your table or column names. This isn’t a necessity, of course, but I have found it helpful with other db tools such as Toad for Oracle. The query pane does, however, give you some nice right-click options—one of the most interesting I found was the use of templates. SQLyog comes with a list of predefined MySQL statements such as ‘alter table’, ‘create indexes’, and others; clicking on any of the templates will pop the statement into the query editor, where you can then simply add your parameters such as table name, columns to be affected, and so on. Once executed, the results appear in a nice “Excel-style” tabular pane with column headers, as it is the case with most GUI database tools. The interesting thing about SQLyog’s results pane is that it does more than simply
SQLyog
show you the query output—you can view any messages returned by MySQL, view the objects connected to a selected table or database, and view your query history. Can We See the Menu Please? SQLyog’s menu contains many of the same items that can be accessed via right clicks or through the top toolbar, so that you can really perform a variety of operations, like executing queries, table diagnostics, and structure synchronization, from both places. What I like most about the menu and toolbar is their nice, clean, well organized layout. I was able to quickly find the tools and options needed to do just about any database-related task I needed. Some of the more interesting items in the menu are the Database Synchronization tool, Structure Synchronization tool, SQLyog's Job Agent and the HTML Schema tool. Using the database and structure synchronization tools will allow you to, as the names imply, synchronize the data or structure of two MySQL databases. These can be very helpful tools if you are working with separate development and production databases. Along with the synchronization tools, SQLyog ships with the SQLyog Job Agent (available free for Linux users), which allows you to schedule your synchronization tasks. This can be very helpful if you need to synchronize databases on a regular basis.
Figure 2
February 2004
●
PHP Architect
●
www.phparch.com
38
PRODUCT REVIEW The HTML Schema tool is another nice feature, as it allows you to quickly create an HTML representation of your db. The created schema shows table structures with columns and indexes. The generated schema also contains hyperlinks, which allow you to quickly find specific tables. Moving Data in and Out Any nice thing about SQLyog is its array of options to export data in and out of your databases. You can export data as batch scripts, export query results as XML, CSV (you can select the terminator) or HTML. Choosing the HTML option will create an HTML page with all of your data presented in HTML tables. The import functions allow you to run batch files, import from CVS and of course execute SQL scripts. SQLyog also contains an "Import Wizard," which gives you the power of importing from other ODBC data sources, such as Oracle, MSSQL, DB2 and Access. If you're porting your data to MySQL from external sources, the Wizard can help you cut down the migration time significantly.
SQLyog
The Future The future of SQLyog looks bright, thanks to the announcement of an SQLyog Max release sometime in Mid 2004. SQLyog Max will be a complete re-write of SQLyog to support multiple operating systems including Windows, Linux, Mac and *nix. According to Webyog, there are several new features to look out for in the upcoming release: • Full Multithread Support-Multiple queries can be executed simultaneously. Queries can be terminated in the middle of execution. • Unicode and Internationalization-Fully Unicode compliant, the new version will display Unicode data (MySQL 4.1) correctly. SQLyog would be available in multiple languages. • High Performance Editor-The new, highly scalable editor will allow the editing of very large file without loss in performance, as well as provide support for ToolTips, autocompletion and syntax highlighting for a
Figure 3
February 2004
●
PHP Architect
●
www.phparch.com
39
PRODUCT REVIEW variety of languages, such as PHP, HTML, XML, Perl and Python. • Tabbed Interface-A new, Visual Studio.NETlike tabbed interface, will support multiple documents, and allow you to get rid of modal dialogs in important operations like Data Editing and Table Structure Editing. • More polished interface, with new icons in menus and dialogue windows, as well as an improved toolbar and context-sensitive help. • for latest MySQL-SQLyog Max will also be compatible with the latest version of MySQL and with the new Stored Procedure Editor in MySQL 5.x.
SQLyog
What I Didn't Like I have not really found anything yet that I don't like about SQLyog. Many of the items on my 'would be nice to have' list should be addressed with the release of SQLyog Max-the E-R diagramming tool and the ability to start and stop MySQL from SQLyog Max will be welcome additions. The one main area, however, that could be improved is the help system. Right now, the help simply consists of HTML files, which do a good job of helping with SQLyog, but do not function well as a reference when you're in a bind. It would just be nice to see a searchable-index approach, as well as more MySQL-related help.
Along with some of the new features, SQLyog Max will be given a few enhancements over SQLyog that should help to make it a nice well-rounded database tool. These include a 'Query Builder', an EntityRelationship diagramming tool, and the ability to shutdown and start up MySQL, and will provide shot in the arm that SQLyog Max needs to become a top MySQL GUI administration tool.
Conclusion Overall, I really like SQLyog. I use it daily in my PHP/MySQL development and it is now my primary database tool. It is easy enough to use for the beginner, but has enough options for the professional MySQL developer. The lack of some advanced features, like the diagramming tool, keep it from getting five stars but it is a workhorse that I will not give up easily.
What I liked There are many things I liked about SQLyog. It is very easy to use and I found it quick and responsive. The menu is nicely laid out and organized with many of the options only a click away. Options like the synchronization tools and table diagnostics (you can select to optimize, check, analyze, or repair your selected tables) are nice features that can make a developer's life much easier. Out of all the tools I have tried, this has become my MySQL tool of choice.
php|a
Figure 4
February 2004
●
PHP Architect
●
www.phparch.com
40
Profiling PHP Applications
F E A T U R E
by George Schlossnagle
I
f you program PHP professionally, there is little doubt that, at some point, you will need to improve the performance of an application. If you work on a hightraffic site, this might be a daily or weekly endeavor for you; if your projects are mainly intranet ones, the need may arise less frequently. At some point, though, most applications need to be "retuned" in order to perform as you want them to. When I'm giving presentations on performance tuning PHP applications, I like to make the distinction between tuning tools and diagnostic techniques. Among the tuning tools are caching methodologies, system-level tunings, database query optimization, and improved algorithm design. I like to think of these techniques as elements of a toolbox, like a hammer, a torque wrench, or a screwdriver are elements of a handyman's toolbox. Just as you can't change a tire with a hammer, you can't address a database issue by improving a set of regular expressions. Without a good toolset, it's impossible to fix problems; without the ability to apply the right tool to the job, the tools are equally worthless. In automobile maintenance, choosing the right tool is a combination of experience and diagnostic insight. Even simple problems benefit from diagnostic techniques. If I have a flat tire, I may be able to patch it, but I need to know where to apply the patch. More complex problems require deeper diagnostics. If my acceleration is sluggish, I could simply guess at the problem and swap out engine parts until performance is acceptable. That method is costly in both time and materials. A much better solution is to run an engine diagnostic test to determine the malfunctioning part.
February 2004
●
PHP Architect
●
www.phparch.com
Software applications are in general much more complex than a car's engine, yet I often see even experienced developers choosing to make "educated" guesses about the location of performance deficiencies. During the spring 2003, the php.net Web sites experienced some extreme slowdowns. Inspection of the Apache Web server logs quickly indicated that the search pages were to blame for the slowdown. However, instead of profiling to find the specific source of the slowdown within those pages, random guessing was used to try to solve the issue. The result was that a problem that should have had a one-hour fix dragged on for days as "solutions" were implemented but did nothing to address the core problem. Thinking that you can spot the critical inefficiency in a large application by intuition alone is almost always pure hubris. Much as I would not trust a mechanic who claims to know what is wrong with my car without running diagnostic tests or a doctor who claims to know the source of my illness without performing tests, I am inherently skeptical of any programmer who claims to know the source of an application slowdown but does not profile the code. This article focuses on using the APD profiler for PHP to profile code. APD is a Zend extension, meaning that
REQUIREMENTS PHP: 4.x Or 5.x OS: Any Applications: N/A Code: http://code.phparch.com/20/1 Code Directory: profile
41
FEATURE
Profiling PHP Applications
it hooks deep into PHP itself to get accurate and lowcost performance measurements. Although products like Xdebug and DBG provide some profiling capabilities, APD offers the most comprehensive profiling capabilities.
-RSort by real time spent in subroutines (inclusive of child calls). -sSort by system time spent in subroutines. -SSort by system time spent in subroutines (inclusive of child calls). -uSort by user time spent in subroutines. -USort by user time spent in subroutines (inclusive of child calls). -vSort by average amount of time spent in subroutines. -zSort by user+system time spent in subroutines. (default)
Installing and Using APD APD is part of PECL and can thus be installed with the PEAR installer:
Display options -cDisplay Real time elapsed alongside call tree. -iSuppress reporting for php built-in functions -mDisplay file/line locations in traces. -OSpecifies maximum number of subroutines to display. (default 15) -tDisplay compressed call tree. -TDisplay uncompressed call tree.
# pear install apd
After ADP is installed, you should enable it by setting the following in your php.ini file: zend_extension=/path/to/apd.so apd.dumpdir=/tmp/traces
APD works by dumping trace files that can be postprocessed with the bundled pprofp trace-processing tool. These traces are dumped into apd.dumpdir, under the name pprof.pid, where pid is the process ID of the process that dumped the trace. To cause a script to be traced, you simply need to call this when you want tracing to start (usually at the top of the script): apd_set_pprof_trace();
APD works by logging the following events while a script runs: • When a function is entered. • When a function is exited. • When a file is included or required. Also, whenever a function return is registered, APD checkpoints a set of internal counters and notes how much they have advanced since the previous checkpoint. Three counters are tracked: • Real Time (a.k.a. wall-clock time)—the actual amount of real time passed. • User Time—the amount of time spent executing user code on the CPU. • System Time—the amount of time spent in operating system kernel-level calls. After a trace file has been generated, you analyze it with the pprofp script. pprofp implements a number of sorting and display options that allow you to look at a script's behavior in a number of different ways through a single trace file. Here is the list of options to pprofp: pprofp Sort options -aSort by alphabetic names of subroutines. -lSort by number of calls to subroutines -rSort by real time spent in subroutines.
February 2004
●
PHP Architect
●
www.phparch.com
The -t and -T options, which allow you to display a call tree for the script and the entire field of sort options, are particularly interesting. As indicated, the sort options allow for functions to be sorted either based on the time spent in that function exclusively (that is, not including any time spent in any child function calls) or on time spent, inclusive of function calls. In general, sorting on real elapsed time (using -r and -R) is most useful because it is the amount of time a visitor to the page actually experiences. This measurement includes time spent idling in database access calls waiting for responses and time spent in any other blocking operations. Although identifying these bottlenecks is useful, you might also want to evaluate the performance of your raw code without counting time spent in input/output (I/O) waiting. For this, the -z and -Z options are useful because they sort only on time spent on the CPU. A Tracing Example To see exactly what APD generates, you can run it on the simple script shown in Listing 1. Figure 1 shows the results of running this profiling with -r. The results are not surprising of course: sleep() takes roughly 1 second to complete. Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
42
FEATURE
Profiling PHP Applications
(Actually slightly longer than 1 second—this inaccuracy is typical of the sleep function in many languages; you should use usleep() if you need finer-grain accuracy). The hello() and goodbye() functions are both quite fast. All the functions were executed a single time, and the total script execution time was 1.0214 seconds. To generate a full call tree, you can run pprofp with the -Tcm options. This generates a full call tree, with cumulative times and file/line locations for each function call. Figure 2 shows the output from running this script. Note that in the call tree, sleep() is indented because it is a child call of hello(). Profiling a Larger Application Now that you understand the basics of using APD, let's employ it on a larger project. Serendipity is opensource weblog software written entirely in PHP. Although it is most commonly used for the weblogs of private individuals, Serendipity was designed with large, multiuser environments in mind, and it supports an unlimited number of authors. In this sense, Serendipity is an ideal starting point for
a community-based Web site that offers weblogs to its users. As far as features go, Serendipity is ready for that sort of high-volume environment, but the code should first be audited to make sure it will be able to scale well—and a profiler is perfect for this sort of analysis. One of the great things about profiling tools is that they give you easy insight into any code base, even one you might be unfamiliar with. By identifying bottlenecks and pinpointing their locations in code, APD allows you to quickly focus your attention on trouble spots. A good place to start is profiling the front page of the Web log. To do this, the index.php file is changed to a dump trace. Because the Web log is live, you do not generate a slew of trace files by profiling every page hit, so you can wrap the profile call to make sure it is called only if you manually pass PROFILE=1 on the URL line: