VOLUME III - ISSUE 9
SEPTEMBER 2004
TM
www.phparch.com
The Magazine For PHP Professionals
Plus: Tips & Tricks, Product Reviews, Security Corner and much more...
This copy is registered to: livia carboni jackson
[email protected] TABLE OF CONTENTS
php|architect Departments
5
TM
Features
Editorial Welcome to the Family!
9
Debugging Questions and Xdebug Answers
I N D E X
by Derick Rethans
6
What’s New! 24
Open != $Free by Mark Evans
36
IonCube Encoder Windows GUI Product Review by Peter B. MacIntyre
29
File Fixing Robots by Ron Goff
59
Security Corner Secure Design By Chris Shiflett
40
The Ultimate PHP 5 Shopping Cart Part 2: Taking the Customer’s Money by Eric David Wiener
62
Tips & Tricks By John W. Holmes
52
Cheap Manpower Calling External Programs from PHP Scripts by Michal Wojciechowski
65
exit(0); Exam Under the Microscope by Andi Gutmans and Marco Tabini
September 2004
●
PHP Architect
●
www.phparch.com
3
You’ll never know what we’ll come up with next EXCLUSIVE!
For existing subscribers
Subscribe to the print edition and get a copy of Lumen's LightBulb — a $499 value absolutely FREE †!
Upgrade to the Print edition and save!
In collaboration with:
Login to your account for more details.
† Lightbulb Lumination offer is valid until 12/31/2004 on the purchase of a 12-month print subscription.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
The Magazine For PHP Professionals
php|architect Subscription Dept. P.O. Box 54526 1771 Avenue Road Toronto, ON M5M 4N5 Canada Name: ____________________________________________ Address: _________________________________________ City: _____________________________________________ State/Province: ____________________________________
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
Choose a Subscription type:
Canada/USA International Air Combo edition add-on (print + PDF edition)
$ 97.99 CAD $139.99 CAD $ 14.00 CAD
($69.99 US*) ($99.99 US*) ($10.00 US)
ZIP/Postal Code: ___________________________________ Country: ___________________________________________ Payment type: VISA Mastercard
American Express
Credit Card Number:________________________________ Expiration Date: _____________________________________
Signature:
Date:
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.
E-mail address: ______________________________________ Phone Number: ____________________________________
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EDITORIAL
Welcome R A N T S
TM
Volume III - Issue 9 September, 2004
to the family!
E D I T O R I A L
php|architect
I
f you are reading this, there’s a chance that you picked up your copy of php|architect from your local bookstore or specialty newsstand. If so, let me welcome to our family. If you’ve been with us for a while, or if you are reading our PDF edition, welcome back! Although this is our first issue to hit the stands, php|a has been around for almost two full years—all the way publishing hundreds of in-depth articles conceived specifically for the professional PHP developer who is looking for intermediate- to advanced-level techniques and programming methodologies. If I can give you my highly unbiased opinion, php|a is the best PHP magazine in the world! You have probably noticed that php|architect is a bit different from the other magazines that sit on the stands. For one thing, there are hardly any advertisements in our magazine; this is by design—we think our readers prefer to read articles rather than commercial messages, and publishing the kind of material that makes up our magazine takes up a lot of space anyway. Our independence from advertisers also means that the articles you find in our magazine are generally unbiased and focused on the technical merits of every solution that they describe. You’ll find little or no fluff here, and if something doesn’t work, you’ll hear about it in no uncertain terms. Our website, at www.phparch.com is a great place to learn what’s going on in the PHP community and discuss your problems (and your ideas) with fellow PHP enthusiasts. We publish PHP-related news items every day—if you have a news aggregator, you can even subscribe to our RSS feed and have the news delivered to your desktop in real time. Our forums are frequented by lots of PHP-loving folks, and most of our authors love to discuss their articles with our readers through them. In this issue, you will find a number of great articles on topics ranging from debugging your scripts using Derick Rethans’ popular Xdebug extension to writing automated robots that keep an eye on and repair your filesystems. We always try to publish articles that are relevant to your day-to-day PHP experience—sometimes, even if you don’t know about them yet! If you have any questions or are curious about php|a… feel free to drop us an e-mail! Until next month, happy readings!
Publisher Marco Tabini
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Graphics & Layout Arbi Arzoumani
Managing Editor Emanuela Corso
Director of Marketing J. Scott Johnson
[email protected] Account Executive Shelley Johnston
[email protected] Authors Mark Evans, Ron Goff, John Holmes, Peter B. MacIntyre, Derick Rethans, Chris Shiflett, Eric David Wiener, Michal Wojciechowski php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
Contact Information: General mailbox:
[email protected] Editorial:
[email protected] Subscriptions:
[email protected] Sales & advertising:
[email protected] Technical support:
[email protected] Copyright © 2003-2004 Marco Tabini & Associates, Inc. — All Rights Reserved
September 2004
●
PHP Architect
●
www.phparch.com
NEW STUFF
N E W
S T U F F
What’s New! PHP 4.3.9RC2 released! PHP.net is proud to announce the release of 4.3.9 RC2 “This is the last release candidate before the final release and should have a very low number of problems and/or bugs. Nevertheless, please download and test it as much as possible on real-life applications to uncover any remaining issues.” Some changes include: • Implemented periodic PCRE compiled regexp cache cleanup, to avoid memory exhaustion. • Fixed a file-descriptor leak with phpinfo() and other ‘special’ URLs • Fixed bug #29594 (Use PHP’s own tmpfile() implementation). • More.. Get the latest release from php.net.
PHP-Nuke 7.5 Final The latest release of PHP-Nuke is out! This version includes a totally new modularized administration and multiple-administrator management. Several bugs have been fixed, cosmetic changes applied, some HTML problems taken care of, almost all variable validation implemented and much, much more. Additionally PHP-Nuke version 7.4 has been released for free to the public. For more information visit: http://phpnuke.org/
September 2004
●
PHP Architect
●
www.phparch.com
PHP Live! 2.7 The latest version of PHP Live is out!. PHP Live! 2.7 is Webbased live chat support software, written in PHP and MySQL. It Enables live help and live customer support communication directly from your website. Provides real-time sales and/or customer service support from any computer, anywhere! Some of the features include: • Real-time Chats • Email Transcripts • Ad Tracking (Track’it) popular! • PUSH HTML pages popular! • Spam Blocking New! • Visitor Traffic Monitor popular! • Hidden Departments • Time Zone Setup • and more! For more information visit: www.phplivesupport.com
6
NEW STUFF
Back-End CMS Back-End.org announces: ”Back-End allows even the nontechnical to manage any website easily, through a web browser, on any operating system. Fast, flexible and easy to understand, Back-End puts you in charge of your site, saving your organization time and money in the process.” What’s new in this release? “This is our first xhtml release, but there are still a few more bugs bugs to squish before every page is xhtml 1.0 strict compliant. We’ve also tightened out our security to ensure that all user data is filtered properly.” Get more information from back-end.org
PHPtags CMS 0.93 What is it? “PHPtags CMS is a content management framework for entering data and managing content that is then deployed on Web sites using the PHPtags custom tags engine. It is a membership authentication system that is especially useful for setting up member-only login-required sections of Web sites. It contains numerous useful tags that make it easier to build intelligent forms, do database queries, build templates, export data, cloak email addresses, edit pages online, and create popups. You can also write your own tags.” Get more info from the PHPTags homepage.
phpMyBackupPro 1.0 A new release called phpMyBackupPro 1.0 allows the user to schedule backups, email or upload them using FTP phpMyBackupPro is a webbased MySQL backup program, written in PHP. You can schedule backups, email or upload them using FTP and either backup your data, structure or both. You can gzip your backups if you wish and can include ‘drop table if exists’-commands. There is a database summary and onlinehelp available. It’s combined with an easy to use web-based interface and an easy install and configuration. For more information visit:
http://www.smkelly.com/smkel ly/products/phptags/
www2.fht-esslingen.de
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL. esmtp 0.3.0 Esmtp is a wrapper for an SMTP (http://www.stafford.uklinux.net/libesmtp/)
client
library
based
on
the
libESMTP
library
memcache 1.3 Memcached is a caching daemon designed especially for dynamic web applications to decrease database load by storing objects in memory.This extension allows you to work with memcached through handy OO and procedural interfaces. parsekit 0.3.1 Provides a userspace interpretation of the opcodes generated by the Zend engine compiler built into PHP. This extension is meant for development and debug purposes only and contains some code which is potentially non-threadsafe. newt 0.1 PHP-NEWT - PHP language extension for RedHat Newt library, a terminal-based window and widget library for writing applications with user friendly interface. Once this extension is enabled in PHP it will provide the use of Newt widgets, such as windows, buttons, checkboxes, radiobuttons, labels, editboxes, scrolls, textareas, scales, etc. Use of this extension if very similar to the original Newt API fo C programming language. huffman 0.2.0 Huffman compression belongs into a family of algorithms with a variable codeword length. That means that individual symbols (characters in a text file for instance) are replaced by bit sequences that have a distinct length. So symbols that occur a lot in a file are given a short sequence while other that are used seldom get a longer bit sequence.
September 2004
●
PHP Architect
●
www.phparch.com
7
NEW STUFF
ZPS 4.0.2 is Now Available Zend has announced the latest release of Zend Performance Suite with Full PHP 5 Support! “Zend Performance Suite enables customers to decrease hardware costs while improving performance, protect content assets from piracy, and ensure a better end-user experience.“ Some of the Highlights of ZPS 4.0.2 include: • NEW - Full PHP 5 Support! • Unparalleled server performance increase - up to 25X increase in server throughput: - Reduce the load on the server - Decrease page response time - Decrease load on Database Backend - Increase concurrent users to be served • No code intervention necessary • Flexible configuration of caching conditions • Dramatic cost savings, with fast ROI payback • See the results yourself with the built-in testing capability • Ease of use and straight-forward deployment • API functions for personalization For more information visit: www.zend.com
Check out some of the hottest new releases from PEAR. Gtk_ScrollingLabel 0.3.0beta1 This is a class to encapsulate the functionality needed for a scrolling GTK label. This class provides a simple, easy to understand API for setting up and controlling the label. It allows for the ability to scroll in either direction, start and stop the scroll, pause and resume the scroll, get and set the text, and set display properties of the text. Image_Text 0.5.2beta1 Image_Text provides a comfortable interface to text manipulation in GD images. Beside common Freetype2 functionality, it is capable of handling text elements in a graphic- or office-tool like way. For example, it allows the alignment of text inside a text box, rotation (around the top left corner of a text box or its center point) and the automatic measurement of the optimal font size for a given text box. HTTP_Download 0.10.0 HTTP_Download provides an interface to easily send hidden files or any arbitrary data to the client through HTTP. It features HTTP caching, compression, ranges (partial downloads and resuming) and a throttling mechanism. Date_Holidays 0.10.0 Date_Holidays helps you calculate the dates and names of holidays and other special celebrations. The calculation is driver-based, so it is easy to add new drivers that calculate a country’s holidays. The class’ methods can be used to get a holiday’s date and name in various languages. I18N_UnicodeString 0.1.0 Provides a method of storing and manipulating multibyte strings in PHP without using ext/mbstring. Also allows conversion between various methods of storing Unicode in 8 byte strings like UTF-8 and HTML entities.
September 2004
●
PHP Architect
●
www.phparch.com
8
Debugging Questions and Xdebug Answers
F E A T U R E
by Derick Rethans Despite its relatively young age, Xdebug is a very popular extension for PHP 4 and 5 that adds a great set of functions to the standard interpreter to debug, profile and analyze PHP scripts. In this article, I will provide a Q&A session for PHP developers who want to know how Xdebug can help make solving debugging, profiling and testing problems easier thanks to its advanced features. Here we go... Why do I Need a Debugger? Q: I have heard many times that a debugger is a required tool to debug an application, but I don’t really agree as there already some functions in PHP to help you with debugging, such as echo() and print_r(). A: I beg to differ and will try to convince you during the rest of this Q&A session that those two functions are nowhere close to the tools you need to debug a complex PHP application. As this article is about debugging, most of the answers below will make use of functionality provided by Xdebug, an Open Source (BSD-style) licensed debugger for PHP. Most of the examples are written on an Unix-like Operating System, but they should work as well on a Windows platform, except that they might require some changes in path and/or file names. The examples were all run with PHP 5.0.2-dev and Xdebug 2.0.0beta1, but should work on any PHP version after 4.3.0—although the output may differ slightly. Go ahead with your questions! How do I Install Xdebug? Q: Before I ask questions about a few problems that I have while debugging applications, I would like to know how I can install Xdebug, as I assume that for most of the examples in this article to work I will need to have it up and running. A: That is correct. If you’re on a Unix-like system I would recommend to get the source code from the Xdebug website (xdebug.org) and follow the instructions on the Xdebug September 2004
●
PHP Architect
●
www.phparch.com
documentation page (xdebug.org/install.php). This should not take more than three minutes if you have a correctly set-up PHP installation. Depending on your distribution, you might need to install a php-dev package, or something like that, in order to get all the required files you need for a successful compilation of a PHP extension. Debian users can install the “php-dev” package through apt-get, for example. If you still fail installing Xdebug, your distribution might not provide a “correctly” set-up PHP installation and your best shot is to ask on the
[email protected] mailing list. For installation on a Windows system, please follow the instructions for precompiled modules on the installation documentation page—these are very straightforward, as you don’t have to compile the extension at all. Xdebug is also available through the Windows PECL snapshots for PHP 5.0.x (snaps.php.net/win32/PECL_5_0/php_xdebug.dll) and for PHP 5.1.x (snaps.php.net/win32/PECL_UNSTABLE/php_xdebug.dll) and the PECL distribution system (just execute pear install xdebug from a shell). Make sure you install at least version 2.0.0beta1.
REQUIREMENTS PHP: 4.3.0+ (4.3.6+ recommended) or 5.0.0+ OS: Any Other software: Xebug 2.0 beta 1 or higher, KCacheGrind recommended Code Directory: xdebug
9
FEATURE
Debugging Questions and Xdebug Answers
Why Does PHP Crash? Q: Sometimes I get no output in the browser for a script and, when I look at the Apache error log I see something like “notice child pid 1048 exit signal Segmentation fault (11)”. I know this is not a good thing, but how can I find out what happened? A: When a crash occurs in a PHP script, people are always ready to blame a bug in the PHP interpreter, but, most of the time, that is not really the case. There is one occasion in which PHP is “allowed” to crash, and that is when a script has an infinite recursion loop in its code and an overflow occurs. PHP does not protect against those errors itself because it is always very hard to determine how many levels can be executed before the stack—a limited sized memory structure belonging to a program—overflows and PHP crashes. Without any clue as to where in a complex script this might be, the simple debugging functions—eecho() and print_r()— will not be of much help in determining the cause of the crash. Xdebug does protect against this infinite recursion problem by aborting the script when you reach 100 Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
nested function calls. A nested function call is defined as a function calling another function—Listing 1 shows an example of this. It is, of course, possible to change this default limit of 100 to something else if your scripts so require; you can do that with the xdebug.max_nesting_level php.ini setting. When the configured maximum nesting level is reached, Xdebug aborts the script with the message “Fatal error: Maximum function nesting level of ‘3’ reached, aborting! in on line ,” followed by a stack trace. This is a lot more useful than just crashing your web server. Where Did The Error Occur? Q: When receiving a warning or error like: Warning: chdir(): No such file or directory (errno 2) in include/dir.php on line 8123
I know on which line the code occurred but how do I find out by which function the function which raised the error was called? A: With PHP you can create your own error handler and use the debug_backtrace() function to show a stack trace that gives you exactly this information. See Listing 2 for an example on how to do this. When you run this script, it will output something similar to Figure 1. As you see, it’s quite some work to implement something as basic as showing a stack trace when an error occurs, and the debug_backtrace() function did not even exist before PHP 4.2.x. Around the time of the 4.2.x releases, I started working on Xdebug and the display of the stack when an error occurred was the second feature that I implemented. Shortly after this, the PHP development team added a similar function to PHP 5, and later backported it to PHP 4.3. When Xdebug is loaded, the stack will be dumped for your automatically as soon as an error (or notice or warning) occurs. Parameters to functions are only shown as part of the stack trace when the xdebug.collect_params php.ini setting is set to one. Variables in
Figure 1
September 2004
●
PHP Architect
●
www.phparch.com
10
FEATURE
Debugging Questions and Xdebug Answers
the scope of the “highest” function are only shown when xdebug.show_local_vars is enabled. The result of this same script (without the set_error_handler() call) is shown in Figure 2. How Can I Debug Request Variables? Q: Whenever I get an error, I would like to see information from the superglobals, such as $_POST, together with the other debugging information. Is there a way to do that easily? A: Xdebug can automatically show information from the superglobals whenever an error occurs, very similar to the way it displays local variables as described in the previous section. To configure this feature, you need to make certain settings in php.ini. For each super global, there is a specific setting—for example, xdebug.dump.POST for the $_POST array, and all other superglobals—CCOOKIE, ENV, FILES, GET, REQUEST, SERVER and SESSION—are supported as well in a similar fashion. The value of each setting is a comma separated list of the indices of the variables that you want to show— make sure that you do not use any spaces in the setting’s value. In case you want to see all the elements of a specific superglobal, you can use the special wildcard Listing 2 1
value *. For example, if you use the settings illustrated in Listing 3, Xdebug will show information about all COOKIE variables, the POST variables login and password, and the SESSION variables id, login and hash on each error. An example of how this might look like is shown in Figure 3. (You can find the files for “creating” this error in the superglobals/ directory in the downloadable files accompanying this issue). How do I “Pretty Print” Variables For Debugging? Q: Whenever I use print_r() or var_dump() to display variables while debugging, I have to surround the function call with <pre>... in order to make the output somewhat readable. Even then, it’s often hard to figure out what I’m seeing. A: Xdebug implements its own variable display funcListing 4 1
Listing 5 1
Listing 3 xdebug.dump.COOKIE=*xdebug.dump.POST=login,passwordxdebug.dump.SESSION=id,login,hash
September 2004
●
PHP Architect
●
www.phparch.com
11
FEATURE
Debugging Questions and Xdebug Answers
tion—xxdebug_var_dump()—to display variables. When this new function is used with the Command Line Interface (CLI) of PHP, the format is only a bit different from PHP’s var_dump() function, but when you use it through your web browser, the function will display a variable laid out in proper HTML and colour-coded depending on its type. Besides this new function, Xdebug will also override PHP’s var_dump() function so that your scripts will require no changes to make use of Xdebug’s improved variable display capabilities. Figure 4 shows how Xdebug’s variable display function shows the complex variable from Listing 4.
Figure 4
How Much Time Does my Script Take? Q: I want to know how long my script, or a part of my script, takes to execute. How can I do that? A: In PHP, you can do this by using the microtime()
Figure 2
Figure 3
September 2004
●
PHP Architect
●
www.phparch.com
12
FEATURE
Debugging Questions and Xdebug Answers
function—but this is not very trivial, as you can see in Listing 5. Because microtime() returns the time in a strange format (the number of microseconds, followed by a space and the number of seconds), you have to create a helper function to generate a floating point number from it. Defining this function and executing it takes time too, so your measurements are less accurate than they could have been. You can see in the listing that the second line that prints the number of seconds to execute the script makes use of the Xdebug function xdebug_time_index(), which always returns the time since the request for the script was made through the browser, which is often before the code in your written script starts to execute. This is why you see the difference in time between the “old” method and the Xdebug method. If you want to measure how much time a specific section of your script takes, then the differences between both methods would be marginal, but if you use Xdebug’s xdebug_time_index() function you still save time because you don’t have to write your equivalent of the get_time() function. The measurement will also be a bit more accurate, too. What is The Memory Usage of my Script? Q: I am concerned about the memory usage of my script; how can I find out how much it uses overall, and what its peak memory usage is? A: When I added the functionality to measure the memory usage at any given point during the script’s execution and the peak memory usage to Xdebug, PHP itself did not have any functions to retrieve this information. Nowadays, PHP provides the memory_get_usage() function for this purpose. Xdebug’s equivalent is the xdebug_memory_usage() function, which retrieves the current amount of script memory that is used. However, Xdebug has an additional function, xdebug_peak_memory_usage(), that can be used to retrieve information about the script’s peak memory usage. Both functions are only available when PHP is compiled Listing 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
September 2004
●
PHP Architect
●
www.phparch.com
13
FEATURE
Debugging Questions and Xdebug Answers
the directory where the trace file should be written to; this is, by default, the /tmp directory—if you’re running your scripts on Windows, you will have to change this setting, as /tmp obviously does not exist (unless you create a directory called “tmp” in your main drive). The format of the filename of the trace file can be manipulated by tweaking the xdebug.trace_output_name setting. When this is set to crc32, the name of the trace file has the format trace., followed by a crc32 checksum of the current working directory, followed by the .xt extension. If the setting has any other value, the middle part (that is, the crc32 checksum of the current working directory) is replaced by the ID of the current process (PID). Lastly, the xdebug.trace_options setting that configures various options for creating traces. At the moment, the only available option is 1, which will cause Xdebug to open the trace files in “append” mode rather than “overwrite” mode. The settings below will start automatic tracing of scripts and cause Xdebug to write the results to the /tmp/xdebug-traces directory in append mode where the filename consists of trace. plus the PID of the script .xt followed by the extension. The xdebug.collect_params and xdebug.collect_return settings control whether parameters to function calls and
their return values should be included in the trace file. xdebug.auto_trace = 1 xdebug.collect_params = 1 xdebug.collect_return = 1 xdebug.trace_output_dir = /tmp/xdebug-traces xdebug.trace_output_name = pid xdebug.trace_options = 1
The second way of starting a trace is by calling the xdebug_start_trace() function in your script. This function accepts two parameters: the first is the base name
Listing 7 1
Figure 6
September 2004
●
PHP Architect
●
www.phparch.com
14
FEATURE
Debugging Questions and Xdebug Answers
for the trace file’s location and the second optional parameter configures the options for the trace output. Because Xdebug automatically appends “.xt” to the base name of the trace file, there is a function, called xdebug_get_tracefile_name(), that returns the full filename of the trace file. xdebug_start_trace() supports two options: XDEBUG_TRACE_APPEND, which will create the trace file in append mode, and XDEBUG_TRACE_COMPUTERIZED, which will change the format of the trace file (more about that later). To stop a manually started trace, you can use the xdebug_stop_trace() function, which does not accept any parameters. Please do keep in mind that there can only be one trace (manually or automatic) running at any given time. The script in Listing 7 is started with the php.ini settings described above. When the script is run, the trace file that is generated is shown in Figure 6. The colourcoded formatting comes from a VIM syntax file called xt.vim, which instructs VIM to highlight the trace file according to a set of special rules. The xt.vim file can be found in the Xdebug source package and can be easily installed. Please refer to the Xdebug documentation at http://xdebug.org/install.php#vim for instructions on how to configure VIM to use this syntax-high-
lighting file. The trace file shows the following information: the start and end time of the trace (the first and the last line) and, for each function call, the current time index since the beginning of the script, the amount of memory in use at that moment, the function name, its parameters and the location from where the function was called, all in one line. Nested function calls are indented, so that you can immediately determine the order in which the execution takes place. For each corresponding function call, there is also a line with the return value of the function. In the trace from Figure 6, you can see that the function find_files_recursive() (line 3) was called with the parameters /tmp and null. This function called the glob() function with, as a parameter, the string /tmp/*. The glob() function returned an array. Then, the find_files_recursive() function called the is_dir() function for each element in the array. And, when that function returned True, the find_files_recursive() was called again. While this trace is very easy to read for humans like you and me, it is not very easy to parse for programs. This is why there is a second format for trace files, called the “computerized format,” which you can see in
Figure 7
September 2004
●
PHP Architect
●
www.phparch.com
15
FEATURE
Debugging Questions and Xdebug Answers
Figure 7. The information shown in this format is very much the same, except that the fields are all separated by tabs and that there is no indentation for the function names. Instead, the first column of the file consists of the “nesting” level, which, in combination with the second column (that shows the function number) can be used to re-create the same format as the human-readable output. The third field is a number; zero means the start of the function call and one the end of the function call. Together with the time index in the fourth field, it is easy to calculate how long each function call takes—and it is also not very hard to subtract the time spent in calling other functions from this function. The fifth column shows the memory usage, except this value is also shown when the function exists. This is followed, in order, by: the name of the function, whether it is a user-defined function (one) or a PHP internal function (zero), the filename from which the function was called and the line number from which the function was called. Currently, there is only one tool that is able to read the trace files in “computerized format”—Xgui—but this tool is still under development and not yet publicly available. However, once we release it we’ll make both
a Windows and Linux version available (and, indeed, provide support for any other system on which the QT libraries can be compiled and the necessary build tools are available). Figure 8 shows the trace file from Figure 8 displayed with this tool. In the screenshot, you can see three columns; the first one displays a tree of functions, which shows immediately which function called which. By default, the whole tree is expanded so that all function calls are shown. The second column shows the dif-
Figure 8
Listing 8 1
September 2004
●
PHP Architect
●
www.phparch.com
Listing 9 1
Listing 10 1
16
FEATURE
Debugging Questions and Xdebug Answers
ference in memory usage between the start of the function call and the end of the function call, including all allocated memory in the functions each listed function called. The last column shows the time used for each function call and calls to its children. Besides the fact that this tool is not yet available, it is also not very complete. In the future, we plan to add a lot of functionality to it, including features to highlight the slowest functions. Stay tuned to the Xdebug site (http://xdebug.org) for more information. Do I Use All the Code in My Scripts? Q: While developing a test suite for my web application’s functions I want to know if all the code I have written is covered in my tests. How do I do that? A: A technique to find out which code is actually run in your script is called “Code Coverage.” Xdebug has functionality to check on which lines statements in scripts are executed. In order to take advantage of it, you need to make sure that the xdebug.extended_info setting in your php.ini file is set to 1—otherwise, Xdebug can not gather the information it requires. Listing 8 shows a very simple and basic way of running
unit tests while also checking if all lines in the functionto-be-tested are actually executed. The script expects the following directory structure: ./listing8.php ./tests/basic.t ./tests/advanced.t
Listing 11 1
Figure 9
September 2004
●
PHP Architect
●
www.phparch.com
17
FEATURE
Debugging Questions and Xdebug Answers
The *.t files are the test files which are responsible for running the test case. You can find basic.t in Listing 9 and the script that this test case uses (ttests/bigassfunction.php) can be found in Listing 10. When you open up Listing 8 in your browser, it will generate a list (A) of all the test files that are available and display them as a button (inlay image in Figure 9). When you click on a button, the system will run the corresponding test file. Below “(B),” you can see that the script uses xdebug_start_code_coverage() to start collecting information on code coverage. After the test (Listing 9) is run, it uses xdebug_get_code_coverage() to retrieve a two dimensional array in which the first dimension is the filename and the second dimension the line number. This value contains the number of times a specific line was touched during the execution of the test. After the information is extracted from Xdebug, the xdebug_stop_code_coverage() function is called to stop gathering information. In section (C), we single out the file we’re interested in (returned by the test file in the $test variable). Then, at last, in section (D) we read the test file, split it into lines with the file() function, loop through it and display the lines that exist in the code coverage data contained in the $clean variable. We also display the number of “hits” on a specific line. The full result of the display is shown in Figure 9. Sometimes, the Zend Engine plays a few tricks that make some lines not show up in the code coverage information, thus making Xdebug’s output inaccurate. The reasons why this happens bear some further investigation, and we plan to address them in future versions of Xdebug. How Many Functions do I Call? Q: I am interested in how many functions a specific part of my script calls because of optimization reasons. How can I find out? A: Xdebug has a special function for this, called xdebug_get_function_count(). This function always returns the number of functions since the start of the script, but you can, of course do a few simple calculations to figure out how many calls a specific section of your script makes. In Listing 11, we attempt to calculate how many functions are called in our find_files_recursive() function from Listing 7. Before we start the main find_files_recursive() function, we store the number of function calls up to that point in the $start variable and then, when the function ends,
Listing 12 1
September 2004
●
PHP Architect
●
www.phparch.com
we calculate the amount of function calls by subtracting the value of $start + 1 from the current function call count. We subtract the extra “1” because the call to xdebug_get_function_count() is added to the function count as well. You should also note that calls to include(), require(), include_once(), require_once() and eval() add to the function count as well, even though they are not real functions (they are language constructs). This is why the script in Listing 12 outputs “1,4”; function 1 is the first xdebug_get_function_count(), “function” 2 is eval(), function 3 is abs() and function 4 is the second xdebug_get_function_count() function. How do I Analyze the Performance of My Script? Q: My script is slow and I’d like to know why. Xdebug doesn’t have anything to help me here, does it? A: But of course it does! Xdebug has functionality to analyze the performance of your script—it’s called “profiling.” I have to admit, however, that in the Xdebug 1.x series profiling was in a pretty bad state, as it introduced too much overhead and thus invalidated the profiling results, particularly when dealing with the larger applications that you usually want to profile. Xdebug 2 has a whole new profiling concept in which Xdebug only generates a profiler information file, which is similar to the function tracing feature, except that the format of the file is fully focused on generating profiling data in the most efficient way. The file format of the profiler information file is the same format that cachegrind—a memory management and profiling tool for C applications on Linux—uses, since there is a great tool for visualizing profiler information Listing 13 1
18
FEATURE
Debugging Questions and Xdebug Answers
files in this format: KCacheGrind. However, this tool requires KDE libraries, which you normally only find on Linux machines—you do not have to run KDE as your desktop environment, just having the libraries installed is more than enough. Debian users can install KCacheGrind with apt-get install kcachegrind. In the future, I plan to introduce more profiling output formats through Xdebug. Setting up profiling requires you to make a number of php.ini settings, they are: xdebug.profiler_enable=1 xdebug.extended_info=0 xdebug.remote_enable=0 xdebug.auto_trace=0 xdebug.profiler_output_dir=/tmp xdebug.profiler_output_name=crc32
The first one is obviously the one that activates Xdebug’s profiling functionality, but the second one is more mysterious. While you need to have this setting enabled for most other features in Xdebug (such as Code Coverage and Remote Debugging), it is usually a good idea to turn it off while profiling code. When this setting is set to one, it will instruct the PHP parser to generate extra instructions for the PHP executor so that
it is possible to add a hook after each statement. This is needed in order to be able to set breakpoints on arbitrary lines. However, because extra instructions are generated, the code that PHP internally needs to execute will be about 33% larger—and that, of course, makes your script a bit slower. Because mixing profiling together with the other features in Xdebug doesn’t really make much sense—who would want to debug a script while also profiling it?—it is not much of a problem to have to turn off this setting. In the future, Xdebug will most likely do this for you automatically when you enable profiling in your php.ini file. The next two options turn off the generation of trace files and the remote debugger—the goal of this is, again, to make the script run as “naturally” as possible; both options would dramatically slow down execution if they were still enabled. The last options serve the same purpose as when you are generating trace files: the first one selects the directory in which to place the generated profiler information file and the second one selects its filename. crc32 will cause Xdebug to include the CRC32 checksum of the current working directory into the filename, while any other value will cause it to include the PID of the running PHP process into the filename. All files generated by the profiler have a name in
Figure 10
September 2004
●
PHP Architect
●
www.phparch.com
19
FEATURE
Debugging Questions and Xdebug Answers
the following format: cachegrind.out.. This is required in order for KCacheGrind to open the profiler information file. Let’s start with profiling a small example script, which you can find in Listing 13. After I ran the script, a 4kB profiling information file appeared in the /tmp directory, which, when opened in KCacheGrind (and with a bit of tweaking of the layout) caused a display like the one in Figure 10 to appear. Although I won’t cover all the features that KCacheGrind offers—you can find more information on this on the Xdebug documentation page at http://xdebug/docs-profil ing2.php—I will describe what
function itself generates some overhead as well, and non-function calls (such as language constructs like if and foreach) use up time, too. It’s obvious that the source code for internal PHP functions cannot be displayed, as there is simply no source code available for them! On the lower left column, I selected the Calls tab, which shows all the functions that were called from the selected function—{{main} in our case. The lower right column shows not only the functions that were called directly, but also those that were called through functions that were called from the selected function. As that sentence is probably very hard to understand (how much wood would a wood chuck chuck…), I’ll give you an example. The distance column describes how far from the selected function the listed function was called. Everything with a distance of 1 was called directly by the selected function. In the screenshot, this only applies to the array_map() function. Because there is only one function with a distance of 1, we are sure that the function with a distance of 2 was called by it. In our case, the convert_word() function was indeed called as callback through the array_map() function. Similarly, because there is also only one function with a distance of 2 we can conclude that all functions with a distance of 3 were called by it. In our case, this applies to the iconv() function. With a more advanced script as your guinea pig, it would have been possible for a specific function to have multiple distance values; for example, strlen() could have been called directly by the selected function, and also through a different function first. In that case, KCacheGrind will display something like 1-2(2) as a distance value, where the 1-2 describes the range of distances and the (2) describes the median distance, which is where most of the functions were called from. KCacheGrind has plenty of other features, and the best way to get to know them is just to play around with it.
“Instead of modifying
you see in the screenshot. On the left side of the screen, you will find a “flat profile” that lists all the functions that were called in the script. The first column shows how much time in percentage was spent in the listed function, including calls to other functions. The second column only shows the time spent in the listed function, excluding calls to other functions. The third column shows how often a specific function was called and the fourth column lists the function’s name. Xdebug will prefix the name for internal PHP functions such as array_map() with php:: in order to be able to tell them apart from user-defined functions; included files are listed as a function as well, with names like include::/path/to/included_file.php. The last column (not in the screenshot) displays the filename in which the function was defined. The right side of the screen is split into two columns, which, in turn, is split into two rows. Each column shows information about a specific function, in our case the pseudo-function {main} on the left and the do_convert() function on the right. In the left column, I selected the Call Map tab, which displays a map where a larger area represents more time spent in this function. As you can see here, the execution of the iconv() function took 76.61% of the time of the total execution of the script. The maps are stacked and, when you move your mouse pointer over the area, you see a simple stack trace describing the percent of time spend in that function. The top half of the right column shows the annotated source code of the selected function. For each function call within it, KCacheGrind shows you the number of calls to the function and the total time spent in it while being called from the selected function. In our case, there was just one call to array_map(). The numbers don’t add up to 100% because the calling of the
your code, Xdebug offers a non-intrusive way of debugging your application.”
September 2004
●
PHP Architect
●
www.phparch.com
How do I Analyze my Script While it is Running? Q: I have a problem somewhere in my script, and I like to know what is going on. Of course, I can add var_dump() calls everywhere, but that is not very effective. Any idea for a better method? A: Instead of modifying your code, Xdebug offers a non-intrusive way of debugging your application. With the built-in “remote” debugger, you can tell Xdebug to connect to a debug client as soon as a request is made
20
FEATURE
Debugging Questions and Xdebug Answers
through the web server. Again, there are a few php.ini settings involved. The most optimal combination is listed below: xdebug.remote_enable=1 xdebug.remote_handler=dbgp xdebug.remote_mode=req xdebug.remote_host=localhost xdebug.remote_port=9000 xdebug.extended_info=1
All of these are default settings, except for xdebug.remote_enable, which defaults to zero. Xdebug understands multiple protocols for its debugging interface. There is the gdb protocol, which makes it compatible with the popular GNU debugger that most opensource developers have probably used at some point to debug their C applications, as well as dbgp, a brand new language-agnostic debugger protocol that I developed together with Shane Caraveo at ActiveState for adding debugging support in their Komodo IDE. Although the DBGp protocol is a bit harder to use than the GDB protocol for a human being, it is much more powerful and makes it possible to perform more effective debugging—and that is why we will be using it here in our next example. Normally, you would use a GUI client to manage your debugging activities, but there is only a very simple client bundled with Xdebug. The protocol uses plain ASCII commands to the debugger in the form command -a optiona -b optionb, very much like the arguments you would pass to programs on the command line. The answer returned by Xdebug is always an XML packet.
“There is one occasion in which PHP is ‘allowed’ to crash, and that is when a script has an infinite recursion loop in its code and an overflow occurs.” First of all, we need to compile the debugclient application—at least version 0.8—which you can find in the debugclient/ subdirectory of the Xdebug source package. There is no binary client for Windows at this moment (unless you compile under cygwin). It is a good idea to install libedit (aapt-get install libedit2 libedit-dev for Debian users), as this adds a history buffer to the client. You can issue the following commands to compile debugclient and install it in /usr/local/bin : September 2004
●
PHP Architect
●
www.phparch.com
./buildconf ./configure —with-libedit make make install
Now, whenever you run the debugclient command, it will wait until Xdebug connects to it. To start debugging, all you have to is point your browser to the script you want to debug and add this query parameter to the URL: ?XDEBUG_SESSION_START=application
For example: http://ez34/index.php/galleries?XDEBUG_SESSION_START=p hpa
In your running debugclient client, you will now see that Xdebug connected to the client and has sent you the packet. The important elements of this packet are the fileurl attribute, which contains the file that was requested, and the idekey attribute, which contains the value of the XDEBUG_SESSION_START request variable. From now on, it is possible to issue commands to the client that are described in the DBGp protocol specification (http://xdebug.org/docs-dbgp.php). This protocol is so elaborate that it is unwise to go into much detail here. I included a reformatted trace of a simple debugging session in Listing 14 (the source code for the script being debugged is, once again, the code from Listing 13). After Xdebug sees the special XDEBUG_SESSION_START request variable, it will set a cookie, so further requests do not need changes to the URL query string. Xdebug will continue connecting to debugclient unless you either remove the cookie yourself, or use another special request variable— XDEBUG_SESSION_STOP . It is, of course, also possible to debug command-line PHP scripts and, since there it is not possible to add an extra request variable to the “URL,” you need to tell Xdebug that it needs to contact debugclient in a difference way, that is, by setting an environment variable as follows: export XDEBUG_CONFIG=“idekey=phpa”
Xdebug will stop trying to connect to the debug client once you remove the environment variable. Since parsing XML by hand to debug an application is not very efficient, it is much better to use a graphical client to help you with your debugging. The XML protocol specification is pretty straightforward and the debug protocol is already implemented in Komodo 3
21
FEATURE
Debugging Questions and Xdebug Answers
and in Maguma Workbench 2.1, which will be released soon. Unfortunately, those two products are commercial (though they might have a trial version) and the only free debug client for Xdebug that I know of— Weaverslave ( http://weaverslave.ws)—only implements the older GDB protocol. In case you want to help out writing a cool interface (especially one running on Linux) for the DBGp protocol, feel free to drop a mail— I will gladly help you if you have any question about the protocol specification. Where Can I Set All The Different Options? Q: I read about the xdebug.* php.ini settings here, but where exactly can I set those? A: They will all work from within the php.ini file itself, and all options except xdebug.default_enable and xdebug.extended_info can be set in an .httaccess file. Also, all options except these two, the xdebug.profiler* settings and xdebug.remote_enable can be set from within your script itself. However, you should keep in mind that some of these might not have the desired effect when set from within your script with ini_set(), as most of Xdebug’s features are already activated before your script starts. I usually use an .htaccess file
September 2004
●
PHP Architect
●
www.phparch.com
for configuring Xdebug on a per-project bases. Anything Left? Q: This all sounded very interesting, are there any famous last words left? A: No last famous words, but if you still have questions I would like to point you to the Xdebug website (http://xdebug.org), which provides extensive documentation about Xdebug’s features. If you still have problems, feel free to write to the
[email protected] mailing list.
About the Author
?>
Derick Rethans provides solutions for Internet related problems. He has contributed in a number of ways to the PHP project, including the mcrypt extension, bug fixes, additions and leading the QA team. He now works as a developer for eZ systems A.S.. In his spare time he likes to work on SRM: Script Running Machine and Xdebug, watch movies and travel. You can reach him at
[email protected] To Discuss this article: http://forums.phparch.com/170
22
Open != $Free
F E A T U R E
by Mark Evans
Why do users of open-source software sometimes get disillusioned with the concept of open-source? How can users and developers work better together? In this article, I will address the issue of supporting open-source applications from both a user’s and developer’s perspective and will provide suggested guidelines on how the users and developers can help each other to make each project a success.
A
s an open-source developer, I often see requests from users for fixes to bugs or for new features to be added to any given project. These requests can go unanswered for a few days—and sometimes even a few weeks or months. This leaves some users frustrated and angry. Often, the user is contacted by a developer not connected to the project who offers to help fix the bug or add the feature that they require— but that comes with a price tag. This immediately brings the follow up questions of… “Hang on... The software is free right? So why do we have to pay for something which everyone else will be able to get for free?” Or “if we pay for a new feature to be added can we keep that feature to ourselves since we paid for it to be implemented?” For the users to understand these issues, they need to understand what the term “free software” means, the motivation of the open-source developer and the spirit that powers the open-source community. The best analogy of what the term “Free Software” means comes from the Free Software Foundation’s founder Richard Stallman, who describes it as a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech,” not as in “free beer.” The essence of open-source development is not about what the developers and users can do for themselves, but what they can do to benefit the community as a whole. If all users that have paid for a new feature or a bug fix to be completed had then kept these to themselves, the project would not have evolved into what it is today—in fact, it might even cease being actively developed due to lack of support for (and September 2004
●
PHP Architect
●
www.phparch.com
from) the community. The evolutionary process of an open-source project is only guaranteed as long as someone adds new features to it (and corrects existing deficiencies), whether someone is paying for the work or not and, therefore, it is in everyone’s interest that all changes be made available to the community as a whole. Commercialism & Open-source There is a popular misconception that open-source is the antithesis of commercialism and that companies are not allowed to make money from an open-source project. If this were true, Red Hat Corporation or SuSE would not be in existence today. For companies that wish to sell or use open-source software to help promote their business, it is essential to provide active and public support for the developers and the community. Companies that don’t do this can be categorized as freeloaders—and will often find that their competitor’s products are recommended over their own when discussed in open-source arenas. There are a number of ways through which companies can support open-source projects; some of these include financial donations, donation of merchandise,
REQUIREMENTS PHP: N/A OS: N/A Other Software: N/A Code Directory: N/A
24
FEATURE
Open != $Free
promotion of the project in trade publications or at trade shows and the sponsorship of full time developers or the provision of equipment or software to help the project be as efficient as possible. The User’s Needs Users need software that is easy to install, easy to use, bug free and secure and has all of the features that “they” want. In reality, this is not possible, whether the software is open or closed source, since the needs of each user are likely to be quite different from the other. Thus, a happy middle ground has to be found between the needs of the user base and the aims of the project. Most open-source projects plan for a lean code base that satisfies the needs of as many users as possible; they then try and make it easy for the users themselves to extend the features of the software with a minimum of programming effort using either a plug-in module architecture or a programming interface that is easy to understand and build upon. Users also crave constant updates with new features and firm deadlines as to when these updates will be released. Many times, I have seen comments on mailing lists or in support forums saying something to the effect of “Thanks for version x… now when is version x.1 due to be released?” With development driven by the motivation of the developers and the support of a community of users that is constantly evolving, it is impossible for a project to predict release dates with any accuracy. This generally isn’t a problem when a project first starts out, as the software is simple and the user base is small. As the software becomes more successful and the user base and number of features increase, the time needed for development increasing accordingly. This causes a conflict between users, who want the new release “yesterday,” and the project managers, who want the release to be as stable and secure as possible. It is not unusual for projects to have a oneor two-year development cycle between stable releases; this is a constant source of frustration for users as many don’t realize the effort needed to keep a project active in terms of both development and management. When a user finds a feature that they need is missing, the first place they normally turn to is the projects mailing list or support forums, where they ask why the feature isn’t available. A common response from other users is that everyone has access to the source code can write whatever additional functionality is required himself, and then contribute it back to the community. In reality, most users don’t have the time or the ability to write code and their reply often boils down to “If I were a programmer I would, but I am not.” This causes the user to be reliant on the core developers taking the time to add the new functionality or on external developers. If the core developers are unable to help the user with the missing feature due to lack of time, or if they feel that the feature does not fit into the aims of the September 2004
●
PHP Architect
●
www.phparch.com
project, then the users have no choice by to rely on using an external developer to create the code that they need. This has a number of issues, the first being that the new feature is unlikely to make it into the core application in the same timeframe as if it were developed by a core developer—if it makes it at all. The user would also have to rely on the coding ability of the external developer and trust them to follow the programming standards set out by the project leaders, as they are unlikely to integrate a feature that would require hours of programming effort just to bring it up to their coding conventions. If the feature isn’t integrated into the core code base, the user then needs to think about how it will be maintained. It is likely that, as the project develops, changes will be needed to external contributions to ensure compatibility with future releases. Whatever the path they choose to develop it, it is important that the users contribute the new feature back to the community; this will benefit the users themselves, as it will allow other developers to help maintain the contribution when changes to the core application occur. It also encourages new users to contribute features back to the community—and this can help build a great community spirit. The Developer’s Needs A number of today’s open-source developers started out as users searching for software to help them with a specific task or problem; they either found no existing projects, projects that don’t work or a project that is poorly managed and maintained. After getting frustrated at the lack of responses to bug reports and feature requests, they begin on the road to open-source development by either starting a new project, joining an existing development team or forking the initial project if they don’t agree with its direction. Some developers believe that programming is not something that can be easily taught and relate programming ability to either an art or a science. Others believe that developing software is a gift, and that this gift only has a value when it is shared with others. This value is increased when the software is widely used and only when the users can see the creative effort that has been put into the project rather than just the results output to a screen. Developers crave the satisfaction of writing that perfect function or streamlining that troublesome piece of code. Few could easily describe the joy and achievement felt after they complete a new feature or find a solution to a bug that has been the source of trouble for hours or even days. In fact, some developers define their machismo or intellectual abilities by the speed and efficiency of the code they create (whether they are willing to admit it or not). One of the hardest issues for any open-source developer to overcome is the discipline of maintaining the code once the initial fun of cre-
25
FEATURE
Open != $Free
ating it has evaporated. Since it’s a challenge they have already mastered, they have the urge to move onto the next fun thing. To make an open-source project successful, however, bugs must be fixed, code improved and maintained and the user base actively supported. Without this, the project is very likely to disappear without leaving a trace. There have been many studies that show opensource programmers often have loyalty to a project that goes beyond purely financial compensation. If you ask open-source programmers why they write free software, many are at a loss for words. Some of the leading figures in the open-source arena have their own way of describing why developers do it: Richard Stallman says that “People will program because its fun” and Eric Raymond calls it “scratching an itch.” However, one must wonder whether developers still think it’s “fun” and they still want to scratch that itch once emails from users who demand that the software be fixed or a new feature added “now”, but don’t want to pay anything for it start rolling in. Developers can also get frustrated when working on something that they consider to be fun. When incomplete or inaccurate bug reports are submitted that give little or no information as to how the bug can be recreated or as to the environment the user is using, or when the developer asks the user for more information to help track the bug only to get little or no response, things start going south pretty quickly. How questions are asked in forums and on support mailing lists is also a source of frustration for developers. Messages to mailing lists or posts in the support forums with titles like “Please Help” or “Desperately need help NOW” almost beg to be ignored, at least as far as the developers are concerned. This, in turn, frustrates the user as they get little or no help. Your Project Needs You Many users of open-source software are unsure of how they can help support a project and many don’t understand that contributing and supporting a project is not purely about financial rewards. So, what can users do to make the project become more productive that can also make the project more successful and benefit the community as a whole? Financial: There are a number of reasons why a user of open-source software should feel a moral obligation to contribute to a projects progress or support. The developers are contributing to the users’ business or interests, and it is both fair and in the long-term interest of the users to provide them with support, whether it’s financial, in the donation of code improvements, or just in helping out with bug reports or answering support questions on mailing lists. However, this obligation should not be turned into a requirement, as this would destroy September 2004
●
PHP Architect
●
www.phparch.com
the basis for the moral obligation and would quickly cause resentment within the community. The project should receive contributions because it deserves it based on the usefulness of the software, development of new features and user assistance, and not because they demand it. Support: Support questions can distract the developers from the job of actually developing code, especially when they are not asked in the most efficient way and require a lot of work to come to a positive resolution. As a user gets more familiar with a particular project, he will have the ability to answer many of the questions that as by first timers. If possible, users should always try and answer support questions on the mailing list or forums. This not only helps support the project, but also earns them “good karma” points from others users in the community. Users can also learn to ask support questions in a better way. For example, a set of guidelines can be put into place to help users write “good” questions the first time around: • Use meaningful and specific subject headers. • Make it easy to reply: don’t ask the developers to send the reply to a different email account, as this increases the time needed to answer a question and reduces the likelihood that you will get a response. • Send questions to mailing lists using plain text, as many developers do not accept HTML emails. • Be precise and provide as much information about your problem as you can • Don’t use words like “Urgent” or “Response Needed ASAP,” even if your problem is urgent, as this can be considered rude. • Be courteous. It never hurts to say “please” or “thank you.” • Follow up with a solution if one is found to help others that may have the same problem and will stumble upon your conversation thread on the mailing lists. Bug Reports: The main aim of a bug report is to enable a developer to reproduce the problem so that the bug can be tracked down and squashed in the smallest amount of time possible. Bug reports that are vague “it doesn’t work” laments are unlikely to be looked into by a developer in a timely manner. If, as a user, you have the ability to write code, then try and find fixes for bugs that are open and have not been assigned to a developer. If you find a fix, you can then pass it on to the project developers for inclusion into the software. This is a great way to learn how the software works and will also earn you lots of goodwill from others in the com-
26
FEATURE
Open != $Free
munity and from the developers themselves. Here are some tips on how to report bugs effectively. • Describe the symptoms of your bug carefully and clearly. • Describe the environment in which it occurs. Provide as much detail on the operating system and related system information as possible. • Describe any research you did to try and fix the bug before you posted the report. • Don’t provide information if you are unsure if it’s correct; many developers spend hours looking for a bug in a particular version of PHP only to find the user reported the wrong version when the bug report was submitted. Documentation: No-one likes to write documentation; least of all the developers themselves. It is well known that many open-source projects tend to have a major problem with providing any kind of decent documentation. The most common response to this complaint is “if you need documentation then write it!” Without any “primer” to start with, however, the learning curve is sometimes too great for new users. If you have the ability to write documentation, don’t hesitate to offer your services—after all, some documentation is better than no documentation, and your efforts may prompt others to help create a more complete document base starting from your work. Your Users Need You It isn’t all a one way street—there are a number of things the projects can do to help reduce the frustration of users and encourage users to provide both financial and non financial contributions.
they can gauge for themselves how much work is left to be done. Releases: Many users get frustrated when the time span between releases is large. I think that it is a generally good idea for open-source projects to plan for frequent releases with fewer new features than to aim for a large amount of new features that require years to develop. Releases should also be made in a controlled fashion, and not rushed out based purely on user demand. This ensures that only stable releases are supplied to users, which, in turn, will promote an influx of contributions from the community. Merchandise: Something many users are happy to purchase is official merchandise related to a project. Some good contenders in this area are versions of the software on an official CD or a boxed set, tshirts and printed manuals. This is a great way for a project to find the financial support it needs to continue development. Conclusion The penetration of open-source software into all aspects of our day-to-day lives can only be of benefit us all; however, today’s society is still encouraging the accumulation of wealth and, while this is the case, the progress of open-source will be held back unless we take the fact that developers need to pay their rent, too, into consideration, while at the same time recognizing the essential freedom that open-source represents. Organizations such as the Free Software Foundation are helping promote the benefits of opensource to companies and governments. It’s now the turn of the users of open-source software to spread the word.
Aims & Goals: Projects need to make sure that their aims and goals are clearly set out from the start, so that users know exactly what their direction is going to be. It is important not to deviate from these goals without explaining to the users the reason for the deviation. Progress Reports: Frequent progress reports are needed so that people can see how the project is being active developed. These reports need to be accurate and honest and should not promise deadlines or release dates that are not realistic. In the eyes of many users it is better to not provide a release date at all than to promise a release and then miss it. Outstanding tasks that need to be completed before the next release will be available also need to be clearly visible to the users, so that
September 2004
●
PHP Architect
●
www.phparch.com
About the Author
?>
Mark Evans is an IT consultant. He is also a member of the osCommerce Development Team and an active promoter of the benefits of opensource software for both businesses and consumers. He currently resides in Spain and can be contacted via his website: http://www.openphpsolutions.com, or by email:
[email protected] To Discuss this article: http://forums.phparch.com/173
27
Can’t stop thinking about PHP? Write for us! Visit us at http://www.phparch.com/writeforus.php
File Fixing Robots
F E A T U R E
by Ron Goff
Have you ever suffered a defacement of your website or even a fellow employee delete and important file from your server by mistake without you knowing until you got a phone call telling you to fix the server now? Then you may benefit from one of these robots.
I
t’s 6:00 AM, my phone is ringing, and I am coming out of a great dream to hear on the other end of my phone that “our website has been defaced, the home page is pointing to some weird site. Get in here now, or our clients are going to be calling and complaining if we don’t change it.” That’s a sadly common scenario for any one responsible for maintaining a web server. It is very discouraging when a defacement or deletion of files take place and could be easily taken care of by employing a fixer robot. Now let’s look at the same situation with a fixer robot in place and watching your home page. The hacker would gain access to your server and replace the index file, they would then leave thinking they have just accomplished one of the greatest accomplishments of their lives. In a few minutes, the fixer bot starts doing its rounds and checks on the integrity of files on its list. Upon running across the newly replaced file, it would immediately determine that its thumbprint doesn’t match the original print it has stored in a database and proceeds to replace the hacker-posted page promptly. Now, with the robot working you will not get any angry phone calls in the wee hours of the morning and, when you check your email later in the day, you will receive a warning from your bot letting you know that it replaced a file, and you can promptly fix the security hole. I don’t know about you, but that sounds much better than the first scenario to me. The idea behind a fixer robot is not that a replacement or enhancement to the security of your server, but can act as band-aid or buffer until you can administer a fix and it is also a safety net against the acciden-
September 2004
●
PHP Architect
●
www.phparch.com
tal deleting of important files. Its job is, essentially, to keep an eye on things, making sure that everything is working as it should. If something should go wrong, it will let you know and will fix the problem for you. Think of it as someone who is watching your files 24/7 and knows what to do if something happens, like an auto mechanic who does nothing but follows you around checking your car’s fluids all day—now, wouldn’t that be great! I know I could have used one of these bots many times with my own server troubles. This type of bot could also be easily modified for other jobs, such as intrusion detection or across-thenetwork backups, but we will discuss that a little later on. How It Works I know the idea of a robot lurking around inside your server may sound strange and dangerous, but here is a definition of a robot: a program that automatically performs “some action” without user intervention. This is exactly what we want—a program that constantly checks important files for us automatically without us having to worry about it. Your time could be better spent working on writing new code instead of keeping an eye on your old files. The robot is a combination of PHP, cron and a MySQL
REQUIREMENTS PHP: 4.1.2+ OS: Linux Redhat 7.2 Other: Crontab, lynx Code Directory: robot
29
FEATURE
File Fixing Robots
database. You can use any other databases, or even work from a text file—it is really up to you—but, in our example we will use MySQL, since it’s the one most developers will be acquainted with. The basic logic here is that you will list all the files that you want to be watched in a database. The program will initially record a value using md5_file() to get an MD5 hash, basically a thumbprint, of the entire contents of a file, rather than just the filename and lastmodification date. It will also copy that file to a secret location specified by you, and append a secret extension to the MD5 hash to come up with a completely new filename that will look like the following: / secret_directory/69e4a39b4b83c2c3bcd10ad730b4ea1b.sec
The bot will use the “secret” file to replace a file that suddenly changes without reason. Naturally, the bot will only record this information and create a backup of the original file once. If a file does need to be replaced by you, because, for example, you have created a new version or fixed a bug in one of your scripts, you will have to remove the MD5 variable from the database so that the next time the robot runs it will not replace your new file but create a thumbprint and a new “secret” backup copy. Note: The md5_file function works in exactly the same way as MD5 does: it takes the content of a string and returns a 32 character long value that contains a combination of hexadecimal digits. The difference is that md5_file() will create a unique hash or thumbprint using the contents of a file instead of from the contents of a string variable. If you md5_file() the same file at a later time, you will receive the same value if it has not been changed or modified. If it has been changed even just slightly, the MD5 value will not match with the previously stored value. It is next to impossible get the same MD5 value after you change a file or string.
Every hour, the robot will check the files using md5_file() for integrity and will replace every file whose new hash does not match the original MD5 value stored in the database. Naturally, you can change the interval between each time that the bot is executed—just in case you’re worried about the extra load that running it will place on your servers. Most of a web application’s files—the actual PHP files—rarely change; the information that does change September 2004
●
PHP Architect
●
www.phparch.com
is usually stored safely in a database. For example, forums, banner ad administration and content management systems (CMS) programs all use a database or text file to store their ever-changing content. This makes fixer bots viable. A fixer bot would not be a good idea for a static file that is always updated by hand or a log file being constantly updated by the web server— in fact, it would be more of a hassle than a help. On the other hand, index pages or configuration files that don’t change could be good candidates for the watchful eye of the fixer robot. Another bot-friendly scenario is working in a multiuser environment. It’s not uncommon for an important file to be deleted because another user does not see a reason for its existence. This could really damage your applications and upset your clients—something you definitely want to avoid. This has happened to me a few times, and I had no idea of the problem until I went to use a program and my code said it couldn’t find a file—very frustrating. Normally, you will want your bot to only check for files in specific directories. There are also some ways to write a robot that could check files across your entire server—for example to protect yourself from rootkits— but there are some permissions issues related to that, so we will review this type of option a little later. Database Setup There are only 2 tables in the database for this program. One is named config_fix and the other rep_wat. config_fix will contain the email and secret information, while rep_wat will contain all the information for your files. I would definitely suggest using PHPMyAdmin, a great open source project for administering a MySQL database, since the bot has no administrative interface. In the config_fix, which you can create with the SQL code shown in Listing 1, you will only want to create one record with these values: • Set the email field to the email address you want the messages sent to, e.g.:
[email protected] . • secret_dir is the path to the directory you have created on the server to store the “secret” versions of your files, like /usr/local/secret_dir/ .
Listing 1 CREATE TABLE `config_fix` ( `email` text NOT NULL, `secret_dir` text NOT NULL, `secret_ext` text NOT NULL ) TYPE=MyISAM;
30
FEATURE
File Fixing Robots
• secret_ext is the secret extension you will use on the backed up file—something like SGT, for example. This will be used for every file that is to be backed up. The program will replace the name and file and extension so that a malicious user could not just search for an identical filename or extension. The structure of the rep_wat table is shown in Listing 2. You will only need to set a few of the fields in the rep_wat table to get the bot working: • file_path is the path to the file that the bot is supposed to watch, something like /home/httpd/vhosts/yoursite.com/httpdocs . Note that you can watch any file on the server if the user account under which PHP runs has sufficient permissions to access them. • main_file is the file that the robot will actually watch. This file would contain something like index.php. Now that you have all that set, let’s look at the code and what is will be doing. On to the Code The source code for our bot, which actually fits in a single file called fixerbot.php, is shown in Listing 3. The script starts by retrieving all the “secret” information and the email address that error messages should be sent to for later use in the program. I use the connect.php include for my MySQL connection details—I didn’t include the listing for that here, since connecting to a database is trivial and you will have to add your own functionality anyway. Note that on lines 13-21, we check to see if the version of PHP under which we are running provides the md5_file function natively. Older versions of the interpreter will not include it (the function was introduced with PHP 4.2.0). Therefore, we simply redefine our version of it—which, although it won’t be as efficient as the newer version, which is written entirely in C, will at least provide the functionality we need. For newer versions of PHP in which the function is available natively, we do not do anything and use the built-in functionalListing 2 CREATE TABLE `rep_wat` ( `file_path` text NOT NULL, `md5_value` text NOT NULL, `time` timestamp(14) NOT NULL, `id` int(11) NOT NULL auto_increment, `main_file` text NOT NULL, KEY `id` (`id`) ) TYPE=MyISAM AUTO_INCREMENT=1 ;
September 2004
●
PHP Architect
●
www.phparch.com
ity instead. Line 23 prints out a welcome banner—once the bot runs under cron, this information will be stored in the cron log that is sent over to you (assuming, of course, that you decide to let it send information over via email).
“I like the idea of robot programs and I try to implement them whenever I am assigned repetitive tasks.” Next, on lines 25-46 we proceed to create backups and MD5 checksums for all those files that have not yet been processed. It’s important to do so at this stage, so that later on when we perform the actual checks on file integrity we do not have to make a distinction between those files that have been backed up and those that haven’t. As you can see, files that are not backed up are those that do not yet have an MD5 hash—you could perform additional checks if you wanted to, but that’s not really necessary if you have a reasonable amount of control over the machine (and you need to in order to use cron anyway). To create the backup, we start by looping through all the results from the database query. On lines 30-34, we build a complete path to the source file in $file_to_MD5, then pass it to the md5_file function. Next, we create the “secret” filename and store in $secret_file_copy. We print out the name of the files we are trying to copy and then attempt to perform the copy itself, printing out either a success or failure message. Lines 48 through 73 contain a similar loop, except this time we are checking files for which we already have a secret backup and an MD5 hash. These are extracted from the database using a query that is, essentially, the exact opposite of the previous one. With this information firmly in hand, it’s time to make the rounds and compare the file to the MD5 value that has been stored for each. We start by determining a file’s original filename and determining its current MD5 hash. If it corresponds to the one we have cached in the database, we simply print out a notice and move on to the next file. If it doesn’t, then it follows that the file has been modified and we need to overwrite with the “secret” copy. To do so, we use the copy function, making sure that
31
FEATURE
File Fixing Robots
we can actually copy the file and, if that’s not the case, print out an error message. If the copy is successful, we also send an e-mail message out to the address that we have stored in the $email_user variable at the beginning of the script. Setting Up Crontab In order for the bot to run unattended, you will need to set your cron to execute it at specific intervals. Crontab is a program you can use to schedule the execution of a program at certain times—it is specific to Unix-like operating system, but on a Windows machine you’d be able to use the System Scheduler. In this example, we will set things up so that the bot script will be accessed through a web server. This is not necessary if you have the CLI version of PHP available— which, in fact, would give you a higher degree of flexibility when it comes to user impersonation. However, the CLI version of PHP is not installed on all servers, and the chances that you will have a web server available will be much higher. There are, clearly, a number of precautions that you need to take—you don’t want anybody from the outside to be able to access your bot, as doing so would give them precious information about your filesystem. Thus, the first step will be to protect the bot script so that it can only be accessed locally. You can do so using any of the Apache configuration facilities, such as httpd.conf or an .htaccess file. In order for cron to be able to run the bot, you will also need to have lynx, the text-based web browser, installed on your server (wwget will do just as well if you have it). Lynx can be used both on Linux and Windows and is a handy little text-only browser. I would run the robot every hour, but you could run it every minute, every day or just every Friday—whenever you want, really. In my case, the cron entry that runs it looks like this: //05 * * * * lynx -dump http://yourwebsite.com/robots/fixer_bot.php
Where: • 05 is the minute I would like it run • the first * is the hour position—an asterisk means “every hour” • the second * is the day of the month,—an asterisk here means “ever day” • the third * is the month (set to “every month”) • the fourth * is the day of the week (again, set to “every day”)
Listing 3 1
This will make cron execute the bot at :05 minutes of every hour—for example, a 12:05, 1:05, 2:05 and so on. Once you enter this command in your cron (you can September 2004
●
PHP Architect
●
www.phparch.com
32
FEATURE
File Fixing Robots
use crontab –e at the shell to edit your crontab file), your robot is powered up. It will start checking the files that you listed in the database and will continue to check them indefinitely (or, at least, while the cron daemon is running). You can now rest a little easier because your robot has taken on all the hard work of watching your precious files! Let’s Review Once you list all your files in the database, you can just relax and let the robot do all the work for you. It will locate the files, get an MD5 hash for each one of them, copy and rename them to a secret location and keep a watchful eye on them for changes, sending you a warning if something does change. Easy, eh? Now—if I could only make a robot that takes out my trash… With a slight rewrite of the robot, you could transform the fixer bot into a fixer server that logs in from server to server using FTP, checking on files, recording thumbprints and backing files up remotely. You could then keep an eye on multiple servers from a central location and keep important files for your applications and websites up and running. This would be a great thing to have in a busy web server network. Another variation that comes to mind is an intrusiondetection bot, or early warning system, which I mentioned before. An intrusion detection system watches key files (such as operating system files) for changes. If a change takes place, it will alert you to check the server and verify who changed the file in question. This can be very helpful in preventing any malicious activities on your server before it is too late. Being able to react in a timely fashion when a hacker has penetrated your server is a great advantage—it certainly beats waiting for something to go wrong before fixing it. Here is a simple way to use the code in Listing 3, with a simple modification, as an intrusion detection robot. Instead of copying the files and getting the md5_file hash, you could just record the hash values of all important system files without copying them to the secret location. If one of those files on the database list is unexpectedly modified, the bot will tell you there might be an intruder and then you can take the appropriate action. You will definitely want to keep in mind that keeping tabs on constantly-changing files (like logfiles) is not a good idea, as the bot will try to warn you continuously until you update the database, thus raising a lot of false alarms and becoming ineffective (ever hear of the boy who cried wolf?). Keep in mind that something like this intrusion detector, or even a cross-site file fixer, will require you to
change some security settings in your php.ini file— specifically, open_basedir, so be sure to review security issues with this. The open_basedir setting protects the servers file system from unauthorized access in a multiuser environment by allowing the web host user access to only their folders. It might not be a great idea to change it on a web-hosting machine serving multiple clients, since this would open the access to the all system files. Some hosting setups, on the other hand, do allow the open_basedir to be set on a per-domain basis, usually through a virtual host configuration file. This would allow just one domain access to all the files on the server. Naturally, you will need full control of a server to be able to build this type of robot. One other benefit if you do decide to change the value of open_basedir is that you will be able to place the robot and the backup files anywhere you like on the server. This, in turn, will protect the file fixer from any unwanted deletions or changes by hackers or users who have only access to the web services. Also, if you can change the open_basedir option in the php.ini file or through a virtual host configuration file, you will be able to have the fixer robot check any file on the server, rather than just web files. If you run multiple domains, you could also offer this type of protection to your users as an add-on service. You could even give your users access to an administration module for the robot, so that they can select some files to protect as well. There are many directions you could take the robot—it is really up to you. Another idea to think about (without getting too paranoid about it) is a fixer bot for the fixer bot. Say a hacker or fellow employee does find the fixer bot (program) and deletes it. Well we could instantiate another
“The robot is a combi-
nation of PHP, cron and a MySQL database”
September 2004
●
PHP Architect
●
www.phparch.com
Dynamic Web Pages www.dynamicwebpages.de sex could not be better | dynamic web pages - german php.node
news . scripts . tutorials . downloads . books . installation hints
33
FEATURE
File Fixing Robots
bot that keeps its eye on the original file fixer and will fix it if it is modified—something like a back up of a back up, to be used only in extreme situations. One final note—and something I came across executing md5_file() on an older version of PHP (4.1 and lower). Since 4.1 doesn’t support md5_file(), I had to use my own function and, with large files, the file function may not work too well, particularly considering the amount of memory that you will be consuming. A possible solution (assuming you can’t or don’t want to upgrade your version of PHP) is to use fopen() to open the file and then examine it one small chunk at a time, carrying your hash along at every iteration and making it part of the string that is md5()’ed. This way, your results won’t lose in accuracy and you won’t need to use excessive amounts of memory. Conclusion Now, I am sure you see some situations in which the file fixer robot could be stopped or disabled by a hacker or fellow employee, if the person happened to know that a file fixing robot was even being used. However, as a day-to-day, entry level security measure it works quite nicely and may save you a few headaches, while giving you a little time to apply an actual fix to a security issue. Just remember to review and research any security risk you might run into if you want to perform cross-file fixing or intrusion detection. I like the idea of robot programs and I try to imple-
ment them whenever I am assigned repetitive tasks. It frees up my time—and bots never sleep: they constantly do their job without a single complaint. You can save yourself a lot of time by reviewing your mundane server tasks and develop your own robots to take some of the load off from you and free up your time for more important work. Best of all, well-written bots will only cost you the initial time required to set them up. Hopefully, this idea will help with the annoyances of running a server and keeping programs working.
About the Author
Have you had your PHP today?
ER FF LO IA EC SP
Subscribe to the print
?>
Ron is the technical director/senior programmer for Conveyor Group, a Southern-California based web development firm. His responsibilities include technology development, programming, IT and network management, strategic research, server systems management (webmaster), and website projects leader.
To Discuss this article: http://forums.phparch.com/171
http://www.phparch.com
edition and get a copy of Lumen's LightBulb--a $499 value absolutely FREE*!
In collaboration with:
NEW COMBO NOW AVAILABLE: PDF + PRINT The Magazine For PHP Professionals
* Offer valid until 12/31/2004 on the purchase of a 12-month print subscription
September 2004
●
PHP Architect
●
www.phparch.com
34
R E V I E W
Windows GUI Product Review
IonCube Encoder by Peter B. MacIntyre
P R O D U C T
T
he product I’ll review for you this month is a new graphical user interface (GUI) that manages the workings of the IonCube Encoder software suite. I took a little while to understand the benefits of encrypting my already safe (or so I thought) PHP code. Now, it is true that once code is interpreted from a server that it is converted to simple HTML code. But what happens when you have Figure 1 PHP code (be it a script or an entire application suite) that you want to turn around and sell to multiple customers, or that you want to make available on a limited basis (for example for a yearly basis) and on which you want to be able to retain full copyright? Those are situations in which you will want to encrypt or encode your source code to protect the intellectual property and expertise that is contained within your product. In this review I will be covering 2 aspects of the product: the GUI itself and what features it possesses, and the distribution of the encoded files and what options are available in that vein.
up. Figure 1 is the initial appearance of this tool. Now, at first glance there is not much to look at here, however, you must keep in mind the overall point of this tool—that is, to assist you in the encryption of what you have already created in PHP in the simplest way possible. This tool also has a “wizard” feature that you
First Things First What we need to look at here first is the Windows GUI for this product. After a straightforward and painless installation, I opened the product September 2004
●
PHP Architect
●
www.phparch.com
36
PRODUCT REVIEW can access to help you with determining the location of the source files and the placement of the results of the encryption process. The wizard is launched by clicking on the second tool bar item. The steps within the wizard are simple: it will ask you where your source files are located, what they are called (and are there any that you want excluded, copied, and so on) and where you would like the resulting encryption files to be stored. This is a very basic approach to the overall process of encryption, so the wizard, in my opinion, is a great way to start. Once you start using this GUI tool for serious encoding, however, the wizard will no longer be necessary. Once you have the wizard completed, you will want to encode the files that you designated for that purpose. Clicking on the green arrow on the toolbar (third from the end) will start
IonCube Encoder
Figure 2
Figure 3
September 2004
●
PHP Architect
●
www.phparch.com
37
PRODUCT REVIEW the encoding process and the results of the example files that I was playing with are shown in Figure 2. This shows the results of the encoding and what files were affected, as well as any error messages that may have occurred. A look at the code that is generated after the encryption process is also interesting to see. Take a look at Figure 3, showing the PHP code in a before and after setting. Now, what makes this tool so much more valuable is the options that are available once the novice user gets past the basics of the wizard, which are located under the Project Settings window. Taking a look at Figures 4 and 5 will show you just two of the most important ones. Figure 4 shows the options on what is possible on the target side of the encryption process. Decisions like sending all the encrypted entries to a zip file then sending that zip file to an FTP site and bundling the loader (more on the loader later in this review) in with the encoded package are all accessible from here. These options are neat and powerful all at the same time, and will be great time savers if you want to use them. The main feature in Figure 5 is the setting of environ-
●
ment variables. You can use the ones that are supplied here, or you can add your own variables to this list. You could add a copyright line, for example. Each of these environment settings will then be used in the headers of the encrypted files, so that if the files are ever opened, the first thing the viewer will see is this system environment information. If you do have settings created under this tab, the details for their respective values will be requested once you OK the settings window as a whole. Another very important option that is available in this settings window is under the Restrictions tab. Here, you can set when the encryption file expires. You can set it to last for a certain number of days or you can set it to expire on a specified date. This is all very handy if you are sending out a demo of your software, for example. All these settings can then be saved as projects so that, once you have everything set up the way you want, you can save the many options once and then regenerate the files at a later time if so desired. A neat feature of this tool is that, if you are looking at the project file in Windows Explorer, you can start the encryption process without having to open up the GUI application. Figure 6 shows you that once you locate the Figure 5
Figure 4
September 2004
IonCube Encoder
PHP Architect
●
www.phparch.com
38
PRODUCT REVIEW project file, you simple open the pop-up menu with the right-mouse key and select the “Encode Project” option. I mentioned that there is another side to this tool that should be discussed as well. That would be what is required on the receiving side of the equation. What does the customer need in order to actually use your encoded files or application? The answer is that you need the “Loader,” a small application that can be acquired from the IonCube web site. Naturally, PHP supports several different platforms, and so does the Loader: there are versions for Windows, Linux, Solaris— to name a few. Once you download the Loader from the ionCube website, there are specific setup instructions for each operating environment. Just look into the ZIP or tar file that you download and the readme file will tell you all you need to know. In Windows, for example, it is just a matter of taking a DLL and modifying the php.ini file with the [zend_extension_ts] directive. I think that this Encoder has great value in the business world. The tool itself is well written and has many different options. It is easy to learn and generates encrypted files at break-neck speeds. The on-line tutorials are also quite valuable and helpful.
September 2004
●
PHP Architect
●
www.phparch.com
IonCube Encoder
I would recommend this product to any commercial PHP developer who wants to distribute their applications in a safe and, more importantly, secure way.
Figure 6
About the Author
?>
Peter MacIntyre lives and works in Prince Edward Island, Canada. He has been and editor with php|architect since September 2003. Peter's web site is at: paladin-bs.com
39
The Ultimate PHP 5 Shopping Cart Part 2: Taking the Customer’s Money
F E A T U R E
by Eric David Wiener Imagine a user-friendly shopping cart that allows the customer to make instantaneous order adjustments without waiting for the page to refresh, queries UPS for all the available shipping rates and charges the customers credit card upon checkout. That same cart also handles the SSL security automatically, maintains function regardless of browser and is XHTML 1.1, Section 508 and W3C WAI compliant.
L
ast month, we created the first two elements of our ultimate shopping cart: the product adjuster and the cart contents displayer. The product adjuster adds and removes items from the cart, as well as adjusting the item quantities. The cart-contents displayer returns a list of the products in the cart along with their characteristics such as name, description and price. In addition, we have created the shipping class that retrieves the exact shipping rates from UPS and the constantData class that contains two associative arrays populated with constant data. The first array maps the US state abbreviations to their proper names and the second array maps month numbers to month names. This month we will complete the cart by creating the checkout element that gathers the customer data such as name, address and credit card number and then charges the credit card and stores the order in the database. Credit Card Processing There are many ways to accept credit cards online, such as PayPal, third-party billing companies and getting your own merchant account and gateway. For an established business, the ideal option is to use your own merchant account and gateway. Some of the benefits of this method are significantly lower fees, total control over the transactions, quickest transfer of funds and the ability to completely integrate the customer interface into your own web site. The cons, on the other hand, are relatively insignificant to an established business entity but may deter the casual home-based business. A merchant account will
September 2004
●
PHP Architect
●
www.phparch.com
usually require being linked to a business checking account and in order to open one, you will need to own a corporation or other legal business entity. In addition, some merchant account providers will also charge start-up fees, statement fees and have a minimum monthly discount rate that they collect. The discount rate, also known as the buy rate, is the percentage that the merchant account provider takes from each purchase. Despite the choice of wording, you are not getting any sort of discount, in actuality, the merchant account provider profits at the expense of the merchant. The statement fee is essentially just a monthly fee—a statement will typically be generated monthly, even if you have no transactions. On occasion, you will also find a merchant account charging a transaction fee, which is a flat rate charged each time an authorization request is submitted or a refund is granted. Keep in mind that transaction fees apply to charges, credits and even unsuccessful attempts to charge a card that does not validate, perhaps due to insufficient funds or because it was reported lost; regardless of the reason, the merchant gets charged the transaction fee. If you’re wondering why the merchant bank makes you pay a transaction fee even if a credit card operation is declined, their expla-
REQUIREMENTS PHP: 5.0 OS: Any Applications: MySQL & OpenSSL required Code Directory: cart
40
FEATURE
The Ultimate PHP 5 Shopping Cart: Part II
nation is likely to be that “those are the ones you want to know about anyway.” Make no mistake about it— the entire credit card industry is skewed against the merchant and, if you’re not careful, your risks running an online business can be extremely significant.
easy as filling out a form and any good merchant account provider will help you do this. From a programming perspective, the company that provides the merchant account is irrelevant, as long as they have a payment network that is served by the gateway.
The Merchant Account Since online credit card processing does not require any card-swiping equipment like a traditional cardpresent business does, the merchant account is really just a number that allows the credit card gateway to transfer funds to your bank account from the customer’s credit cards (and vice versa). Think of the merchant account number as an abstraction layer to your business checking account number. Most merchant accounts will automatically grant the ability to process Visa and MasterCard but in order to process Discover, American Express and others, you will need to send in an application to each card company directly. This is as
The Credit Card Gateway The credit card gateway is basically a computerized replacement for the card-swiping equipment. The gateway is an online service that receives a request to do a credit card transaction, forwards the request to the credit card clearinghouse, returns a response code and if the transaction is successful, instructs the financial institutions to transfer funds from the credit card account of the customer to the checking account of the merchant. Note that “online” does not necessarily mean that the gateway is Internet-based (although, these days, most are). The card-swiping terminal that sits at your corner grocery store connects to a gateway using a modem through a phone line, for example. And, in the old days, Internet-based stores used to do the same! These days, AuthorizeNet is the most popular gateway by a landslide and, therefore, we will use them in this article. Technically, you could use any gateway that you like, although you will need to adjust the connection string along with the request and response data as needed. Charging a credit card is usually a three-step process entailing authorization, capture and settlement. Authorization is the act of checking the validity of the card and verifying that the account has sufficient funds to cover the transaction. A capture is the act of designating the funds for transfer from the customer to the merchant. Settlement sends all the transactions that were captured to the financial institutions for fulfillment. Settlement is usually done only once a day in a batch process containing all the captured transactions and it is completely automated in most credit card gateway systems. Depending on the time-of-day set to batch out all of the captured transactions for settlement, transfer of funds may be delayed up to 24 hours, but the delay also grants the merchant with up to 24 hours to void a transaction before any funds are transferred. After a transaction gets settled, you can still give a refund for up to 120 days, but you can no longer void the transaction. A typical transaction for a purchase will include both authorization and capture at the same time.
Listing 1 1
September 2004
●
PHP Architect
●
www.phparch.com
Fun with Math Since a transaction fee is charged every time the gateway receives an authorization request, regardless of whether the card clears, has insufficient funds or is of a card type that your merchant account is not set up to accept, it is important to verify the validity of the card number before attempting to authorize a charge—at
41
FEATURE
The Ultimate PHP 5 Shopping Cart: Part II
the very least, you’ll weed out people who mistype their card number or intentionally type in incorrect information without having to spend the money and time waiting for the gateway to do it for you. Before we begin examining the verification algorithm, however, it is helpful to understand the rationale behind the digits appearing on a credit card. Your credit card number is not a string of arbitrary random digits, but rather a compilation of three parts: the credit card identifier, the account number and a check digit. The first few digits identify the type of card, for instance, Discover cards all begin with 6011, Visa cards begin with the number four and a Diners Club card will have the first three digits fall in the range between 300 and 305. The remaining digits that follow the identifier up until the second to last digit make up the account number. The last digit is called the check digit and is generated by the Mod10 algorithm, also known as the “Luhn” algorithm.
The Mod10 algorithm can be calculated with rudimentary math skills. Starting from the right-most digit on the credit card add up every other digit going from right to left. If your card number were 6011012345678907 then you would add up 0+1+1+3+5+7+9+7, which would equal to 33. Next, each of the remaining digits is multiplied by two and, if the resulting value is a two-digit number, those two digits are then added together. For example, 2 is doubled into 4, 7 is doubled into 14 but then the 1 and 4 are added together making 5, so essentially 7 doubles to become 5. The remaining numbers in our example: 6 1 0 2 4 6 8 0 equate to 3 2 0 4 8 3 7 0. Now we add up those numbers, 3+2+0+4+8+3+7+0 equating to 27 and add that to our original number of 33 for a total of 60. For the card to be valid the resulting number must be an increment of ten, in our example the value of 60 shows that the card is valid. With this number mapping algorithm, we start off
Listing 2 1
September 2004
●
PHP Architect
●
www.phparch.com
46
FEATURE
The Ultimate PHP 5 Shopping Cart: Part II
has 10 digits and the billing ZIP code has 5. The reason that we only check the billing ZIP code and not the shipping zip code is that if the shipping ZIP code were invalid, UPS will not accept it and an error will be thrown stating that there is no service available for it. We then use regex to check for a valid email address on line 51. Following that, we validate the credit card number with the mod10 method of the cc class that we created earlier. We also verify on line 55 that the expiration date has not yet passed by concatenating the current year and month into a single four-digit number, doing the same to the submitted year and month and checking which value is greater. Later in the code, we will want to escape any characters in the user data that cannot be directly entered into a SQL string. We will be using array_walk() to pass each element in the $_POST array through the mysql_escape_string() PHP function. In order to do that, we will need to utilize the mysql_escape_string function by reference, so we make a function called mysql_escape_string_by_reference() that does exactly that. We will also need to trim the white space off the data, so we will create a trim_by_reference function as well. We immediately pass the entire $_POST array through trim_by_reference() via array_walk(). This gets rid of the heading and trailing white space on all the submitted fields in one easy step. If the magic_quotes_gpc setting in the php.ini file is set to On, the POST array will have the escapeslashes function applied to them at each phase of the shopping cart, leading to some very messy values. To counteract this, we create a stripslashes_by_reference function and then on line 71 we check if magic_quotes_gpc is enabled with the get_magic_quotes_gpc PHP function; if it is, we pass the $_POST array through stripslashes_by_reference() via array_walk().
We will need to retrieve the shipping rates from UPS regardless of which phase of the checkout process the customer is in, as long as the shipping ZIP code has been set, so we create a new instance of the shipping class on line 74 and set the basic properties that are defined in the global.php file: xmlaccesskey, userid, password and originzip. The weight variable comes from the inclusion of showcart.php that calculates the total weight for the order. We set the shippingzip property to the value in the shippingzip session and then we attempt to get the rates via the getRates method. If we get a return value of false (line 81), we add the UPS error message string stored in the errormessage property to our error array. If the user has already chosen a shipping rate, signifying that they are at least in the second step of checkout, the shipping method would be set in the ServiceCode variable of the $_POST array. If we know which shipping method has been chosen, we can calculate the tax and total for the order. First we create local variables for the service code and shipping cost (lines 85 and 86), which we will need later. Then, we add the shipping rate to the subtotal and save the result to a variable suitably named total. On line 88, we check if the billing state is the same as the state that the merchant operates out of; if it is, we must charge sales tax, so we divide the tax rate by 100 to get the percent value, multiply by the total and round the number to two digits following the decimal point and save the results to the taxcost variable. If it is an out-ofstate sale, then taxcost is set to zero. Finally, taxcost is added to the total. Taking Action The action variable is set to default to showform on line 92; this variable will be used shortly to determine the course of action to take. Before we are sure of what to
Listing 3 function datacopy() { if(document.getElementById(‘samedata’).checked) { var inputtags=document.body.getElementsByTagName(‘input’); var inputtagslength=inputtags.length; var billingobject; for(var x=0; x
Eric David Wiener applies his web solutions expertise at IPRO where he is the Webmaster. IPRO is New York State's healthcare quality improvement organization and the leading QIO in the US. Eric's consulting experience focuses on business management and technology, providing clients with interactive solutions in the areas of e-commerce, e-learning, and accessibility. You may see his work at www.dynamicink.com and contact him at
[email protected] To Discuss this article: http://forums.phparch.com/174
51
Cheap Manpower Calling External Programs from PHP Scripts
F E A T U R E
by Michal Wojciechowski
The ability to launch external programs from PHP scripts enables the developer to extend the functionality of an application beyond the capabilities of PHP functions and extensions—all this with little effort and in a short time.
O
ne of the biggest advantages of PHP is its flexibility, achieved through the wide range of builtin functions, extensions, and PEAR packages available to the developer. The functionality provided by these mechanisms makes PHP suitable for many different needs. Most common tasks can be accomplished with basic functions and extensions, while other applications may require additional packages. However, from time to time all developers find themselves faced with a problem for which existing and readily available solutions are not sufficient. In some cases, necessity becomes the mother of invention and the developer comes up with a new solution (which he will hopefully share with the community). Sometimes, however, there’s no time (or money) for invention—for example, when a project you’re working on is due the next day and a major feature is still missing. You need a working solution, and you need it immediately. At least in some cases like this, existing software may come to the rescue. Many tasks that are still hard to accomplish directly from within PHP are easily taken care of by applications that have been around for years. Piggybacking on these programs is usually much simpler than attempting to implement the same functionality in PHP (which, in many cases, may seem like reinventing the wheel). PHP provides several core functions for executing programs on the server’s system. The developer’s task is reduced to using these functions to create a kind of interface to call the desired application and collect the results—pretty much what Unix applications have been
September 2004
●
PHP Architect
●
www.phparch.com
doing for years. This can be usually achieved with a few lines of code, while a solution implemented directly in PHP may require hundreds. Program Execution Functions Program execution functions are part of the PHP core, so no extensions are required to use them (which is great if you’re running on a shared host). Be aware, however, that safe mode severely limits which programs you can execute, as we’ll see later on. The functions can be divided into two groups. The first contains functions that execute a program and return (or display) its output and exit code: • exec() - executes a command and returns the last line of its output • passthru() - displays the raw output of the command • shell_exec() - returns the output as a string (equivalent to the backtick operator) • system() - displays the output, does a flush after each line. These functions can be described as “fire and forget”—you execute a program and wait until it finishes
REQUIREMENTS PHP: 4.3.x OS: Unix, Linux Other: txt2html, wvHtml, unrtf Code Directory: manpower
52
FEATURE
Cheap Manpower
its operation, not worrying much about what the program does in the meantime. There’s no interaction between your PHP code and the running program while it is being executed. When the function returns, the program has already terminated. The following example illustrates a typical usage of one of these functions: system(“uname”, $ret); if ($ret != 0) { print “uname failed\n”; }
The system() function executes the uname system command, which, in a Unix-like environment, displays the server’s operating system type (e.g.:. Linux or FreeBSD). The exit code is stored in the $ret variable— a value other than zero here indicates an error. There is also a group of program execution functions that operate in a slightly different way: they make it possible for a script to “spawn” a process and communicate with it: • popen() - starts a program and opens its input or output as a pipe • pclose() - closes a pipe opened with popen() • proc_open() - starts a program and opens file pointers for its input and output • proc_close() - closes a process started with proc_open() and returns its exit code • proc_terminate() - kills a process started with proc_open() (introduced in PHP 5) The popen() and proc_open() functions start a new process that runs in parallel with the PHP code. You can interact with the running program by writing to its
Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
September 2004
●
PHP Architect
●
www.phparch.com
input and reading from its output. The ability to communicate with a running program that these functions provide makes them especially useful for working with interactive applications that expect user input. Listing 1 shows an example of PHP code that launches gnuplot, a function plotting program, and uses it to create a graph of the sine function in PNG format. In the example, we need bi-directional communication with gnuplot—we’ll be writing commands to its input and reading results from its output—so we use the proc_open() function (ppopen() only allows for either reading or writing to the process, but not both). The function sets up the $pipes array for input and output. Commands can then be sent to $pipes[0] (which points to gnuplot’s standard input) with the fwrite() function. In turn, gnuplot writes back the image data to its standard output, which we read using fread() and pass to the client’s browser with the appropriate HTTP header (CContent-Type: image/png). This example is slightly inadequate with respect to the unique functionality provided by external programs, since it’s not particularly hard to create a graph in PHP and using gnuplot for this task doesn’t give any bonus—the point was only to show interaction between PHP and a running program. We will now present a more interesting and practical example of using external programs to make a hard task easy. Story of an On-line Story Archive Imagine the following scenario: our mission is to build an on-line story archive, where amateur writers could publish their stories, making them available for the visitors’ reading pleasure. The stories will be uploaded using a standard HTML form-based file upload mechanism. To make the archive more user-friendly, we have decided to support a number of file formats for the submitted files, that is, TXT, DOC and RTF, so that the author isn’t required to convert his story to a specific format before sending it in. On the other hand, we would also like to make the archive accessible for any visitor, regardless of whether he or she can open a document saved in a specific format or not. Therefore, we need to convert each file to a format readable by every browser, which is,of course, HTML. While converting a text file to HTML is a trivial task (simply placing the file contents inside ... would do), converting DOCs and RTFs isn’t. So far, there are no extensions or packages that are capable of performing such conversion. Doing it from scratch would be a complex and time-consuming task. DOC, RTF, and several other file formats, widely used by Microsoft Windows users, have been a thorn in the side of Unix user community for years. Until recently, few or no applications were able to handle these for-
53
FEATURE
Cheap Manpower
mats in a usable way. However, as a half solution, programmers have created many conversion programs capable of turning DOC/RTF documents into TXT, POD, HTML, TeX and other “Unix-friendly” formats. Examples of such utilities include antiword, catdoc, word2x, unrtf just to name a few. Thanks to the existence of program execution functions and conversion utilities, our task can be greatly simplified to just launching the appropriate conversion program. First, we need to choose the software to use. As we have already said, we want to support the TXT, DOC and RTF formats (we could extend this list to other document types, but, for the sake of simplicity, we’ll stick with these three). Each conversion program must be capable of producing HTML output, as that’s how we want the stories to be kept in the archive. A quick search, followed by several tests, enabled me to come up with the following selections: • txt2html - the name says it all—it is a textto-HTML converter • wvHtml - one of the scripts of the wv utilities suite, it converts DOCs to HTML • unrtf - an RTF file converter, capable of producing HTML output. To keep our work simple, we will not implement any user account management—no registration or login will be required to submit a story. We’ll keep the stories in a basic MySQL database; the database creation script is shown in Listing 2. We’ll use a single table, which will hold information about the story’s author, its title, and the HTML file containing the story itself. What we need now is a form that enables the user to upload his story. We’ll provide a very simple form with two text input boxes and a file selection element, laid out nicely in a table as shown in Listing 3. Two important parts of the form are the enctype attribute and the MAX_FILE_SIZE hidden input element. The former determines the mechanism used to encode the form’s contents—we use multipart/form-data, as that is the appropriate enctype for file upload. The latter sets the maximum size of uploaded file that will be accepted by the PHP script. We set it to 128 kilobytes Listing 2 CREATE DATABASE stories; USE stories; GRANT ALL ON stories.* TO 'stories'@'localhost' IDENTIFIED BY 'rocket43'; CREATE TABLE stories ( author TEXT NOT NULL, title TEXT NOT NULL, file TEXT NOT NULL );
September 2004
●
PHP Architect
●
www.phparch.com
(1131072 bytes). The finished form is shown in Figure 1. If the uploaded file is larger than the maximum size we specified, an error occurs. Therefore, the first thing we do before processing the file is check for errors: if ($_FILES[‘file’][‘error’] == 1 || $_FILES[‘file’][‘error’] == 2) { $error = “File too big”; }
An error value of 1 means that the file size exceeds the upload_max_filesize directive specified in php.ini, while a value of 2 indicates that the size exceeds our MAX_FILE_SIZE set in the form. The $error variable will be used in our script to track and report errors. We will now build the target HTML filename, based on the author and title values provided by the user. To avoid spaces in the filename, we replace them with underscores. The resulting filename will have the form author_title.html; for example, if the author introduced himself as John Foo, and his story was titled “Story of the Quick Brown Fox,” then the file will be named John_Foo_Story_of_the_Quick_Brown_Fox.html. Since the filename is based on user input and will be used in a shell command, we need to make sure it will be safe. For that purpose, we use escapeshellarg() to escape any “dangerous” character away: $author = str_replace(‘ ‘, ‘_’, $_POST[‘author’]); $title = str_replace(‘ ‘, ‘_’, $_POST[‘title’]); $target = escapeshellarg($author.’_’.$title.’.html’);
We now need to determine the format of the uploaded file. There are several methods for doing this, such as using a “magic” file recognition utility (for example file) or checking the MIME type. The former is most accurate, but requires some additional work, while the latter only requires that the appropriate Listing 3 1 94
95 | 96 | 97
98 99 100 101
102 103 104 105 106 107 ERROR: 108 109
110 111 112 113 Submit your story:
114 <small>(TXT, DOC, and RTF format is accepted. 115 File size limit is 128K.) 116 117
This code uses the cpp2html utility to display the specified C source file ($$file) as HTML. The author assumes that all source files are stored in the sources directory. The files can be accessed simply using an URL like source.php?file=helloworld.c . Besides being very useful, however, the script is also very dangerous. It doesn’t take a genius to discover that this script can be used to read virtually any file on the server’s filesystem—actually, any file that the HTTP server has read access to. A malicious user could type an URL like source.php?file=../../../../etc/passwd and see the contents of the /etc/passwd file. The script assumes that the accessed file is located in the sources directory, but does nothing to ensure that it really is. Scary? Wait, there’s more to come. The Unix shell has the ability to execute a sequence of commands separated by semicolons. Since program execution functions make use of the shell, this method works for them as well. An attacker could supply an URL of the form source.php?file=;/sbin/ifconfig, which expands to -d < the command line /usr/bin/cpp2html sources/;/sbin/ifconfig. The shell interprets it as two subsequent commands, and first executes cpp2html, then ifconfig. The attacker gets the results of both commands—in our case, he manages to view the configuration of system network interfaces returned by ifconfig. The ability to execute arbitrary commands is the first step for an intruder to successfully compromise your system. For a skilled attacker, gaining more privileges September 2004
●
PHP Architect
●
www.phparch.com
could be just a matter of time. The problem here is caused by the fact that certain symbols, like .. or ;, have special meaning to the shell and should be considered dangerous when executing commands from PHP code. Allowing them in parameters provided by users is asking for trouble. A rule of thumb is to avoid relying on user-supplied values used as command line arguments or path elements. If you really have to, be double sure that there are no dangerous characters in the command line or that they have been properly escaped—the escapeshellcmd() and escapeshellarg() functions will help you with that.
“External programs inherit their execution environment from the script that called them...” Safe Mode For a higher level of security, you might want to enable PHP safe mode on your web server. Safe mode places several restrictions on program execution functions. First of all, when it is enabled, the functions can only execute programs located in the directory specified in the safe_mode_exec_dir configuration directive. Also, the shell_exec() function is disabled (as well as its equivalent backtick operator). Other commands are restricted to have only one argument passed to the executed program. If more than one word follows the program name in the command line, they are treated as one quoted argument, eg. cat foo; cat bar becomes cat ‘foo; cat bar’. There are also restrictions with respect to environment variables. The script is allowed to set only those variables whose names begin with prefixes specified in the safe_mode_allowed_env_vars configuration directive. In addition, safe_mode_protected_env_vars contains a list of variables that cannot be changed at all. Chroot Environment If you are even more concerned about the security of your web server when calling external programs, you might consider running it in a chroot environment. Chroot creates a separate directory structure for an application, preventing it from accessing the root filesystem. Even if an attacker succeeded in compromising the web server, the impact of his actions would be minimized to the chroot environment.
57
FEATURE
PHP 5: Beyond the Objects
Setting up a chroot environment requires some effort from the system administrator. The environment needs to be properly configured to run the web server software as well as the additional programs. The administrator must ensure that the programs can be successfully launched, are able to access the required libraries, can create temporary files (if necessary), and so forth. External Programs Security As a final note on security, be aware that in addition to the security of the web server and the PHP code, you need to take care of the external programs security as well. A vulnerability in an application executed from PHP code might be even more dangerous to the server’s security than an insecure script. This puts a higher level of responsibility on both the developer and the server administrator. Always make sure that you are running the latest (secure) versions of the programs called from your PHP scripts. Summary Piggybacking on existing programs is a very effective method of implementing functionality that is hard to accomplish the “traditional” way by using standard PHP functions and extensions. It makes developing a
working solution in short time and with little effort possible. One of the greatest advantages of using existing software is its maturity—programs that have been around for years are less likely to cause any trouble than freshly developed solutions. This can save hours (or even days) of testing to ensure that some component of your application actually works the way you want. However, it’s important to keep in mind that this technique should be used only when it is really necessary. Do not use an external program when there is a function or extension that can do the job as well. Using external applications usually reduces the portability of your code (especially if you use system-specific programs) and requires more attention to system security.
About the Author
?>
Michal Wojciechowski is a student at the Warsaw University of Technology in Poland and a freelance PHP developer. You can contact him at
[email protected].
To Discuss this article: http://forums.phparch.com/172
FavorHosting.com offers reliable and cost effective web hosting... SETUP FEES WAIVED AND FIRST 30 DAYS FREE! So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions. Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!* - Strong support team - Focused on developer needs - Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications. Please visit http://www.favorhosting.com/phpa/ call 1-866-4FAVOR1 now for information.
September 2004
●
PHP Architect
●
www.phparch.com
58
SECURITY CORNER
S E C U R I T Y
C O R N E R
Security Corner
Secure Design by Chris Shiflett Welcome to another edition of Security Corner. This month’s topic is secure design, the application architecture that provides the foundation for secure development. May’s column on data filtering touched on this topic a bit, and it’s something that is sure to appear in this column again. Design has always been a controversial topic, but only because developers tend to be very loyal to their own discoveries, ideas, and approaches. Thus, discussing software design can spawn debates rivaled only by coding standards discussions, text editor opinions, and programming language choices. With this in mind, please feel free to contact me to suggest different approaches than the ones written here. Like any other developer, I’m always interested in learning new techniques, and I’d be happy to do a few case studies of any particularly sound designs. In order to demonstrate some common approaches, I describe two different overall methods of organizing your applications: the dispatch method and the include method.
Dispatch Method A good software design should help developers ensure that data filtering cannot be bypassed, guarantee that tainted data cannot be mistaken for clean data and identify the origin of data. Without these characteristics, a developer’s task is more difficult and prone to errors. Just as complexity in an application leads to more bugs, the lack of a sound design leads to more security vulnerabilities. One popular design that embodies each of these characteristics is the dispatch method. The approach is to have a single PHP script available directly from the Web (via URL). Everything else is a module included with include or require as needed. This method usually requires that a GET variable be passed along with every URL, identifying the task. This GET variable can be considered the replacement for the script name that would be used in a more simplistic design. September 2004
●
PHP Architect
●
www.phparch.com
For example: http://example.org/dispatch.php?task=print_form
The file dispatch.php is the only file within document root. This allows a developer to do two important things: 1. Implement some global security measures at the top of dispatch.php and be assured that these measures cannot be bypassed, and 2. Easily see that data filtering takes place when necessary by focusing on the control flow of a specific task. I have developed applications using this approach with great success. As a developer, I especially appreci-
59
SECURITY CORNER ate its simplicity. Over-architected applications tend to solve problems that don’t exist, and the added complexity is rarely worth it. To further illustrate this approach, consider the example dispatch.php script given in Listing 1. Keeping in mind that dispatch.php is the only resource available from the Web, it should be clear that the design of this application ensures that any global security measures taken at the top cannot be bypassed. It also lets a developer easily see the control flow for a specific task. For example, instead of glancing through a lot of code, it is easy to see that end.inc is only displayed to a user when $form_valid is True, and because it is initialized as False just before process.inc is included, it is clear that the logic within process.inc must set it to Ttrue, otherwise the form is displayed again (presumably with appropriate error messages). It is also impossible to access end.inc otherwise, because it is not available from the Web (it is not within the docuListing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Listing 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
September 2004
●
PHP Architect
●
www.phparch.com
Secure Design
ment root). In order to keep dispatch.php as simple as possible, I recommend only adding logic that is important to the control flow of the application and putting everything else in modules. It can also help your module organization to classify the latter in three categories: display, logic, and database queries. These two things make dispatch.php very easy to follow. Note: If you use a directory index file index.php such as (instead of dispatch.php), you can use URLs such as http://example.org/?task=print_form . You can also use the Apache ForceType directive or mod_rewrite to accommodate URLs such as http://example.org/app-name/print-form .
“If you’re generating random passwords, then it’s still better to use a loop that selects random uppercase and lowercase letters...” Include Method An almost opposite approach is to have a single module that is included at the top of every public script (those within the document root). This module is responsible for all global security measures, such as ensuring that data filtering cannot be bypassed. Listing 2 gives a simplistic example of such a script, security.inc. This example demonstrates the handling of form submissions, although this is only one of the types of tasks that can be performed here. The process.inc script is where the data filtering takes place, and security.inc makes sure that it always executes when a form meets the minimum criteria for testing, (which is that it only contains expected data). This is done by adding a hidden form variable to every form that identifies it (or using any approach that can be used to distinguish forms) and then comparing form fields with what is expected. Listing 3 shows an example of a form that identifies itself as login and adheres to the checks from the exam-
60
SECURITY CORNER ple security.inc script shown in Listing 2. Note: Use the auto_prepend_file directive to ensure that security.inc is not accidentally left out.
Naming Conventions A topic worth revisiting here is that of naming conventions. However you decide to name your variables, make sure that you choose a method that will not make it easy to mistakenly use tainted data. One approach is to rename any variable that passes through data filtering to something that distinguishes it as being clean. For example, Listing 4 demonstrates testing the validity of an email address. The $clean[‘email’] variable will either not exist, or it will contain a valid email address. With this approach, you can safely use any variable within the $clean array in your programming logic, and the worst-case scenario is that you reference an undefined variable. You’ll catch these types of errors with your error reporting (a future topic for Security Corner), and the impact is much less severe anyway.
Note: If you place your data filtering in a separate module (such as process.inc as mentioned in Listing 2), it is important to initialize your $clean array in the parent script and to be sure that no path through your logic bypasses this initialization.
Listing 3 1 2 3 4 5 6 7 8
Username:
Password:
Secure Design
Until Next Time... Of the two general approaches I discussed, dispatch method is my favourite. The main reason for my preference is that it leverages existing mechanisms that have been proven reliable, such as the fact that only files within document root can be accessed from the Web. Another benefit is that it relies less on the developer remembering something (although the auto_prepend_file directive can assure us that a file such as security.inc is always prepended). Again, if you have a particularly secure design that you wouldn’t mind sharing with your fellow readers, please let me know. I’ll be happy to have a look and provide a thorough analysis in the hopes that everyone benefits. You should now have the tools you need to add security precautions to your next application design. Just be sure that your approach satisfies the three important characteristics I mentioned at the beginning: help developers to ensure that data filtering cannot be bypassed, ensure that tainted data cannot be mistaken for clean data and identify the origin of data. These things help as security-conscious developer focus on the important issues. Until next month, be safe.
About the Author
?>
Chris Shiflett is a frequent contributor to the PHP community and one of the leading security experts in the field. His solutions to security problems are often used as points of reference, and these solutions are showcased in his talks at conferences such as ApacheCon and the O’Reilly Open Source Convention, and in his articles in publications such as PHP Magazine and php|architect. “Security Corner,” his monthly column for php|architect, is the industry’s first and foremost PHP security column. Chris is the author of the HTTP Developer’s Handbook (Sams), a coauthor of the Zend PHP Certification Study Guide (Sams), and is currently writing PHP Security (O’Reilly). As a member of the Zend PHP Education Advisory Board, he is one of the authors of the Zend PHP Certification. He is also leading an effort to create a PHP community site at PHPCommunity.org. You can contact him at
[email protected] or visit his Web site at http://shiflett.org/.
Listing 4 1 2 3 4 5 6 7 8 9 10 11 12 13
September 2004
●
PHP Architect
●
www.phparch.com
61
T I P S
&
T R I C K S
Tips & Tricks By John W. Holmes
RANDOM LINES I recently came across the need to grab random lines from a file for a project I was working on. Okay, I thought, I’ve done this before. A simple call to file(), array_rand() and a small loop will get me the lines I need, right? I soon realized that I couldn’t have been more wrong. So, to start off, let’s say that for small files, the file/array_rand method isn’t bad. If you’ve got a file of 10 lines, then it’s twice as fast as the second method I’ll soon get to. Using file() fails to scale very well, though, as you can imagine, when the files start getting larger. This is because you’re reading the entire file into memory when you may only want to grab a couple lines from it. Not only is this slower than other methods, but it’s now eating up your memory by the handful. To put this into perspective, the file I was looking to grab lines from is a word list consisting of around 113,000 words, each on their own line. You can see that this isn’t something that you want to be loading up into memory with file(). This is especially the case if the script is going on any server that’ll actually see some traffic, versus your test rig at home. The other limitation that you’ll eventually run into, besides memory, is that array_rand() has an upper limit of 32,767. This is also the upper limit of rand(), as shown by the getrandmax(). So, out of my 113,000 lines, two-thirds of the times the function would just return 32,767, which doesn’t make for a very random line selector. Of course, I could use mt_rand(), which
has an upper limit of 2147483647, but then there would still be the memory issues to deal with. Obviously, these functions weren’t meant for files or arrays of this size, so I went on to search for a faster algorithm to get me my random lines. I knew I’d be opening the file with fopen() and using fseek() to move around looking for lines, but wasn’t sure on the exact sequence. Luckily I stumbled across an algorithm on a web site (it was actually an ASP site, but don’t tell anyone), that gave me what I needed. The code in Listing 1 shows how you can open the file and search for and return a certain number of random lines. This code was developed to return a two word code that could be used as a secret invitation code to participate in a survey, but can be easily adapted for any use. The function is actually quite simple in how it works (and, as you know, if it’s simple and it works, it must be brilliant). It opens the file and determines its size. It then picks a random number between zero and the file size and uses ffseek() to get to that position. Then, it begins looping and seeking backwards until it hits a newline character or the beginning of the file. Once it reaches either one, it uses fgets() to grab the current line and return that as a random word. Depending upon how many lines (or words) you need from the file, the process repeats and joins everything together before it’s returned. One thing to note is how I said this is to generate a “code”—and didn’t say “password”. Even though there are 113,000 words to choose from in my case,
“Using file() fails to scale very well, though, as you can imagine, when the files start getting larger.”
September 2004
●
PHP Architect
●
www.phparch.com
62
TIPS & TRICKS this method would still be a bad password generator. It would be trivial for a malicious user to loop through all of the word combinations looking for matches. If you’re generating random passwords, then it’s still better to use a loop that selects random uppercase and lowercase letters, numbers, and special characters of a good length. WORD LIST SOURCE Anyone looking to create a similar code generator based on a list of words can get an incredible word list from Project Gutenberg at www.gutenberg.net/etext/3201. The Moby Word Lists file available for download contains public domain files that have a variety of words separated into different categories. There are over 354,000 single words, 256,000 compound words, 113,000 words legal for crossword puzzles (such as Scrabble™, the page says, so you can finally code that official Scrabble word checker!), 21,000 names, 10,000 place names in the U.S., the complete works of Shakespeare and the entire U.S. Constitution, plus a few more files. These files may not help speed up the payroll system you’re designing, but if you ever come across the need for a large list of words and names to select from, you’ll be thankful that these are in the public domain.
make this a short tip. We all have our favorite editors, no doubt, and if it does what you want, then continue to use it. The editor I want to mention is something that fills a small requirement that I had recently. I’m often switching between computers at work that I don’t have any kind of administrator access to, so I can’t just install my favorite editor on each computer I sit down to. So, I went out in search of a small editor that I could carry on my USB thumb drive and that didn’t require any installation. I figured I’d have to settle for something with all of the features of Notepad in order to meet my requirements, but, luckily, I was pointed to an editor named Scite. You can check it out yourself at http://www.scintilla.org/SciTE.html, but I’m going to tell you a little about it as well. First of all, it met my requirements exactly. The editor is offered in two versions, one version a “full” download of 600K that comes with everything in separate files and a second version that has everything available in a single 360K executable. It doesn’t require any installation, so it’s simply double-click and go. What about the features, you ask? It blows notepad out of the water and matches many of the multi-megabyte-downloadrequiring-admin-installation editors out there. It’s got syntax highlighting, which is one of the first things to look for. Search and replace with regular expressions is something that always comes in handy and is implemented along with “find in files”. Set a buffer value in the configuration file and you can open multiple files in tabs like many other browsers.
“If you’re generating random passwords, then it’s still better to use a loop that selects random uppercase and lowercase letters...”
PHP EDITOR REVIEW I know—I can hear the collective moans over another editor review. In fact, I heard them from my editor (the human kind) when I brought the subject up, so I’ll Listing 1 1
define(‘WORDCODE_NUMWORDS’,2); define(‘WORDCODE_SEPERATOR’,’-’); function _getWordCode($file) { $chosenwords = array(); $fp = fopen($file,’r’); $fsize = filesize($file); for($x=0;$x<WORDCODE_NUMWORDS;$x++) { $pos = mt_rand(0,$fsize); fseek($fp,$pos); while(fgetc($fp) != “\n” && $pos != 0) { fseek($fp,—$pos); } $chosenwords[] = trim(fgets($fp)); } return implode(WORDCODE_SEPERATOR,$chosenwords); }
September 2004
●
PHP Architect
●
www.phparch.com
63
TIPS & TRICKS It certainly doesn’t end there, either. Other features include brace and parenthesis matching, code folding (allowing you to hide the code within functions or methods, for example), auto-complete and code hinting (although those two will require some work on your own to enable and get set up). You can configure the path to the CLI version of PHP and run your PHP scripts from Scite to check for errors, also. The program will take you to the file and line number of any standard errors that are reported, also. Want to export your magic function to a PDF file and share it with others? Exporting to PDF, HTML, RTF, Latex and XML is only a couple of clicks away. Using the configuration file options, you can color single and double quotes string different colors as a reminder of which one you’re working with. Variables within single quoted strings are not highlighted like variables in double quoted strings, so you have a visual cue that the variable will not be evaluated. There are a ton of different configuration options you can set that are described in great detail in the documentation(http://scintilla.sourceforge.net/SciTEDoc.ht ml). You can essentially customize any part of the program to your liking. Looking back, I think I said something about keeping this short in the beginning, so I had better cut myself off and let you see the rest for yourself. If you already
have a favorite editor, I’m looking to change it. The Scite editor fills the niche of a small, powerful, featurerich editor that’ll match almost any other editor out there. It also fits easily on a USB thumb drive, which is how I take it everywhere I go now. Go check it out for yourself. GOT SOMETHING BETTER? We’ve done the editor comparisons to death across the PHP community, I’m sure, but if you have a better suggestion for a small powerful editor, visit the php|architect forums at http://www.phparch.com/discuss/ and let me know in the Tips & Tricks forum. Or, if you have a better algorithm for getting random lines out of a file or a better algorithm for anything in general, come let the community know in the forums where we can discuss and learn from it. As alway
About the Author
Have you had your PHP today?
ER FF LO IA EC SP
Subscribe to the print
?>
John Holmes is a Captain in the U.S. Army and a freelance PHP and MySQL programmer. He has been programming in PHP for over 4 years and loves every minute of it. He is currently serving at Ft. Gordon, Georgia as a Company Commander with his wife and two sons.
http://www.phparch.com
edition and get a copy of Lumen's LightBulb--a $499 value absolutely FREE*!
In collaboration with:
NEW COMBO NOW AVAILABLE: PDF + PRINT The Magazine For PHP Professionals
* Offer valid until 12/31/2004 on the purchase of a 12-month print subscription
September 2004
●
PHP Architect
●
www.phparch.com
64
Exam Under the Microscope
e x i t ( 0 ) ;
by Andi Gutmans and Marco Tabini
Marco’s Braindump The recent introduction of the Zend Certification Exam has been the topic of several news items, as well as posts on our forums and on the mailing lists. Although I must say that, as far as I’ve been able to see, the reception has, for the most part, been good, some people just seem to think that the certification program is nothing more than a power (and money) grab from Zend. As one of the people who helped write the certification exam, I can hardly promise you that my opinion on the matter is entirely unbiased. I could, however, tell you that of the ten new things that are proposed to me every day, the certification is one of the few that I actually decided to pick up and help with—not because I consider my time so precious to the world-at-large that Zend and the community should feel honoured that I did, but because my time is limited and I did consider the certification program a worthy project that I could support wholeheartedly, and my interest goes beyond its purely financial aspects—you’ve been warned, and you be the judge. September 2004
●
PHP Architect
●
In my opinion, it is extremely important that we all understand one thing: the certification is not conceived as a tool for technical people. I don’t know how many times I’ve heard someone say (or write) something to the effect of “why should I get certified? I know PHP, and the fact that someone is certified or not wouldn’t affect my hiring him or not—I would simply judge his or her technical abilities for myself.” That’s great, except that it is a very deeply flawed way of thinking—slightly self-centered, if you ask me. In the real world—the one in which most people live and have to pay their mortgage and eat their meals—things are a bit different. The person who makes decisions is often not someone who has the technical knowledge required to make them—as sad as it may be, it’s a fact that the average business person does not know PHP. Like it or hate it, this means that that person lacks the tools required to make a good technical decision, including whether or not to hire a programmer, either as an employee or as a contractor.
www.phparch.com
From their point of view, experience alone cannot be a reference point. I have been taking pictures— semiprofessionally—for almost ten years now, and yet I doubt that any professional photographer would hire me given the opportunity to examine my work (the words “don’t quit your day job” come to mind). I could, however, go to someone who does not have the experience to judge my work technically, show them my ten best pictures and he would have to make a decision based on my ten years of experience—without being able to tell whether I can produce output of consistent quality. Much like a degree, a certification tells a prospective employer that you have been able to overcome a set of hurdles—seventy questions in ninety minutes in our case—and successfully meet a specific set of requirements. Naturally, a certificate is not on the same level as a degree—much like a degree is not on the same level of actually knowing what you’re doing. Still, they both have the intrinsic value given them by the level of knowledge that is required to obtain them, and
65
EXIT(0);
Exam Under the Microscope
while obtaining a degree is undoubtedly more difficult than getting Zend-certified, the latter exam is much more specific to PHP than anything you’re likely to do in college. Besides this, I have to admit that I took a sort-of perverse pleasure in seeing people come by the Zend booth (where I was a guest) at a recent conference, claiming that they “knew PHP inside out” and then failing to answer a reasonably simple question. My pleasure, just so that we understand each other, is not of the “told you so” kind— everybody makes mistakes and God know I wake up in the morning wondering about what I’ll do wrong each day. The pleasure is in seeing reality dawn on the faces of these people when they start wondering whether there is any value in the certification program after all. It’s too bad that, while they had an opportunity to face the problem in first person, so many others will simply dismiss the certification program as yet another attempt at commercializing PHP. Andi’s Thoughts Such an interesting topic! As I do share some of the reasoning Marco mentioned, I hope my part of the column won’t be too repetitive. If it is, then I apologize in advance. Like Marco, I might be a bit biased, being co-Founder of Zend and also a member of the Zend Certification Committee, but I never write something I don’t believe in, so, again, you be the judge. There has always been a lot of talk about what is good for PHP and whether the involvement of commercial companies such as Zend benefits the PHP community. The way I see it, at the end of the day, most non-hobbyist developers in the PHP community have to make a living—preferably, not “just a living,” but a decent one, too. I have met many PHP enthusiasts who have had to develop JSP or ASP to make a living. Now why is that? Because, historically, there have September 2004
●
PHP Architect
●
been more jobs—better paying jobs—in these technologies. If we try and understand why this is the case, I am sure most people reading php|architect will agree that it’s not due to PHP’s inferiority as a technology. On the contrary, we hear of far more success stories about sites moving from JSP/ASP to PHP than the other way around. So what is the reason behind this inequality? I think there are many, but probably a major one is the current perception of PHP in the commercial market. Many decision makers have seen PHP as a hobbyist language that is not ready for prime-time, or even a great language but just not a safe choice because, without strong commercial backing, betting on such a technology might be risky. The end result is that large employers are still hesitant in choosing PHP and/or they see PHP developers as mere “scripters” and not real software engineers and architects. This leads to fewer available PHP jobs and lower salaries for the PHP developer. For this reason and others, is it so important to have companies like Zend who help evangelize PHP in the commercial market. The first step is to convince these companies that PHP is a great technology that easily competes with JSP/ASP and is up to prime-time, enterprise-grade applications. Such market education is difficult and requires many efforts to reach the places where decision makers get their information. Although the Netcraft numbers of PHP’s install base are amazing, most such decision makers don’t hang out on their site or visit php.net to see our impressive graphs. If employers actually have decided to look at PHP in a serious manner, they start asking themselves what anybody would qualify as very reasonable questions. Who will support us and hold our hands in this endeavor? Where do we get commercially-backed development and production tools from? Most importantly, not only how do we
www.phparch.com
find PHP developers, but also how do we know they are actually up to the job? I think it is clear that the Zend Certification Exam is up to this job. Having such a certification is the key to make employers feel comfortable in hiring PHP developers and ultimately pushing the wider proliferation of PHP. As Zend recognized the strategic importance of having a certification program for PHP as a technology, we called upon a selection of the leaders of the PHP community to help us make the certification into an exam whose rationale doesn’t only sound good in writing (like the *cough* Microsoft certifications), but an exam which truly tests the skills a successful PHP developer requires. In my opinion, having a high quality certification backed by PHP experts is something PHP has needed for a long time. The logistics required to set up such a certification exam, which include market education, availability of three thousand test centers around the world and other important tasks can only be successfully performed by a commercial company that can invest the amount of resources needed to get such an endeavor up and running. There is no doubt that Zend’s motivation is the ongoing proliferation of PHP and, therefore, advancing Zend’s business. However, it all comes back to the fact that PHP’s ongoing success is the success for every person that makes (or would like to make) a living developing or working with the best web scripting language available today. Looking back at that last year, with the growing adoption of PHP in larger commercial companies and the growing number of available PHP jobs, I think all of this hard work seems to be making a change in the market.
php|a
66