Programming Python

Programming Python, 3rd Edition By Mark Lutz ............................................... Publisher: O'Reilly Pub Dat...

Author: Mark Lutz

1122 downloads 8152 Views 22MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Programming Python, 3rd Edition By Mark Lutz ............................................... Publisher: O'Reilly Pub Date: August 2006 Print ISBN-10: 0-596-00925-9 Print ISBN-13: 978-0-59-600925-0 Pages: 1596

Table of Contents | Index

Already the industry standard for Python users, Programming Python from O'Reilly just got even better. This third edition has been updated to reflect current best practices and the abundance of changes introduced by the latest version of the language, Python 2.5.

Whether you're a novice or an advanced practitioner, you'll find this refreshed book more than lives up to its reputation. Programming Python, Third Edition teaches you the right way to code. It explains Python language syntax and programming techniques in a clear and concise manner, with numerous examples that illustrate both correct usage and common idioms. By reading this comprehensive guide, you'll learn how to apply Python in real-world problem domains such as:

GUI programming Internet scripting Parallel processing Database management Networked applications

Programming Python, Third Edition covers each of these target domains gradually, beginning with in-depth discussions of core concepts and then progressing toward complete programs. Large examples do appear, but only after you've learned

enough to understand their techniques and code.

Along the way, you'll also learn how to use the Python language in realistically scaled programs--concepts such as Object Oriented Programming (OOP) and code reuse are recurring side themes throughout this text. If you're interested in Python programming, then this O'Reilly classic needs to be within arm's reach. The wealth of practical advice, snippets of code and patterns of program design can all be put into use on a daily basis--making your life easier and more productive.

Reviews of the second edition:

"...about as comprehensive as any book can be." --Dr. Dobb's Journal

"If the language had manuals, they would undoubtedly be the texts from O'Reilly...'Learning Python' and 'Programming Python' are definitive treatments." --SD Times

Programming Python, 3rd Edition By Mark Lutz ............................................... Publisher: O'Reilly Pub Date: August 2006 Print ISBN-10: 0-596-00925-9 Print ISBN-13: 978-0-59-600925-0 Pages: 1596

Table of Contents | Index

Copyright Foreword Preface Part I: The Beginning Chapter 1. Introducing Python Section 1.1. "And Now for Something Completely Different" Section 1.2. Python Philosophy 101 Section 1.3. The Life of Python Section 1.4. Signs of the Python Times Section 1.5. The Compulsory Features List Section 1.6. What's Python Good For? Section 1.7. What's Python Not Good For? Section 1.8. Truth in Advertising Chapter 2. A Sneak Preview Section 2.1. "Programming Python: The Short Story" Section 2.2. The Task Section 2.3. Step 1: Representing Records Section 2.4. Step 2: Storing Records Persistently Section 2.5. Step 3: Stepping Up to OOP Section 2.6. Step 4: Adding Console Interaction Section 2.7. Step 5: Adding a GUI Section 2.8. Step 6: Adding a Web Interface Section 2.9. The End of the Demo Part II: System Programming Chapter 3. System Tools Section 3.1. "The os.path to Knowledge" Section 3.2. System Scripting Overview Section 3.3. Introducing the sys Module Section 3.4. Introducing the os Module Section 3.5. Script Execution Context Section 3.6. Current Working Directory Section 3.7. Command-Line Arguments Section 3.8. Shell Environment Variables Section 3.9. Standard Streams Chapter 4. File and Directory Tools Section 4.1. "Erase Your Hard Drive in Five Easy Steps!"

Section 4.2. File Tools Section 4.3. Directory Tools Chapter 5. Parallel System Tools Section 5.1. "Telling the Monkeys What to Do" Section 5.2. Forking Processes Section 5.3. Threads Section 5.4. Program Exits Section 5.5. Interprocess Communication Section 5.6. Pipes Section 5.7. Signals Section 5.8. Other Ways to Start Programs Section 5.9. A Portable Program-Launch Framework Section 5.10. Other System Tools Chapter 6. System Examples: Utilities Section 6.1. "Splits and Joins and Alien Invasions" Section 6.2. Splitting and Joining Files Section 6.3. Generating Forward-Link Web Pages Section 6.4. A Regression Test Script Section 6.5. Packing and Unpacking Files Section 6.6. Automated Program Launchers Chapter 7. System Examples: Directories Section 7.1. "The Greps of Wrath" Section 7.2. Fixing DOS Line Ends Section 7.3. Fixing DOS Filenames Section 7.4. Searching Directory Trees Section 7.5. Visitor: Walking Trees Generically Section 7.6. Copying Directory Trees Section 7.7. Deleting Directory Trees Section 7.8. Comparing Directory Trees Part III: GUI Programming Chapter 8. Graphical User Interfaces Section 8.1. "Here's Looking at You, Kid" Section 8.2. Python GUI Development Options Section 8.3. Tkinter Overview Section 8.4. Climbing the GUI Learning Curve Section 8.5. Tkinter Coding Basics Section 8.6. Tkinter Coding Alternatives Section 8.7. Adding Buttons and Callbacks Section 8.8. Adding User-Defined Callback Handlers Section 8.9. Adding Multiple Widgets Section 8.10. Customizing Widgets with Classes Section 8.11. Reusable GUI Components with Classes Section 8.12. The End of the Tutorial Section 8.13. Python/Tkinter for Tcl/Tk Converts Chapter 9. A Tkinter Tour, Part 1 Section 9.1. "Widgets and Gadgets and GUIs, Oh My!" Section 9.2. Configuring Widget Appearance Section 9.3. Top-Level Windows Section 9.4. Dialogs Section 9.5. Binding Events

Section 9.6. Message and Entry Section 9.7. Checkbutton, Radiobutton, and Scale Section 9.8. Running GUI Code Three Ways Section 9.9. Images Section 9.10. Viewing and Processing Images with PIL Chapter 10. A Tkinter Tour, Part 2 Section 10.1. "On Today's Menu: Spam, Spam, and Spam" Section 10.2. Menus Section 10.3. Listboxes and Scrollbars Section 10.4. Text Section 10.5. Canvas Section 10.6. Grids Section 10.7. Time Tools, Threads, and Animation Section 10.8. The End of the Tour Section 10.9. The PyDemos and PyGadgets Launchers Chapter 11. GUI Coding Techniques Section 11.1. "Building a Better Mouse Trap" Section 11.2. GuiMixin: Common Tool Mixin Classes Section 11.3. GuiMaker: Automating Menus and Toolbars Section 11.4. ShellGui: GUIs for Command-Line Tools Section 11.5. GuiStreams: Redirecting Streams to Widgets Section 11.6. Reloading Callback Handlers Dynamically Section 11.7. Wrapping Up Top-Level Window Interfaces Section 11.8. GUIs, Threads, and Queues Section 11.9. More Ways to Add GUIs to Non-GUI Code Chapter 12. Complete GUI Programs Section 12.1. "Python, Open Source, and Camaros" Section 12.2. PyEdit: A Text Editor Program/Object Section 12.3. PyPhoto: An Image Viewer and Resizer Section 12.4. PyView: An Image and Notes Slideshow Section 12.5. PyDraw: Painting and Moving Graphics Section 12.6. PyClock: An Analog/Digital Clock Widget Section 12.7. PyToe: A Tic-Tac-Toe Game Widget Section 12.8. Where to Go from Here Part IV: Internet Programming Chapter 13. Network Scripting Section 13.1. "Tune In, Log On, and Drop Out" Section 13.2. Plumbing the Internet Section 13.3. Socket Programming Section 13.4. Handling Multiple Clients Section 13.5. A Simple Python File Server Chapter 14. Client-Side Scripting Section 14.1. "Socket to Me!" Section 14.2. FTP: Transferring Files over the Net Section 14.3. Processing Internet Email Section 14.4. POP: Fetching Email Section 14.5. SMTP: Sending Email Section 14.6. email: Parsing and Composing Mails Section 14.7. pymail: A Console-Based Email Client Section 14.8. The mailtools Utility Package

Section 14.9. NNTP: Accessing Newsgroups Section 14.10. HTTP: Accessing Web Sites Section 14.11. Module urllib Revisited Section 14.12. Other Client-Side Scripting Options Chapter 15. The PyMailGUI Client Section 15.1. "Use the Source, Luke" Section 15.2. A PyMailGUI Demo Section 15.3. PyMailGUI Implementation Chapter 16. Server-Side Scripting Section 16.1. "Oh What a Tangled Web We Weave" Section 16.2. What's a Server-Side CGI Script? Section 16.3. Running Server-Side Examples Section 16.4. Climbing the CGI Learning Curve Section 16.5. Saving State Information in CGI Scripts Section 16.6. The Hello World Selector Section 16.7. Refactoring Code for Maintainability Section 16.8. More on HTML and URL Escapes Section 16.9. Transferring Files to Clients and Servers Chapter 17. The PyMailCGI Server Section 17.1. "Things to Do When Visiting Chicago" Section 17.2. The PyMailCGI Web Site Section 17.3. The Root Page Section 17.4. Sending Mail by SMTP Section 17.5. Reading POP Email Section 17.6. Processing Fetched Mail Section 17.7. Utility Modules Section 17.8. CGI Script Trade-Offs Chapter 18. Advanced Internet Topics Section 18.1. "Surfing on the Shoulders of Giants" Section 18.2. Zope: A Web Application Framework Section 18.3. HTMLgen: Web Pages from Objects Section 18.4. Jython: Python for Java Section 18.5. Grail: A Python-Based Web Browser Section 18.6. XML Processing Tools Section 18.7. Windows Web Scripting Extensions Section 18.8. Python Server Pages Section 18.9. Rolling Your Own Servers in Python Section 18.10. And Other Cool Stuff Part V: Tools and Techniques Chapter 19. Databases and Persistence Section 19.1. "Give Me an Order of Persistence, but Hold the Pickles" Section 19.2. Persistence Options in Python Section 19.3. DBM Files Section 19.4. Pickled Objects Section 19.5. Shelve Files Section 19.6. The ZODB Object-Oriented Database Section 19.7. SQL Database Interfaces Section 19.8. PyForm: A Persistent Object Viewer Chapter 20. Data Structures Section 20.1. "Roses Are Red, Violets Are Blue; Lists Are Mutable, and So Is Set Foo"

Section 20.2. Implementing Stacks Section 20.3. Implementing Sets Section 20.4. Subclassing Built-In Types Section 20.5. Binary Search Trees Section 20.6. Graph Searching Section 20.7. Reversing Sequences Section 20.8. Permuting Sequences Section 20.9. Sorting Sequences Section 20.10. Data Structures Versus Python Built-Ins Section 20.11. PyTree: A Generic Tree Object Viewer Chapter 21. Text and Language Section 21.1. "See Jack Hack. Hack, Jack, Hack" Section 21.2. Strategies for Parsing Text in Python Section 21.3. String Method Utilities Section 21.4. Regular Expression Pattern Matching Section 21.5. Advanced Language Tools Section 21.6. Handcoded Parsers Section 21.7. PyCalc: A Calculator Program/Object Part VI: Integration Chapter 22. Extending Python Section 22.1. "I Am Lost at C" Section 22.2. Integration Modes Section 22.3. C Extensions Overview Section 22.4. A Simple C Extension Module Section 22.5. Extension Module Details Section 22.6. The SWIG Integration Code Generator Section 22.7. Wrapping C Environment Calls Section 22.8. A C Extension Module String Stack Section 22.9. A C Extension Type String Stack Section 22.10. Wrapping C++ Classes with SWIG Section 22.11. Other Extending Tools Chapter 23. Embedding Python Section 23.1. "Add Python. Mix Well. Repeat." Section 23.2. C Embedding API Overview Section 23.3. Basic Embedding Techniques Section 23.4. Registering Callback Handler Objects Section 23.5. Using Python Classes in C Section 23.6. A High-Level Embedding API: ppembed Section 23.7. Other Integration Topics Part VIII: The End Chapter 24. Conclusion: Python and the Development Cycle Section 24.1. "That's the End of the Book, Now Here's the Meaning of Life" Section 24.2. "Something's Wrong with the Way We Program Computers" Section 24.3. The "Gilligan Factor" Section 24.4. Doing the Right Thing Section 24.5. Enter Python Section 24.6. But What About That Bottleneck? Section 24.7. On Sinking the Titanic Section 24.8. So What's "Python: The Sequel"? Section 24.9. In the Final Analysis . . .

Section 24.10. Postscript to the Second Edition (2000) Section 24.11. Postscript to the Third Edition (2006) About the Author Colophon Index

Programming Python, Third Edition by Mark Lutz Copyright © 2006, 2001, 1996 O'Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Editor:

Mary O'Brien

Production Editor:

Mary Brady

Copyeditor:

Audrey Doyle

Proofreaders:

Lydia Onofrei, Colleen Gorman, and Mary Brady

Indexer:

Johnna VanHoose Dinse

Cover Designer:

Edie Freedman

Interior Designer:

David Futato

Illustrators:

Robert Romano and Jessamyn Read

Printing History: October 1996:

First Edition.

March 2001:

Second Edition.

August 2006:

Third Edition.

Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc. Programming Python, the image of an African rock python, and related trade dress are trademarks of O'Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN-10: 0-596-00925-9 ISBN-13: 978-0-596-00925-0

[M]

Foreword How Time Flies! Ten years ago I completed the foreword for the first edition of this book. Python 1.3 was current then, and 1.4 was in beta. I wrote about Python's origins and philosophy, and about how its first six years changed my life. Python was still mostly a one-man show at the time, and I only mentioned other contributors and the Python community in one paragraph near the end. Five years later the second edition came out, much improved and quite a bit heftier, and I wrote a new foreword. Python 2.0 was hot then, and the main topic of the foreword was evolution of the language. Python 2.0 added a lot of new features, and many were concerned that the pace of change would be unsustainable for the users of the language. I addressed this by promising feature-byfeature backward compatibility for several releases and by regulating change through a community process using Python Enhancement Proposals (PEPs). By then, Python's development had become truly community-driven, with many developers (besides myself) having commit privileges into the source tree. This move toward community responsibility has continued ever since. My own role has become more limited over time, though have not yet been reduced to playing a purely ceremonial function like that of the Dutch Queen. Perhaps the biggest change in recent years is the establishment of the Python Software Foundation (PSF), a non-profit organization that formally owns and manages the rights to the Python source code and owns the Python trademark. Its board and members (helped by many nonmember volunteers) also offer many services to the Python community, from the Python.org web site and mailing lists to the yearly Python Conference. Membership in the PSF is by invitation only, but donations are always welcome (and tax-deductible, at least in the U.S.). The PSF does not directly control Python's development; however, the developers don't have to obey any rules set by the PSF. Rather, it's the other way around: active Python developers make up the majority of the PSF's membership. This arrangement, together with the open source nature of Python's source code license, ensures that Python will continue to serve the goals of its users and developers.

Coming Attractions What developments can Python users expect to see in the coming years? Python 3000, which is referred to in the foreword to the second edition as "intentionally vaporware," will see the light of day after all as Python 3.0. After half a decade of talk, it's finally time to start doing something about it. I've created a branch of the 2.5 source tree, and, along with a handful of developers, I'm working on transforming the code base into my vision for Python 3000. At the same time, I'm working with the community on a detailed definition of Python 3000; there's a new mailing dedicated to Python 3000 and a series of PEPs, starting with PEP 3000. This work is still in the early stages. Some changes, such as removing classic classes and string exceptions, adopting Unicode as the only character type, and changing integer division so that 1/2

returns 0.5 instead of truncating toward zero, have been planned for years. But many other changes are still being hotly debated, and new features are being proposed almost daily. I see my own role in this debate as a force of moderation: there are many more good ideas than could possibly be implemented in the given time, and, taken together, they would change the language so much that it would be unrecognizable. My goal for Python 3000 is to fix some of my oldest design mistakes, especially the ones that can't be fixed without breaking backward compatibility. That alone will be a huge task. For example, a consequence of the choice to use Unicode everywhere is the need for a total rewrite of the standard I/O library and a new data type to represent binary ("noncharacter") data, dubbed "bytes." The biggest potential danger for Python 3000 is that of an "accidental paradigm shift": a change, or perhaps a small set of changes that weren't considered together, that would unintentionally cause a huge change to the way people program in Python. For example, adding optional static type checking to the language could easily have the effect of turning Python into "Java without braces"which is definitely not what most users would like to see happen! For this reason, I am making it my personal responsibility to guide the Python 3000 development process. The new language should continue to represent my own esthetics for language design, not a design-by-committee compromise or a radical departure from today's Python. And if we don't get everything right, well, there's always Python 4000.... The timeline for 3.0 is roughly as follows: I expect the first alpha release in about a year and the first production release a year later. I expect that it will then take another year to shake out various usability issues and get major third-party packages ported, and, finally, another year to gain widespread user acceptance. So, Mark should have about three to four years before he'll have to start the next revision of this book. To learn more about Python 3000 and how we plan to help users convert their code, start by reading PEP 3000. (To find PEP 3000 online, search for it in Google.) In the meantime, Python 2.x is not dead yet. Python 2.5 will be released around the same time as this book (it's in late alpha as I am writing this). Python's normal release cycle produces a new release every 1218 months. I fully expect version 2.6 to see the light of day while Python 3000 is still in alpha, and it's likely that 2.7 will be released around the same time as 3.0 (and that more users will download 2.7 than 3.0). A 2.8 release is quite likely; such a release might back-port certain Python 3.0 features (while maintaining backward compatibility with 2.7) in order to help users migrate code. A 2.9 release might happen, depending on demand. But in any case, 2.10 will be right out! (If you're not familiar with Python's release culture, releases like 2.4 and 2.5 are referred to as "major releases." There are also "bug-fix releases," such as 2.4.3. Bug-fix releases are just that: they fix bugs and, otherwise, maintain strict backward and forward compatibility within the same major release. Major releases introduce new features and maintain backward compatibility with at least one or two previous major releases, and, in most cases, many more than that. There's no specific name for "earth-shattering" releases like 3.0, since they happen so rarely.)

Concluding Remarks Programming Python was the first or second book on Python ever published, and it's the only one of the early batch to endure to this day. I thank its author, Mark Lutz, for his unceasing efforts in keeping the book up-to-date, and its publisher, O'Reilly, for keeping the page count constant for this edition. Some of my fondest memories are of the book's first editor, the late Frank Willison. Without Frank's inspiration and support, the first two editions would never have been. He would be proud of this third

edition. I must end in a fine tradition, with one of my favorite Monty Python quotes: "Take it away, Eric the orchestra leader!" Guido van Rossum Belmont, California, May 2006

Foreword to the Second Edition (2001) Less than five years ago, I wrote the Foreword for the first edition of Programming Python. Since then, the book has changed about as much as the language and the Python community! I no longer feel the need to defend Python: the statistics and developments listed in Mark's Preface speak for themselves. In the past year, Python has made great strides. We released Python 2.0, a big step forward, with new standard library features such as Unicode and XML support, and several new syntactic constructs, including augmented assignment: you can now write x += 1 instead of x = x+1. A few people wondered what the big deal was (answer: instead of x, imagine dict[key] or list[index]), but overall this was a big hit with those users who were already used to augmented assignment in other languages. Less warm was the welcome for the extended print statement, print>>file, a shortcut for printing to a different file object than standard output. Personally, it's the Python 2.0 feature I use most frequently, but most people who opened their mouths about it found it an abomination. The discussion thread on the newsgroup berating this simple language extension was one of the longest everapart from the never-ending Python versus Perl thread. Which brings me to the next topic. (No, not Python versus Perl. There are better places to pick a fight than a Foreword.) I mean the speed of Python's evolution, a topic dear to the heart of the author of this book. Every time I add a feature to Python, another patch of Mark's hair turns graythere goes another chapter out of date! Especially the slew of new features added to Python 2.0, which appeared just as he was working on this second edition, made him worry: what if Python 2.1 added as many new things? The book would be out of date as soon as it was published! Relax, Mark. Python will continue to evolve, but I promise that I won't remove things that are in active use! For example, there was a lot of worry about the string module. Now that string objects have methods, the string module is mostly redundant. I wish I could declare it obsolete (or deprecated) to encourage Python programmers to start using string methods instead. But given that a large majority of existing Python codeeven many standard library modulesimports the string module, this change is obviously not going to happen overnight. The first likely opportunity to remove the string module will be when we introduce Python 3000; and even at that point, there will probably be a string module in the backwards compatibility library for use with old code. Python 3000?! Yes, that's the nickname for the next generation of the Python interpreter. The name may be considered a pun on Windows 2000, or a reference to Mystery Science Theater 3000, a suitably Pythonesque TV show with a cult following. When will Python 3000 be released? Not for a loooooong timealthough you won't quite have to wait until the year 3000. Originally, Python 3000 was intended to be a complete rewrite and redesign of the language. It would allow me to make incompatible changes in order to fix problems with the language design that weren't solvable in a backwards compatible way. The current plan, however, is that the necessary changes will be introduced gradually into the current Python 2.x line of development, with a clear transition path that includes a period of backwards compatibility support.

Take, for example, integer division. In line with C, Python currently defines x/y with two integer arguments to have an integer result. In other words, 1/2 yields 0! While most dyed-in-the-wool programmers expect this, it's a continuing source of confusion for newbies, who make up an everlarger fraction of the (exponentially growing) Python user population. From a numerical perspective, it really makes more sense for the / operator to yield the same value regardless of the type of the operands: after all, that's what all other numeric operators do. But we can't simply change Python so that 1/2 yields 0.5, because (like removing the string module) it would break too much existing code. What to do? The solution, too complex to describe here in detail, will have to span several Python releases, and involves gradually increasing pressure on Python programmers (first through documentation, then through deprecation warnings, and eventually through errors) to change their code. By the way, a framework for issuing warnings will be introduced as part of Python 2.1. Sorry, Mark! So don't expect the announcement of the release of Python 3000 any time soon. Instead, one day you may find that you are already using Python 3000only it won't be called that, but rather something like Python 2.8.7. And most of what you've learned in this book will still apply! Still, in the meantime, references to Python 3000 will abound; just know that this is intentionally vaporware in the purest sense of the word. Rather than worry about Python 3000, continue to use and learn more about the Python version that you do have. I'd like to say a few words about Python's current development model. Until early 2000, there were hundreds of contributors to Python, but essentially all contributions had to go through my inbox. To propose a change to Python, you would mail me a context diff, which I would apply to my work version of Python, and if I liked it, I would check it into my CVS source tree. (CVS is a source code version management system, and the subject of several books.) Bug reports followed the same path, except I also ended up having to come up with the patch. Clearly, with the increasing number of contributions, my inbox became a bottleneck. What to do? Fortunately, Python wasn't the only open source project with this problem, and a few smart people at VA Linux came up with a solution: SourceForge! This is a dynamic web site with a complete set of distributed project management tools available: a public CVS repository, mailing lists (using Mailman, a very popular Python application!), discussion forums, bug and patch managers, and a download area, all made available to any open source project for the asking. We currently have a development group of 30 volunteers with SourceForge checkin privileges, and a development mailing list comprising twice as many folks. The privileged volunteers have all sworn their allegiance to the BDFL (Benevolent Dictator For Lifethat's me :-). Introduction of major new features is regulated via a lightweight system of proposals and feedback called Python Enhancement Proposals (PEPs). Our PEP system proved so successful that it was copied almost verbatim by the Tcl community when they made a similar transition from Cathedral to Bazaar. So, it is with confidence in Python's future that I give the floor to Mark Lutz. Excellent job, Mark. And to finish with my favorite Monty Python quote: Take it away, Eric, the orchestra leader! Guido van Rossum Reston, Virginia, January 2001

Foreword from the First Edition (1996) As Python's creator, I'd like to say a few words about its origins, adding a bit of personal philosophy. Over six years ago, in December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office (a government-run research lab in

Amsterdam) would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to UNIX/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus). Today, I can safely say that Python has changed my life. I have moved to a different continent. I spend my working days developing large systems in Python, when I'm not hacking on Python or answering Python-related email. There are Python T-shirts, workshops, mailing lists, a newsgroup, and now a book. Frankly, my only unfulfilled wish right now is to have my picture on the front page of the New York Times. But before I get carried away daydreaming, here are a few tidbits from Python's past. It all started with ABC, a wonderful teaching language that I had helped create in the early eighties. It was an incredibly elegant and powerful language aimed at nonprofessional programmers. Despite all its elegance and power and the availability of a free implementation, ABC never became popular in the UNIX/C world. I can only speculate about the reasons, but here's a likely one: the difficulty of adding new "primitive" operations to ABC. It was a monolithic closed system, with only the most basic I/O operations: read a string from the console, write a string to the console. I decided not to repeat this mistake in Python. Besides this intention, I had a number of other ideas for a language that improved upon ABC, and was eager to try them out. For instance, ABC's powerful data types turned out to be less efficient than we hoped. There was too much emphasis on theoretically optimal algorithms, and not enough tuning for common cases. I also felt that some of ABC's features, aimed at novice programmers, were less desirable for the (then!) intended audience of experienced UNIX/C programmers. For instance: ABC's idiosyncratic syntax (all uppercase keywords!), some terminology (for example, "how-to" instead of "procedure"); and the integrated structured editor, which its users almost universally hated. Python would rely more on the UNIX infrastructure and conventions, without being UNIX-bound. And in fact, the first implementation was done on a Macintosh. As it turned out, Python is remarkably free from many of the hang-ups of conventional programming languages. This is perhaps due to my choice of examples: besides ABC, my main influence was Modula-3. This is another language with remarkable elegance and power, designed by a small, strong-willed team (most of whom I had met during a summer internship at DEC's Systems Research Center in Palo Alto). Imagine what Python would have looked like if I had modeled it after the UNIX shell and C instead! (Yes, I borrowed from C too, but only its least controversial features, in my desire to please the UNIX/C audience.) Any individual creation has its idiosyncracies, and occasionally its creator has to justify them. Perhaps Python's most controversial feature is its use of indentation for statement grouping, which derives directly from ABC. It is one of the language's features that is dearest to my heart. It makes Python code more readable in two ways. First, the use of indentation reduces visual clutter and makes programs shorter, thus reducing the attention span needed to take in a basic unit of code. Second, it allows the programmer less freedom in formatting, thereby enabling a more uniform style, which makes it easier to read someone else's code. (Compare, for instance, the three or four different conventions for the placement of braces in C, each with strong proponents.) This emphasis on readability is no accident. As an object-oriented language, Python aims to encourage the creation of reusable code. Even if we all wrote perfect documentation all of the time, code can hardly be considered reusable if it's not readable. Many of Python's features, in addition to its use of indentation, conspire to make Python code highly readable. This reflects the philosophy of ABC, which was intended to teach programming in its purest form, and therefore placed a high value on clarity. Readability is often enhanced by reducing unnecessary variability. When possible, there's a single, obvious way to code a particular construct. This reduces the number of choices facing the programmer who is writing the code, and increases the chance that it will appear familiar to a second

programmer reading it. Yet another contribution to Python's readability is the choice to use punctuation mostly in a conservative, conventional manner. Most operator symbols are familiar to anyone with even a vague recollection of high school math, and no new meanings have to be learned for comic strip curse characters like @&$!. I will gladly admit that Python is not the fastest running scripting language. It is a good runner-up, though. With ever-increasing hardware speed, the accumulated running time of a program during its lifetime is often negligible compared to the programmer time needed to write and debug it. This, of course, is where the real time savings can be made. While this is hard to assess objectively, Python is considered a winner in coding time by most programmers who have tried it. In addition, many consider using Python a pleasurea better recommendation is hard to imagine. I am solely responsible for Python's strengths and shortcomings, even when some of the code has been written by others. However, its success is the product of a community, starting with Python's early adopters who picked it up when I first published it on the Net, and who spread the word about it in their own environment. They sent me their praise, criticism, feature requests, code contributions, and personal revelations via email. They were willing to discuss every aspect of Python in the mailing list that I soon set up, and to educate me or nudge me in the right direction where my initial intuition failed me. There have been too many contributors to thank individually. I'll make one exception, however: this book's author was one of Python's early adopters and evangelists. With this book's publication, his longstanding wish (and mine!) of having a more accessible description of Python than the standard set of manuals, has been fulfilled. But enough rambling. I highly recommend this book to anyone interested in learning Python, whether for personal improvement or as a career enhancement. Take it away, Eric, the orchestra leader! (If you don't understand this last sentence, you haven't watched enough Monty Python reruns.) Guido van Rossum Reston, Virginia, May 1996

Preface "And Now for Something Completely Different . . . Again" This book teaches application-level programming with Python. That is, it is about what you can do with the language once you've mastered its fundamentals. By reading this book, you will learn to use Python in some of its most common roles: to build GUIs, web sites, networked tools, scripting interfaces, system administration programs, database and text processing utilities, and more. Along the way, you will also learn how to use the Python language in realistically scaled programsconcepts such as object-oriented programming (OOP) and code reuse are recurring side themes throughout this text. And you will gain enough information to further explore the application domains introduced in the book, as well as to explore others.

About This Book Now that I've told you what this book is, I should tell you what it is not. First of all, this book is not a reference manual. Although the index can be used to hunt for information, this text is not a dry collection of facts; it is designed to be read. And while many larger examples are presented along the way, this book is also not just a collection of minimally documented code samples. Rather, this book is a tutorial that teaches the most common Python application domains from the ground up. It covers each of Python's target domains gradually, beginning with in-depth discussions of core concepts in each domain, before progressing toward complete programs. Large examples do appear, but only after you've learned enough to understand their techniques and code. For example, network scripting begins with coverage of network basics and protocols and progresses through sockets, client-side tools, HTML and CGI fundamentals, and web frameworks. GUI programming gets a similarly gentle presentation, with one introductory and two tutorial chapters, before reaching larger, complete programs. And system interfaces are explored carefully before being applied in real and useful scripts. In a sense, this book is to application-level programming what the book Learning Python is to the core Python languagea learning resource that makes no assumptions about your prior experience in the domains it covers. Because of this focus, this book is designed to be a natural follow-up to the core language material in Learning Python and a next step on the way to mastering the many facets of Python programming. In deference to all the topic suggestions I have received over the years, I should also point out that this book is not intended to be an in-depth look at specific systems or tools. With perhaps one million Python users in the world today, it would be impossible to cover in a useful way every Python-related system that is of interest to users. Instead, this book is designed as a tutorial for readers new to the application domains covered. The web chapters, for instance, focus on core web scripting ideas, such as server-side scripts and state

retention options, not on specific systems, such as SOAP, Twisted, and Plone. By reading this book, you will gain the groundwork necessary to move on to more specific tools such as these in the domains that interest you.

About This Edition To some extent, this edition's structure is a result of this book's history. The first edition of this book, written in 1995 and 1996, was the first book project to present the Python language. Its focus was broad. It covered the core Python language, and it briefly introduced selected application domains. Over time, the core language and reference material in the first edition evolved into more focused books Learning Python and Python Pocket Reference. Given that evolution, the second edition of this book, written from 1999 to 2000, was an almost completely new book on advanced Python topics. Its content was an expanded and more complete version of the first edition's application domain material, designed to be an application-level follow-up to the core language material in Learning Python, and supplemented by the reference material in Python Pocket Reference. The second edition focused on application libraries and tools rather than on the Python language itself, and it was oriented toward the practical needs of real developers and real tasksGUIs, web sites, databases, text processing, and so on. This third edition, which I wrote in 2005 and 2006, is exactly like the second in its scope and focus, but it has been updated to reflect Python version 2.4, and to be compatible with the upcoming Python 2.5. It is a minor update, and it retains the second edition's design and scope as well as much of its original material. However, its code and descriptions have been updated to incorporate both recent changes in the Python language, as well as current best practices in Python programming.

Python Changes You'll find that new language features such as string methods, enclosing-function scope references, list comprehensions, and new standard library tools, such as the email package, have been integrated throughout this edition. Smaller code changesfor instance, replacing apply calls and exc_type usage with the newer func(*args) and exc_info( )have been applied globally as well (and show up surprisingly often, because this book is concerned with building general tools). All string-based, user-defined exceptions are now class-based, too; string exceptions appeared half a dozen times in the book's examples, but are documented as deprecated today. This is usually just a matter of changing to class MyExc(Exception): pass , though, in one case, exception constructor arguments must be extracted manually with the instance's args attribute. 'X' also became repr(X) across all examples, and I've replaced some appearances of while 1: with the newer and more mnemonic while True:, though either form works as advertised and C programmers often find the former a natural pattern. Hopefully, these changes will future-proof the examples for as long as possible; be sure to watch the updates page described later for future Python changes. One futurisms note: some purists might notice that I have not made all classes in this book derive from object to turn on new-style class features (e.g., class MyClass(object)). This is partly because the programs here don't employ the new-style model's slightly modified search pattern or advanced extensions. This is also because Python's creator, Guido van Rossum, told me that he believes this derivation will not be required in Python 3.0standalone classes will simply be new-style too, automatically (in fact, the new-style class distinction is really just a temporary regression due to its incompatible search order in particular rare, multiple-inheritance trees). This is impossible to predict with certainty, of course, and Python 3.0 might abandon compatibility in other ways that break some examples in this book. Be sure to both watch for 3.0 release notes and keep an eye on this book's updates page over time.

Example Changes You'll also notice that many of the second edition's larger examples have been upgraded substantially, especially the two larger GUI and CGI email-based examples (which are arguably the implicit goals of much of the book). For instance: The PyMailGUI email client is a complete rewrite and now supports sending and receiving attachments, offline viewing from mail save files, true transfer thread overlap, header-only fetches and mail caches, auto-open of attachments, detection of server inbox message number synchronization errors, and more. The PyMailCGI email web site was also augmented to support sending and receiving mail attachments, locate an email's main text intelligently, minimize mail fetches to run more efficiently, and use the PyCrypto extension for password encryption. The PyEdit text editor has grown a font dialog; unlimited undo and redo; a configuration module for fonts, colors, and sizes; intelligent modified tests on quit, open, new, and run; and caseinsensitive searches. PyPhoto, a new, major example in Chapter 12, implements an image viewer GUI with Tkinter and the optional PIL extension. It supports cached image thumbnails, image resizing, saving images to files, and a variety of image formats thanks to PIL. PyClock has incorporated a countdown timer and a custom window icon; PyCalc has various cosmetic and functionality upgrades; and PyDemos now automatically pops up examples' source files. In addition to the enhanced and new, major examples, you'll also find many other examples that demonstrate new and advanced topics such as thread queues.

Topic Changes In addition to example changes, new topics have been added throughout. Among these are the following: Part II, System Programming, looks at the struct, mimetools, and StringIO modules and has been updated for newer tools such as file iterators. Part III, GUI Programming, has fresh coverage of threading and queues, the PIL imaging library, and techniques for linking a separately spawned GUI with pipes and sockets. Part IV, Internet Programming, now uses the new email package; covers running a web server on your local machine for CGI scripts; has substantially more on cookies, Zope, and XML parsing; and uses the PyCrypto encryption toolkit. Chapter 19, Databases and Persistence, has new ZODB examples and much-expanded coverage of the SQL API, including dozens of new pages on using MySQL and ZODB. Chapter 21, Text and Language, has a new, gentler introduction to pattern matching and mentions Python 2.4 templates.

Chapter 22, Extending Python, now introduces Distutils and includes overviews of Pyrex, SIP, ctypes, Boost.Python, and CXX, in addition to offering updated SWIG coverage. Beyond these specific kinds of changes, some material has been reorganized to simplify the overall structure. For example, a few chapters have been split up to make them less challenging; appendixes have been removed to save space (references are available separately); and the PyErrata web site example chapter has been removed (it didn't present many new concepts, so we've made it and its code available in the book's examples distribution as optional reading). You'll also find a new "Sneak Preview" chapter for readers in a hurrya throwback to the first edition. This chapter takes a single example from command line to GUI to web site, and introduces Python and its libraries along the way.

Focus Unchanged Fundamentally, though, this edition, like the second, is still focused on ways to use Python rather than on the language itself. Python development concepts are explored along the wayin fact, they really become meaningful only in the context of larger examples like those in this edition. Code structure and reuse, for instance, are put into practice by refactoring and reusing examples throughout the book. But in general, this text assumes that you already have at least a passing acquaintance with Python language fundamentals, and it moves on to present the rest of the Python storyits application to real tasks. If you find code in this book confusing, I encourage you to read Learning Python as a prelude to this text. In the remainder of this preface, I'll explain some of the rationales for this design, describe the structure of this edition in more detail, and give a brief overview of how to use the Python programs shipped in the book examples package.

This Book's Motivation Over the 10 years since the first edition of this book was written, Python has transitioned from an emerging language that was of interest primarily to pioneers to a widely accepted tool used by programmers for day-to-day development tasks. Along the way, the Python audience has changed as well, and this book has been refocused with this new readership in mind. You will find that it is a nuts-and-bolts text, geared less toward introducing and popularizing the language and more toward showing you how to apply Python for realistically scaled programming tasks.

So, What's Python? If you are looking for a concise definition of this book's topic, try this: Python is a general-purpose, open source computer programming language optimized for quality, productivity, portability, and integration. It is used by hundreds of thousands of developers around the world in areas such as Internet scripting, systems programming, user interfaces, product customization, and more. As a popular programming language that shrinks the development time, Python is used in a wide variety of products and roles. Counted among its current user base are Google, Industrial Light & Magic, ESRI, the BitTorrent file sharing system, NASA's Jet Propulsion Lab, and the U.S. National Weather Service. Python's application domains range from system administration, web site development, cell phone scripting, and education to hardware testing, investment analysis, computer games, and spacecraft control. Among other things, Python sports OOP; a remarkably simple, readable, and maintainable syntax; integration with C components; and a vast collection of precoded interfaces and utilities. Its tool set makes it a flexible and agile language, ideal for both quick tactical tasks as well as longer-range strategic application development efforts. Although it is a general-purpose language, Python is often called a scripting language because it makes it easy to utilize and direct other software components. Perhaps Python's best asset is simply that it makes software development more rapid and enjoyable. To truly understand how, read on; we'll expand on these concepts in the next chapter.

Since writing the first edition, I have also had the opportunity to teach Python classes in the U.S. and abroad some 170 times as of mid-2006, and this book reflects feedback garnered from these training sessions. The application domain examples, for instance, reflect interests and queries common among the thousands of students I have introduced to Python. Teaching Python to workers in the trenches, many of whom are now compelled to use Python on the job, also inspired a new level of practicality that you will notice in this book's examples and topics. Other book examples are simply the result of me having fun programming Python. Yes, fun. One of the most common remarks I hear from Python newcomers is that Python is actually enjoyable to useit is able to both kindle the excitement of programming among beginners and rekindle that excitement among those who have toiled for years with more demanding tools. When you can code as fast as you can think, programming becomes a very different proposition and feels more like pleasure than work. As you will see in this book, Python makes it easy to play with advanced but practical tools such as threads, sockets, GUIs, web sites, and OOPareas that can be both tedious and daunting in traditional languages such as C and C++. It enables things you may not have considered or attempted with other tools. Frankly, even after 14 years as a bona fide Pythonista, I still find programming most enjoyable when I do it in Python. Python is a remarkably productive and flexible language, and witnessing its application firsthand is an aesthetic pleasure. I hope this edition, as much as the two before it, will both demonstrate how to reap Python's productivity benefits and communicate some of the excitement to be found in this rewarding tool.

This Book's Structure Although code examples are reused throughout the book and later chapters build upon material in earlier chapters (e.g., GUIs), topics in this book are covered fairly independently and are grouped together in different parts. If a particular domain's part doesn't interest you, you can generally skip ahead to a part that does. As a result, it's not too much of a stretch to consider this edition as akin to four or five books in one. Its top-level structure underscores its application-topics focus (see the Table of Contents for a more fine-grained look at the book's structure):

Part I, The Beginning I start off with an overview of some of the main ideas behind Python and a quick sneakpreview chapter to whet your appetite. The sneak preview doesn't teach much, but it serves as an introduction and demo for some of the topics to come, and as a refresher for core Python concepts such as OOP.

Part II, System Programming This section explores the system-level interfaces in Python as well as their realistic applications. We'll look at topics such as threading, directory walkers, processes, environment variables, and streams, and we will apply such tools to common system administration tasks such as directory searchers and file splitters.

Part III, GUI Programming In this section, you'll learn how to build portable GUIs with Python. The Tkinter toolkit is covered from the ground up as you move from basics to techniques to constructing complete programs. You'll build text editors, clocks, and more in this part. GUIs also show up throughout the rest of the book, and they often reuse some of the tools you'll build here.

Part IV, Internet Programming In this section, you'll learn all about using Python on the Internet. I begin with network basics and sockets, move through client-side tools like FTP and email, and end up using server-side tools to implement interactive web sites. Along the way, I'll contrast different ways to move bits around the Web with Python. You'll code GUI and web-based email programs, for example, to help underscore trade-offs between client- and server-side techniques. A final chapter in this part surveys more advanced toolkits and techniques for Internet-related application developmentZope, Jython, XML, and the like.

Part V, Tools and Techniques This part is a collection of tool topics that span application domainsdatabase interfaces and object persistence, text and language processing, and data structure implementation. You'll build GUIs here for browsing databases, viewing data structures, and performing calculations.

Part VI, Integration

This part of the book looks at the interfaces available for mixing Python with programs written in C and C++. These interfaces allow Python to script existing libraries and to serve as an embedded customization tool. As you'll see, by combining Python with compiled languages, programs can be both flexible and efficient.

Part VII, The End Finally, I'll wrap up with a conclusion that looks at some of the implications of Python's scripting role. Two notes about the structure: first of all, don't let these titles fool youalthough most have to do with application topics, Python language features and general design concepts are still explored along the way, in the context of real-world goals. Secondly, readers who use Python as a standalone tool can safely skip the integration part, though I still recommend a quick glance. C programming isn't nearly as fun or as easy as Python programming is. Yet, because integration is central to Python's role as a scripting tool, a cursory understanding can be useful, regardless of whether you do integrating, scripting, or both.

This Edition's Design The best way to get a feel for any book is to read it, of course. But especially for people who are familiar with the prior edition, this section will clarify regarding what is new this time around.

It's Been Updated for Python 2.4 (and 2.5) All of the example code has been upgraded to use the latest features of the Python language and its standard library. Python is still largely compatible with the code in the first two editions, but recent language additions such as nested scopes and list comprehensions simplify many coding tasks. For instance, default arguments are no longer required to pass objects into most lambda expressions, and the new email package greatly simplifies the tasks of parsing and adding email attachments. See the Python changes list earlier in this chapter for more on this subject. Although the GUI examples in this book required almost no code changes, they have been updated to run on Tk 8.4, the library used by Python 2.4 as its standard portable GUI toolkit. Among other things, the latest Tk allows window icons to be set by the program. Although begun under 2.4, this edition is also compatible with the upcoming Python 2.5 release.

It's Been Reorganized A few chapters have been moved to make the flow more logical; for example, the sections on files and directories and the PyMailGUI example are now in chapters of their own. In addition, all appendixes were cut (this book is neither a reference nor a Python changes log), and a new initial preview chapter was added to introduce topics explored throughout the book. As mentioned earlier, in deference to space, one second-edition chapterthat on the PyErrata web sitehas been cut in this edition. Its main, unique topics on state retention have been incorporated into other chapters. The original source code for the PyErrata site still appears on the book's examples package, as supplemental reading.[*]

[*]

I regret cutting this chapter, but new material was added, and as you can tell, this is already a substantial book. As my first editor, Frank Willison, famously said when the second edition came out, if this book were run over by a truck, it would do damage....

It Covers New Topics You'll find much-expanded coverage of Zope, the ZODB database, threading tools and techniques including the queue module, SQL interfaces, XML parsing, and more. See the example and topic changes lists provided earlier for additional details. Most of the new or expanded topics are a result of the evolution of common practice in the Python world. While this book doesn't address core language evolution directly (the basics of new language tools such as list comprehensions are the domain of the text Learning Python), it does employ it throughout its examples.

It's Still Mostly Platform-Neutral Except for some C integration examples, the majority of the programs in this edition were developed on Windows XP computers, with an eye toward portability to Linux and other platforms. In fact, some of the examples were born of my desire to provide portable Python equivalents of tools missing on Windows (e.g., file splitters). When programs are shown in action, it's usually on Windows; they are demonstrated on the Linux platform only if they exercise Unix-specific interfaces. This is not meant as a political statement; it is mostly a function of the fact that I wrote this book with Microsoft Word. When time is tight, it's more convenient to run scripts on the same platform as your publishing tools than to frequently switch platforms. Luckily, because Python has become so portable, the underlying operating system is largely irrelevant to developers. Python, its libraries, and its Tkinter GUI framework work extremely well on all major platforms today. Where platform issues do come into play, though, I've made the examples as platform-neutral as possible, and I point out platform-specific issues along the way. Generally speaking, most of the scripts should work unchanged on common Python platforms. For instance, all the GUI examples were tested on both Windows (ME, XP) and Linux (KDE, Gnome), and most of the command-line and thread examples were developed on Windows but work on Linux too. Because Python's system interfaces are built to be portable, this is easier than it may sound; it's largely automatic. On the other hand, this book does delve into platform-specific topics where appropriate. For instance, there is coverage of many Windows-specific topicsActive Scripting, COM, program launch options, and so on. Linux and Unix readers will also find material geared toward their platformsforks, pipes, and the like.

C integration code platform issues The one place where readers may still catch a glimpse of platform biases is in the Python/C integration examples. For simplicity, the C compilation details covered in this text are still somewhat Unix/Linux-oriented. One can make a reasonable case for such a focusnot only does Linux come with C compilers, but the Unix development environment it provides grew up around that language. On standard Windows, the C code shown in this book will work, but you may need to use different build procedures (they vary per Windows compiler, some of which are very similar to Linux compilers). In fact, for this third edition of the book, many of the C integration examples were run on the Cygwin system, not on Linux. Cygwin provides a complete, Unix-like environment and library for Windows. It includes C development tools, command-line utilities, and a version of Python that supports Unix tools not present in the standard Windows Python, including process forks and fifos. Unlike Linux, because it runs on Windows, Cygwin does not require a complete operating system installation (see

http://www.cygwin.com). Cygwin has a GPL-style, open source license that requires giving away code (more on this later in the book). If you do not wish to download and install Cygwin, you may have to translate some of the C integration build files for your platform; the standard C development concepts apply. On standard Windows, you'll have to translate for your C compiler. O'Reilly has published an outstanding text, Python Programming on Win32, that covers Windows-specific Python topics like this, and it should help address any disparity you may find here.

It's Still Focused for a More Advanced Audience Becoming proficient in Python involves two distinct tasks: learning the core language itself, and then learning how to apply it in applications. This book addresses the latter (and larger) of these tasks by presenting Python libraries, tools, and programming techniques. Learning Python syntax and datatypes is an important first step, and a prerequisite to this book. Very soon after you've learned how to slice a list, though, you'll find yourself wanting to do real things, like writing scripts to compare file directories, responding to user requests on the Internet, displaying images in a window, reading email, and so on. Most of the day-to-day action is in applying the language, not the language itself. That's what this book is for. It covers libraries and tools beyond the core language, which become paramount when you begin writing real applications. It also addresses larger software design issues such as reusability and OOP, which can be illustrated only in the context of realistically scaled programs. Because it assumes you already know Python, this is a somewhat advanced text; again, if you find yourself lost, you might do well to learn the core language from other resources before returning here.

It's Still Example-Oriented Although this book teaches concepts before applying them, it still contains many larger working programs that tie together concepts presented earlier in the book and demonstrate how to use Python for realistically scaled tasks. Among them:

PyEdit A Python/Tk text-file editor object and program

PyView A photo image and note-file slideshow

PyDraw A paint program for drawing and moving image objects

PyTree

A tree data structure drawing program

PyClock A Python/Tk analog and digital clock widget

PyToe An AI-powered graphical tic-tac-toe program

PyForm A persistent object table browser

PyCalc A calculator widget in Python/Tk

PyMailGUI A Python/Tkinter POP and SMTP email client

PyFtp A simple Python/Tk file-transfer GUI

PyMailCGI A web-based email client interface

PyPhoto A new thumbnail picture viewer with resizing and saves See the earlier example changes list for more about how some of these have mutated in this edition. Besides the major examples listed here, there are also mixed-mode C integration examples (e.g., callback registration and class object processing); SWIG examples (with and without "shadow" classes for C++); more Internet examples (FTP upload and download scripts, NNTP and HTTP examples, email tools, and socket and select module examples); many examples of Python threads and thread queues; and coverage of Jython, HTMLgen, Zope, COM, XML parsing, and Python ZODB and MySQL database interfaces. In addition, as mentioned earlier, the second edition's PyErrata web site example appears in the examples distribution.

But It's Still Not a Reference Manual This edition, like the first, is still more of a tutorial than a reference manual (despite sharing a title pattern with a popular Perl reference text). This book aims to teach, not to document. You can use its table of contents and index to track down specifics, and the new structure helps make this easy to do. But this edition is still designed to be used in conjunction with, rather than to replace, Python reference manuals. Because Python's manuals are free, well written, available online, and change frequently, it would be folly to devote space to parroting their content. For an exhaustive list of all tools available in the Python system, consult other books (e.g., O'Reilly's Python Pocket Reference and Python in a Nutshell) or the standard manuals at Python's web site (see http://www.python.org/doc).

Using the Book's Examples Because examples are central to the structure of this book, I want to briefly describe how to use them here. In general, though, see the following text files in the examples directory for more details:

README-root.txt Package structure notes

PP3E\README-PP3E.txt General usage notes Of these, the README-PP3E.txt file is the most informative. In addition, the PP3E\Config directory contains low-level configuration file examples for Windows and Linux, which may or may not be applicable to your usage. I give an overview of some setup details here, but the preceding files give the complete description.

The Book Examples Tree In a sense, the directory containing the book's examples is itself a fairly sophisticated Python software system and the examples within it have been upgraded structurally in a number of important ways:

Examples directory tree: a package The entire examples distribution has been organized as one Python module package to facilitate cross-directory imports and avoid name clashes with other Python code installed on your computer. All cross-directory imports in book examples are package imports, relative to the examples root directory.

Using directory paths in import statements (instead of a complex PYTHONPATH) also tends to make it easier to tell where modules come from. Moreover, you now need to add only one directory to your PYTHONPATH search-path setting for the entire book examples tree: the directory containing the PP3E examples root directory. To reuse code in this book within your own applications, simply import through the PP3E package root (e.g., from PP3E.Launcher import which, or import PP3E.Gui.Tools.threadtools ).

Example filenames Module names are now descriptive and of arbitrary length (I punted on 8.3 DOS compatibility long ago), and any remaining all-uppercase filenames are long gone.

Example listing titles Labels of example listings give the full directory pathname of the example's source file to help you locate it in the examples distribution. For instance, an example source-code file whose name is given as Example N-M: PP3E\Internet\Ftp\sousa.py refers to the file sousa.py in the PP3E\Internet\Ftp subdirectory of the examples directory. The examples directory is the directory containing the top-level PP3E directory of the book examples tree. The examples tree is simply the Examples directory of the book examples distribution, described further in the next section.

Example command lines Similarly, command lines give their directory context. For example, when a command line is shown typed after a system prompt, as in ...\PP3E\System\Streams>, it is really to be typed at a system command-line prompt, while working in the PP3E\System\Streams subdirectory in your examples directory. Unix and Linux users: think / when you see \ in filename paths.

Example launchers Because it's just plain fun to click on things right away, there are also self-configuring demo launcher programs (described later), to give you a quick look at Python scripts in action with minimal configuration requirements. You can generally run them straight from the examples package without any configuration.

The Book Examples Distribution Package You can find the book examples distribution package on the book's web page at O'Reilly's web site, http://www.oreilly.com/catalog/python3/. The book examples directory is located in the PP3E subdirectory of the topmost Examples directory in the packagethat is, Examples\PP3E on Windows and Examples/PP3E on Linux. If you've copied the examples to your machine, the examples directory is wherever you copied the PP3E root directory. Example titles reflect this tree's structure. For instance, an example title of PP3E\Preview\mod.py refers to the Examples\PP3E\Preview\mod.py file at the top level of the book examples distribution package. You can run most of the examples from within the package directly, but if you obtained them on a CD, you'll want to copy them to a writable medium such as your hard drive to make changes, and to allow Python to save .pyc compiled bytecode files for quicker startups. See the example package's

top-level README file for more details, or browse the examples directory in your favorite file explorer for a quick tour. Depending on how often the book's distribution package is maintained, it may also contain extra open source packages such as the latest releases of Python, the SWIG code generator, and Windows extensions, but you can always find up-to-date releases of Python and other packages on the Web (see Python's web site, http://www.python.org, or search the Web). In fact, you shouldmost likely, the Web will very quickly become more current than any extra software included in the book's package.

Running Examples: The Short Story Now the fun stuffif you want to see some Python examples right away, do this:

1. Install Python from the book's distribution package or from Python's web site (http://www.python.org), unless it is already present on your computer. If you use a Linux or recent Macintosh, Python is probably already installed. On Windows, click on the name of the Python self-installer program and do a default install (click Yes or Next in response to every prompt). On other systems, see the README file. 2. Start one of the following self-configuring scripts located in the top-level Examples\PP3E directory of the book examples package. Either click on their icons in your file explorer, or run them from your system prompt (e.g., a Windows console box, or Linux xterm) using command lines of the form python scriptname (you may need to use the full path to python if it's not implicit on your system):

Launch_PyDemos.pyw The main Python/Tk demo launcher toolbar

Launch_PyGadgets_bar.pyw A Python/Tk utilities launcher bar

Launch_PyGadgets.py Starts the standard Python/Tk utilities

LaunchBrowser.py Opens the web examples index page in your web browser The Launch_* scripts start Python programs portably[*] and require only that Python be installedyou don't need to set any environment variables first to run them. LaunchBrowser will work if it can find a web browser on your machine even if you don't have an Internet link (though some Internet examples won't work completely without a live link). [*]

All the demo and launcher scripts are written portably but are known to work only on Windows and Linux at the time of this writing; they may require minor changes on other platforms. Apologies if you're using a platform that I could not test: Tk runs on Windows, X Windows, and Macs; Python itself runs on everything from PDAs, iPods, and cell phones to real-time systems, mainframes, and

supercomputers; and my advance for writing this book wasn't as big as you may think.

The demo launchers also include a number of web-based programs that use a web browser for their interface. When run, these programs launch a locally running web server coded in Python (we'll meet this server script later in this book). Although these programs can run on a remote server too, they still require a local Python installation to be used with a server running on your machine.

Running Examples: The Details This section goes into a few additional details about running the book's example programs. If you're in a hurry, feel free to skip this and run the programs yourself now.

Demos and gadgets To help organize the book's examples, I've provided a demo launcher program GUI, PyDemos2.pyw, in the top-level PP3E directory of the examples distribution. Figure P-1 shows PyDemos in action on Windows after pressing a few buttons. We'll meet in this text all the programs shown in the figure. The launcher bar itself appears on the top right of the screen; with it, you can run most of the major graphical examples in the book with a mouse click, and view their source code in pop-up windows. The demo launcher bar can also be used to start major Internet book examples if a browser can be located on your machine and a Python-coded server can be started.

Figure P-1. The PyDemos launcher with gadgets and demos

Besides launching demos, the PyDemos source code provides pointers to major examples in the examples tree; see its code for details. You'll also find automated build scripts for the Python/C integration examples in the Integration examples directory, which serve as indexes to major C examples. I've also included a top-level program called PyGadgets.py, and its relative, PyGadgets_bar.pyw, to launch some of the more useful GUI book examples for real use instead of demonstration (mostly, the programs I use; configure as desired). Run PyGadgets_bar to see how it looksit's a simple row of buttons that pop up many of the same programs shown in Figure P-1, but for everyday use, not for demonstrations. All of its programs are presented in this book as well and are included in the examples distribution package. See the end of Chapter 10 for more on PyDemos and PyGadgets.

Setup requirements Most demos and gadgets require a Python with Tkinter GUI support, but that is the default configuration for Python out-of-the-box on Windows. As a result, most of the demos and gadgets should "just work" on Windows. On some other platforms, you may need to install or enable Tkinter for your Python; try it and seeif you get an error about Tkinter not being found, you'll need to configure it. If it's not already present, Tkinter support can be had freely on the Web for all major platforms (more on this in the GUI part of this book, but search the Web with Google for quick pointers).

Two external dependency notes: PyPhoto will not run without PIL, and PyMailCGI runs without PyCrypto but uses it if installed. Both PIL and PyCrypto are open source third-party extension packages, but must be installed in addition to Python. Some book examples use additional thirdparty tools (for instance, ZODB and MySQL in the database chapter), but these are not launched from the demos and gadgets interfaces. To run the files listed in the preceding section directly, you'll also need to set up your Python module search path, typically with your PYTHONPATH environment variable or a .pth file. The book examples tree ships as a simple directory and does not use Python's Distutils scripts to install itself in your Python's site packages directory (this system works well for packed software, but can add extra steps for viewing book examples). If you want to run a collection of Python demos from the book right away, though, and you don't want to bother with setting up your environment first, simply run these self-launching utility scripts in the PP3E directory instead: Launch_PyDemos.pyw Launch_PyGadgets_bar.pyw Launch_PyGadgets.py These Python-coded launcher scripts assume Python has already been installed, but will automatically find your Python executable and the book examples distribution and set up your Python module and system search paths as needed to run their programs. You can probably run these launcher scripts by simply clicking on their names in a file explorer, and you should be able to run them directly from the book's examples package tree (you can read more about these scripts in Part II of the book).

Web-based examples Beginning with this edition of the book, its browser-based Internet examples are not installed on a remote server. Instead, we'll be using a Python-coded web server running locally to test these examples. If you launch this server, though, you can also test-drive browser-based examples too. You can find more on this in the Internet section of this book. For a quick look, though, PyDemos attempts to launch both a web server and a web browser on your machine automatically for the major example web pages. You start the browser by running the LaunchBrowser.py script in the examples root directory. That script tries to find a usable browser on your machine, with generally good results; see the script for more details if it fails. The server is implemented by a Python script, assuming you have permission to run an HTTP server on your machine (you generally do on Windows). Provided the server starts and LaunchBrowser can find a browser on your machine, some demo buttons will pop up web pages automatically. Figure P-2, for example, shows the web examples index page running under a local server and the Firefox browser.

Figure P-2. The PyInternetDemos web page

Clicking this page's links runs various server-side Python CGI scripts presented in the book. Of special interest, the getfile.html link on this page allows you to view the source code of any other file in the book's web server directoryHTML code, Python CGI scripts, and so on; see Chapter 16 for details.

Top-level programs To summarize, here is what you'll find in the top-level Examples\PP3E directory of the book's examples package:

PyDemos.pyw Button bar for starting major GUI and Internet examples in demo mode

PyGadgets_bar.pyw

Button bar for starting GUIs in PyGadgets on demand

PyGadgets.py Starts programs in nondemo mode for regular use

Launch_*.py* Starts the PyDemos and PyGadgets programs using Launcher.py to autoconfigure search paths (run these for a quick look)

LaunchBrowser.py Opens example web pages with an automatically located web browser

Launcher.py Utility used to start programs without environment settingsfinds Python, sets PYTHONPATH, and spawns Python programs You'll also find subdirectories for examples from each major topic area of the book. In addition, the top-level PP3E\PyTools directory contains Python-coded command-line utilities for converting line feeds in all example text files to DOS or Unix format (useful if they look odd in your text editor); making all example files writable (useful if you drag-and-drop off a CD on some platforms); deleting old .pyc bytecode files in the tree; and more. Again, see the example directory's README-PP3E.txt file for more details on all example issues.

Conventions Used in This Book The following font conventions are used in this book:

Italic Used for file and directory names, to emphasize new terms when first introduced, and for some comments within code sections

Constant width Used for code listings and to designate modules, methods, options, classes, functions, statements, programs, objects, and HTML tags

Constant width bold Used in code sections to show user input

Constant width italic

Used to mark replaceables This icon designates a note related to the nearby text.

This icon designates a warning related to the nearby text.

Safari® Enabled

When you see a Safari® Enabled icon on the cover of your favorite technology book, that means the book is available online through the O'Reilly Network Safari Bookshelf. Safari offers a solution that's better than e-books. It's a virtual library that lets you easily search thousands of top tech books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at http://safari.oreilly.com.

Where to Look for Updates As before, updates, corrections, and supplements for this book will be maintained at the author's web site, http://www.rmi.net/~lutz. Look for the third edition's link on that page for all supplemental information related to this version of the book. As for the first two editions, I will also be maintaining a log on this web site of Python changes over time, which you should consider a supplemental appendix to this text. O'Reilly's web site, http://www.oreilly.com, also has an errata report system, and you should consider the union of these two lists to be the official word on book bugs and updates.

Contacting O'Reilly You can also address comments and questions about this book to the publisher: O'Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States and Canada) 707-827-7000 (international/local) 707-829-0104 (fax) O'Reilly has a web page for this book, which lists errata, examples, and any additional information. You can access this page at:

http://www.oreilly.com/catalog/python3 To comment or ask technical questions about this book, send email to:

[email protected] For more information about books, conferences, software, Resource Centers, and the O'Reilly Network, see the O'Reilly web site at: http://www.oreilly.com

Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you're reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O'Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product's documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: "Python Programming, Third Edition, by Mark Lutz. Copyright 2006 O'Reilly Media, Inc., 978-0-596-00925-0."

Acknowledgments In closing, I would like to extend appreciation to a few of the people who helped in some way during all the incarnations of this book project: To this book's first editor, the late Frank Willison, for the early years. To this book's later editors, for tolerating my nondeterministic schedule: Laura Lewin on the second edition, Jonathan Gennick on the third edition, and Mary O'Brien at the end. To the people who took part in a technical review of an early draft of this edition: Fredrik Lundh, Valentino Volonghi, Anna Ravenscroft, and Kyle VanderBeek. To Python creator Guido van Rossum, for making this stuff fun again. To Tim O'Reilly and the staff of O'Reilly, both for producing this book and for supporting open source software in general. To the Python community at large, for quality, simplicity, diligence, and humor. To C++, for frustrating me enough to compel me toward Python; I think I'd rather flip burgers than go back :-). To the thousands of students of the 170 Python classes I have taught so far, for your feedback on Python in general, and its applications. You taught me how to teach. To the scores of readers who took the time to send me comments about the first two editions of

this book. Your opinions helped shape this book's evolution. And finally, a few personal notes of thanks. To all the friends I've met on the training trail, for hospitality. To my mom, for wisdom and courage. To OQO, for toys. To my brothers and sister, for old days. To Guinness, for the beer in Dublin. To Birgit, for inspiration and spleenwurst. And to my children, Michael, Samantha, and Roxanne, for hope. Mark Lutz April 2006 Somewhere in Colorado, or an airport near you

Part I: The Beginning This part of the book gets things started by introducing the Python language and taking us on a quick tour of some of the most common ways it is applied.

Chapter 1 Here, we'll take a "behind the scenes" look at Python and its world by presenting some of its history, its major uses, and the common reasons people choose it for their projects. This is essentially a management-level, nontechnical introduction to Python.

Chapter 2 This chapter uses a simple examplerecording information about peopleto briefly introduce some of the major Python application domains we'll be studying in this book. We'll migrate the same example through multiple steps. Along the way, we'll meet databases, GUIs, web sites, and more. This is something of a demo chapter, designed to pique your interest. We won't learn much here, but we'll have a chance to see Python in action before digging into the details. This chapter also serves as a review of some core language ideas you should be familiar with before starting this book, such as data representation and object-oriented programming (OOP). The point of this part of the book is not to give you an in-depth look at Python, but just to let you sample its application. It will also provide you with a grounding in Python's broader goals and purpose.

Chapter 1. Introducing Python Section 1.1. "And Now for Something Completely Different" Section 1.2. Python Philosophy 101 Section 1.3. The Life of Python Section 1.4. Signs of the Python Times Section 1.5. The Compulsory Features List Section 1.6. What's Python Good For? Section 1.7. What's Python Not Good For? Section 1.8. Truth in Advertising

1.1. "And Now for Something Completely Different" This book is about using Python, an easy-to-use, flexible, object-oriented, mature, popular, and open source[*] programming language designed to optimize development speed. Although it is completely general purpose, Python is often called a scripting language, partly because of its sheer ease of use and partly because it is commonly used to orchestrate or "glue" other software components in an application. Python is also commonly known as a high-level language, because it automates most low-level tasks that programmers must handle manually in traditional languages such as C. [*]

Open source systems are sometimes called freeware, in that their source code is freely distributed and community controlled. Don't let that concept fool you, though; with roughly 1 million users in that community today, Python is very well supported. For more information on open source, see http://opensource.org.

If you are new to Python, chances are you've heard about the language somewhere but are not quite sure what it is about. To help you get started, this chapter provides a general introduction to Python's features and roles. Most of it will make more sense once you have seen real Python programs, but let's first take a quick pass over the forest before wandering among the trees. In this chapter, we'll explore Python's philosophy, its history, and some of its most prominent benefits and uses, before digging into the details.

1.2. Python Philosophy 101 In the Preface, I mentioned that Python emphasizes concepts such as quality, productivity, portability, and integration. Since these four terms summarize most of the reasons for using Python, I'd like to define them in a bit more detail.

Software quality Python makes it easy to write software that can be understood, reused, and modified. It was deliberately designed to raise development quality expectations in the scripting world. Python's clear syntax and coherent design, for example, almost force programmers to write readable codea critical feature for software that may be changed or reused by others in the future. Of equal importance, because the Python language tries to do better, so too do Python developers and the Python community at large. In the Python world, one finds a refreshing focus on quality concepts such as simplicity, explicitness, and readabilityideas often given little more than a passing glance in some camps. (For more on this Python-inspired mindset, see the sidebar "The Python 'Secret Handshake'," near the end of this chapter.) The Python language really does look like it was designed and not accumulated. It has an orthogonal, explicit, and minimalist design that makes code easy to understand and easy to predict. Python approaches complexity by providing a simple core language and splitting application-specific tools into a large set of modular library components. As a popular slogan attests, the result is that Python "fits your brain"it's possible to use the language without constantly flipping through reference manuals. This design makes Python ideal as a customization language for nonexperts. Perhaps most important is that by limiting the number of possible interactions in your code, Python reduces both program complexity and the potential for bugs. Besides being well designed, Python is also well tooled for modern software methodologies such as structured, modular, and object-oriented design, which allow code to be written once and reused many times. In fact, due to the inherent power and flexibility of the language, writing high-quality Python components that may be applied in multiple contexts is almost automatic.

Developer productivity Python is optimized for speed of development. It's easy to write programs fast in Python, because the interpreter handles details you must code explicitly in more complex, lower-level languages. Things such as type declarations, storage layout, memory management, common task implementations, and build procedures are nowhere to be found in Python scripts. In fact, programs written in Python are typically one-third to one-fifth as large as they would be in a language like C++ or Java, and these ratios directly correlate to improved programmer speed. Because of Python's high-level design, Python developers have less to code, less to debug, and less to maintain.

The result is a remarkably flexible and agile language, useful for both quick tactical tasks such as testing and system administration, as well as larger and long-term strategic projects employing design and analysis tools. Today, developers use Python for everything from five-line scripts to systems composed of more than 1 million lines of Python code (including IronPort's email security products suite). Its tool set allows it to scale up as needed. In both modes, Python programmers gain a crucial development speed advantage because of the language itself, as well as its library of precoded tools. For instance, the lack of type declarations alone accounts for much of the conciseness and flexibility of Python code: because code is not restricted to a specific type, it is generally applicable to many types. Any object with a compatible interface will do. And although Python is dynamically typedtypes are tracked automatically instead of being declared (it is still strongly typed)every operation is sanity checked as your program runs. Odd type combinations are errors in Python, not invocations of arbitrary magic. But fast initial development is only one component of productivity. In the real world, programmers must write code both for a computer to execute and for other programmers to read and maintain. Because Python's syntax resembles executable pseudocode, it yields programs that are easy to understand, change, and use long after they have been written. In addition, Python supports (but does not impose) advanced code reuse paradigms such as object-oriented programming, which further boost developer productivity and shrink development time.

Program portability Most Python programs run without modification on nearly every computer system in use todayon Windows, Linux, Macintosh, and everything from IBM mainframes and Cray supercomputers to real-time systems and handheld PDAs. Python programs even run on more exotic devices such as game consoles, cell phones, and the Apple iPod. Although some platforms offer nonportable extensions, the core Python language and libraries are largely platform neutral and provide tools for dealing with platform differences when they arise. For example, most Python scripts developed on Windows, Linux, or Macintosh will generally run on the other two platforms immediatelysimply copy the script's source code over to the other platforms. Moreover, a GUI program written with Python's standard Tkinter library will run on the X Windows system, Microsoft Windows, and the Macintosh, with native look-and-feel on each and without modifying the program's source code. Alternative toolkits such as wxPython and PyQt offer similar GUI portability.

Component integration Python is not a closed box: it is also designed to be integrated with other tools. Programs written in Python can be easily mixed with and can script (i.e., direct) other components of a system. This makes Python ideal as a control language and as a customization tool. When programs are augmented with a Python layer, their end users can configure and tailor them, without shipping the system's entire source code. More specifically, today Python scripts can call out to existing C and C++ libraries; use Java classes; integrate with COM, .NET, and CORBA components; communicate with other components over network protocols such as sockets, HTTP, XML-RPC, and SOAP; and more. In addition, programs written in other languages can just as easily run Python scripts by calling C and Java API functions, accessing Python-coded COM and network servers, and so on. Python allows developers to open up their products to customization in a variety of ways.

In an era of increasingly short development schedules, faster machines, and heterogeneous applications, these strengths have proven to be powerful allies to hundreds of thousands of developers, in both small and large development projects. Naturally, there are other aspects of Python that attract developers, such as its simple learning curve for developers and users alike, vast libraries of precoded tools to minimize upfront development, and a completely free nature that cuts product development and deployment costs. Python's open source nature, for instance, means that it is controlled by its users, not by a financially vested company. To put that more forcefully, because Python's implementation is freely available, Python programmers can never be held hostage by a software vendor. Unlike commercial tools, Python can never be arbitrarily discontinued. Access to source code liberates programmers and provides a final form of documentation. At the end of the day, though, Python's productivity focus is perhaps its most attractive and defining quality. As I started writing the second edition of this book in the Internet bubble era of 1999, the main problem facing the software development world was not just writing programs quickly, but finding developers with the time to write programs at all. As I write this third edition in the postboom era of 2005, it is perhaps more common for programmers to be called on to accomplish the same tasks as before, but with fewer resources. In both scenarios, developers' time is paramountin fact, it's usually much more critical than raw execution speed, especially given the speed of today's computers. As a language optimized for developer productivity, Python seems to be the right answer to the questions asked by the development world. It allows programmers to accomplish more in less time. Not only can Python developers implement systems quickly, but the resulting systems will be reusable, maintainable by others, portable across platforms, and easily integrated with other application components.

Why Not Just Use C or C++? I'm asked this question quite often, and if you're new to the scripting languages domain, you might be puzzling over this question yourself. After all, C runs very fast and is widely available. So how did Python become so popular? The short storyone we'll see in action firsthand in this bookis that people use scripting languages rather than compiled languages like C and C++ because scripting languages are orders of magnitude easier and quicker to use. Python can be used in long-term strategic roles too, but unlike compiled languages, it also works well in quick, tactical mode. As an added benefit, the resulting systems you build are easier to change and reuse over time. This is especially true in the web domain, for example, where text processing is so central, change is a constant, and development speed can make or break a project. In domains like these: Python's string objects and pattern matching make text processing a breezethere is no need to limit the size of strings, and tasks like searching, splitting, concatenation, and slicing are trivial. In C, such tasks can be tedious, because everything is constrained by a type and a size. Python's general support for data structures helps here tooyou just type a complex

nested dictionary literal, for example, and Python builds it. There is no need to lay out memory, allocate and free space, and so on. The Python language itself is much simpler to code. Because you don't declare types, for instance, your code not only becomes shorter, but also can be applied and reused in a much wider range of contexts. When there is less to code, programming is quicker. And the runtime error checking provided by scripting languages like Python makes it easier to find and fix bugs. Just as important is that a vast collection of free, web-related software is available for Python programmers to useeverything from the client and server-side protocol modules in the standard library, to third-party web application frameworks such as Zope, Plone, CherryPy, Django, and Webware. These greatly simplify the task of building enterprise-level web sites. In other domains, the same factors apply but with different available tool sets. In fact, after you use Python for awhile, you'll probably find that it enables things that you would have never considered doing in a compiled language because they would have been too difficult. Network scripting, GUIs, multitasking, and so on, can be cumbersome in C but are easy in Python. The bottom line is that C is just too complex, rigid, and slow, especially for web work. In such a dynamic domain, you need the flexibility and rapid development of a scripting language like Python. Compiled languages can run faster (depending on the sort of code you run), but speed of development tends to overshadow speed of execution on the Web. You should be warned, thoughonce you start using Python, you may never want to go back.

1.3. The Life of Python Python was invented around 1990 by Guido van Rossum, when he was at CWI in Amsterdam. It is named after the BBC comedy series Monty Python 's Flying Circus, of which Guido is a fan (see this chapter's sidebar "What's in a Name?"). Guido was also involved with the Amoeba distributed operating system and the ABC language. In fact, his original motivation for creating Python was to create an advanced scripting language for the Amoeba system. Moreover, Python borrowed many of the usability-study-inspired ideas in ABC, but added practicality in the form of libraries, datatypes, external interfaces, and more. The net effect was that Python's design turned out to be general enough to address a wide variety of domains. It is now used in increasingly diverse roles by hundreds of thousands of engineers around the world. Companies use Python today in commercial products for tasks as diverse as web site construction, hardware testing, numeric analysis, customizing C++ and Java class libraries, movie animation, and much more (more on roles in the next section). In fact, because Python is a completely general-purpose language, its target domains are limited only by the scope of computers in general. Since it first appeared on the public domain scene in 1991, Python has continued to attract a loyal following and has spawned a dedicated Internet newsgroup, comp.lang.python, in 1994. As the first edition of this book was being written in 1995, Python's home page debuted on the Web at http://www.python.orgstill the official place to find all things Python. A supplemental site, the Vaults of Parnassus, serves as a library of third-party extensions for Python application development (see http://www.vex.net/parnassus). More recently, the Python Package Index site (PyPI at http://www.python.org/pypialso known as the "Python Cheese Shop"began providing a comprehensive and automated catalog of third-party Python packages. To help manage Python's growth, organizations that are aimed at supporting Python developers have taken shape over the years: among them, the now defunct Python Software Activity (PSA) was formed to help facilitate Python conferences and web sites, and the Python Consortium was formed by organizations interested in helping to foster Python's growth. More recently, the Python Software Foundation (PSF) was formed to own the intellectual property of Python and coordinate community activities, and the Python Business Forum (PBF) nonprofit group addresses the needs of companies whose businesses are based on Python. Additional resources are available for Python training, consulting, and other services. Today, Guido is employed by Google, the web search-engine maker and a major Python user, and he devotes a portion of his time to Python. A handful of key Python developers are also employed by Zope Corporation, home to the Python-based Zope web application toolkit (see http://www.zope.org and Chapter 18; Zope is also the basis of the Plone web content management system). However, the Python language is owned and managed by an independent body, and it remains a true open source, community-driven, and self-organizing system. Hundreds, if not thousands, of individuals contribute to Python's development, following a now formal Python Enhancement Proposal (PEP) procedure and coordinating their efforts online. Other companies have Python efforts underway as well. For instance, ActiveState and PythonWare develop Python tools, O'Reilly (the publisher of this book) and the Python community organize annual Python conferences (OSCON, PyCon, and EuroPython), and O'Reilly manages a supplemental Python web site (see the O'Reilly Network's Python DevCenter at http://www.oreillynet.com/python). Although the world of professional organizations and companies changes more frequently than do

published books, the Python language will undoubtedly continue to meet the needs of its user community.

1.4. Signs of the Python Times It's been an exciting decade in the Python world. Since I wrote the first edition of this book in 1995 and 1996, Python has grown from a new kid on the scripting-languages block to an established and widely used tool in companies around the world. In fact, today the real question is not who is using Python, but who is not. Python is now used in some fashion in almost every software organizationwhether as a tactical tool for quick tasks or an implementation language for longer-range strategic projects. Although measuring the popularity of an open source, freely distributed tool such as Python is not always easy (there are no licenses to be tallied), most available statistics reveal exponential growth in Python's popularity over the last decade. Among the most recent signs of Python's explosive growth are:

Users In 1999, one leading industry observer suggested that, based on various statistics, there were as many as 300,000 Python users worldwide. Other estimates are still more optimistic. In early 2000, for instance, the Python web site was already on track to service 500,000 new Python interpreter downloads by year end in addition to other Python distribution media. Python is also a standard preinstalled item on Linux, Macintosh, and some Windows computers today and is embedded in various applications and hardware. Today, the best estimates, based on developer surveys and network activity, suggest that there are likely between 750,000 and 1 million Python users worldwide. A better estimate is impossible because of Python's open source nature, but Python clearly enjoys a large and active user community.

Applications Real organizations have adopted Python and Python-focused systems for real projects. It has been used to: Animate movies (Industrial Light & Magic, Sony Pictures Imageworks, Disney, Pixar) Perform searches on the Internet (Google, Infoseek) Script GIS mapping products (ESRI) Distribute content downloads on the Internet (BitTorrent) Predict the weather (U.S. National Weather Service, NOAA) Test computer hardware (Seagate, Intel, Hewlett-Packard, Micron, KLA) Do numeric analysis (NASA, Los Alamos National Laboratory, Lawrence Livermore National Laboratory, Fermi)

Perform cryptography and stock market analysis (NSA, Getco) Script games and graphics (Origin, Corel, Blender, PyGame) Navigate spacecraft and control experiments (Jet Propulsion Laboratory) Serve up maps and directories on the Web (Yahoo!) Guide users through Linux installation and maintenance (Red Hat) Implement web sites and content (Disney, JPL, Zope, Plone, Twisted) Design missile defense systems (Lockheed Martin) Manage mail lists (Mailman) Deliver eGreeting cards (American Greetings) Implement Personal Information Managers (Chandler) ...and much more.[*] Some of the Python-based systems in the preceding list are very popular in their own right. For example, the widely used Google search enginearguably responsible for much of the Web's successmakes heavy use of the Python language and is likely the most successful server-side application of Python so far. And in the latest release of its popular ArcGIS geographical information system (GIS), ESRI has begun recommending Python as the scripting tool for customization and automation to its reported 1 million licensees. [*]

See http://www.python.org/moin/OrganizationsUsingPython or search Python.org (http://www.python.org/about/success) for more examples of Python-based applications. Some companies don't disclose their Python use for competitive reasons, though many eventually become known when one of their web pages crashes and displays a Python error message in a browser. O'Reilly has also published a list of Python success stories derived from a list of testimonials maintained by people interested in Python advocacy; see the advocacy group's list at http://www.pythonology.com/success.

Of special note, BitTorrent, a distributed file-sharing system written in Python, is likely the most successful client-side Python program to date. It already records 42 million lifetime downloads on SourceForge.net as this chapter is being written, and it is listed as the number three package for all-time top downloads at that site (this does not include the roughly 2 million new downloads per month, or alternative clients that embed the BitTorrent Python backend). In addition, a late 2004 Reuters report noted that more than one-third of the Internet's traffic was based on BitTorrent. Per other reports, BitTorrent accounted for 53 percent of all peer-to-peer (P2P) Internet traffic in mid-2004, and P2P traffic may be two-thirds of all Internet traffic today.

Books When I started the first edition of this book in 1995, no Python books were available. As I wrote the second edition of this book in 2000, more than a dozen were available, with almost that many more on the way. And as I write this third edition in 2005, far more than 50 Python books are on the market, not counting non-English translations (a simple search for "Python programming" books currently yields 91 hits on Amazon.com). Some of these books are focused on a particular domain such as Windows or the Web, and some are available in German, French, Japanese, and other language editions.

Domains Python has grown to embrace Microsoft Windows developers, with support for .NET, COM, and

Active Scripting; Java developers, with the Jython Java-based implementation of the language; Mac OS X developers, with integration of tools such as Cocoa and standard inclusion in the Mac OS; and web developers, with a variety of toolkits such as Zope and Plone. As we'll see in this book, the COM support allows Python scripts to be both a server and a client of components and to interface with Microsoft Office products; Active Scripting allows Python code to be embedded in HTML web page code and run on either clients or servers. The Jython system compiles Python scripts to Java Virtual Machine (JVM) code so that they can be run in Java-aware systems and seamlessly integrate Java class libraries for use by Python code. As an open source tool for simplifying web site construction, the Python-based Zope web application framework discussed in this edition has also captured the attention of webmasters and CGI coders. Dynamic behavior in Zope web sites is scripted with Python and rendered with a server-side templating system. By using a workflow model, the Plone web content management system, based on Zope and Python, also allows webmasters to delegate the management of web site content to people who produce the content. Other toolkits, such as Django, Twisted, CherryPy, and Webware, similarly support network-based applications.

Compilers As I write this third edition, two Python compilers are under development for the Microsoft .NET framework and C# language environmentindependent implementations of the Python language that provide seamless .NET integration for Python scripts. For instance, the new IronPython implementation of Python for .NET and Mono compiles Python code for use in the .NET runtime environment (and is currently being developed in part by Microsoft employees). It promises to be a new, alternative implementation of Python, along with the standard C-based Python and the Jython Java-based implementation mentioned in the prior section. Other systems, such as the Psyco just-in-time bytecode compiler and the PyPy project, which may subsume it the IronPython implementation, promise substantial speedups for Python programs. See this chapter's sidebar "How Python Runs Your Code" for more details on program execution and compilers.

Newsgroup User traffic on the main Python Internet newsgroup, comp.lang.python, has risen dramatically too. For instance, according to Yahoo! Groups (see http://groups.yahoo.com/group/pythonlist), 76 articles were posted on that list in January 1994 and 2,678 in January 2000a 35-fold increase. Later months were busier still (e.g., 4,226 articles during June 2000, and 7,675 in February 2003roughly 275 per day), and growth has been generally constant since the list's inception. Python Internet newsgroup user trafficalong with all other user-base figures cited in this chapteris likely to have increased by the time you read this text. But even at current traffic rates, Python forums are easily busy enough to consume the full-time attention of anyone with full-time attention to give. Other online forums, such as weblogs (blogs), host additional Python-oriented discussions.

Conferences There are now two or more annual Python conferences in the U.S., including the annual PyCon event, organized by the Python community, and the Python conference held as part of the

Open Source Convention, organized by O'Reilly. Attendance at Python conferences roughly doubled in size every year in their initial years. At least two annual conferences are also now held in Europe each year, including EuroPython and PythonUK. Furthermore, there is now a PyCon conference in Brazil, and conferences have also been held in other places around the world.

Press Python is regularly featured in industry publications. In fact, since 1995, Python creator Guido van Rossum has appeared on the cover of prominent tech magazines such as Linux Journal and Dr. Dobb's Journal; the latter publication gave him a programming excellence award for Python. Linux Journal also published a special Python supplement with its May 2000 issue, and a Python-specific magazine, PyZine, was started up in recently.

Group therapy Regional Python user groups have begun springing up in numerous sites in the U.S. and abroad, including Oregon, San Francisco, Washington D.C., Colorado, Italy, Korea, and England. Such groups work on Python-related enhancements, organize Python events, and more.

Services On the pragmatics front, commercial support, consulting, prepackaged distributions, and professional training for Python are now readily available from a variety of sources. For instance, the Python interpreter can be obtained on CDs and packages sold by various companies (including ActiveState), and Python usually comes prebuilt and free with most Linux and recent Macintosh operating systems. In addition, there are now two primary sites for finding third-party add-ons for Python programming: the Vaults of Parnassus and PyPI (see http://www.python.org for links).

Jobs It's now possible to make money as a Python programmer (without having to resort to writing large, seminal books). As I write this book, the Python job board at http://www.python.org/Jobs.html lists some 60 companies seeking Python programmers in the U.S. and abroad, in a wide variety of domains. Searches for Python at popular employment sites such as Monster.com and Dice.com yield hundreds of hits for Python-related jobs. And according to one report, the number of Python jobs available in the Silicon Valley area increased 400 percent to 600 percent in the year ending in mid-2005. Not that anyone should switch jobs, of course, but it's nice to know that you can now make a living by using a language that also happens to be fun.

Tools Python has also played host to numerous tool development efforts. Among the most prominent are the Software Carpentry project, which developed new core software tools in Python; ActiveState, which provides a set of Windows- and Linux-focused Python development products; the Eclipse development environment; and PythonWare, which offers a handful of Python tools.

Education Python has also begun attracting the attention of educators, many of whom see Python as the "Pascal of the 2000s"an ideal language for teaching programming due to its simplicity and structure. Part of this appeal was spawned by Guido van Rossum's proposed Computer Programming for Everybody (CP4E) project, aimed at making Python the language of choice for first-time programmers worldwide. CP4E itself is now defunct, but an active Python Special Interest Group (SIG) has been formed to address education-related topics. Regardless of any particular initiative's outcome, Python promises to make programming more accessible to the masses. As people grow tired of clicking preprogrammed links, they may evolve from computer users to computer scripters.

1.4.1. Recent Growth (As of 2005, at Least) As I was writing this third edition, I found that all signs pointed toward continued growth in the Python world: Python.org traffic had increased 30 percent for the year that ended in March 2005. PyCon conference attendance essentially doubled, increasing to 400500 attendees in 2005 compared to 200300 in 2004. Python 2.4 was given a Jolt productivity award in early 2005 by Software Development Magazine. Per a survey conducted by InfoWorld, Python popularity nearly doubled in 2004 (usage by developers grew to 14 percent in late 2004, versus 8 percent in the prior year; another survey in the same period measured Python use to be roughly 16 percent). Based on the InfoWorld survey and the number of all developers, the Python user base is now estimated to be from 750,000 to 1 million worldwide. Google, maker of the leading web search engine, launched an open source code site whose initially featured components were mostly Python code. The IronPython port being developed in part by Microsoft reported an 80 percent performance boost over the standard C-based Python 2.4 release on some benchmarks. As mentioned, the number of Python jobs available in Silicon Valley have reportedly increased by a factor of 4 to 6. A web site that automatically tracks the frequency of references to programming languages in online forums found that Python chatter more than doubled between 2004 and 2005. This site also found that among scripting languages, only Python traffic showed the early stages of a rapid growth curve. According to an article by O'Reilly, industry-wide book sales data shows that the Python book market grew to two-thirds the size of the Perl book market as of April 2005. Two years earlier, the Python book market was approximately one-sixth the size of the Perl book market. (Perl is an older scripting language optimized for text processing tasks, which some see as being in competition with Python for mindshare.) In other words, it's not 1995 anymore. Much of the preceding list was unimaginable when the first edition of this book was conceived. Naturally, this list is doomed to be out-of-date even before this

book hits the shelves, but it is nonetheless representative of the sorts of milestones that have occurred over the last five years and will continue to occur for years to come. As a language optimized to address the productivity demands of today's software world, Python's best is undoubtedly yet to come.

What's in a Name? Python gets its name from the 1970s British TV comedy series Monty Python's Flying Circus. According to Python folklore, Guido van Rossum, Python's creator, was watching reruns of the show at about the same time he needed a name for a new language he was developing. And as they say in show business, "the rest is history." Because of this heritage, references to the comedy group's work often show up in examples and discussion. For instance, the words spam, lumberjack, and shrubbery have a special connotation to Python users, and confrontations are sometimes referred to as "The Spanish Inquisition." As a rule, if a Python user starts using phrases that have no relation to reality, they're probably borrowed from the Monty Python series or movies. Some of these phrases might even pop up in this book. You don't have to run out and rent The Meaning of Life or The Holy Grail to do useful work in Python, of course, but it can't hurt. While "Python" turned out to be a distinctive name, it has also had some interesting side effects. For instance, when the Python newsgroup, comp.lang.python, came online in 1994, its first few weeks of activity were almost entirely taken up by people wanting to discuss topics from the TV show. More recently, a special Python supplement in the Linux Journal magazine featured photos of Guido garbed in an obligatory "nice red uniform." Python's news list still receives an occasional post from fans of the show. For instance, one poster innocently offered to swap Monty Python scripts with other fans. Had he known the nature of the forum, he might have at least mentioned whether they ran on Windows or Unix.

1.5. The Compulsory Features List One way to describe a language is by listing its features. Of course, this will be more meaningful after you've seen Python in action; the best I can do now is speak in the abstract. And it's really how Python's features work together that make it what it is. But looking at some of Python's attributes may help define it; Table 1-1 lists some of the common reasons cited for Python's appeal.

Table 1-1. Python language features Features

Benefits

No manual compile or link steps

Rapid development cycle turnaround

No type declarations

Simpler, shorter, and more flexible programs

Automatic memory management Garbage collection avoids bookkeeping code and errors High-level datatypes and operations

Fast development using built-in object types

Object-oriented programming

Code reuse; C++, Java, COM, and .NET integration

Embedding and extending in C

Optimization, customization, legacy code, system "glue"

Classes, modules, exceptions

Modular "programming-in-the-large" support for large-scale projects

A simple, clear syntax and design

Readability, maintainability, ease of learning, less potential for bugs

Dynamic loading of C modules

Simplified extensions, smaller binary files

Dynamic reloading of Python modules

Programs can be modified without stopping

Universal "first-class" object model

Fewer restrictions, code flexibility

Runtime program construction

Handles unforeseen needs, end-user coding

Interactive, dynamic nature

Incremental development and testing

Access to interpreter information Metaprogramming, introspective objects Wide interpreter portability

Cross-platform programming without per-program ports

Compilation to portable bytecode Execution speed, portability Standard portable GUI framework

Tkinter scripts run on X, Windows, Macs; alternatives: wxPython, PyQt, etc.

Standard Internet protocol support

Easy access to email, FTP, HTTP, CGI, Telnet, etc.

Standard portable system calls

Platform-neutral system scripting and system administration

Built-in and third-party libraries

Vast collection of precoded software components

Features

Benefits

True open source software

May be freely embedded and shipped

To be fair, Python is really a conglomeration of features borrowed from other languages and combined into a coherent whole. It includes elements taken from C, C++, Modula-3, ABC, Icon, and others. For instance, Python's modules came from Modula and its slicing operation from Icon (as far as anyone can seem to remember, at least). And because of Guido's background, Python borrows many of ABC's ideas but adds practical features of its own, such as support for C-coded extensions. To many, Python's feature combination seems to be "just right"it combines remarkable power with a readable syntax and coherent design.

1.6. What's Python Good For? Because Python is used in a wide variety of ways, it's almost impossible to give an authoritative answer to this question. As a general-purpose language, Python can be used for almost anything computers are capable of. Its feature set applies to both rapid and longer-term development modes. And from an abstract perspective, any project that can benefit from the inclusion of a language optimized for speed of development is a good target Python application domain. Given the evershrinking schedules in software development, this is a very broad category. A more specific answer is less easy to formulate. For instance, some use Python as an embedded extension language, and others use it exclusively as a standalone programming tool. To some extent, this entire book will answer this very questionit explores some of Python's most common roles. For now, here's a summary of some of the more common ways Python is being applied today:

System utilities Portable command-line tools, testing, system administration scripts

Internet scripting CGI web sites, Java applets, XML, email, Zope/Plone, CherryPy, Webware, Twisted

GUIs With tools such as Tk, wxPython, Qt, Gtk, PythonCard, Dabo, Swing, Anygui

Component integration C/C++ library frontends, product customization

Database access Persistent object stores, SQL database interfaces

Distributed programming With client/server APIs like CORBA, CGI, COM, .NET, SOAP, XML-RPC

Rapid-prototyping/development Tactical run-once programs or deliverable prototypes

Language-based modules

Replacing special-purpose parsers with Python

And more Image processing, numeric programming, gaming, AI, etc. On the other hand, Python is not really tied to any particular application area. For example, Python's integration support makes it useful for almost any system that can benefit from a frontend, programmable interface. In abstract terms, Python provides services that span domains. It is all of the things described in the following list.

"Buses Considered Harmful" The PSA organization described earlier was originally formed in response to an early thread on the Python newsgroup that posed the semiserious question: "What would happen if Guido was hit by a bus?" The more recent PSF group has been tasked to address similar questions. These days, Python creator Guido van Rossum is still the ultimate arbiter of proposed Python changes. He was officially anointed the BDFLBenevolent Dictator For Lifeof Python, at the first Python conference and still makes final yes and no decisions on language changes (and usually says no: a good thing in the programming languages domain, because Python tends to change slowly and in backward-compatible ways). But Python's user base helps support the language, work on extensions, fix bugs, and so on. It is a true community project. In fact, Python development is now a completely open processanyone can inspect the latest source-code files or submit patches by visiting a web site (see http://www.python.org for details). As an open source package, Python development is really in the hands of a very large cast of developers working in concert around the world. Given Python's popularity, bus attacks seem less threatening now than they once did; of course, I can't speak for Guido.

A dynamic programming language, ideal for situations in which a compile/link step is either impossible (on-site customization) or inconvenient (prototyping, rapid development, system utilities) A powerful but simple programming language designed for development speed, ideal for situations in which the complexity of larger languages can be a liability (prototyping, end-user coding, time to market) A generalized language tool, ideal for situations in which we might otherwise need to invent and implement yet another "little language" (programmable system interfaces, configuration tools) Given these general properties, you can apply Python to any area you're interested in by extending it with domain libraries, embedding it in an application, or using it all by itself. For instance, Python's role as a system tools language is due as much to its built-in interfaces to operating system services as to the language itself. In fact, because Python was built with integration in mind, it has naturally given rise to a growing

library of extensions and tools, available as off-the-shelf components to Python developers. Table 1-2 names just a few as a random sample (with apologies to the very many systems omitted here). You can find more about most of these components in this book, on Python's web site, at the Vaults of Parnassus and PyPI web sites mentioned earlier in this chapter, and by a simple Google web search.

How Python Runs Your Code Today, Python is "interpreted" in the same way Java is: Python source code is automatically compiled (translated) to an intermediate and platform-neutral form called bytecode, which is then executed by the Python virtual machine (that is, the Python runtime system). Translation to bytecode happens when a module is first imported, and it is avoided when possible to speed program startup: bytecode is automatically saved in .pyc files and, unless you change the corresponding source file, loaded directly the next time your program runs. This bytecode compilation model makes Python scripts portable and faster than a pure interpreter that runs raw source code lines. But it also makes Python slower than true compilers that translate source code to binary machine code. Bytecode is not machine code and is ultimately run by the Python (or other) virtual machine program, not directly by your computer's hardware. Keep in mind, though, that some of these details are specific to the standard Python implementation. For instance, the Jython system compiles Python scripts to Java bytecode, and the IronPython implementation compiles Python source code to the bytecode used by the C#/.NET environment. In addition, Python compiler-related projects have been spawned in the past and will likely continue into the future. For more details on this front, see the following: The Psyco just-in-time compiler for Python, which replaces portions of a running program's bytecode with optimized binary machine code tailored to specific datatypes. Psyco can speed Python programs by any factor from 2 to 100. The high end is more likely for heavily algorithmic code, whereas I/O-bound programs don't improve as much. (In my own experience, a 3x-5x speedup is common for typical programsamazing for a simple install.) A related project, PyPy, which aims to reimplement the Python virtual machine to better support optimizations. The PyPy project may incorporate and subsume Psyco's techniques. The Parrot project, which seeks to develop a bytecode and virtual machine that will be shared by many languages, including Python. The Installer, Py2Exe, and Freeze systems, which package Python programs as standalone executables known as "frozen binaries"a combination of your bytecode and the Python virtual machine. Frozen binaries do not require that Python be installed on the receiving end. Other program distribution formats, including zip archives (with modules automatically extracted on imports); Python eggs (an emerging package format); Distutils (an installation script system); and encrypted bytecode (for instance, using PyCrypto and the import hooks). The emerging Shed Skin system, which translates Python source code to C++. This

system assumes that your code will not use the full range of Python's dynamic typing, but this constraint allows highly efficient code to be generated, which is by some accounts faster than Psyco and much faster than standard Python. Shed Skin's own website reports speedups of 12 and 45 times faster on average than Psyco and standard CPython, respectively, though results can vary greatly. Psyco may provide a simpler optimization path for some programs than linked-in C libraries, especially for algorithm-intensive code. Although Python's extreme dynamic nature makes compilation complex (the behavior of "x + 1" cannot be easily predicted until runtime), a future optimizing Python compiler might also make many of the performance notes in this chapter moot points.

Table 1-2. Popular Python domains, tools, and extensions Domain

Tools and extensions

Systems programming: support for all common system-level tools

Sockets, processes, threads, signals, pipes, RPC, directories, POSIX bindings...

GUIs: a variety of portable GUI toolkits and builders

Tkinter, wxPython, PyQt, PyGTK, Anygui, Swing, PythonCard, Dabo...

Database interfaces: interfaces for both relational and object-oriented databases

MySQL, Oracle, Sybase, PostgreSQL, SQLite, persistence, ZODB, DBM...

Microsoft Windows tools: access to a variety of Windows-specific tools

MFC wrappers, COM interfaces, ActiveX scripting, ASP, ODBC drivers, .NET...

Internet tools: sockets, CGI, client tools, server tools, web frameworks, parsers, Apache support, Java integration

Jython, XML, email, ElementTree, htmllib, telnetlib, urllib, Zope, CherryPy, Twisted, Webware, Django, mod_python, SSL...

Distributed objects: SOAP web services, XMLRPC, CORBA, DCOM

PySOAP, SOAPy, xmlrpclib, ILU, Fnorb, omniORB, PyWin32...

Other popular tools: graphics, language, visualization, numerics, cryptography, integration, gaming, wikis...

PIL, VPython, Blender, PyOpenGL, NLTK, YAPPS, VTK, NumPy, PyCrypto, SWIG, ctypes, PyGame, MoinMoin...

1.7. What's Python Not Good For? To be fair again, some tasks are outside of Python's scope. Like all dynamic interpreted languages, Python, as currently implemented, isn't generally as fast or efficient as static, compiled languages such as C (see the earlier sidebar, "How Python Runs Your Code," for the technical story). At least when nontypical benchmarks are compared line for line, Python code runs more slowly than C code. Whether you will ever care about this difference in execution speed depends upon the sorts of applications you will write. In many domains, the difference doesn't matter at all; for programs that spend most of their time interacting with users or transferring data over networks, Python is usually more than adequate to meet the performance needs of the entire application by itself. Moreover, most realistic Python programs tend to run very near the speed of the C language anyhow. Because system interactions such as accessing files or creating GUIs are implemented by linked-in C language code in the standard implementation, typical Python programs are often nearly as fast as equivalent C language programs. In fact, because Python programs use highly optimized data structures and libraries, they are sometimes quicker than C programs that must implement such tools manually. In some domains, however, efficiency is still a main priority. Programs that spend most of their time in intense number crunching, for example, will usually be slower in Python than in fully compiled languages. Because it is interpreted today, Python alone usually isn't the best tool for the delivery of such performance- critical components. Instead, computationally intensive operations can be implemented as compiled extensions to Python and coded in a low-level language such as C. Python can't be used as the sole implementation language for such components, but it works well as a frontend scripting interface to them. For example, numerical programming and image processing support has been added to Python by combining optimized extensions with a Python language interface. In such a system, once the optimized extensions have been developed, most of the programming occurs at the simpler level of Python scripting. The net result is a numerical programming tool that's both efficient and easy to use. The NumPy extension (and its NumArray and ScientificPython relatives), for instance, adds vector processing to Python, turning it into what has been called an open source equivalent to Matlab. Python can also still serve as a prototyping tool in such domains. Systems may be implemented in Python first and later moved whole or piecemeal into a language such as C for delivery. C and Python have distinct strengths and roles; a hybrid approach using C for compute-intensive modules and Python for prototyping and frontend interfaces can leverage the benefits of both. In some sense, Python solves the efficiency/flexibility trade-off by not solving it at all. It provides a language optimized for ease of use, along with tools needed to integrate with other languages. By combining components written in Python with compiled languages such as C and C++, developers may select an appropriate mix of usability and performance for each particular application. On a more fundamental level, while it's unlikely that it will ever be as fast as C, Python's speed of development is at least as important as C's speed of execution in most modern software projects.

1.8. Truth in Advertising In this book's conclusionafter we've had a chance to study Python in actionwe will return to some of the bigger ideas introduced in this chapter. I want to point out up front, though, that my background is in computer science, not marketing. I plan to be brutally honest in this book, both about Python's features and about its downsides. Despite the fact that Python is one of the most easy-to-use and flexible programming languages ever created, there are indeed some pitfalls, which we will not gloss over in this book. Let's start now. One of the first pitfalls you should know about, and a common remark made by Python newcomers, is this: Python makes it incredibly easy to quickly throw together a bad design. For some, it seems a genuine problem. Because developing programs in Python is so simple and fast compared with using traditional languages, it's easy to get wrapped up in the act of programming itself and pay less attention to the problem you are really trying to solve. If you haven't done any Python development yet, you'll find that it is an incremental, interactive, and rapid experience that encourages experimentation. In fact, Python can be downright seductiveso much so that you may need to consciously resist the temptation to quickly implement a program in Python that works, is loaded with features, and is arguably "cool," but that leaves you as far from a maintainable implementation of your original conception as you were when you started. The natural delays built into compiled language developmentfixing compiler error messages, linking libraries, and the likearen't there in Python to apply the brakes. In fact, it's not uncommon for a Python program to run the first time you try it; there is much less syntax and there are far fewer procedures to get in your way. This isn't necessarily all bad, of course. In most cases, the early designs that you throw together fast are steppingstones to better designs that you later keep. That is the nature of prototyping, after all, and often the reality of programming under tight schedules. But you should be warned: even with a rapid development language such as Python, there is no substitute for brainsit's always best to think before you start typing code. To date, at least, no computer programming language has managed to make "wetware" obsolete.

The Python "Secret Handshake" I've been involved with Python for some 14 years now as of this writing, and I have seen it grow from an obscure language into one that is used in some fashion in almost every development organization. It has been a fun ride. But looking back over the years, it seems to me that if Python truly has a single legacy, it is simply that Python has made quality a more central focus in the development world. It was almost inevitable. A language that requires its users to line up code for readability can't help but make people raise questions about good software practice in general. Probably nothing summarizes this aspect of Python life better than the standard library this modulea sort of Easter egg in Python written by Python core developer, Tim Peters, which captures much of the design philosophy behind the language. To see this for yourself, go to any Python interactive prompt and import the module (naturally, it's available on all platforms):

>>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! >>>

Worth special mention, the "Explicit is better than implicit" rule has become known as "EIBTI" in the Python worldone of Python's defining ideas, and one of its sharpest contrasts with other languages. As anyone who has worked in this field for more than a few years can attest, magic and engineering do not mix. Python has not always followed all of these guidelines, of course, but it comes very close. And if Python's main contribution to the software world is getting people to think about such things, it seems like a win. Besides, it looked great on the T-shirt.

Chapter 2. A Sneak Preview Section 2.1. "Programming Python: The Short Story" Section 2.2. The Task Section 2.3. Step 1: Representing Records Section 2.4. Step 2: Storing Records Persistently Section 2.5. Step 3: Stepping Up to OOP Section 2.6. Step 4: Adding Console Interaction Section 2.7. Step 5: Adding a GUI Section 2.8. Step 6: Adding a Web Interface Section 2.9. The End of the Demo

2.1. "Programming Python: The Short Story" If you are like most people, when you pick up a book as large as this one, you'd like to know a little about what you're going to be learning before you roll up your sleeves. That's what this chapter is forit provides a demonstration of some of the kinds of things you can do with Python, before getting into the details. You won't learn much here, and if you're looking for explanations of the tools and techniques applied in this chapter, you'll have to read on to later parts of the book. The point here is just to whet your appetite, review a few Python basics, and preview some of the topics to come. To do this, I'll pick a fairly simple application taskconstructing a database of recordsand migrate it through multiple steps: interactive coding, command-line tools, console interfaces, GUIs, and simple web-based interfaces. Along the way, we'll also peek at concepts such as data representation, object persistence, and object-oriented programming (OOP); I'll mention some alternatives that we'll revisit later in the book; and I'll review some core Python ideas that you should be aware of before reading this book. Ultimately, we'll wind up with a database of Python class instances, which can be browsed and changed from a variety of interfaces. I'll cover additional topics in this book, of course, but the techniques you will see here are representative of some of the domains we'll explore later. And again, if you don't completely understand the programs in this chapter, don't worry because you shouldn'tnot yet anyway. This is just a Python demo. We'll fill in the details soon enough. For now, let's start off with a bit of fun.

2.2. The Task Imagine, if you will, that you need to keep track of information about people for some reason; maybe you want to store an address book on your computer, or perhaps you need to keep track of employees in a small business. For whatever reason, you want to write a program that keeps track of details about these people. In other words, you want to keep records in a databaseto permanently store lists of people's attributes on your computer. Naturally, there are off-the-shelf programs for managing databases like these. By writing a program for this task yourself, however, you'll have complete control over its operation; you can add code for special cases and behaviors that precoded software may not have anticipated. You won't have to install and learn to use yet another database product. And you won't be at the mercy of a software vendor to fix bugs or add new features. You decide to write a Python program to manage your people.

2.3. Step 1: Representing Records If we're going to store records in a database, the first step is probably deciding what those records will look like. There are a variety of ways to represent information about people in the Python language. Built-in object types such as lists and dictionaries are often sufficient, especially if we don't care about processing the data we store.

2.3.1. Using Lists Lists, for example, can collect attributes about people in a positionally ordered way. Start up your Python interactive interpreter and type the following two statements (this works in the IDLE GUI, after typing python at a shell prompt, and so on, and the >>> characters are Python's promptif you've never run Python code this way before, see an introductory resource such as O'Reilly's Learning Python for help with getting started):

>>> bob = ['Bob Smith', 42, 30000, 'software'] >>> sue = ['Sue Jones', 45, 40000, 'music']

We've just made two records, albeit simple ones, to represent two people, Bob and Sue (my apologies if you really are Bob or Sue, generically or otherwise[*]). Each record is a list of four properties: name, age, pay, and job field. To access these fields, we simply index by position (the result is in parentheses here because it is a tuple of two results): [*]

No, I'm serious. For an example I present in Python classes I teach, I had for many years regularly used the named "Bob Smith," age 40.5, and jobs "developer" and "manager" as a supposedly fictitious database recorduntil a recent class in Chicago, where I met a student name Bob Smith who was 40.5 and was a developer and manager. The world is stranger than it seems.

>>> bob[0], sue[2] ('Bob Smith', 40000)

# fetch name, pay

Processing records is easy with this representation; we just use list operations. For example, we can extract a last name by splitting the name field on blanks and grabbing the last part, and we may give someone a raise by changing their list in-place:

>>> bob[0].split( )[-1] # what's bob's last name? 'Smith' >>> sue[2] *= 1.25 # give sue a 25% raise >>> sue ['Sue Jones', 45, 50000.0, 'music']

The last-name expression here proceeds from left to right: we fetch Bob's name, split it into a list of substrings around spaces, and index his last name (run it one step at a time to see how).

2.3.1.1. A database list Of course, what we really have at this point is just two variables, not a database; to collect Bob and Sue into a unit, we might simply stuff them into another list:

>>> people = [bob, sue] >>> for person in people: print person ['Bob Smith', 42, 30000, 'software'] ['Sue Jones', 45, 50000.0, 'music']

Now, the people list represents our database. We can fetch specific records by their relative positions and process them one at a time, in loops:

>>> people[1][0] 'Sue Jones' >>> for person in people: print person[0].split( )[-1] person[2] *= 1.20

# print last names # give each a 20% raise

Smith Jones >>> for person in people: print person[2]

# check new pay

36000.0 60000.0

Now that we have a list, we can also collect values from records using some of Python's more powerful iteration tools, such as list comprehensions, maps, and generator expressions:

>>> pays = [person[2] for person in people] >>> pays [36000.0, 60000.0]

# collect all pay

>>> pays = map((lambda x: x[2]), people) >>> pays [36000.0, 60000.0]

# ditto

>>> sum(person[2] for person in people) 96000.0

# generator expression sum (2.4)

To add a record to the database, the usual list operations, such as append and extend, will suffice:

>>> people.append(['Tom', 50, 0, None]) >>> len(people) 3 >>> people[-1][0] 'Tom'

Lists work for our people database, and they might be sufficient for some programs, but they suffer from a few major flaws. For one thing, Bob and Sue, at this point, are just fleeting objects in memory that will disappear once we exit Python. For another, every time we want to extract a last name or give a raise, we'll have to repeat the kinds of code we just typed; that could become a problem if we ever change the way those operations workwe may have to update many places in our code. We'll address these issues in a few moments.

2.3.1.2. Field labels Perhaps more fundamentally, accessing fields by position in a list requires us to memorize what each position means: if you see a bit of code indexing a record on magic position 2, how can you tell it is extracting a pay? In terms of understanding the code, it might be better to associate a field name with a field value. We might try to associate names with relative positions by using the Python range built-in function, which builds a list of successive integers:

>>> NAME, AGE, PAY = range(3) >>> bob = ['Bob Smith', 42, 10000] >>> bob[NAME] 'Bob Smith' >>> PAY, bob[PAY] (2, 10000)

# [0, 1, 2]

This addresses readability: the three variables essentially become field names. This makes our code dependent on the field position assignments, thoughwe have to remember to update the range assignments whenever we change record structure. Because they are not directly associated, the names and records may become out of sync over time and require a maintenance step. Moreover, because the field names are independent variables, there is no direct mapping from a record list back to its field's names. A raw record, for instance, provides no way to label its values with field names in a formatted display. In the preceding record, without additional code, there is no path from value 42 to label AGE. We might also try this by using lists of tuples, where the tuples record both a field name and a value; better yet, a list of lists would allow for updates (tuples are immutable). Here's what that idea translates to, with slightly simpler records:

>>> bob = [['name', 'Bob Smith'], ['age', 42], ['pay', 10000]] >>> sue = [['name', 'Sue Jones'], ['age', 45], ['pay', 20000]] >>> people = [bob, sue]

This really doesn't fix the problem, though, because we still have to index by position in order to fetch fields:

>>> for person in people: print person[0][1], person[2][1]

# name, pay

Bob Smith 10000 Sue Jones 20000 >>> [person[0][1] for person in people] ['Bob Smith', 'Sue Jones'] >>> for person in people: print person[0][1].split( )[-1] person[2][1] *= 1.10

# collect names

# get last names # give a 10% raise

Smith Jones >>> for person in people: print person[2] ['pay', 11000.0] ['pay', 22000.0]

All we've really done here is add an extra level of positional indexing. To do better, we might inspect field names in loops to find the one we want (the loop uses tuple assignment here to unpack the name/value pairs):

>>> for person in people: for (name, value) in person: if name == 'name': print value

# find a specific field

Bob Smith Sue Jones

Better yet, we can code a fetcher function to do the job for us:

>>> def field(record, label): for (fname, fvalue) in record: if fname == label: return fvalue >>> field(bob, 'name') 'Bob Smith'

# find any field by name

>>> field(sue, 'pay') 22000.0 >>> for rec in people: print field(rec, 'age')

# print all ages

42 45

If we proceed down this path, we'll eventually wind up with a set of record interface functions that generically map field names to field data. If you've done any Python coding in the past, you probably already know that there is an easier way to code this sort of association, and you can probably guess where we're headed in the next section.

2.3.2. Using Dictionaries The list-based record representations in the prior section work, though not without some cost in terms of performance required to search for field names (assuming you need to care about milliseconds and such). But if you already know some Python, you also know that there are more convenient ways to associate property names and values. The built-in dictionary object is a natural:

>>> bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'} >>> sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'mus'}

Now, Bob and Sue are objects that map field names to values automatically, and they make our code more understandable and meaningful. We don't have to remember what a numeric offset means, and we let Python search for the value associated with a field's name with its efficient dictionary indexing:

>>> bob['name'], sue['pay'] ('Bob Smith', 40000)

# not bob[0], sue[2]

>>> bob['name'].split( )[-1] 'Smith' >>> sue['pay'] *= 1.10 >>> sue['pay'] 44000.0

Because fields are accessed mnemonically now, they are more meaningful to those who read your code (including you).

2.3.2.1. Other ways to make dictionaries Dictionaries turn out to be so useful in Python programming that there are even more convenient

ways to code them than the traditional literal syntax shown earliere.g., with keyword arguments and the type constructor:

>>> bob = dict(name='Bob Smith', age=42, pay=30000, job='dev') >>> bob {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}

Other Uses for Lists Lists are convenient any time we need an ordered container of other objects that may need to change over time. A simple way to represent matrixes in Python, for instance, is as a list of nested liststhe top list is the matrix, and the nested lists are the rows:

>>> M = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# 3x3, 2-dimensional

>>> N = [[2, 2, 2], [3, 3, 3], [4, 4, 4]]

Now, to combine one matrix's components with another's, step over their indexes with nested loops; here's a simple pairwise multiplication:

>>> for i in range(3): for j in range(3): print M[i][j] * N[i][j], print 2 4 6 12 15 18 28 32 36

To build up a new matrix with the results, we just need to create the nested list structure along the way:

>>> tbl = [] >>> for i in range(3): row = [] for j in range(3): row.append(M[i][j] * N[i][j]) tbl.append(row) >>> tbl

[[2, 4, 6], [12, 15, 18], [28, 32, 36]]

Nested list comprehensions such as either of the following will do the same job, albeit at some cost in complexity (if you have to think hard about expressions like these, so will the next person who has to read your code!):

[[M[i][j] * N[i][j] for j in range(3)] for i in range(3)] [[x * y for x, y in zip(row1, row2)] for row1, row2 in zip(M, N)]

List comprehensions are powerful tools, provided you restrict them to simple tasksfor example, listing selected module functions, or stripping end-of-lines:

>>> import sys >>> [x for x in dir(sys) if x.startswith('getr')] ['getrecursionlimit', 'getrefcount'] >>> lines = [line.rstrip( ) for line in open('README.txt')] >>> lines[0] 'This is Python version 2.4 alpha 3'

If you are interested in matrix processing, also see the mathematical and scientific extensions available for Python in the public domain, such as those available through NumPy and SciPy. The code here works, but extensions provide optimized tools. NumPy, for instance, is seen by some as an open source Matlab equivalent.

by filling out a dictionary one field at a time:

>>> sue = {} >>> sue['name'] = 'Sue Jones' >>> sue['age'] = 45 >>> sue['pay'] = 40000 >>> sue['job'] = 'mus' >>> sue {'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}

and by zipping together name/value lists:

>>> names = ['name', 'age', 'pay', 'job'] >>> values = ['Sue Jones', 45, 40000, 'mus'] >>> zip(names, values) [('name', 'Sue Jones'), ('age', 45), ('pay', 40000), ('job', 'mus')]

>>> sue = dict(zip(names, values)) >>> sue {'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}

We can even make dictionaries today from a sequence of key values and an optional starting value for all the keys (handy to initialize an empty dictionary):

>>> fields = ('name', 'age', 'job', 'pay') >>> record = dict.fromkeys(fields, '?') >>> record {'job': '?', 'pay': '?', 'age': '?', 'name': '?'}

2.3.2.2. Lists of dictionaries Regardless of how we code them, we still need to collect our records into a database; a list does the trick again, as long as we don't require access by key:

>>> people = [bob, sue] >>> for person in people: print person['name'], person['pay']

# all name, pay

Bob Smith 30000 Sue Jones 44000.0 >>> for person in people: if person['name'] == 'Sue Jones': print person['pay']

# fetch sue's pay

44000.0

Iteration tools work just as well here, but we use keys rather than obscure positions (in database terms, the list comprehension and map in the following code project the database on the "name" field column):

>>> names = [person['name'] for person in people] >>> names ['Bob Smith', 'Sue Jones']

# collect names

>>> map((lambda x: x['name']), people) ['Bob Smith', 'Sue Jones']

# ditto

>>> sum(person['pay'] for person in people) 74000.0

# sum all pay

And because dictionaries are normal Python objects, these records can also be accessed and updated with normal Python syntax:

>>> for person in people: print person['name'].split( )[-1] person['pay'] *= 1.10

# last name # a 10% raise

Smith Jones >>> for person in people: print person['pay'] 33000.0 48400.0

2.3.2.3. Nested structures Incidentally, we could avoid the last-name extraction code in the prior examples by further structuring our records. Because all of Python's compound datatypes can be nested inside each other and as deeply as we like, we can build up fairly complex information structures easilysimply type the object's syntax, and Python does all the work of building the components, linking memory structures, and later reclaiming their space. This is one of the great advantages of a scripting language such as Python. The following, for instance, represents a more structured record by nesting a dictionary, list, and tuple inside another dictionary:

>>> bob2 = {'name': 'age': 'job': 'pay':

{'first': 'Bob', 'last': 'Smith'}, 42, ['software', 'writing'], (40000, 50000)}

Because this record contains nested structures, we simply index twice to go two levels deep:

>>> bob2['name'] {'last': 'Smith', 'first': 'Bob'} >>> bob2['name']['last'] 'Smith' >>> bob2['pay'][1] 50000

# bob's full name # bob's last name # bob's upper pay

The name field is another dictionary here, so instead of splitting up a string, we simply index to fetch the last name. Moreover, people can have many jobs, as well as minimum and maximum pay limits. In fact, Python becomes a sort of query language in such caseswe can fetch or change nested data with the usual object operations:

>>> for job in bob2['job']: print job software writing

# all of bob's jobs

>> bob2['job'][-1] # bob's last job 'writing' >>> bob2['job'].append('janitor') # bob gets a new job >>> bob2 {'job': ['software', 'writing', 'janitor'], 'pay': (40000, 50000), 'age': 42, 'name': {'last': 'Smith', 'first': 'Bob'}}

It's OK to grow the nested list with append, because it is really an independent object. Such nesting can come in handy for more sophisticated applications; to keep ours simple, we'll stick to the original flat record structure.

2.3.2.4. Dictionaries of dictionaries One last twist on our people database: we can get a little more mileage out of dictionaries here by using one to represent the database itself. That is, we can use a dictionary of dictionariesthe outer dictionary is the database, and the nested dictionaries are the records within it. Rather than a simple list of records, a dictionary-based database allows us to store and retrieve records by symbolic key:

>>> db = {} >>> db['bob'] = bob >>> db['sue'] = sue >>> >>> db['bob']['name'] 'Bob Smith' >>> db['sue']['pay'] = 50000 >>> db['sue']['pay'] 50000

# fetch bob's name # change sue's pay # fetch sue's pay

Notice how this structure allows us to access a record directly instead of searching for it in a loop (we get to Bob's name immediately by indexing on key bob ). This really is a dictionary of dictionaries, though you won't see all the gory details unless you display the database all at once:

>>> db {'bob': {'pay': 33000.0, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}, 'sue': {'job': 'mus', 'pay': 50000, 'age': 45, 'name': 'Sue Jones'}}

If we still need to step through the database one record at a time, we can now rely on dictionary iterators. In recent Python releases, a dictionary iterator produces one key in a for loop each time through (in earlier releases, call the keys method explicitly in the for loop: say db.keys( ) rather than just db):

>>> for key in db: print key, '=>', db[key]['name'] bob => Bob Smith sue => Sue Jones >>> for key in db: print key, '=>', db[key]['pay'] bob => 33000.0 sue => 50000

To visit all records, either index by key as you go:

>>> for key in db: print db[key]['name'].split( )[-1] db[key]['pay'] *= 1.10 Smith Jones

or step through the dictionary's values to access records directly:

>>> for record in db.values( ): print record['pay'] 36300.0 55000.0 >>> x = [db[key]['name'] for key in db] >>> x ['Bob Smith', 'Sue Jones'] >>> x = [rec['name'] for rec in db.values( )] >>> x ['Bob Smith', 'Sue Jones']

And to add a new record, simply assign it to a new key; this is just a dictionary, after all:

>>> db['tom'] = dict(name='Tom', age=50, job=None, pay=0) >>> >>> db['tom'] {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} >>> db['tom']['name'] 'Tom' >>> db.keys( ) ['bob', 'sue', 'tom'] >>> len(db)

3

Although our database is still a transient object in memory, it turns out that this dictionary-ofdictionaries format corresponds exactly to a system that saves objects permanentlythe shelve (yes, this should be shelf grammatically speaking, but the Python module name and term is shelve). To learn how, let's move on to the next section.

2.4. Step 2: Storing Records Persistently So far, we've settled on a dictionary-based representation for our database of records, and we've reviewed some Python data structure concepts along the way. As mentioned, though, the objects we've seen so far are temporarythey live in memory and they go away as soon as we exit Python or the Python program that created them. To make our people persistent, they need to be stored in a file of some sort.

2.4.1. Using Formatted Files One way to keep our data around between program runs is to write all the data out to a simple text file, in a formatted way. Provided the saving and loading tools agree on the format selected, we're free to use any custom scheme we like.

2.4.1.1. Test data script So that we don't have to keep working interactively, let's first write a script that initializes the data we are going to store (if you've done any Python work in the past, you know that the interactive prompt tends to become tedious once you leave the realm of simple one-liners). Example 2-1 creates the sort of records and database dictionary we've been working with so far, but because it is a module, we can import it repeatedly without having to retype the code each time. In a sense, this module is a database itself, but its program code format doesn't support automatic or end-user updates as is.

Other Uses for Dictionaries Besides allowing us to associate meaningful labels with data rather than numeric positions, dictionaries are often more flexible than lists, especially when there isn't a fixed size to our problem. For instance, suppose you need to sum up columns of data stored in a text file where the number of columns is not known or fixed:

>>> print open('data.txt').read( ) 001.1 002.2 003.3 010.1 020.2 030.3 040.4 100.1 200.2 300.3

Here, we cannot preallocate a fixed-length list of sums because the number of columns may vary. Splitting on whitespace extracts the columns, and float converts to numbers, but a fixed-size list won't easily accommodate a set of sums (at least, not without extra code to manage its size). Dictionaries are more convenient here because we can use

column positions as keys instead of using absolute offsets Most of this code uses tools added to Python in the last five years; see Chapter 4 for more on file iterators, Chapter 21 for text processing and alternative summers, and the library manual for the 2.3 enumerate and 2.4 sorted functions this code uses:

>>> sums = {} >>> for line in open('data.txt'): cols = [float(col) for col in line.split( )] for pos, val in enumerate(cols): sums[pos] = sums.get(pos, 0.0) + val >>> for key in sorted(sums): print key, '=', sums[key] 0 1 2 3

= = = =

111.3 222.6 333.9 40.4

>>> sums {0: 111.3, 1: 222.59999999999999, 2: 333.90000000000003, 3: 40.399999999999999}

Dictionaries are often also a handy way to represent matrixes, especially when they are mostly empty. The following two-entry dictionary, for example, suffices to represent a potentially very large three-dimensional matrix containing two nonempty valuesthe keys are coordinates and their values are data at the coordinates. You can use a similar structure to index people by their birthdays (use month, day, and year for the key), servers by their Internet Protocol (IP) numbers, and so on.

>>> D = {} >>> D[(2, 4, 6)] = 43 >>> D[(5, 6, 7)] = 46 >>> X, Y, Z = (5, 6, 7) >>> D.get((X, Y, Z), 'Missing') 46 >>> D.get((0, Y, Z), 'Missing') 'Missing' >>> D {(2, 4, 6): 43, (5, 6, 7): 46}

# 43 at position (2, 4, 6)

Example 2-1. PP3E\Preview\initdata.py

# initialize data to be stored in files, pickles, shelves # records bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'} sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'mus'} tom = {'name': 'Tom', 'age': 50, 'pay': 0, 'job': None} # database db = {} db['bob'] = bob db['sue'] = sue db['tom'] = tom if _ _name_ _ == '_ _main_ _': # when run as a script for key in db: print key, '=>\n ', db[key]

As usual, the _ _name_ _ test at the bottom of Example 2-1 is true only when this file is run, not when it is imported. When run as a top-level script (e.g., from a command line, via an icon click, or within the IDLE GUI), the file's self-test code under this test dumps the database's contents to the standard output stream (remember, that's what print statements do by default). Here is the script in action being run from a system command line on Windows. Type the following command in a Command Prompt window after a cd to the directory where the file is stored, and use a similar console window on other types of computers:

...\PP3E\Preview> python initdata.py bob => {'job': 'dev', 'pay': 30000, 'age': 42, 'name': 'Bob Smith'} sue => {'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'} tom => {'job': None, 'pay': 0, 'age': 50, 'name': 'Tom'}

Now that we've started running script files, here are a few quick startup hints: On some platforms, you may need to type the full directory path to the Python program on your machine, and on recent Windows systems you don't need python on the command line at all (just type the file's name to run it). You can also run this file inside Python's standard IDLE GUI (open the file and use the Run menu in the text edit window), and in similar ways from any of the available third-party Python IDEs (e.g., Komodo, Eclipse, and the Wing IDE). If you click the program's file icon to launch it on Windows, be sure to add a raw_input( ) call to the bottom of the script to keep the output window up. On other systems, icon clicks may require a #! line at the top and executable permission via a chmod command. I'll assume here that you're able to run Python code one way or another. Again, if you're stuck, see

other books such as Learning Python for the full story on launching Python programs.

2.4.1.2. Data format script Now, all we have to do is store all of this in-memory data on a file. There are a variety of ways to accomplish this; one of the most basic is to write one piece of data at a time, with separators between each that we can use to break the data apart when we reload. Example 2-2 shows one way to code this idea.

Example 2-2. PP3E\Preview\make_db_files.py

#################################################################### # save in-memory database object to a file with custom formatting; # assume 'endrec.', 'enddb.', and '=>' are not used in the data; # assume db is dict of dict; warning: eval can be dangerous - it # runs strings as code; could also eval( ) record dict all at once #################################################################### dbfilename = 'people-file' ENDDB = 'enddb.' ENDREC = 'endrec.' RECSEP = '=>' def storeDbase(db, dbfilename=dbfilename): "formatted dump of database to flat file" dbfile = open(dbfilename, 'w') for key in db: print >> dbfile, key for (name, value) in db[key].items( ): print >> dbfile, name + RECSEP + repr(value) print >> dbfile, ENDREC print >> dbfile, ENDDB dbfile.close( ) def loadDbase(dbfilename=dbfilename): "parse data to reconstruct database" dbfile = open(dbfilename) import sys sys.stdin = dbfile db = {} key = raw_input( ) while key != ENDDB: rec = {} field = raw_input( ) while field != ENDREC: name, value = field.split(RECSEP) rec[name] = eval(value) field = raw_input( ) db[key] = rec key = raw_input( ) return db

if _ _name_ _ == '_ _main_ _': from initdata import db storeDbase(db)

This is a somewhat complex program, partly because it has both saving and loading logic and partly because it does its job the hard way; as we'll see in a moment, there are better ways to get objects into files than by manually formatting and parsing them. For simple tasks, though, this does work; running Example 2-2 as a script writes the database out to a flat file. It has no printed output, but we can inspect the database file interactively after this script is run, either within IDLE or from a console window where you're running these examples (as is, the database file shows up in the current working directory):

...\PP3E\Preview> python make_db_file.py ...\PP3E\Preview> python >>> for line in open('people-file'): ... print line, ... bob job=>'dev' pay=>30000 age=>42 name=>'Bob Smith' endrec. sue job=>'mus' pay=>40000 age=>45 name=>'Sue Jones' endrec. tom job=>None pay=>0 age=>50 name=>'Tom' endrec. enddb.

This file is simply our database's content with added formatting. Its data originates from the test data initialization module we wrote in Example 2-1 because that is the module from which Example 2-2's self-test code imports its data. In practice, Example 2-2 itself could be imported and used to store a variety of databases and files. Notice how data to be written is formatted with the as-code repr( ) call and is re-created with the eval( ) call which treats strings as Python code. That allows us to store and re-create things like the None object, but it is potentially unsafe; you shouldn't use eval( ) if you can't be sure that the database won't contain malicious code. For our purposes, however, there's probably no cause for alarm.

2.4.1.3. Utility scripts

To test further, Example 2-3 reloads the database from a file each time it is run.

Example 2-3. PP3E\Preview\dump_db_file.py

from make_db_file import loadDbase db = loadDbase( ) for key in db: print key, '=>\n ', db[key] print db['sue']['name']

And Example 2-4 makes changes by loading, updating, and storing again.

Example 2-4. PP3E\Preview\update_db_file.py

from make_db_file import loadDbase, storeDbase db = loadDbase( ) db['sue']['pay'] *= 1.10 db['tom']['name'] = 'Tom Tom' storeDbase(db)

Here are the dump script and the update script in action at a system command line; both Sue's pay and Tom's name change between script runs. The main point to notice is that the data stays around after each script exitsour objects have become persistent simply because they are mapped to and from text files:

...\PP3E\Preview> python dump_db_file.py bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 40000, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} Sue Jones ...\PP3E\Preview> python update_db_file.py ...\PP3E\Preview> python dump_db_file.py bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 44000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom Tom'}

Sue Jones

As is, we'll have to write Python code in scripts or at the interactive command line for each specific database update we need to perform (later in this chapter, we'll do better by providing generalized console, GUI, and web-based interfaces instead). But at a basic level, our text file is a database of records. As we'll learn in the next section, though, it turns out that we've just done a lot of pointless work.

2.4.2. Using Pickle Files The formatted file scheme of the prior section works, but it has some major limitations. For one thing, it has to read the entire database from the file just to fetch one record, and it must write the entire database back to the file after each set of updates. For another, it assumes that the data separators it writes out to the file will not appear in the data to be stored: if the characters => happen to appear in the data, for example, the scheme will fail. Perhaps worse, the formatter is already complex without being general: it is tied to the dictionary-of-dictionaries structure, and it can't handle anything else without being greatly expanded. It would be nice if a general tool existed that could translate any sort of Python data to a format that could be saved on a file in a single step. That is exactly what the Python pickle module is designed to do. The pickle module translates an inmemory Python object into a serialized byte streama string of bytes that can be written to any filelike object. The pickle module also knows how to reconstruct the original object in memory, given the serialized byte stream: we get back the exact same object. In a sense, the pickle module replaces proprietary data formatsits serialized format is general and efficient enough for any program. With pickle, there is no need to manually translate objects to data when storing them persistently. The net effect is that pickling allows us to store and fetch native Python objects as they are and in a single stepwe use normal Python syntax to process pickled records. Despite what it does, the pickle module is remarkably easy to use. Example 2-5 shows how to store our records in a flat file, using pickle.

Example 2-5. PP3E\Preview\make_db_pickle.py

from initdata import db import pickle dbfile = open('people-pickle', 'w') pickle.dump(db, dbfile) dbfile.close( )

When run, this script stores the entire database (the dictionary of dictionaries defined in Example 21) to a flat file named people-pickle in the current working directory. The pickle module handles the work of converting the object to a string. Example 2-6 shows how to access the pickled database after it has been created; we simply open the file and pass its content back to pickle to remake the object from its serialized string.

Example 2-6. PP3E\Preview\dump_db_pickle.py

import pickle dbfile = open('people-pickle') db = pickle.load(dbfile) for key in db: print key, '=>\n ', db[key] print db['sue']['name']

Here are these two scripts at work, at the system command line again; naturally, they can also be run in IDLE, and you can open and inspect the pickle file by running the same sort of code interactively as well:

...\PP3E\Preview> python make_db_pickle.py ...\PP3E\Preview> python dump_db_pickle.py bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 40000, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} Sue Jones

Updating with a pickle file is similar to a manually formatted file, except that Python is doing all of the formatting work for us. Example 2-7 shows how.

Example 2-7. PP3E\Preview\update-db-pickle.py

import pickle dbfile = open('people-pickle') db = pickle.load(dbfile) dbfile.close( ) db['sue']['pay'] *= 1.10 db['tom']['name'] = 'Tom Tom' dbfile = open('people-pickle', 'w') pickle.dump(db, dbfile) dbfile.close( )

Notice how the entire database is written back to the file after the records are changed in memory, just as for the manually formatted approach; this might become slow for very large databases, but we'll ignore this for the moment. Here are our update and dump scripts in actionas in the prior

section, Sue's pay and Tom's name change between scripts because they are written back to a file (this time, a pickle file):

...\PP3E\Preview> python update_db_pickle.py ...\PP3E\Preview> python dump_db_pickle.py bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 44000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom Tom'} Sue Jones

As we'll learn in Chapter 19, the Python pickling system supports nearly arbitrary object typeslists, dictionaries, class instances, nested structures, and more. There, we'll also explore the faster cPickle module, as well as the pickler's binary storage protocols, which require files to be opened in binary mode; the default text protocol used in the preceding examples is slightly slower, but it generates readable ASCII data. As we'll see later in this chapter, the pickler also underlies shelves and ZODB databases, and pickled class instances provide both data and behavior for objects stored. In fact, pickling is more general than these examples may imply. Because they accept any object that provides an interface compatible with files, pickling and unpickling may be used to transfer native Python objects to a variety of media. Using a wrapped network socket, for instance, allows us to ship pickled Python objects across a network and provides an alternative to larger protocols such as SOAP and XML-RPC.

2.4.3. Using Per-Record Pickle Files As mentioned earlier, one potential disadvantage of this section's examples so far is that they may become slow for very large databases: because the entire database must be loaded and rewritten to update a single record, this approach can waste time. We could improve on this by storing each record in the database in a separate flat file. The next three examples show one way to do so; Example 2-8 stores each record in its own flat file, using each record's original key as its filename with a .pkl prepended (it creates the files bob.pkl, sue.pkl, and tom.pkl in the current working directory).

Example 2-8. PP3E\Preview\make_db_pickle_recs.py

from initdata import bob, sue, tom import pickle for (key, record) in [('bob', bob), ('tom', tom), ('sue', sue)]: recfile = open(key+'.pkl', 'w') pickle.dump(record, recfile) recfile.close( )

Next, Example 2-9 dumps the entire database by using the standard library's glob module to do filename expansion and thus collect all the files in this directory with a .pkl extension. To load a single record, we open its file and deserialize with pickle; we must load only one record file, though, not the entire database, to fetch one record.

Example 2-9. PP3E\Preview\dump_db_pickle_recs.py

import pickle, glob for filename in glob.glob('*.pkl'): recfile = open(filename) record = pickle.load(recfile) print filename, '=>\n ', record suefile = open('sue.pkl') print pickle.load(suefile)['name']

# for 'bob','sue','tom'

# fetch sue's name

Finally, Example 2-10 updates the database by fetching a record from its file, changing it in memory, and then writing it back to its pickle file. This time, we have to fetch and rewrite only a single record file, not the full database, to update.

Example 2-10. PP3E\Preview\update_db_pickle_recs.py

import pickle suefile = open('sue.pkl') sue = pickle.load(suefile) suefile.close( ) sue['pay'] *= 1.10 suefile = open('sue.pkl', 'w') pickle.dump(sue, suefile) suefile.close( )

Here are our file-per-record scripts in action; the results are about the same as in the prior section, but database keys become real filenames now. In a sense, the filesystem becomes our top-level dictionaryfilenames provide direct access to each record.

...\PP3E\Preview> python make_db_pickle_recs.py ...\PP3E\Preview> python dump_db_pickle_recs.py bob.pkl => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} tom.pkl => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} sue.pkl => {'pay': 40000, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'}

Sue Jones ...\PP3E\Preview> python update_db_pickle_recs.py ...\PP3E\Preview> python dump_db_pickle_recs.py bob.pkl => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} tom.pkl => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} sue.pkl => {'pay': 44000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} Sue Jones

2.4.4. Using Shelves Pickling objects to files, as shown in the preceding section, is an optimal scheme in many applications. In fact, some applications use pickling of Python objects across network sockets as a simpler alternative to network protocols such as the SOAP and XML-RPC web services architectures (also supported by Python, but much heavier than pickle). Moreover, assuming your filesystem can handle as many files as you'll need, pickling one record per file also obviates the need to load and store the entire database for each update. If we really want keyed access to records, though, the Python standard library offers an even higher-level tool: shelves. Shelves automatically pickle objects to and from a keyed-access filesystem. They behave much like dictionaries that must be opened, and they persist after each program exits. Because they give us key-based access to stored records, there is no need to manually manage one flat file per recordthe shelve system automatically splits up stored records and fetches and updates only those records that are accessed and changed. In this way, shelves provide utility similar to per-record pickle files, but are usually easier to code. The shelve interface is just as simple as pickle: it is identical to dictionaries, with extra open and close calls. In fact, to your code, a shelve really does appear to be a persistent dictionary of persistent objects; Python does all the work of mapping its content to and from a file. For instance, Example 2-11 shows how to store our in-memory dictionary objects in a shelve for permanent keeping.

Example 2-11. make_db_shelve.py

from initdata import bob, sue import shelve db = shelve.open('people-shelve') db['bob'] = bob db['sue'] = sue db.close( )

This script creates one or more files in the current directory with the name people-shelve as a prefix;

you shouldn't delete these files (they are your database!), and you should be sure to use the same name in other scripts that access the shelve. Example 2-12, for instance, reopens the shelve and indexes it by key to fetch its stored records.

Example 2-12. dump_db_shelve.py

import shelve db = shelve.open('people-shelve') for key in db: print key, '=>\n ', db[key] print db['sue']['name'] db.close( )

We still have a dictionary of dictionaries here, but the top-level dictionary is really a shelve mapped onto a file. Much happens when you access a shelve's keysit uses pickle to serialize and deserialize, and it interfaces with a keyed-access filesystem. From your perspective, though, it's just a persistent dictionary. Example 2-13 shows how to code shelve updates.

Example 2-13. update_db_shelve.py

from initdb import tom import shelve db = shelve.open('people-shelve') sue = db['sue'] sue['pay'] *= 1.50 db['sue'] = sue db['tom'] = tom db.close( )

# fetch sue # update sue # add a new record

Notice how this code fetches sue by key, updates in memory, and then reassigns to the key to update the shelve; this is a requirement of shelves, but not always of more advanced shelve-like systems such as ZODB (covered in Chapter 19). Also note how shelve files are explicitly closed; some underlying keyed-access filesystems may require this in order to flush output buffers after changes. Finally, here are the shelve-based scripts on the job, creating, changing, and fetching records. The records are still dictionaries, but the database is now a dictionary-like shelve which automatically retains its state in a file between program runs:

...\PP3E\Preview> ...\PP3E\Preview> bob => {'pay': 30000, sue => {'pay': 40000,

python make_db_shelve.py python dump_db_shelve.py 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} 'job': 'mus', 'age': 45, 'name': 'Sue Jones'}

Sue Jones ...\PP3E\Preview> python update_db_shelve.py ...\PP3E\Preview> python dump_db_shelve.py tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 60000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} Sue Jones

When we ran the update and dump scripts here, we added a new record for key tom and increased Sue's pay field by 50 percent. These changes are permanent because the record dictionaries are mapped to an external file by shelve. (In fact, this is a particularly good script for Suesomething she might consider scheduling to run often, using a cron job on Unix, or a Startup folder or msconfig entry on Windows.)

2.5. Step 3: Stepping Up to OOP Let's step back for a moment and consider how far we've come. At this point, we've created a database of records: the shelve, as well as per-record pickle file approaches of the prior section suffice for basic data storage tasks. As is, our records are represented as simple dictionaries, which provide easier-to-understand access to fields than do lists (by key, rather than by position). Dictionaries, however, still have some limitations that may become more critical as our program grows over time. For one thing, there is no central place for us to collect record processing logic. Extracting last names and giving raises, for instance, can be accomplished with code like the following:

>>> import shelve >>> db = shelve.open('people-shelve') >>> bob = db['bob'] >>> bob['name'].split( )[-1] 'Smith' >>> sue = db['sue'] >>> sue['pay'] *= 1.25 >>> sue['pay'] 75000.0 >>> db['sue'] = sue >>> db.close( )

# get bob's last name

# give sue a raise

This works, and it might suffice for some short programs. But if we ever need to change the way last names and raises are implemented, we might have to update this kind of code in many places in our program. In fact, even finding all such magical code snippets could be a challenge; hardcoding or cutting and pasting bits of logic redundantly like this in more than one place will almost always come back to haunt you eventually. It would be better to somehow hidethat is, encapsulatesuch bits of code. Functions in a module would allow us to implement such operations in a single place and thus avoid code redundancy, but still wouldn't naturally associate them with the records themselves. What we'd like is a way to bind processing logic with the data stored in the database in order to make it easier to understand, debug, and reuse. Another downside to using dictionaries for records is that they are difficult to expand over time. For example, suppose that the set of data fields or the procedure for giving raises is different for different kinds of people (perhaps some people get a bonus each year and some do not). If we ever need to extend our program, there is no natural way to customize simple dictionaries. For future growth, we'd also like our software to support extension and customization in a natural way. This is where Python's OOP support begins to become attractive:

Structure

With OOP, we can naturally associate processing logic with record dataclasses provide both a program unit that combines logic and data in a single package and a hierarchy that allows code to be easily factored to avoid redundancy.

Encapsulation With OOP, we can also wrap up details such as name processing and pay increases behind method functionsi.e., we are free to change method implementations without breaking their users.

Customization And with OOP, we have a natural growth path. Classes can be extended and customized by coding new subclasses, without changing or breaking already working code. That is, under OOP, we program by customizing and reusing, not by rewriting. OOP is an option in Python and, frankly, is sometimes better suited for strategic than for tactical tasks. It tends to work best when you have time for upfront planningsomething that might be a luxury if your users have already begun storming the gates. But especially for larger systems that change over time, its code reuse and structuring advantages far outweigh its learning curve, and it can substantially cut development time. Even in our simple case, the customizability and reduced redundancy we gain from classes can be a decided advantage.

2.5.1. Using Classes OOP is easy to use in Python, thanks largely to Python's dynamic typing model. In fact, it's so easy that we'll jump right into an example: Example 2-14 implements our database records as class instances rather than as dictionaries.

Example 2-14. PP3E\Preview\person_start.py

class Person: def _ _init_ _(self, name, age, pay=0, job=None): self.name = name self.age = age self.pay = pay self.job = job if _ _name_ _ == '_ _main_ _': bob = Person('Bob Smith', 42, 30000, 'sweng') sue = Person('Sue Jones', 45, 40000, 'music') print bob.name, sue.pay print bob.name.split( )[-1] sue.pay *= 1.10 print sue.pay

There is not much to this classjust a constructor method that fills out the instance with data passed in as arguments to the class name. It's sufficient to represent a database record, though, and it can already provide tools such as defaults for pay and job fields that dictionaries cannot. The self-test code at the bottom of this file creates two instances (records) and accesses their attributes (fields); here is this file being run under IDLE:

>>> Bob Smith 40000 Smith 44000.0

This isn't a database yet, but we could stuff these objects into a list or dictionary as before in order to collect them as a unit:

>>> from person_start import Person >>> bob = Person('Bob Smith', 42) >>> sue = Person('Sue Jones', 45, 40000) >>> people = [bob, sue] >>> for person in people: print person.name, person.pay

# a "database" list

Bob Smith 0 Sue Jones 40000 >>> x = [(person.name, person.pay) for person in people] >>> x [('Bob Smith', 0), ('Sue Jones', 40000)]

Notice that Bob's pay defaulted to zero this time because we didn't pass in a value for that argument (maybe Sue is supporting him now?). We might also implement a class that represents the database, perhaps as a subclass of the built-in list or dictionary types, with insert and delete methods that encapsulate the way the database is implemented. We'll abandon this path for now, though, because it will be more useful to store these records persistently in a shelve, which already encapsulates stores and fetches behind an interface for us. Before we do, though, let's add some logic.

2.5.2. Adding Behavior So far, our class is just data: it replaces dictionary keys with object attributes, but it doesn't add much to what we had before. To really leverage the power of classes, we need to add some behavior. By wrapping up bits of behavior in class method functions, we can insulate clients from changes. And by packaging methods in classes along with data, we provide a natural place for readers to look for code. In a sense, classes combine records and the programs that process those records; methods provide logic that interprets and updates the data. For instance, Example 2-15 adds the last-name and raise logic as class methods; methods use the self argument to access or update the instance (record) being processed.

Example 2-15. PP3E\Preview\person.py

class Person: def _ _init_ _(self, name, age, pay=0, job=None): self.name = name self.age = age self.pay = pay self.job = job def lastName(self): return self.name.split( )[-1] def giveRaise(self, percent): self.pay *= (1.0 + percent) if _ _name_ _ == '_ _main_ _': bob = Person('Bob Smith', 42, 30000, 'sweng') sue = Person('Sue Jones', 45, 40000, 'music') print bob.name, sue.pay print bob.lastName( ) sue.giveRaise(.10) print sue.pay

The output of this script is the same as the last, but the results are being computed by methods now, not by hardcoded logic that appears redundantly wherever it is required:

>>> Bob Smith 40000 Smith 44000.0

2.5.3. Adding Inheritance One last enhancement to our records before they become permanent: because they are implemented as classes now, they naturally support customization through the inheritance search mechanism in Python. Example 2-16, for instance, customizes the last section's Person class in order to give a 10 percent bonus by default to managers whenever they receive a raise (any relation to practice in the real world is purely coincidental).

Example 2-16. PP3E\Preview\manager.py

from person import Person class Manager(Person): def giveRaise(self, percent, bonus=0.1): self.pay *= (1.0 + percent + bonus) if _ _name_ _ == '_ _main_ _': tom = Manager(name='Tom Doe', age=50, pay=50000) print tom.lastName( ) tom.giveRaise(.20) print tom.pay >>> Doe 65000.0

Here, the Manager class appears in a module of its own, but it could have been added to the person module instead (Python doesn't require just one class per file). It inherits the constructor and lastname methods from its superclass, but it customizes just the raise method. Because this change is being added as a new subclass, the original Person class, and any objects generated from it, will continue working unchanged. Bob and Sue, for example, inherit the original raise logic, but Tom gets the custom version because of the class from which he is created. In OOP, we program by customizing, not by changing. In fact, code that uses our objects doesn't need to be at all ware of what the raise method doesit's up to the object to do the right thing based on the class from which it is created. As long as the object supports the expected interface (here, a method called giveRaise), it will be compatible with the calling code, regardless of its specific type, and even if its method works differently than others. If you've already studied Python, you may know this behavior as polymorphism; it's a core property of the language, and it accounts for much of your code's flexibility. When the following code calls the giveRaise method, for example, what happens depends on the obj object being processed; Tom gets a 20 percent raise instead of 10 percent because of the Manager class's customization:

>>> from person import Person >>> from manager import Manager >>> >>> >>> >>>

bob = Person(name='Bob Smith', age=42, pay=10000) sue = Person(name='Sue Jones', age=45, pay=20000) tom = Manager(name='Tom Doe', age=55, pay=30000) db = [bob, sue, tom]

>>> for obj in db: obj.giveRaise(.10)

# default or custom

>>> for obj in db: print obj.lastName( ), '=>', obj.pay Smith => 11000.0 Jones => 22000.0 Doe => 36000.0

2.5.4. Refactoring Code Before we move on, there are a few coding alternatives worth noting here. Most of these underscore the Python OOP model, and they serve as a quick review.

2.5.4.1. Augmenting methods As a first alternative, notice that we have introduced some redundancy in Example 2-16: the raise calculation is now repeated in two places (in the two classes). We could also have implemented the customized Manager class by augmenting the inherited raise method instead of replacing it completely:

class Manager(Person): def giveRaise(self, percent, bonus=0.1): Person.giveRaise(self, percent + bonus)

The trick here is to call back the superclass's version of the method directly, passing in the self argument explicitly. We still redefine the method, but we simply run the general version after adding 10 percent (by default) to the passed-in percentage. This coding pattern can help reduce code redundancy (the original raise method's logic appears in only one place and so is easier to change) and is especially handy for kicking off superclass constructor methods in practice. If you've already studied Python OOP, you know that this coding scheme works because we can always call methods through either an instance or the class name. In general, the following are equivalent, and both forms may be used explicitly:

instance.method(arg1, arg2) class.method(instance, arg1, arg2)

In fact, the first form is mapped to the secondwhen calling through the instance, Python determines the class by searching the inheritance tree for the method name and passes in the instance automatically. Either way, within giveRaise, self refers to the instance that is the subject of the call.

2.5.4.2. Display format For more object-oriented fun, we could also add a few operator overloading methods to our people classes. For example, a _ _str_ _ method, shown here, could return a string to give the display format for our objects when they are printed as a wholemuch better than the default display we get for an instance:

class Person: def _ _str_ _(self):

return ' %s>' % (self._ _class_ _._ _name_ _, self.name) tom = Manager('Tom Jones', 50) print tom

# prints: <Manager => Tom Jones>

Here _ _class_ _ gives the lowest class from which self was made, even though _ _str_ _ may be inherited. The net effect is that _ _str_ _ allows us to print instances directly instead of having to print specific attributes. We could extend this _ _str_ _ to loop through the instance's _ _dict_ _ attribute dictionary to display all attributes generically. We might even code an _ _add_ _ method to make + expressions automatically call the giveRaise method. Whether we should is another question; the fact that a + expression gives a person a raise might seem more magical to the next person reading our code than it should.

2.5.4.3. Constructor customization Finally, notice that we didn't pass the job argument when making a manager in Example 2-16; if we had, it would look like this with keyword arguments:

tom = Manager(name='Tom Doe', age=50, pay=50000, job='manager')

The reason we didn't include a job in the example is that it's redundant with the class of the object: if someone is a manager, their class should imply their job title. Instead of leaving this field blank, though, it may make more sense to provide an explicit constructor for managers, which fills in this field automatically:

class Manager(Person): def _ _init_ _(self, name, age, pay): Person._ _init_ _(self, name, age, pay, 'manager')

Now when a manager is created, its job is filled in automatically. The trick here is to call to the superclass's version of the method explicitly, just as we did for the giveRaise method earlier in this section; the only difference here is the unusual name for the constructor method.

2.5.4.4. Alternative classes We won't use any of this section's three extensions in later examples, but to demonstrate how they work, Example 2-17 collects these ideas in an alternative implementation of our Person classes.

Example 2-17. PP3E\Preview\people-alternative.py

""" alternative implementation of person classes data, behavior, and operator overloading """ class Person: """ a general person: data+logic """ def _ _init_ _(self, name, age, pay=0, job=None): self.name = name self.age = age self.pay = pay self.job = job def lastName(self): return self.name.split( )[-1] def giveRaise(self, percent): self.pay *= (1.0 + percent) def _ _str_ _(self): return (' %s: %s, %s>' % (self._ _class_ _._ _name_ _, self.name, self.job, self.pay)) class Manager(Person): """ a person with custom raise inherits general lastname, str """ def _ _init_ _(self, name, age, pay): Person._ _init_ _(self, name, age, pay, 'manager') def giveRaise(self, percent, bonus=0.1): Person.giveRaise(self, percent + bonus) if _ _name_ _ == '_ _main_ _': bob = Person('Bob Smith', 44) sue = Person('Sue Jones', 47, 40000, 'music') tom = Manager(name='Tom Doe', age=50, pay=50000) print sue, sue.pay, sue.lastName( ) for obj in (bob, sue, tom): obj.giveRaise(.10) # run this obj's giveRaise print obj # run common _ _str_ _ method

Notice the polymorphism in this module's self-test loop: all three objects share the constructor, lastname, and printing methods, but the raise method called is dependent upon the class from which an instance is created. When run, Example 2-17 prints the following to standard outputthe manager's job is filled in at construction, we get the new custom display format for our objects, and the new version of the manager's raise method works as before:

Sue Jones: music, 40000> 40000 Jones Bob Smith: None, 0.0> Sue Jones: music, 44000.0> <Manager => Tom Doe: manager, 60000.0>

Such refactoring (restructuring) of code is common as class hierarchies grow and evolve. In fact, as is, we still can't give someone a raise if his pay is zero (Bob is out of luck); we probably need a way to set pay, too, but we'll leave such extensions for the next release. The good news is that Python's flexibility and readability make refactoring easyit's simple and quick to restructure your code. If you haven't used the language yet, you'll find that Python development is largely an exercise in rapid, incremental, and interactive programming, which is well suited to the shifting needs of real-world projects.

2.5.5. Adding Persistence It's time for a status update. We now have encapsulated in the form of classes customizable implementations of our records and their processing logic. Making our class-based records persistent is a minor last step. We could store them in per-record pickle files again; a shelve-based storage medium will do just as well for our goals and is often easier to code. Example 2-18 shows how.

Example 2-18. PP3E\Preview\make_db_classes.py

import shelve from person import Person from manager import Manager bob = Person('Bob Smith', 42, 30000, 'sweng') sue = Person('Sue Jones', 45, 40000, 'music') tom = Manager('Tom Doe', 50, 50000) db = shelve.open('class-shelve') db['bob'] = bob db['sue'] = sue db['tom'] = tom db.close( )

This file creates three class instances (two from the original class and one from its customization) and assigns them to keys in a newly created shelve file to store them permanently. In other words, it creates a shelve of class instances; to our code, the database looks just like a dictionary of class instances, but the top-level dictionary is mapped to a shelve file again. To check our work, Example 2-19 reads the shelve and prints fields of its records.

Example 2-19. PP3E\Preview\dump_db_class.py

import shelve db = shelve.open('class-shelve') for key in db: print key, '=>\n ', db[key].name, db[key].pay bob = db['bob'] print bob.lastName( ) print db['tom'].lastName( )

Note that we don't need to reimport the Person class here in order to fetch its instances from the shelve or run their methods. When instances are shelved or pickled, the underlying pickling system records both instance attributes and enough information to locate their classes automatically when they are later fetched (the class's module simply has to be on the module search path when an instance is loaded). This is on purpose; because the class and its instances in the shelve are stored separately, you can change the class to modify the way stored instances are interpreted when loaded (more on this later in the book). Here is the shelve dump script running under IDLE just after creating the shelve:

>>> tom => Tom Doe 50000 bob => Bob Smith 30000 sue => Sue Jones 40000 Smith Doe

As shown in Example 2-20, database updates are as simple as before, but dictionary keys become object attributes and updates are implemented by method calls, not by hardcoded logic. Notice how we still fetch, update, and reassign to keys to update the shelve.

Example 2-20. PP3E\Preview\update_db_class.py

import shelve db = shelve.open('class-shelve') sue = db['sue'] sue.giveRaise(.25) db['sue'] = sue tom = db['tom'] tom.giveRaise(.20) db['tom'] = tom db.close( )

And last but not least, here is the dump script again after running the update script; Tom and Sue have new pay values, because these objects are now persistent in the shelve. We could also open and inspect the shelve by typing code at Python's interactive command line; despite its longevity, the shelve is just a Python object containing Python objects.

>>> tom => Tom Doe 65000.0 bob => Bob Smith 30000 sue => Sue Jones 50000.0 Smith Doe

Tom and Sue both get a raise this time around, because they are persistent objects in the shelve database. Although shelves can store simpler object types such as lists and dictionaries, class instances allow us to combine both data and behavior for our stored items. In a sense, instance attributes and class methods take the place of records and processing programs in more traditional schemes.

2.5.6. Other Database Options At this point, we have a full-fledged database system: our classes simultaneously implement record data and record processing, and they encapsulate the implementation of the behavior. And the Python pickle and shelve modules provide simple ways to store our database persistently between program executions. This is not a relational database (we store objects, not tables, and queries take the form of Python object processing code), but it is sufficient for many kinds of programs. If we need more functionality, we could migrate this application to even more powerful tools. For example, should we ever need full-blown SQL query support, there are interfaces that allow Python scripts to communicate with relational databases such as MySQL, PostgreSQL, and Oracle in portable ways. Moreover, the open source ZODB system provides a more comprehensive object database for

Python, with support for features missing in shelves, including concurrent updates, transaction commits and rollbacks, automatic updates on in-memory component changes, and more. We'll explore these more advanced third-party tools in Chapter 19. For now, let's move on to putting a good face on our system.

2.6. Step 4: Adding Console Interaction So far, our database program consists of class instances stored in a shelve file, as coded in the preceding section. It's sufficient as a storage medium, but it requires us to run scripts from the command line or type code interactively in order to view or process its content. Improving on this is straightforward: simply code more general programs that interact with users, either from a console window or from a full-blown graphical interface.

2.6.1. A Console Shelve Interface Let's start with something simple. The most basic kind of interface we can code would allow users to type keys and values in a console window in order to process the database (instead of writing Python program code). Example 2-21, for instance, implements a simple interactive loop that allows a user to query multiple record objects in the shelve by key.

Example 2-21. PP3E\Preview\peopleinteract_query.py

# interactive queries import shelve fieldnames = ('name', 'age', 'job', 'pay') maxfield = max(len(f) for f in fieldnames) db = shelve.open('class-shelve') while True: key = raw_input('\nKey? => ') # key or empty line, exc at eof if not key: break try: record = db[key] # fetch by key, show in console except: print 'No such key "%s"!' % key else: for field in fieldnames: print field.ljust(maxfield), '=>', getattr(record, field)

This script uses getattr to fetch an object's attribute when given its name string, and the ljust leftjustify method of strings to align outputs (maxfield , derived from a comprehension expression, is the length of the longest field name). When run, this script goes into a loop, inputting keys from the interactive user (technically, from the standard input stream, which is usually a console window) and displaying the fetched records field by field. An empty line ends the session:

Key? => sue name => Sue Jones age => 45

job pay

=> music => 40000

Key? => nobody No such key "nobody"! Key? =>

Example 2-22 goes further and allows interactive updates. For an input key, it inputs values for each field and either updates an existing record or creates a new object and stores it under the key.

Example 2-22. PP3E\Preview\peopleinteract_update.py

# interactive updates import shelve from person import Person fieldnames = ('name', 'age', 'job', 'pay') db = shelve.open('class-shelve') while True: key = raw_input('\nKey? => ') if not key: break if key in db.keys( ): record = db[key] # update existing record else: # or make/store new rec record = Person(name='?', age='?') # eval: quote strings for field in fieldnames: currval = getattr(record, field) newtext = raw_input('\t[%s]=%s\n\t\tnew?=>' % (field, currval)) if newtext: setattr(record, field, eval(newtext)) db[key] = record db.close( )

Notice the use of eval in this script to convert inputs (as usual, that allows any Python object type, but it means you must quote string inputs explicitly) and the use of setattr call to assign an attribute given its name string. When run, this script allows any number of records to be added and changed; to keep the current value of a record's field, press the Enter key when prompted for a new value:

Key? => tom [name]=Tom Doe new?=> [age]=55 new?=>56 [job]=mgr new?=> [pay]=65000.0 new?=>90000

Key? => nobody [name]=? new?=>'John Doh' [age]=? new?=>55 [job]=None new?=> [pay]=0 new?=>None Key? =>

This script is still fairly simplistic (e.g., errors aren't handled), but using it is much easier than manually opening and modifying the shelve at the Python interactive prompt, especially for nonprogrammers. Run the query script to check your work after an update (we could combine query and update into a single script if this becomes too cumbersome, albeit at some cost in code and userexperience complexity):

Key? name age job pay

=> => => => =>

tom Tom Doe 56 mgr 90000

Key? name age job pay

=> => => => =>

nobody John Doh 55 None None

Key? =>

2.7. Step 5: Adding a GUI The console-based interface approach of the preceding section works, and it may be sufficient for some users assuming that they are comfortable with typing commands in a console window. With just a little extra work, though, we can add a GUI that is more modern, easier to use and less error prone, and arguably sexier.

2.7.1. GUI Basics As we'll see later in this book, a variety of GUI toolkits and builders are available for Python programmers: Tkinter, wxPython, PyQt, PythonCard, Dabo, and more. Of these, Tkinter ships with Python, and it is something of a de facto standard. Tkinter is a lightweight toolkit and so meshes well with a scripting language such as Python; it's easy to do basic things with Tkinter, and it's straightforward to do more advanced things with extensions and OOP-based code. As an added bonus, Tkinter GUIs are portable across Windows, Linux/Unix, and Macintosh; simply copy the source code to the machine on which you wish to use your GUI. Because Tkinter is designed for scripting, coding GUIs with it is straightforward. We'll study all of its concepts and tools later in this book. But as a first example, the first program in Tkinter is just a few lines of code, as shown in Example 2-23.

Example 2-23. PP3E\Preview\tkinter001.py

from Tkinter import * Label(text='Spam').pack( ) mainloop( )

This isn't the most useful GUI ever coded, but it demonstrates Tkinter basics and it builds the fully functional window shown in Figure 2-1 in just three simple lines of code. From the Tkinter module, we get widget (screen device) construction calls such as Label, geometry manager methods such as pack, widget configuration constants such as TOP and RIGHT side hints for pack, and the mainloop call, which starts event processing.

Figure 2-1. tkinter001.py window

You can launch this example in IDLE from a console command line by clicking its icon the same way

you can run other Python scripts. Tkinter itself is a standard part of Python and works out-of-the-box on Windows, though you may need to install extras on some computers (more details later in this book). It's not much more work to code a GUI that actually responds to a user: Example 2-24 implements a GUI with a button that runs the reply function each time it is pressed.

Example 2-24. PP3E\Preview\ tkinter101.py

from Tkinter import * from tkMessageBox import showinfo def reply( ): showinfo(title='popup', message='Button pressed!') window = Tk( ) button = Button(window, text='press', command=reply) button.pack( ) window.mainloop( )

This example still isn't very sophisticatedit creates an explicit Tk main window for the application to serve as the parent container of the button, and it builds the simple window shown in Figure 2-2 (in Tkinter, containers are passed in as the first argument when making a new widget; they default to the main window). But this time, each time you click the "press" button, the program responds by running Python code that pops up the dialog window in Figure 2-3.

Figure 2-2. tkinter101.py main window

Figure 2-3. tkinter101.py common dialog pop up

Notice how the pop-up dialog looks like it should for Windows, the platform on which this screenshot was taken; Tkinter gives us a native look and feel that is appropriate for the machine on which it is

running. We can customize this GUI in many ways (e.g., by changing colors and fonts, setting window titles and icons, using photos on buttons instead of text), but part of the power of Tkinter is that we need to set only the options we are interested in tailoring.

2.7.2. Using OOP for GUIs All of our GUI examples so far have been top-level script code with a function for handling events. In larger programs, it is often more useful to code a GUI as a subclass of the Tkinter Frame widgeta container for other widgets. Example 2-25 shows our single-button GUI recoded in this way as a class.


from Tkinter import * from tkMessageBox import showinfo class MyGui(Frame): def _ _init_ _(self, parent=None): Frame._ _init_ _(self, parent) button = Button(self, text='press', command=self.reply) button.pack( ) def reply(self): showinfo(title='popup', message='Button pressed!') if _ _name_ _ == '_ _main_ _': window = MyGui( ) window.pack( ) window.mainloop( )

The button's event handler is a bound methodself.reply, an object that remembers both self and reply when later called. This example generates the same window and pop up as Example 2-24 (Figures 2-2 and 2-3); but because it is now a subclass of Frame, it automatically becomes an attachable componenti.e., we can add all of the widgets this class creates, as a package, to any other GUI, just by attaching this Frame to the GUI. Example 2-26 shows how.

Example 2-26. PP3E\Preview\attachgui.py

from Tkinter import * from tkinter102 import MyGui # main app window mainwin = Tk( ) Label(mainwin, text=_ _name_ _).pack( ) # popup window popup = Toplevel( ) Label(popup, text='Attach').pack(side=LEFT) MyGui(popup).pack(side=RIGHT) mainwin.mainloop( )

# attach my frame

This example attaches our one-button GUI to a larger window, here a Toplevel pop-up window created by the importing application and passed into the construction call as the explicit parent (you will also get a Tk main window; as we'll learn later, you always do, whether it is made explicit in your code or not). Our one-button widget package is attached to the right side of its container this time. If you run this live, you'll get the scene captured in Figure 2-4; the "press" button is our attached custom Frame.

Figure 2-4. Attaching GUIs

Moreover, because MyGui is coded as a class, the GUI can be customized by the usual inheritance mechanism; simply define a subclass that replaces the parts that differ. The reply method, for example, can be customized this way to do something unique, as demonstrated in Example 2-27.

Example 2-27. PP3E\Preview\customizegui.py

from tkMessageBox import showinfo from tkinter102 import MyGui class CustomGui(MyGui): def reply(self): showinfo(title='popup', message='Ouch!')

# inherit init # replace reply

if _ _name_ _ == '_ _main_ _': CustomGui().pack( ) mainloop( )

When run, this script creates the same main window and button as the original MyGui class. But pressing its button generates a different reply, as shown in Figure 2-5, because the custom version of the reply method runs.

Figure 2-5. Customizing GUIs

Although these are still small GUIs, they illustrate some fairly large ideas. As we'll see later in the book, using OOP like this for inheritance and attachment allows us to reuse packages of widgets in other programscalculators, text editors, and the like can be customized and added as components to other GUIs easily if they are classes.

2.7.3. Getting Input from a User As a final introductory script, Example 2-28 shows how to input data from the user in an EnTRy widget and display it in a pop-up dialog. The lambda it uses defers the call to the reply function so that inputs can be passed ina common Tkinter coding pattern (we could also use ent as a global variable within reply, but that makes it less general). This example also demonstrates how to change the icon and title of a top-level window; here, the window icon file is located in the same directory as the script.


from Tkinter import * from tkMessageBox import showinfo def reply(name): showinfo(title='Reply', message='Hello %s!' % name) top = Tk( ) top.title('Echo') top.iconbitmap('py-blue-trans-out.ico') Label(top, text="Enter your name:").pack(side=TOP) ent = Entry(top) ent.pack(side=TOP) btn = Button(top, text="Submit", command=(lambda: reply(ent.get( )))) btn.pack(side=LEFT) top.mainloop( )

As is, this example is just three widgets attached to the Tk main top-level window; later we'll learn how to use nested Frame container widgets in a window like this to achieve a variety of layouts for its three widgets. Figure 2-6 gives the resulting main and pop-up windows after the Submit button is pressed (shown here running on a different Windows machine). We'll see something very similar later in this chapter, but rendered in a web browser with HTML.

Figure 2-6. Fetching input from a user

The code we've seen so far demonstrates many of the core concepts in GUI programming, but Tkinter is much more powerful than these examples imply. There are more than 20 widgets in Tkinter and many more ways to input data from a user, including multiple-line text, drawing canvases, pulldown menus, radio and check-buttons, scroll bars, as well as other layout and event handling mechanisms. Beyond Tkinter itself, extensions such as the open source PMW and Tix libraries add additional widgets we can use in our Python Tkinter GUIs and provide an even more professional look and feel. To hint at what is to come, let's put Tkinter to work on our database of people.

2.7.4. A GUI Shelve Interface

For our database application, the first thing we probably want is a GUI for viewing the stored dataa form with field names and valuesand a way to fetch records by key. It would also be useful to be able to update a record with new field values given its key and to add new records from scratch by filling out the form. To keep this simple, we'll use a single GUI for all of these tasks. Figure 2-7 shows the window we are going to code as it looks in Windows; the record for the key sue has been fetched and displayed. This record is really an instance of our class in our shelve file, but the user doesn't need to care.

Figure 2-7. peoplegui.py main display/input window

2.7.4.1. Coding the GUI Also, to keep this simple, we'll assume that all records in the database have the same sets of fields. It would be a minor extension to generalize this for any set of fields (and come up with a general form GUI constructor tool in the process, such as this book's PyForm example), but we'll defer such evolutions to later in this book. Example 2-29 implements the GUI shown in Figure 2-7.

Example 2-29. PP3E\Preview\peoplegui.py

############################################################################ # implement a GUI for viewing/updating class instances stored in a shelve; # the shelve lives on machine this script runs on, as 1 or more local files ############################################################################ from Tkinter import * from tkMessageBox import showerror import shelve shelvename = 'class-shelve' fieldnames = ('name', 'age', 'job', 'pay') def makeWidgets( ): global entries window = Tk( ) window.title('People Shelve') form = Frame(window) labels = Frame(form) values = Frame(form)

labels.pack(side=LEFT) values.pack(side=RIGHT) form.pack( ) entries = {} for label in ('key',) + fieldnames: Label(labels, text=label).pack( ) ent = Entry(values) ent.pack( ) entries[label] = ent Button(window, text="Fetch", command=fetchRecord).pack(side=LEFT) Button(window, text="Update", command=updateRecord).pack(side=LEFT) Button(window, text="Quit", command=window.quit).pack(side=RIGHT) return window def fetchRecord( ): key = entries['key'].get( ) try: record = db[key] # fetch by key, show in GUI except: showerror(title='Error', message='No such key!') else: for field in fieldnames: entries[field].delete(0, END) entries[field].insert(0, repr(getattr(record, field))) def updateRecord( ): key = entries['key'].get( ) if key in db.keys( ): record = db[key] # update existing record else: from person import Person # make/store new one for key record = Person(name='?', age='?') # eval: strings must be quoted for field in fieldnames: setattr(record, field, eval(entries[field].get( ))) db[key] = record db = shelve.open(shelvename) window = makeWidgets( ) window.mainloop( ) db.close( ) # back here after quit or window close

Notice how the end of this script opens the shelve as a global variable and starts the GUI; the shelve remains open for the lifespan of the GUI (mainloop returns only after the main window is closed). As we'll see in the next section, this state retention is very different from the web model, where each interaction is normally a standalone program. Also notice that the use of global variables makes this code simple but unusable outside the context of our database; more on this later.

2.7.4.2. Using the GUI The GUI we're building is fairly basic, but it provides a view on the shelve file and allows us to browse and update the file without typing any code. To fetch a record from the shelve and display it on the GUI, type its key into the GUI's "key" field and click Fetch. To change a record, type into its input

fields after fetching it and click Update; the values in the GUI will be written to the record in the database. And to add a new record, fill out all of the GUI's fields with new values and click Updatethe new record will be added to the shelve file using the key and field inputs you provide. In other words, the GUI's fields are used for both display and input. Figure 2-8 shows the scene after adding a new record (via Update), and Figure 2-9 shows an error dialog pop up issued when users try to fetch a key that isn't present in the shelve.

Figure 2-8. peoplegui.py after adding a new persistent object

Figure 2-9. peoplegui.py common error dialog pop up

Notice how we're using repr( ) again to display field values fetched from the shelve and eval( ) to convert field values to Python objects before they are stored in the shelve. As mentioned previously, this is potentially dangerous if someone sneaks some malicious code into our shelve, but we'll finesse such concerns for now. Keep in mind, though, that this scheme means that strings must be quoted in input fields other than the keythey are assumed to be Python code. In fact, you could type an arbitrary Python expression in an input field to specify a value for an update. (Typing "Tom"*3 in the name field, for instance, would set the name to TomTomTom after an update, though this was not by design! Fetch to see the result.) Even though we now have a GUI for browsing and changing records, we can still check our work by interactively opening and inspecting the shelve file or by running scripts such as the dump utility in Example 2-19. Remember, despite the fact that we're now viewing records in a GUI's windows, the database is a Python shelve file containing native Python class instance objects, so any Python code can access it. Here is the dump script at work after adding and changing a few persistent objects in the GUI:

...\PP3E\Preview> python dump_db_class.py tom =>

Tom Doe 90000 peg => 1 4 tomtom => Tom Tom 40000 bob => Bob Smith 30000 sue => Sue Jones 40000 bill => bill 9999 nobody => John Doh None Smith Doe

2.7.4.3. Future directions Although this GUI does the job, there is plenty of room for improvement: As coded, this GUI is a simple set of functions that share the global list of input fields (entries) and a global shelve (db). We might instead pass these two objects in as function arguments using the lambda TRick of the prior section; though not crucial in a script this small, as a rule of thumb, making your external dependencies explicit makes your code both easier to understand and reusable in other contexts. We could also structure this GUI as a class to support attachment and customization, though it's unlikely that we'll need to reuse such a specific GUI (but see peoplegui_class.py in the book examples directory for a start). More usefully, we could pass in the fieldnames tuple as an input parameter to the functions here to allow them to be used for other record types in the future. Code at the bottom of the file would similarly become a function with a passed-in shelve filename, and we would also need to pass in a new record construction call to the update function because Person could not be hardcoded. (Such generalization is beyond the scope of this preview, but see people_general.py in the book examples directory for a first implementation and the PyForm program later in this book for a more general approach.) To make this GUI more user friendly, it might also be nice to add an index window that displays all the keys in the database in order to make browsing easier. Some sort of verification before updates might be useful as well, and Delete and Clear buttons would be simple to code. Furthermore, assuming that inputs are Python code may be more bother than it is worth; a simpler input scheme might be easier and safer. We could also support window resizing (as we'll learn, widgets can grow and shrink with the window) and provide an interface for calling class methods (as is, the pay field can be updated, but there is no way to invoke the giveRaise method). If we plan to distribute this GUI widely, we might package it up as a standalone executable programa frozen binary in Python terminologyusing third-party tools such as Py2Exe, Installer, and Freeze (search the Web for pointers). Such a program can be run directly without installing Python on the receiving end.

We'll leave all such extensions as suggested exercises and revisit some of them later in this book. Before we move on, two notes. First, I should mention that even more graphical packages are available to Python programmers. For instance, if you need to do graphics beyond basic windows, the Tkinter Canvas widget supports freeform graphics. Third-party extensions such as Blender, OpenGL, VPython, PIL, VTK, and PyGame provide even more advanced graphics, visualization, and animation tools for use in Python scripts. Moreover, the PMW and Tix widget kits mentioned earlier extend Tkinter itself. Try the Vaults of Parnassus, PyPI, and Google for third-party graphics extensions. And in deference to fans of other GUI toolkits such as wxPython and PyQt, I should also note that there are other GUI options to choose from and that choice is sometimes very subjective. Tkinter is shown here because it is mature, robust, fully open source, well documented, well supported, lightweight, and a standard part of Python. By most accounts, it remains the standard for building portable GUIs in Python. Other GUI toolkits for Python have pros and cons of their own, discussed later in this book. For example, some exchange simplicity for richer widget sets. By and large, though, they are variations on a themeonce you've learned one GUI toolkit, others are easy to pick up. Because of that, we'll focus fully on learning one toolkit in its entirety in this book instead of sampling many partially. Some consider web pages to be a kind of GUI as well, but you'll have to read the next and final section of this chapter to judge that for yourself.

2.8. Step 6: Adding a Web Interface GUI interfaces are easier to use than command lines and are often all we need to simplify access to data. By making our database available on the Web, we can open it up to even wider use. Anyone with Internet access and a web browser can access the data, regardless of where they are located and which machine they are using. Anything from workstations to cell phones will suffice. Moreover, web-based interfaces require only a web browser; there is no need to install Python to access the data except on the single-server machine. Although web-based approaches may sacrifice some of the utility and speed of in-process GUI toolkits, their portability gain can be compelling. As we'll also see later in this book, there are a variety of ways to go about scripting interactive web pages of the sort we'll need in order to access our data. Basic CGI scripting is more than adequate for simple tasks like ours. For more advanced applications, toolkits and frameworks such as Zope, Plone, Twisted, CherryPy, Webware, Django, TurboGears, mod_python, and Quixote can provide tools that we would otherwise need to code from scratch. Zope, for instance, simplifies many CGI scripting tasks and provides for security, load balancing on the server, and more. For now, let's keep things simple and code a CGI script.

2.8.1. CGI Basics CGI scripting in Python is easy as long as you already have a handle on things like HTML forms, URLs, and the client/server model of the Web (all topics we'll address in detail later in this book). Whether you're aware of all the underlying details or not, the basic interaction model is probably familiar. In a nutshell, a user visits a web site and receives a form, coded in HTML, to be filled out in her browser. After submitting the form, a script, identified within either the form or the address used to contact the server, is run on the server and produces another HTML page as a reply. Along the way, data typically passes through three programs: from the client browser, to the web server, to the CGI script, and back again to the browser. This is a natural model for the database access interaction we're afterusers can submit a database key to the server and receive the corresponding record as a reply page. We'll go into CGI basics in depth later in this book, but as a first example, let's start out with a simple interactive web page that requests, and then echoes back a user's name in a web browser. The first page in this interaction is just an input form produced by the HTML file shown in Example 2-30. This HTML file is stored on the web server machine and is transferred to the web browser when accessed.

Example 2-30. PP3E\Preview\cgi101.html

Interactive Page
Enter your name:

Notice how this HTML form names the script that will process its input on the server in its action attribute. The input form that this code produces is shown in Figure 2-10 (shown in the open source Firefox web browser running on Windows).

Figure 2-10. cgi101.html input form page

After the input form is submitted, the script in Example 2-31 is run on the web server machine to handle the inputs and generate a reply to the browser on the client machine. Like the HTML file, this Python script resides on the same machine as the web server; it uses the cgi module to parse the form's input and insert it into the HTML reply stream, properly escaped. The cgi module gives us a dictionary-like interface to form inputs sent by the browser, and the HTML code that this script prints winds up rendering the next page on the client's browser. In the CGI world, the standard output stream is connected to the client through a socket.

Example 2-31. PP3E\Preview\cgi-bin\cgi101.py

#!/usr/bin/python import cgi form = cgi.FieldStorage( ) # parse form data print "Content-type: text/html\n" # hdr plus blank line print "Reply Page" # html reply page if not form.has_key('user'): print "Who are you?" else: print "Hello %s!" % cgi.escape(form['user'].value)

And if all goes well, we receive the reply page shown in Figure 2-11essentially, just an echo of the data we entered in the input page. The page in this figure is produced by the HTML printed by the Python CGI script running on the server. Along the way, the user's name was transferred from a client to a server and back againpotentially across networks and miles. This isn't much of a web site, of course, but the basic principles here apply, whether you're echoing inputs or doing full-blown ewhatever.

Figure 2-11. cgi101.py script reply page for input form

If you have trouble getting this interaction to run on Unix-like systems, you may need to modify the path to your Python in the #! line at the top of the script file and make it executable with a chmod command, but this is dependent on your web server (more on servers in the next section). Also note that the CGI script in Example 2-31 isn't printing complete HTML: the and tags of the static HTML file in Example 2-30 are missing. Strictly speaking, such tags should be printed, but web browsers don't mind the omissions, and this book's goal is not to teach legalistic HTML; see other resources for more on HTML. Before moving on, it's worth taking a moment to compare this basic CGI example with the simple GUI of Example 2-28 and Figure 2-6. Here, we're running scripts on a server to generate HTML that is rendered in a web browser. In the GUI, we make calls to build the display and respond to events within a single process and on a single machine. The GUI runs multiple layers of software, but not

multiple programs. By contrast, the CGI approach is much more distributedthe server, the browser, and possibly the CGI script itself run as separate programs that usually communicate over a network. Because of such differences, the GUI model may be simpler and more direct: there is no intermediate server, replies do not require invoking a new program, no HTML needs to be generated, and the full power of a GUI toolkit is at our disposal. On the other hand, a web-based interface can be viewed in any browser on any computer and only requires Python on the server machine. And just to muddle the waters further, a GUI can also employ Python's standard library networking tools to fetch and display data from a remote server (that's how web browsers do their work). We'll revisit the tradeoffs of the GUI and CGI schemes later in this book. First, let's preview a handful of pragmatic issues related to CGI work before we apply it to our people database.

2.8.2. Running a Web Server To run CGI scripts at all, we need a web server that will serve up our HTML and launch our Python scripts on request. The server is a required mediator between the browser and the CGI script. If you don't have an account on a machine that has such a server available, you'll want to run one of your own. We could configure and run a full-blown web server such as the open source Apache system (which, by the way, can be tailored with Python-specific support by the mod_python extension). For this chapter, however, I instead wrote a simple web server in Python using the code in Example 232. We'll revisit the tools used in this example later in this book. In short, because Python provides precoded support for various types of network servers, we can build a CGI-capable and portable HTTP web server in roughly 20 lines of code (including comments, whitespace, and a workaround added to force the CGI script to run in-process because of a Windows problem I ran into on two of my test machinesmore on this later). As we'll see later in this book, it's also easy to build proprietary network servers with low-level socket calls in Python, but the standard library provides canned implementations for many common server types, web based or otherwise. The SocketServer module, for instance, provides threaded and forking versions of TCP and UDP servers. Third-party systems such as Twisted provide even more implementations. For serving up web content, the standard library modules used in Example 2-32 provide what we need.

Example 2-32. PP3E\Preview\webserver.py

###################################################################### # implement HTTP web server in Python that knows how to run server# side CGI scripts; serves files/scripts from current working dir; # Python scripts must be stored in webdir\cgi-bin or webdir\htbin; ###################################################################### webdir = '.' port = 80

# where your html files and cgi-bin script directory live # default http://localhost/, else use http://localhost:xxxx/

import os, sys from BaseHTTPServer import HTTPServer from CGIHTTPServer import CGIHTTPRequestHandler # hack for Windows: os.environ not propagated # to subprocess by os.popen2, force in-process if sys.platform[:3] == 'win': CGIHTTPRequestHandler.have_popen2 = False CGIHTTPRequestHandler.have_popen3 = False os.chdir(webdir) # run in HTML root dir srvraddr = ("", port) # my hostname, portnumber srvrobj = HTTPServer(srvraddr, CGIHTTPRequestHandler) srvrobj.serve_forever( ) # run as perpetual demon

The classes this script uses assume that the HTML files to be served up reside in the current working directory and that the CGI scripts to be run live in a /cgi-bin or /htbin subdirectory there. We're using a /cgi-bin subdirectory for scripts, as suggested by the filename of Example 2-31. Some web servers look at filename extensions to detect CGI scripts; our script uses this subdirectory-based scheme instead. To launch the server, simply run this script (in a console window, by an icon click, or otherwise); it runs perpetually, waiting for requests to be submitted from browsers and other clients. The server listens for requests on the machine on which it runs and on the standard HTTP port number 80. To use this script to serve up other web sites, either launch it from the directory that contains your HTML files and a cgi-bin subdirectory that contains your CGI scripts, or change its webdir variable to reflect the site's root directory (it will automatically change to that directory and serve files located there). But where in cyberspace do you actually run the server script? If you look closely enough, you'll notice that the server name in the addresses of the prior section's examples (near the top right of the browser after the "http://") is always localhost. To keep this simple, I am running the web server on the same machine as the web browser; that's what the server name "localhost" (and the equivalent IP address "127.0.0.1") means. That is, the client and server machines are the same: the client (web browser) and server (web server) are just different processes running at the same time on the same computer. This turns out to be a great way to test CGI scriptsyou can develop them on the same machine without having to transfer code back to a remote server machine after each change. Simply run this script from the directory that contains both your HTML files and a cgi-bin subdirectory for scripts and then use "http://localhost/..." in your browser to access your HTML and script files. Here is the trace output the web server script produces in a Windows console window that is running on the same machine as the web browser and launched from the directory where the HTML files reside:

...\PP3E\Preview> python webserver.py localhost - - [17/Jan/2005 14:30:44] "GET /cgi101.html HTTP/1.1" 200 localhost - - [17/Jan/2005 14:30:45] code 404, message File not found localhost - - [17/Jan/2005 14:30:45] "GET /favicon.ico HTTP/1.1" 404 localhost - - [17/Jan/2005 14:31:30] "POST /cgi-bin/cgi101.py HTTP/1.1" 200 localhost - - [17/Jan/2005 14:31:30] CGI script exited OK localhost - - [17/Jan/2005 14:31:31] code 404, message File not found localhost - - [17/Jan/2005 14:31:31] "GET /favicon.ico HTTP/1.1" 404 localhost - - [17/Jan/2005 14:32:31] "GET /cgi-bin/cgi101.py?name=Sue+Smith HTTP /1.1" 200 localhost - - [17/Jan/2005 14:32:31] CGI script exited OK

To run this server on a different port, change the port number in the script and name it explicitly in the URL (e.g., "http://localhost:8888/"). To run this server on a remote computer, upload the HTML files and CGI scripts' subdirectory to the remote computer, launch the server script on that machine, and replace "localhost" in the URLs with the domain name or IP address of your server machine (e.g., "http://www.myserver.com/"). When running the server remotely, all the interaction will be as shown here, but inputs and replies will be automatically shipped across network connections, not routed between programs running in the same computer. On systems that don't require custom code like the Windows workaround in our code, you can also start a CGI-capable web server by simply running the file CGIHTTPServer.py in the Python standard library (this script is located in the C:\Python24\Lib directory on Windows, for instance, under Python 2.4). This file's test code is similar to our script, but it defaults to port number 8000 unless a port number is given on the command line as an argument. In Chapter 16, we'll expand Example 2-32 to allow the directory name and port numbers to be passed in on the command line, and we'll augment the module search path for platforms where the server runs the script in-process.[*] [*]

Technically speaking, the Windows workaround in Example 2-31 was related to a bug in the os.environ.update call, which was used by the server classes; it did not correctly update on Windows XP, but it may by the time you read this sentence. At the time of this writing, of the environment changes made by os.environ.update({'X': 'spam'}) and os.environ['Y'] = 'ni', only the second was propagated to the subprocess after a (i, o) = os.popen2('sub.py') call. This may seem obscure, but it underscores one of the nice things about having access to the source code of an open source system such as Python: I was not at the mercy of a software vendor to uncover this and provide me with a workaround.

2.8.3. Using Query Strings and urllib In the basic CGI example shown earlier, we ran the Python script by filling out and submitting a form that contained the name of the script. Really, CGI scripts can be invoked in a variety of wayseither by submitting an input form as shown so far, or by sending the server an explicit URL (Internet address) string that contains inputs at the end. Such an explicit URL can be sent to a server either in or outside of a browser; in a sense, it bypasses the traditional input form page. For instance, Figure 2-12 shows the reply generated by the server after typing a URL of the following form in the address field at the top of the web browser (+ means a space here):

http://localhost/cgi-bin/cgi101.py?user=Sue+Smith

Figure 2-12. cgi101.py reply to GET-style query parameters

The inputs here, known as query parameters, show up at the end of the URL after the ?; they are not entered into a form's input fields. Adding inputs to URLs is sometimes called a GET request. Our original input form uses the POST method, which instead ships inputs in a separate step. Luckily, Python CGI scripts don't have to distinguish between the two; the cgi module's input parser handles any data submission method differences for us. It's even possible, and often useful, to submit URLs with inputs appended as query parameters completely outside any web browser. The Python urllib module, for instance, allows us to read the reply generated by a server for any valid URL. In effect, it allows us to visit a web page or invoke a CGI script from within another script; your Python code acts as the web client. Here is this module in action, run from the interactive command line:

>>> from urllib import urlopen >>> conn = urlopen('http://localhost/cgi-bin/cgi101.py?user=Sue+Smith') >>> reply = conn.read( ) >>> reply 'Reply Page\nHello Sue Smith!\n' >>> urlopen('http://localhost/cgi-bin/cgi101.py').read( ) 'Reply Page\nWho are you?\n' >>> urlopen('http://localhost/cgi-bin/cgi101.py?user=Bob').read( ) 'Reply Page\nHello Bob!\n'

The urllib module gives us a file-like interface to the server's reply for a URL. Notice that the output we read from the server is raw HTML code (normally rendered by a browser). We can process this text with any of Python's text-processing tools, including string methods to search and split, the re regular expression pattern-matching module, or the full-blown HTML parsing support in the standard library. When combined with such tools, the urllib module is a natural for interactive testing and custom client-side GUIs, as well as implementing automated tools such as regression testing systems for remote server-side CGI scripts.

2.8.4. Formatting Reply Text One last fine point: because CGI scripts use text to communicate with clients, they need to format their replies according to a set of rules. For instance, notice how Example 2-31 adds a blank line between the reply's header and its HTML by printing an explicit newline (\n) in addition to the one print adds automatically; this is a required separator. Also note how the text inserted into the HTML reply is run through the cgi.escape call, just in case the input includes a character that is special in HTML. For example, Figure 2-13 shows the reply we receive on another machine for form input Bob Smiththe in the middle becomes in the reply, and so doesn't interfere with real HTML code (if not escaped, the rest of the name would not be italicized).

Figure 2-13. Escaping HTML characters

Escaping text like this isn't always required, but it is a good rule of thumb when its content isn't known; scripts that generate HTML have to respect its rules. As we'll see later in this book, a related call, urllib.quote, applies URL escaping rules to text. As we'll also see, larger frameworks such as Zope often handle text formatting tasks for us.

2.8.5. A Web-Based Shelve Interface Now, to use the CGI techniques of the prior sections for our database application, we basically just need a bigger input and reply form. Figure 2-14 shows the form we'll implement for accessing our database in a web browser.

Figure 2-14. peoplecgi.html input page

2.8.5.1. Coding the web site To implement the interaction, we'll code an initial HTML input form, as well as a Python CGI script for displaying fetch results and processing update requests. Example 2-33 shows the input form's HTML code that builds the page in Figure 2-14.

Example 2-33. PP3E\Preview\peoplecgi.html

People Input Form

Key
Name
Age
Job
Pay

To handle form (and other) requests, Example 2-34 implements a Python CGI script that fetches and updates our shelve's records. It echoes back a page similar to that produced by Example 2-33, but with the form fields filled in from the attributes of actual class objects in the shelve database.

As in the GUI, the same web page is used for both displaying results and inputting updates. Unlike the GUI, this script is run anew for each step of user interaction, and it reopens the database each time (the reply page's action field is a link back to the script). The basic CGI model provides no automatic memory from page to page.

Example 2-34. PP3E\Preview\cgi-bin\peoplecgi.py

########################################################################## # implement a web-based interface for viewing/updating class instances # stored in a shelve; shelve lives on server (same machine if localhost) ########################################################################## import cgi, shelve form = cgi.FieldStorage( ) print "Content-type: text/html" shelvename = 'class-shelve' fieldnames = ('name', 'age', 'job', 'pay')

# cgi.test( ) dumps inputs # parse form data # hdr, blank line in string

# main html template replyhtml = """ People Input Form

key $ROWS$

""" # insert html for data rows at $ROWS$ rowhtml = '%s\n' rowshtml = '' for fieldname in fieldnames: rowshtml += (rowhtml % ((fieldname,) * 3)) replyhtml = replyhtml.replace('$ROWS$', rowshtml) def htmlize(adict): new = adict.copy( ) for field in fieldnames: value = new[field] new[field] = cgi.escape(repr(value)) return new def fetchRecord(db, form): try: key = form['key'].value record = db[key]

# values may have &, >, etc. # display as code: quoted # html-escape special chars

fields = record._ _dict_ _ fields['key'] = key except: fields = dict.fromkeys(fieldnames, '?') fields['key'] = 'Missing or invalid key!' return fields

# use attribute dict # to fill reply string

def updateRecord(db, form): if not form.has_key('key'): fields = dict.fromkeys(fieldnames, '?') fields['key'] = 'Missing key input!' else: key = form['key'].value if key in db.keys( ): record = db[key] # update existing record else: from person import Person # make/store new one for key record = Person(name='?', age='?') # eval: strings must be quoted for field in fieldnames: setattr(record, field, eval(form[field].value)) db[key] = record fields = record._ _dict_ _ fields['key'] = key return fields db = shelve.open(shelvename) action = form.has_key('action') and form['action'].value if action == 'Fetch': fields = fetchRecord(db, form) elif action == 'Update': fields = updateRecord(db, form) else: fields = dict.fromkeys(fieldnames, '?') # bad submit button value fields['key'] = 'Missing or invalid action!' db.close( ) print replyhtml % htmlize(fields) # fill reply from dict

This is a fairly large script, because it has to handle user inputs, interface with the database, and generate HTML for the reply page. Its behavior is fairly straightforward, though, and similar to the GUI of the prior section. The only feat of semimagic it relies on is using a record's attribute dictionary (_ _dict_ _) as the source of values when applying string formatting to the HTML reply template string in the last line of the script. Recall that a %(key)code replacement target fetches a value by key from a dictionary:

>>> D = {'say': 5, 'get': 'shrubbery'} >>> D['say'] 5 >>> S = '%(say)s => %(get)s' % D >>> S '5 => shrubbery'

By using an object's attribute dictionary, we can refer to attributes by name in the format string. In fact, part of the reply template is generated by code. If its structure is confusing, simply insert statements to print replyhtml and to call sys.exit , and run from a simple command line. This is how the table's HTML in the middle of the reply is generated (slightly formatted here for readability):

key
name
age
job
pay

This text is then filled in with key values from the record's attribute dictionary by string formatting at the end of the script. This is done after running the dictionary through a utility to convert its values to code text with repr and escape that text per HTML conventions with cgi.escape (again, the last step isn't always required, but it's generally a good practice). These HTML reply lines could have been hardcoded in the script, but generating them from a tuple of field names is a more general approachwe can add new fields in the future without having to update the HTML template each time. Python's string processing tools make this a snap.

2.8.5.2. Using the web site Using the web interface is as simple as using the GUI. To fetch a record, fill in the Key field and click Fetch; the script populates the page with field data grabbed from the corresponding class instance in the shelve, as illustrated in Figure 2-15 for the key bob .

Figure 2-15. peoplecgi.py reply page

Figure 2-15 shows what happens when the key comes from the posted form. As usual, you can also invoke the CGI script by instead passing inputs on a query string at the end of the URL; Figure 2-16 shows the reply we get when accessing a URL of the following form:

http://localhost/cgi-bin/peoplecgi.py?action=Fetch&key=sue

Figure 2-16. peoplecgi.py reply for query parameters

As we've seen, such a URL can be submitted either within your browser, or by scripts that use tools such as the urllib module. Again, replace "localhost" with your server's domain name if you are running the script on a remote machine. To update a record, fetch it by key, enter new values in the field inputs, and click Update; the script will take the input fields and store them in the attributes of the class instance in the shelve. Figure 217 shows the reply we get after updating sue .

Figure 2-17. peoplecgi.py update reply

Finally, adding a record works the same as in the GUI: fill in a new key and field values and click Update; the CGI script creates a new class instance, fills out its attributes, and stores it in the shelve under the new key. There really is a class object behind the web page here, but we don't have to deal with the logic used to generate it. Figure 2-18 shows a record added to the database in this way.

Figure 2-18. peoplecgi.py after adding a new record

In principle, we could also update and add records by submitting a URLeither from a browser or from a scriptsuch as:

http://localhost/cgi-bin/

peoplecgi.py?action=Update&key=sue&pay=50000&name=Sue+Smith& ...more...

Except for automated tools, though, typing such a long URL will be noticeably more difficult than filling out the input page. Here is part of the reply page generated for the "guido" record's display of Figure 2-18 (use your browser's "view page source" option to see this for yourself). Note how the < and > characters are translated to HTML escapes with cgi.escape before being inserted into the reply:

key name age job paypython tutor0.py Content-type: text/html <TITLE>CGI 101 A First CGI script
Hello, CGI World!

When run by the HTTP server program on a web server machine, however, the standard output stream is tied to a socket read by the browser on the client machine. In this context, all the output is sent across the Internet to your browser. As such, it must be formatted per the browser's expectations. In particular, when the script's output reaches your browser, the first printed line is interpreted as a header, describing the text that follows. There can be more than one header line in the printed response, but there must always be a blank line between the headers and the start of the HTML code (or other data). In this script, the first header line tells the browser that the rest of the transmission is HTML text (text/html), and the newline character (\n) at the end of the first print statement generates an extra line feed in addition to the one that the print statement generates itself. The net effect is to insert a blank line after the header line. The rest of this program's output is standard HTML and is used by the browser to generate a web page on a client, exactly as if the HTML lived in a static HTML file on the server.[*] [*]

Notice that the script does not generate the enclosing and tags included in the static HTML file of the prior section. As mentioned in Chapter 2, strictly speaking, it shouldHTML without such tags is technically invalid. But because all commonly used browsers simply ignore the omission, we'll take some liberties with HTML syntax in this book. If you need to care about such things, consult HTML references for more formal details.

CGI scripts are accessed just like HTML files: you either type the full URL of this script into your browser's address field, or click on the tutor0.py link line in the examples root page of Figure 16-1 (which follows a minimal hyperlink that resolves to the script's full URL). Figure 16-3 shows the result page generated if you point your browser at this script.

Figure 16-3. A simple web page from a CGI script

16.4.2.1. Installing CGI scripts If you are running the local web server described at the start of this chapter, no extra installation steps are required to make this example work, and you can safely skip most of this section. If you want to put CGI scripts on another server, though, there are a few pragmatic details you may need to know about. This section provides a brief overview of common CGI configuration details for reference. Like HTML files, CGI scripts are simple text files that you can either create on your local machine and upload to the server by FTP, or write with a text editor running directly on the server machine (perhaps using a Telnet client). However, because CGI scripts are run as programs, they have some unique installation requirements that differ from simple HTML files. In particular, they usually must be stored and named specially, and they must be configured as programs that are executable by arbitrary users. Depending on your needs, CGI scripts also may require help finding imported modules and may need to be converted to the server platform's text file format after being uploaded. Let's look at each install constraint in more depth:

Directory and filename conventions First, CGI scripts need to be placed in a directory that your web server recognizes as a program directory, and they need to be given a name that your server recognizes as a CGI script. In the local web server we're using in this chapter, scripts need to be placed in a special cgi-bin subdirectory and be named with a .py extension. On the server used for this book's second edition, CGI scripts instead were stored in the user's public_html directory just like HTML files, but they required a filename ending in a .cgi, not a .py. Some servers may allow other suffixes and program directories; this varies widely and can sometimes be configured per server or per user.

Execution conventions Because they must be executed by the web server on behalf of arbitrary users on the Web, CGI script files may also need to be given executable file permissions to mark them as programs, and be made executable by others. Again, a shell command chmod 0755 filename does the trick on most servers. Under some servers, CGI scripts also need the special #! line at the top, to identify the Python interpreter that runs the file's code. The text after the #! in the first line simply gives the directory path to the Python executable on your server machine. See Chapter 3 for more details on this special first line, and be sure to check your server's conventions for more details on non-Unix platforms. Some servers may expect this line, even outside Unix. Most of the CGI scripts in this book include the #! line just in case they will ever be run on Unix-like platforms; under our locally running web server on Windows, this first line is simply ignored as a Python comment. One subtlety worth noting: as we saw earlier in the book, the special first line in executable text files can normally contain either a hardcoded path to the Python interpreter (e.g., #!/usr/bin/python) or an invocation of the env program (e.g., #!/usr/bin/env python), which deduces where Python lives from environment variable settings (i.e., your $PATH). The env TRick is less useful in CGI scripts, though, because their environment settings are those of the user "nobody" (not your own), as explained in the next paragraph.

Module search path configuration (optional) Some HTTP servers may run CGI scripts with the username "nobody" for security reasons (this limits the user's access to the server machine). That's why files you publish on the Web must have special permission settings that make them accessible to other users. It also means that some CGI scripts can't rely on the Python module search path to be configured in any particular way. As we've seen, the module path is normally initialized from the user's PYTHONPATH setting and .pth files, plus defaults. But because CGI scripts are run by the user "nobody," PYTHONPATH may be arbitrary when a CGI script runs. Before you puzzle over this too hard, you should know that this is often not a concern in practice. Because Python usually searches the current directory for imported modules by default, this is not an issue if all of your scripts and any modules and packages they use are stored in your web directory. But if the module lives elsewhere, you may need to modify the sys.path list in your scripts to adjust the search path manually before importsfor instance, with sys.path.append(dirname) calls, index assignments, and so on.

End-of-line conventions (optional) On some Unix (and Linux) servers, you might also have to make sure that your script text files follow the Unix end-of-line convention (\n), not DOS (\r\n). This isn't an issue if you edit and debug right on the server (or on another Unix machine) or FTP files one by one in text mode. But if you edit and upload your scripts from a PC to a Unix server in a tar file (or in FTP binary mode), you may need to convert end-of-lines after the upload. For instance, the server that was used for the second edition of this text returns a default error page for scripts whose endof-lines are in DOS format. See Chapter 7 for automated end-of-line converter scripts.

Unbuffered output streams (optional) Under some servers, the print statement may buffer its output. If you have a long-running

CGI script, to avoid making the user wait to see results, you may wish to manually flush your printed text (call sys.stdout.flush( )) or run your Python scripts in unbuffered mode. Recall from Chapter 5 that you can make streams unbuffered by running with the -u command-line flag or by setting your PYTHONUNBUFFERED environment variable to a nonempty value. To use -u in the CGI world, try using a first line like #!/usr/bin/python -u. In typical usage, output buffering is not usually a factor. On some servers and clients, this may be a resolution for empty reply pages, or premature end-of-script header errorsthe client may time out before the buffered output stream is sent (though more commonly, these cases reflect genuine program errors in your script). This installation process may sound a bit complex at first glance, but much of it is server-dependent, and it's not bad once you've worked through it on your own. It's only a concern at install time and can usually be automated to some extent with Python scripts run on the server. To summarize, most Python CGI scripts are text files of Python code, which: Are named according to your web server's conventions (e.g., file.py) Are stored in a directory recognized by your web server (e.g., cgi-bin/) Are given executable file permissions if required (e.g., chmod 755 file.py) May require the special #!pythonpath line at the top for some servers Configure sys.path only if needed to see modules in other directories Use Unix end-of-line conventions, if your server rejects DOS format Flush output buffers if required, or to send portions of the reply periodically Even if you must use a server machine configured by someone else, most of the machine's conventions should be easy to root out during a normal debugging cycle. As usual, you should consult the conventions for any machine to which you plan to copy these example files.

16.4.2.2. Finding Python on remote servers One last install pointer: even though Python doesn't have to be installed on any clients in the context of a server-side web application, it does have to exist on the server machine where your CGI scripts are expected to run. If you're running your own server with either the webserver.py script we met earlier, or an open source server such as Apache, this is a nonissue. But if you are using a web server that you did not configure yourself, you must be sure that Python lives on that machine. Moreover, you need to find where it is on that machine so that you can specify its path in the #! line at the top of your script. If you are not sure if or where Python lives on your server machine, here are some tips: Especially on Unix systems, you should first assume that Python lives in a standard place (e.g., /usr/local/bin/python): type python in a shell window and see if it works. Chances are that Python already lives on such machines. If you have Telnet access on your server, a Unix find command starting at /usr may help. If your server runs Linux, you're probably set to go. Python ships as a standard part of Linux distributions these days, and many web sites and Internet Service Providers (ISPs) run the

Linux operating system; at such sites, Python probably already lives at /usr/bin/python. In other environments where you cannot control the server machine yourself, it may be harder to obtain access to an already installed Python. If so, you can relocate your site to a server that does have Python installed, talk your ISP into installing Python on the machine you're trying to use, or install Python on the server machine yourself. If your ISP is unsympathetic to your need for Python and you are willing to relocate your site to one that is, you can find lists of Python-friendly ISPs by searching http://www.python.org. And if you choose to install Python on your server machine yourself, be sure to check out the freeze tool shipped with the Python source distribution (in the Tools directory). With freeze, you can create a single executable program file that contains the entire Python interpreter, as well as all the standard library modules. Such a frozen interpreter can be uploaded to your web account by FTP in a single step, and it won't require a full-blown Python installation on the server. Also see the public domain Installer and Py2Exe systems, which can similarly produce a frozen Python binary.

16.4.3. Adding Pictures and Generating Tables Let's get back to writing server-side code. As anyone who's ever surfed the Web knows, web pages usually consist of more than simple text. Example 16-4 is a Python CGI script that prints an HTML tag in its output to produce a graphic image in the client browser. This example isn't very Python-specific, but note that just as for simple HTML files, the image file (ppsmall.gif, one level up from the script file) lives on and is downloaded from the server machine when the browser interprets the output of this script to render the reply page.

Example 16-4. PP3E\Internet\Web\cgi-bin\tutor1.py

#!/usr/bin/python text = """Content-type: text/html <TITLE>CGI 101 A Second CGI script
Hello, CGI World!
""" print text

Notice the use of the triple-quoted string block here; the entire HTML string is sent to the browser in one fell swoop, with the print statement at the end. Be sure that the blank line between the Contenttype header and the first HTML is truly blank in the string (it may fail in some browsers if you have any spaces or tabs on that line). If both client and server are functional, a page that looks like Figure 16-4 will be generated when this script is referenced and run.

Figure 16-4. A page with an image generated by tutor1.py

So far, our CGI scripts have been putting out canned HTML that could have just as easily been stored in an HTML file. But because CGI scripts are executable programs, they can also be used to generate HTML on the fly, dynamicallyeven, possibly, in response to a particular set of user inputs sent to the script. That's the whole purpose of CGI scripts, after all. Let's start using this to better advantage now, and write a Python script that builds up response HTML programmatically, listed in Example 165.


#!/usr/bin/python print """Content-type: text/html <TITLE>CGI 101 A Third CGI script
Hello, CGI World!
""" for i in range(5): print "" for j in range(4): print "" % (i, j) print ""

print """
%d.%d

"""

Despite all the tags, this really is Python codethe tutor2.py script uses triple-quoted strings to embed blocks of HTML again. But this time, the script also uses nested Python for loops to dynamically generate part of the HTML that is sent to the browser. Specifically, it emits HTML to lay out a twodimensional table in the middle of a page, as shown in Figure 16-5.

Figure 16-5. A page with a table generated by tutor2.py

Each row in the table displays a "row.column" pair, as generated by the executing Python script. If you're curious how the generated HTML looks, select your browser's View Source option after you've accessed this page. It's a single HTML page composed of the HTML generated by the first print in the script, then the for loops, and finally the last print. In other words, the concatenation of this script's output is an HTML document with headers.

16.4.3.1. Table tags The script in Example 16-5 generates HTML table tags. Again, we're not out to learn HTML here, but we'll take a quick look just so that you can make sense of this book's examples. Tables are declared by the text between and
tags in HTML. Typically, a table's text in turn declares the contents of each table row between and tags and each column within a row between and tags. The loops in our script build up HTML to declare five rows of four columns each by

printing the appropriate tags, with the current row and column number as column values. For instance, here is part of the script's output, defining the first two rows (to see the full output, run the script standalone from a system command line, or select your browser's View Source option):

. . .

0.0 0.1 0.2 0.3

1.0 1.1 1.2 1.3

Other table tags and options let us specify a row title (), layout borders, and so on. We'll use more table syntax to lay out forms in a uniform fashion later in this tutorial.

16.4.4. Adding User Interaction CGI scripts are great at generating HTML on the fly like this, but they are also commonly used to implement interaction with a user typing at a web browser. As described earlier in this chapter, web interactions usually involve a two-step process and two distinct web pages: you fill out an input form page and press Submit, and a reply page eventually comes back. In between, a CGI script processes the form input.

16.4.4.1. Submission page That description sounds simple enough, but the process of collecting user inputs requires an understanding of a special HTML tag, . Let's look at the implementation of a simple web interaction to see forms at work. First, we need to define a form page for the user to fill out, as shown in Example 16-6.

Example 16-6. PP3E\Internet\Web\tutor3.html

CGI 101 A first user interaction: forms
Enter your name:

tutor3.html is a simple HTML file, not a CGI script (though its contents could be printed from a script as well). When this file is accessed, all the text between its and tags generates the input fields and Submit button shown in Figure 16-6.

Figure 16-6. A simple form page generated by tutor3.html

16.4.4.2. More on form tags We won't go into all the details behind coding HTML forms, but a few highlights are worth underscoring. The following occurs within a form's HTML code:

Form handler action The form's action option gives the URL of a CGI script that will be invoked to process

submitted form data. This is the link from a form to its handler programin this case, a program called tutor3.py in the cgi-bin subdirectory of the locally running server's working directory. The action option is the equivalent of command options in Tkinter buttonsit's where a callback handler (here, a remote handler script) is registered to the browser and server.

Input fields Input controls are specified with nested tags. In this example, input tags have two key options. The type option accepts values such as text for text fields and submit for a Submit button (which sends data to the server and is labeled "Submit Query" by default). The name option is the hook used to identify the entered value by key, once all the form data reaches the server. For instance, the server-side CGI script we'll see in a moment uses the string user as a key to get the data typed into this form's text field. As we'll see in later examples, other input tag options can specify initial values (value=X), display-only mode (readonly ), and so on. As we'll also see later, other input type option values may transmit hidden data that embeds state information in pages (type=hidden), reinitializes fields (type=reset), or makes multiple-choice buttons (type=checkbox ). Submission method: get and post Forms also include a method option to specify the encoding style to be used to send data over a socket to the target server machine. Here, we use the post style, which contacts the server and then ships it a stream of user input data in a separate transmission over the socket. An alternative get style ships input information to the server in a single transmission step by appending user inputs to the query string at the end of the URL used to invoke the script, usually after a ? character. Query parameters were introduced earlier when we met URLs; we will put them to use later in this section. With get , inputs typically show up on the server in environment variables or as arguments in the command line used to start the script. With post, they must be read from standard input and decoded. Because the get method appends inputs to URLs, it allows users to bookmark actions with parameters for later submission (e.g., a link to a retail site, together with the name of a particular item); post is very generally meant for sending data that is to be submitted once (e.g., comment text). The get method is usually considered more efficient, but it may be subject to length limits in the operating system and is less secure (parameters may be recorded in server logs, for instance). post can handle larger inputs and may be more secure in some scenarios, but it requires an extra transmission. Luckily, Python's cgi module transparently handles either encoding style, so our CGI scripts don't need to know or care which is used. Notice that the action URL in this example's form spells out the full address for illustration. Because the browser remembers where the enclosing HTML page came from, it works the same with just the script's filename, as shown in Example 16-7.

Example 16-7. PP3E\Internet\Web\tutor3-minimal.html

CGI 101 A first user interaction: forms
Enter your name:

It may help to remember that URLs embedded in form action tags and hyperlinks are directions to the browser first, not to the script. The tutor3.py script itself doesn't care which URL form is used to trigger itminimal or complete. In fact, all parts of a URL through the script filename (and up to URL query parameters) are used in the conversation between browser and HTTP server, before a CGI script is ever spawned. As long as the browser knows which server to contact, the URL will work. On the other hand, URLs submitted outside of a page (e.g., typed into a browser's address field or sent to Python's urllib module) usually must be completely specified, because there is no notion of a prior page.

16.4.4.3. Response script So far, we've created only a static page with an input field. But the Submit button on this page is loaded to work magic. When pressed, it triggers the possibly remote program whose URL is listed in the form's action option, and passes this program the input data typed by the user, according to the form's method encoding style option. On the server, a Python script is started to handle the form's input data while the user waits for a reply on the client, as shown in Example 16-8.


#!/usr/bin/python ####################################################### # runs on the server, reads form input, prints HTML; # url=http://server-name/cgi-bin/tutor3.py ####################################################### import cgi form = cgi.FieldStorage( ) print "Content-type: text/html"

# parse form data # plus blank line

html = """ <TITLE>tutor3.py Greetings
%s
""" if not form.has_key('user'): print html % "Who are you?" else: print html % ("Hello, %s." % form['user'].value)

As before, this Python CGI script prints HTML to generate a response page in the client's browser. But this script does a bit more: it also uses the standard cgi module to parse the input data entered by the user on the prior web page (see Figure 16-6). Luckily, this is automatic in Python: a call to the standard library cgi module's FieldStorage class automatically does all the work of extracting form data from the input stream and environment variables, regardless of how that data was passedin a post style stream or in get style parameters appended to the URL. Inputs sent in both styles look the same to Python scripts. Scripts should call cgi.FieldStorage only once and before accessing any field values. When it is called, we get back an object that looks like a dictionaryuser input fields from the form (or URL) show up as values of keys in this object. For example, in the script, form['user'] is an object whose value attribute is a string containing the text typed into the form's text field. If you flip back to the form page's HTML, you'll notice that the input field's name option was userthe name in the form's HTML has become a key we use to fetch the input's value from a dictionary. The object returned by FieldStorage supports other dictionary operations, toofor instance, the has_key method may be used to check whether a field is present in the input data. Before exiting, this script prints HTML to produce a result page that echoes back what the user typed into the form. Two string-formatting expressions (%) are used to insert the input text into a reply string, and the reply string into the triple-quoted HTML string block. The body of the script's output looks like this:

<TITLE>tutor3.py Greetings
Hello, King Arthur.

In a browser, the output is rendered into a page like the one in Figure 16-7.

Figure 16-7. tutor3.py result for parameters in a form

16.4.4.4. Passing parameters in URLs Notice that the URL address of the script that generated this page shows up at the top of the browser. We didn't type this URL itselfit came from the action tag of the prior page's form HTML. However, nothing is stopping us from typing the script's URL explicitly in our browser's address field to invoke the script, just as we did for our earlier CGI script and HTML file examples. But there's a catch here: where does the input field's value come from if there is no form page? That is, if we type the CGI script's URL ourselves, how does the input field get filled in? Earlier, when we talked about URL formats, I mentioned that the get encoding scheme tacks input parameters onto the end of URLs. When we type script addresses explicitly, we can also append input values on the end of URLs, where they serve the same purpose as fields in forms. Moreover, the Python cgi module makes URL and form inputs look identical to scripts. For instance, we can skip filling out the input form page completely, and directly invoke our tutor3.py script by visiting a URL of this form (type this in your browser's address field):

http://localhost/cgi-bin/tutor3.py?user=Brian

In this URL, a value for the input named user is specified explicitly, as if the user had filled out the input page. When called this way, the only constraint is that the parameter name user must match the name expected by the script (and hardcoded in the form's HTML). We use just one parameter here, but in general, URL parameters are typically introduced with a ? and are followed by one or more name=value assignments, separated by & characters if there is more than one. Figure 16-8 shows the response page we get after typing a URL with explicit inputs.

Figure 16-8. tutor3.py result for parameters in a URL

In fact, HTML forms that specify the get encoding style also cause inputs to be added to URLs this way. Try changing Example 16-6 to use method=GET, and submit the formthe name input in the form shows up as a query parameter in the reply page address field, just like the URL we manually entered in Figure 16-8. Forms can use the post or get style. Manually typed URLs with parameters use get . Generally, any CGI script can be invoked either by filling out and submitting a form page or by passing inputs at the end of a URL. Although hand-coding parameters in URLs can become difficult for scripts that expect many complex parameters, other programs can automate the construction process. When CGI scripts are invoked with explicit input parameters this way, it's not too difficult to see their similarity to functions, albeit ones that live remotely on the Net. Passing data to scripts in URLs is similar to keyword arguments in Python functions, both operationally and syntactically. In fact, in Chapter 18 we will meet a system called Zope that makes the relationship between URLs and Python function calls even more literal (URLs become more direct function calls). Incidentally, if you clear out the name input field in the form input page (i.e., make it empty) and press Submit, the user name field becomes empty. More accurately, the browser may not send this field along with the form data at all, even though it is listed in the form layout HTML. The CGI script detects such a missing field with the dictionary has_key method and produces the page captured in Figure 16-9 in response.

Figure 16-9. An empty name field producing an error page

In general, CGI scripts must check to see whether any inputs are missing, partly because they might not be typed by a user in the form, but also because there may be no form at allinput fields might not be tacked onto the end of an explicitly typed URL. For instance, if we type the script's URL without any parameters at allby omitting the text from the ? and beyond, and visiting http://localhost/cgi-bin/tutor3.py with an explicitly entered URLwe get this same error response page. Since we can invoke any CGI through a form or URL, scripts must anticipate both scenarios.

16.4.4.5. Testing outside browsers with the module urllib Once we understand how to send inputs to forms as query string parameters at the end of URLs like this, the Python urllib module we met in Chapters 2 and 14 becomes even more useful. Recall that this module allows us to fetch the reply generated for any URL address. When the URL names a simple HTML file, we simply download its contents. But when it names a CGI script, the effect is to run the remote script and fetch its output. For example, we can trigger the script in Example 16-8 directly, without either going through the tutor3.html web page or typing a URL in a browser's address field:

C:\Python24>python >>> from urllib import urlopen >>> reply = urlopen('http://localhost/cgi-bin/tutor3.py?user=Brian').read( ) >>> print reply <TITLE>tutor3.py Greetings
Hello, Brian.
>>> url = 'http://localhost/cgi-bin/tutor3.py' >>> conn = urlopen(url) >>> reply = conn.read( ) >>> print reply <TITLE>tutor3.py Greetings
Who are you?

Recall from Chapter 14 that urllib.urlopen gives us a file object connected to the generated reply stream. Reading this file's output returns the HTML that would normally be intercepted by a web browser and rendered into a reply page. When fetched directly, the HTML reply can be parsed with Python text processing tools (e.g., string methods like split and find, the re pattern-matching module, or the htmllib HTML parsing module). Extracting text from the reply this way is sometimes informally called screen scrapinga way to use web site content in other programs. Screen scraping is an alternative to more complex web services frameworks, though a brittle one: small changes in the page's format can often break scrapers that rely on it. The reply text can also be simply inspectedurllib allows us to test CGI scripts from the Python interactive prompt or other scripts, instead of a browser. More generally, this technique allows us to use a server-side script as a sort of function call. For instance, a client-side GUI can call the CGI script and parse the generated reply page. Similarly, a CGI script that updates a database may be invoked programmatically with urllib, outside the context of an input form page. This also opens the door to automated regression testing of CGI scriptswe can invoke scripts on any remote machine, and compare their reply text to the expected output.[*] We'll see urllib in action again in later examples. [*]

If your job description includes extensive testing of server-side scripts, you may also want to explore Twill, a relatively new Pythonbased system that provides a little language for scripting the client-side interface to web applications. Search the Web for details.

Before we move on, here are a few advanced urllib usage notes. First, this module also supports proxies, alternative transmission modes, and more. For instance, proxies are supported transparently with environment variables or system settings, or by passing in an extra argument. Moreover, although it normally doesn't make a difference to Python scripts, it is possible to send parameters in both the get and the put submission modes described earlier with urllib. The get mode, with parameters in the query string at the end of a URL as shown in the prior listing, is used by default. To invoke post, pass parameters in as a separate argument:

>>> from urllib import urlopen, urlencode >>> params = urlencode({'user': 'Brian'}) >>> params 'user=Brian' >>> >>> print urlopen('http://localhost/cgi-bin/tutor3.py', params).read( ) <TITLE>tutor3.py Greetings
Hello, Brian.

Finally, if your web application depends on client-side cookies (discussed later), see also the newer module, urllib2. This module provides the same file-like urlopen interface for opening and reading from a URL, but it uses the cookielib module to automatically store cookies locally, and later return them to the server. It also supports redirection, authentication, and more; both URL modules also support secure HTTP transmissions. See the Python library manual for details. We'll explore both cookies and urllib2 later in this chapter, and introduce secure HTTP in the next.

16.4.5. Using Tables to Lay Out Forms Now let's move on to something a bit more realistic. In most CGI applications, input pages are composed of multiple fields. When there is more than one, input labels and fields are typically laid out in a table, to give the form a well-structured appearance. The HTML file in Example 16-9 defines a form with two input fields.

Example 16-9. PP3E\Internet\Web\tutor4.html

CGI 101 A second user interaction: tables

Enter your name:
Enter your age:

The tag defines a column like , but also tags it as a header column, which generally means it is rendered in a bold font. By placing the input fields and labels in a table like this, we get an input page like that shown in Figure 16-10. Labels and inputs are automatically lined up vertically in columns, much as they were by the Tkinter GUI geometry managers we met earlier in this book.

Figure 16-10. A form laid out with table tags

When this form's Submit button (labeled "Send" by the page's HTML) is pressed, it causes the script in Example 16-10 to be executed on the server machine, with the inputs typed by the user.


#!/usr/bin/python ####################################################### # runs on the server, reads form input, prints HTML; # URL http://server-name/cgi-bin/tutor4.py ####################################################### import cgi, sys sys.stderr = sys.stdout form = cgi.FieldStorage( ) print "Content-type: text/html\n"

# errors to browser # parse form data # plus blank line

# class dummy: # def _ _init_ _(self, s): self.value = s # form = {'user': dummy('bob'), 'age':dummy('10')} html = """ <TITLE>tutor4.py Greetings %s %s %s """ if not form.has_key('user'): line1 = "Who are you?" else: line1 = "Hello, %s." % form['user'].value

line2 = "You're talking to a %s server." % sys.platform line3 = "" if form.has_key('age'): try: line3 = "Your age squared is %d!" % (int(form['age'].value) ** 2) except: line3 = "Sorry, I can't compute %s ** 2." % form['age'].value print html % (line1, line2, line3)

The table layout comes from the HTML file, not from this Python CGI script. In fact, this script doesn't do much newit uses string formatting to plug input values into the response page's HTML triplequoted template string as before, this time with one line per input field. When this script is run by submitting the input form page, its output produces the new reply page shown in Figure 16-11.

Figure 16-11. Reply page generated by tutor4.py

As usual, we can pass parameters to this CGI script at the end of a URL, too. Figure 16-12 shows the page we get when passing a user and age explicitly in this URL:

http://localhost/cgi-bin/tutor4.py?user=Joe+Blow&age=30

Figure 16-12. Reply page from tutor4.py for parameters in URL

Notice that we have two parameters after the ? this time; we separate them with &. Also note that we've specified a blank space in the user value with +. This is a common URL encoding convention. On the server side, the + is automatically replaced with a space again. It's also part of the standard escape rule for URL strings, which we'll revisit later. Although Example 16-10 doesn't introduce much that is new about CGI itself, it does highlight a few new coding tricks worth noting, especially regarding CGI script debugging and security. Let's take a quick look.

16.4.5.1. Converting strings in CGI scripts Just for fun, the script echoes back the name of the server platform by fetching sys.platform along with the square of the age input field. Notice that the age input's value must be converted to an integer with the built-in int function; in the CGI world, all inputs arrive as strings. We could also convert to an integer with the built-in eval function. Conversion (and other) errors are trapped gracefully in a try statement to yield an error line, instead of letting our script die. You should never use eval to convert strings that were sent over the Internet, like the age field in this example, unless you can be absolutely sure that the string does not contain even potentially malicious code. For instance, if this example were available on the general Internet, it's not impossible that someone could type a value into the age field (or append an age parameter to the URL) with a value like os.system('rm *'). Given the appropriate context and process permissions, when passed to eval, such a string might delete all the files in your server script directory! Unless you run CGI scripts in processes with limited permissions and machine access, strings read off the Web can be dangerous to run as code in CGI scripting. You should never pass them to dynamic coding tools like eval and exec, or to tools that run arbitrary shell commands such as os.popen and os.system, unless you can be sure that they are safe. Always use simpler tools for numeric conversion like int and float, which recognize only numbers.

16.4.5.2. Debugging CGI scripts Errors happen, even in the brave new world of the Internet. Generally speaking, debugging CGI scripts can be much more difficult than debugging programs that run on your local machine. Not only do errors occur on a remote machine, but scripts generally won't run without the context implied by the CGI model. The script in Example 16-10 demonstrates the following two common debugging tricks:

Error message trapping This script assigns sys.stderr to sys.stdout so that Python error messages wind up being displayed in the response page in the browser. Normally, Python error messages are written to stderr, which generally causes them to show up in the web server's console window or logfile. To route them to the browser, we must make stderr reference the same file object as stdout (which is connected to the browser in CGI scripts). If we don't do this assignment, Python errors, including program errors in our script, never show up in the browser.

Test case mock-up The dummy class definition, commented out in this final version, was used to debug the script before it was installed on the Net. Besides not seeing stderr messages by default, CGI scripts also assume an enclosing context that does not exist if they are tested outside the CGI environment. For instance, if run from the system command line, this script has no form input data. Uncomment this code to test from the system command line. The dummy class masquerades as a parsed form field object, and form is assigned a dictionary containing two form field objects. The net effect is that form will be plug-and-play compatible with the result of a cgi.FieldStorage call. As usual in Python, object interfaces, not datatypes, are all we must adhere to. Here are a few general tips for debugging your server-side CGI scripts:

Run the script from the command line It probably won't generate HTML as is, but running it standalone will detect any syntax errors in your code. Recall that a Python command line can run source code files regardless of their extension: for example, python somescript.cgi works fine. Assign sys.stderr to sys.stdout as early as possible in your script This will generally make the text of Python error messages and stack dumps appear in your client browser when accessing the script, instead of the web server's console window or logs. Short of wading through server logs, or manual exception handling, this may be the only way to see the text of error messages after your script aborts.

Mock up inputs to simulate the enclosing CGI context For instance, define classes that mimic the CGI inputs interface (as done with the dummy class in this script) so that you can view the script's output for various test cases by running it from the system command line.[*] Setting environment variables to mimic form or URL inputs

sometimes helps too (we'll see how later in this chapter). [*] This

technique isn't unique to CGI scripts, by the way. In Chapter 15, we'll meet systems that embed Python code inside HTML. There is no good way to test such code outside the context of the enclosing system without extracting the embedded Python code (perhaps by using the htmllib HTML parser that comes with Python) and running it with a passed-in mock-up of the API that it will eventually use.

Call utilities to display CGI context in the browser The CGI module includes utility functions that send a formatted dump of CGI environment variables and input values to the browser, to view in a reply page. For instance, cgi.print_form(form) prints all the input parameters sent from the client, and cgi.test( ) prints environment variables, the form, the directory, and more. Sometimes this is enough to resolve connection or input problems. We'll use some of these in the mailer case study in the next chapter.

Show exceptions you catch, print tracebacks If you catch an exception that Python raises, the Python error message won't be printed to stderr (that is normal behavior). In such cases, it's up to your script to display the exception's name and value in the response page; exception details are available in the built-in sys module, from sys.exc_info( ). In addition, Python's traceback module can be used to manually generate stack traces on your reply page for errors; tracebacks show source-code lines active when an exception occurred. We'll use this later in the error page in PyMailCGI (Chapter 17).

Add debugging prints You can always insert tracing print statements in your code, just as in normal Python programs. Be sure you print the content-type header line first, though, or your prints may not show up on the reply page. In the worst case, you can also generate debugging and trace messages by opening and writing to a local text file on the server; provided you access that file later, this avoids having to format the trace messages according to HTML reply stream conventions.

Run it live Of course, once your script is at least half working, your best bet is likely to start running it live on the server, with real inputs coming from a browser. Running a server locally on your machine, as we're doing in this chapter, can help by making changes go faster as you test.

16.4.6. Adding Common Input Devices So far, we've been typing inputs into text fields. HTML forms support a handful of input controls (what we'd call widgets in the traditional GUI world) for collecting user inputs. Let's look at a CGI program that shows all the common input controls at once. As usual, we define both an HTML file to lay out the form page and a Python CGI script to process its inputs and generate a response. The HTML file is presented in Example 16-11.

Example 16-11. PP3E\Internet\Web\cgi-bin\tutor5a.html

<TITLE>CGI 101 Common input devices Please complete the following form and click Send

Name:
Shoe size:

Small Medium Large

Occupation: <select name=job> Developer Manager Student Evangelist Other
Political affiliations:

Pythonista Perlmonger Tcler

Comments: Enter text here

When rendered by a browser, the page in Figure 16-13 appears.

Figure 16-13. Input form page generated by tutor5a.html

This page contains a simple text field as before, but it also has radio buttons, a pull-down selection list, a set of multiple-choice checkbuttons, and a multiple-line text input area. All have a name option in the HTML file, which identifies their selected value in the data sent from client to server. When we fill out this form and click the Send submit button, the script in Example 16-12 runs on the server to process all the input data typed or selected in the form.


#!/usr/bin/python ####################################################### # runs on the server, reads form input, prints HTML ####################################################### import cgi, sys form = cgi.FieldStorage( ) print "Content-type: text/html"

# parse form data # plus blank line

html = """ <TITLE>tutor5.py Greetings Your name is %(name)s You wear rather %(shoesize)s shoes Your current job: %(job)s You program in %(language)s You also said:
%(comment)s
""" data = {} for field in ('name', 'shoesize', 'job', 'language', 'comment'): if not form.has_key(field): data[field] = '(unknown)' else: if type(form[field]) != list: data[field] = form[field].value else: values = [x.value for x in form[field]] data[field] = ' and '.join(values) print html % data

This Python script doesn't do much; it mostly just copies form field information into a dictionary called data so that it can be easily inserted into the triple-quoted response template string. A few of its techniques merit explanation:

Field validation As usual, we need to check all expected fields to see whether they really are present in the input data, using the dictionary has_key method. Any or all of the input fields may be missing if they weren't entered on the form or appended to an explicit URL.

String formatting We're using dictionary key references in the format string this timerecall that %(name)s means pull out the value for the key name in the data dictionary and perform a to-string conversion on its value.

Multiple-choice fields We're also testing the type of all the expected fields' values to see whether they arrive as a list rather than the usual string. Values of multiple-choice input controls, like the language choice field in this input page, are returned from cgi.FieldStorage as a list of objects with value attributes, rather than a simple single object with a value. This script copies simple field values to the dictionary verbatim, but it uses a list comprehension to collect the value fields of multiple-choice selections, and the string join method to construct a single string with an and inserted between each selection value (e.g., Python and Tcl). The script's list comprehension is equivalent to the call map(lambda x: x.value, form[field]).[*] [*] Two

forward references are worth noting here. Besides simple strings and lists, later we'll see a third type of form input object, returned for fields that specify file uploads. The script in this example should really also escape the echoed text inserted into the HTML reply to be robust, lest it contain HTML operators. We will discuss escapes in detail later.

When the form page is filled out and submitted, the script creates the response shown in Figure 1614essentially just a formatted echo of what was sent.

Figure 16-14. Response page created by tutor5.py (1)

16.4.7. Changing Input Layouts

Suppose that you've written a system like that in the prior section, and your users, clients, and significant other start complaining that the input form is difficult to read. Don't worry. Because the CGI model naturally separates the user interface (the HTML input page definition) from the processing logic (the CGI script), it's completely painless to change the form's layout. Simply modify the HTML file; there's no need to change the CGI code at all. For instance, Example 16-13 contains a new definition of the input that uses tables a bit differently to provide a nicer layout with borders.

Example 16-13. PP3E\Internet\Web\tutor5b.html

<TITLE>CGI 101 Common input devices: alternative layout
Use the same tutor5.py server side script, but change the layout of the form itself. Notice the separation of user interface and processing logic here; the CGI script is independent of the HTML used to interact with the user/client.
Please complete the following form and click Submit

Name:
Shoe size: Small Medium Large
Occupation: <select name=job> Developer Manager Student Evangelist Other
Political affiliations:
Pythonista
Perlmonger
Tcler
Comments: Enter spam here

When we visit this alternative page with a browser, we get the interface shown in Figure 16-15.

Figure 16-15. Form page created by tutor5b.html

Now, before you go blind trying to detect the differences in this and the prior HTML file, I should note that the HTML differences that produce this page are much less important than the fact that the action fields in these two pages' forms reference identical URLs. Pressing this version's Submit button triggers the exact same and totally unchanged Python CGI script again, tutor5.py (Example 16-12). That is, scripts are completely independent of both the transmission mode (URL query parameters of form fields) and the layout of the user interface used to send them information. Changes in the response page require changing the script, of course, because the HTML of the reply page is still embedded in the CGI script. But we can change the input page's HTML as much as we like without

affecting the server-side Python code. Figure 16-16 shows the response page produced by the script this time around.


16.4.7.1. Keeping display and logic separate In fact, this illustrates an important point in the design of larger web sites: if we are careful to keep the HTML and script code separate, we get a useful division of display and logiceach part can be worked on independently, by people with different skill sets. Web page designers, for example, can work on the display layout, while programmers can code business logic. Although this section's example is fairly small, it already benefits from this separation for the input page. In some cases, the separation is harder to accomplish, because our example scripts embed the HTML of reply pages. With just a little more work, though, we can usually split the reply HTML off into separate files that can also be developed independently of the script's logic. The html string in tutor5.py (Example 16-12), for instance, might be stored in a text file and loaded by the script when run. In larger systems, tools such as server-side HTML templating languages help make the division of display and logic even easier to achieve. The Zope and Python Server Pages examples we'll meet in Chapter 18, for instance, promote the separation of display and logic by providing reply page description languages that are expanded to include portions generated by separate Python program logic. In a sense, server-side templating languages embed Python in HTMLthe opposite of CGI scripts that embed HTML in Pythonand may provide a cleaner division of labor, provided the Python code is

separate components. See Chapter 18 for more details. Similar techniques can be used for separation of layout and login in the GUIs we studied earlier in this book, but they also usually require larger frameworks or models to achieve.

16.4.8. Passing Parameters in Hardcoded URLs Earlier, we passed parameters to CGI scripts by listing them at the end of a URL typed into the browser's address fieldin the query string parameters part of the URL, after the ?. But there's nothing sacred about the browser's address field. In particular, nothing is stopping us from using the same URL syntax in hyperlinks that we hardcode or generate in web page definitions. For example, the web page from Example 16-14 defines three hyperlinks (the text between the and tags), which trigger our original tutor5.py script again (Example 16-12), but with three different precoded sets of parameters.

Example 16-14. PP3E\Internet\Web\tutor5c.html

<TITLE>CGI 101 Common input devices: URL parameters
This demo invokes the tutor5.py server-side script again, but hardcodes input data to the end of the script's URL, within a simple hyperlink (instead of packaging up a form's inputs). Click your browser's "show page source" button to view the links associated with each list item below.
This is really more about CGI than Python, but notice that Python's cgi module handles both this form of input (which is also produced by GET form actions), as well as POST-ed forms; they look the same to the Python CGI script. In other words, cgi module users are independent of the method used to submit data.
Also notice that URLs with appended input values like this can be generated as part of the page output by another CGI script, to direct a next user click to the right place and context; together with type 'hidden' input fields, they provide one way to save state between clicks.

Send Bob, small
Send Tom, Python
Send Evangelist, spam

This static HTML file defines three hyperlinksthe first two are minimal and the third is fully specified, but all work similarly (again, the target script doesn't care). When we visit this file's URL, we see the page shown in Figure 16-17. It's mostly just a page for launching canned calls to the CGI script.

Figure 16-17. Hyperlinks page created by tutor5c.html

Clicking on this page's second link creates the response page in Figure 16-18. This link invokes the CGI script, with the name parameter set to "Tom" and the language parameter set to "Python," simply because those parameters and values are hardcoded in the URL listed in the HTML for the second hyperlink. As such, hyperlinks with parameters like this are sometimes known as stateful linksthey automatically direct the next script's operation. The net effect is exactly as if we had manually typed the line shown at the top of the browser in Figure 16-18.


Notice that many fields are missing here; the tutor5.py script is smart enough to detect and handle missing fields and generate an unknown message in the reply page. It's also worth pointing out that we're reusing the Python CGI script again. The script itself is completely independent of both the user interface format of the submission page, as well as the technique used to invoke itfrom a submitted form or a hardcoded URL with query parameters. By separating such user interface details from processing logic, CGI scripts become reusable software components, at least within the context of the CGI environment. The query parameters in the URLs embedded in Example 16-14 were hardcoded in the page's HTML. But such URLs can also be generated automatically by a CGI script as part of a reply page in order to provide inputs to the script that implements a next step in user interaction. They are a simple way for web-based applications to "remember" things for the duration of a session. Hidden form fields, up next, serve some of the same purposes.

16.4.9. Passing Parameters in Hidden Form Fields Similar in spirit to the prior section, inputs for scripts can also be hardcoded in a page's HTML as hidden input fields. Such fields are not displayed in the page, but are transmitted back to the server when the form is submitted. Example 16-15, for instance, allows a job field to be entered, but fills in name and language parameters automatically as hidden input fields.

Example 16-15. PP3E\Internet\Web\tutor5d.html

<TITLE>CGI 101 Common input devices: hidden form fields
This demo invokes the tutor5.py server-side script again, but hardcodes input data in the form itself as hidden input fields, instead of as parameters at the end of URL hyperlinks. As before, the text of this form, including the hidden fields, can be generated as part of the page output by another CGI script, to pass data on to the next script on submit; hidden form fields provide another way to save state between pages.

When Example 16-15 is opened in a browser, we get the input page in Figure 16-19.

Figure 16-19. tutor5d.html input form page

When submitting, we trigger our original tutor5.py script once again (Example 16-12), but some of the inputs have been provided for us as hidden fields. The reply page is captured in Figure 16-20.


Here again, we've hardcoded and embedded the inputs in the page's HTML, but such fields can also be generated on the fly as part of the reply from a CGI script. When they are, they serve as inputs for the next page, and so are a sort of memory. To fully understand how and why this is necessary, we need to next take a short diversion into state retention alternatives.

16.5. Saving State Information in CGI Scripts One of the most unusual aspects of the basic CGI model, and one of its starkest contrasts to the GUI programming techniques we studied in the prior part of this book, is that CGI scripts are statelesseach is a standalone program, normally run autonomously, with no knowledge of any other scripts that may run before or after. There is no notion of things such as global variables or objects that outlive a single step of interaction and retain context. Each script begins from scratch, with no memory of where the prior left off. This makes web servers simple and robusta buggy CGI script won't interfere with the server process. In fact, a flaw in a CGI script generally affects only the single page it implements, not the entire webbased application. But this is a very different model from callback-handler functions in a single process GUI, and it requires extra work to remember things longer than a single script's execution. Lack of state retention hasn't mattered in our simple examples so far, but larger systems are usually composed of multiple user interaction steps and many scripts, and they need a way to keep track of information gathered along the way. As suggested in the last two sections, generating query parameters on URL links and hidden form fields in reply pages are two simple ways for a CGI script to pass data to the next script in the application. When clicked or submitted, such parameters send preprogrammed selection or session information back to another server-side handler script. In a sense, the content of the generated reply page itself becomes the memory space of the application. For example, a site that lets you read your email may present you with a list of viewable email messages, implemented in HTML as a list of hyperlinks generated by another script. Each hyperlink might include the name of the message viewer script, along with parameters identifying the selected message number, email server name, and so onas much data as is needed to fetch the message associated with a particular link. A retail site may instead serve up a generated list of product links, each of which triggers a hardcoded hyperlink containing the product number, its price, and so on. Alternatively, the purchase page at a retail site may embed the product selected in a prior page as hidden form fields. In fact, one of the main reasons for showing the techniques in the last two sections is that we're going to use them extensively in the larger case study in the next chapter. For example, we'll use generated stateful URLs with query parameters to implement lists of dynamically generated selections that "know" what to do when clicked. Hidden form fields will also be deployed to pass user login data to the next page's script. From a more general perspective, both techniques are ways to retain state information between pagesthey can be used to direct the action of the next script to be run. Generating URL parameters and hidden form fields works well for retaining state information across pages during a single session of interaction. Some scenarios require more, though. For instance, what if we want to remember a user's login name from session to session? Or what if we need to keep track of pages at our site visited by a user in the past? Because such information must be longer lived than the pages of a single session of interaction, query parameters and hidden form fields won't suffice. In general, there are a variety of ways to pass or retain state information between CGI script executions and across sessions of interaction:

URL query parameters Session state embedded in pages

Hidden form fields Session state embedded in pages

Cookies Smaller information stored on the client that may span sessions

Server-side databases Larger information that might span sessions

CGI model extensions Persistent processes, session management, and so on We'll explore most of these in later examples, but since this is a core idea in server-side scripting, let's take a brief look at each of these in turn.

16.5.1. URL Query Parameters We met these earlier in this chapter: hardcoded URL parameters in dynamically generated hyperlinks embedded in reply web pages. By including both a processing script name and input to it, such links direct the operation of the next page when selected. The parameters are transmitted from client to server automatically, as part of a GET-style request. Coding query parameters is straightforwardprint the correctly formatted URL to standard output from your CGI script as part of the reply page (albeit following some escaping conventions we'll meet later in this chapter):

script = "onViewListLink.py" user = 'bob' mnum = 66 pswd = 'xxx' site = 'pop.rmi.net' print ('View %s' % (script, user, pswd, mnum, site, mnum))

The resulting URL will have enough information to direct the next script when clicked:

View 66

Query parameters serve as memory, and they pass information between pages. As such, they are useful for retaining state across the pages of a single session of interaction. Since each generated URL may have different attached parameters, this scheme can provide context per user-selectable action. Each link in a list of selectable alternatives, for example, may have a different implied action coded as a different parameter value. Moreover, users can bookmark a link with parameters, in order to return to a specific state in an interaction. Because their state retention is lost when the page is abandoned, though, they are not useful for remembering state from session to session. Moreover, the data appended as URL query parameters is generally visible to users and may appear in server logfiles; in some applications, it may have to be manually encrypted to avoid display or forgery.

16.5.2. Hidden Form Input Fields We met these in the prior section as well: hidden form input fields that are attached to form data and are embedded in reply web pages, but are not displayed on web pages. When the form is submitted, all the hidden fields are transmitted to the next script along with any real inputs, to serve as context. The net effect provides context for an entire input form, not a particular hyperlink. An already entered username, password, or selection, for instance, can be implied by the values of hidden fields in subsequently generated pages. In terms of code, hidden fields are generated by server-side scripts as part of the reply page's HTML, and are later returned by the client with all of the form's input data:

print print print print print

'' % urlroot '' % msgnum '' % user '' % site '' % pswd

Like query parameters, hidden form fields can also serve as a sort of memory, retaining state information from page to page. Also like query parameters, because this kind of memory is embedded in the page itself, hidden fields are useful for state retention among the pages of a single session of interaction, but not for data that spans multiple sessions. And like both query parameters and cookies (up next), hidden form fields may be visible to userstheir values are displayed if the page's source HTML code is displayed. As a result, hidden form fields are not secure; encryption of the embedded data may again be required in some contexts to avoid display on the client, or forgery in form submissions.

16.5.3. HTTP "Cookies" Cookies, an extension to the HTTP protocol underlying the web model, are a way for server-side applications to directly store information on the client computer. Because this information is not embedded in the HTML of web pages, it outlives the pages of a single session. As such, cookies are ideal for remembering things that must span sessions. Things like usernames and preferences, for example, are prime cookie candidatesthey will be available the next time the client visits our site. However, because cookies may have space

limitations, are seen by some as intrusive, and can be disabled by users on the client, they are not always well suited to general data storage needs. They are often best used for small pieces of noncritical cross-session state information. Operationally, HTTP cookies are strings of information stored on the client machine and transferred between client and server in HTTP message headers. Server-side scripts generate HTTP headers to request that a cookie be stored on the client as part of the script's reply stream. Later, the client web browser generates HTTP headers that send back all the cookies matching the server and page being contacted. In effect, cookie data is embedded in the data streams much like query parameters and form fields, but is contained in HTTP headers, not in a page's HTML. Moreover, cookie data can be stored permanently on the client, and so outlives both pages and interactive sessions. For web application developers, Python's standard library includes tools that simplify the task of sending and receiving: cookielib does cookie handling for HTTP clients that talk to web servers, and the module Cookie simplifies the task of creating and receiving cookies on the server. Moreover, the module urllib2 has support for opening URLs with automatic cookie handling.

16.5.3.1. Creating a cookie Web browsers such as Firefox and Internet Explorer generally handle the client side of this protocol, storing and sending cookie data. For the purpose of this chapter, we are mainly interested in cookie processing on the server. Cookies are created by sending special HTTP headers at the start of the reply stream:

Content-type: text/html Set-Cookie: foo=bar; ...

The full format of a cookie's header is as follows:

Set-Cookie: name=value; expires=date; path=pathname; domain=domainname; secure

The domain defaults to the hostname of the server that set the cookie, and the path defaults to the path of the document or script that set the cookiethese are later matched by the client to know when to send a cookie's value back to the server. In Python, cookie creation is simple; the following in a CGI script stores a last-visited time cookie:

import Cookie, time cook = Cookie.SimpleCookie( ) cook["visited"] = str(time.time( )) print cook.output( ) print 'Content-type: text/html\n'

# a dictionary # "Set-Cookie: visited=1137268854.98;"

The SimpleCookie call here creates a dictionary-like cookie object whose keys are strings (the names of the cookies), and whose values are "Morsel" objects (describing the cookie's value). Morsels in

turn are also dictionary-like objects with one key per cookie propertypath and domain, expires to give the cookie an expiration date (the default is the duration of the browser session), and so on. Morsels also have attributesfor instance, key and value give the name and value of the cookie, respectively. Assigning a string to a cookie key automatically creates a Morsel from the string, and the cookie object's output method returns a string suitable for use as an HTTP header (printing the object directly has the same effect, due to its _ _str_ _ operator overloading). Here is a more comprehensive example of the interface in action:

>>> import Cookie, time >>> cooks = Cookie.SimpleCookie( ) >>> cooks['visited'] = time.asctime( ) >>> cooks['username'] = 'Bob' >>> cooks['username']['path'] = '/myscript' >>> cooks['visited'].value 'Sun Jan 15 11:31:24 2006' >>> print cooks['visited'] Set-Cookie: visited="Sun Jan 15 11:31:24 2006"; >>> print cooks Set-Cookie: username=Bob; Path=/myscript; Set-Cookie: visited="Sun Jan 15 11:31:24 2006";

16.5.3.2. Receiving a cookie Now, when the client visits the page again in the future, the cookie's data is sent back from the browser to the server in HTTP headers again, in the form "Cookie: name1=value1; name2=value2 ...". For example:

Cookie: visited=1137268854.98

Roughly, the browser client returns all cookies that match the requested server's domain name and path. In the CGI script on the server, the environment variable HTTP_COOKIE contains the raw cookie data headers string uploaded from the client; it can be extracted in Python as follows:

import os, Cookie cooks = Cookie.SimpleCookie(os.environ.get("HTTP_COOKIE")) vcook = cooks.get("visited") # a Morsel dictionary if vcook != None: time = vcook.value

Here, the SimpleCookie constructor call automatically parses the passed-in cookie data string into a dictionary of Morsel objects; as usual, the dictionary get method returns a default None if a key is absent, and we use the Morsel object's value attribute to extract the cookie's value string if sent.

16.5.3.3. Using cookies in CGI scripts

To help put these pieces together, Example 16-16 lists a CGI script that stores a client-side cookie when first visited, and receives and displays it on subsequent visits.

Example 16-16. PP3E\Internet\Web\cgi-bin\cookies.py

####################################################### # create or use a client-side cookie storing username; # there is no input form data to parse in this example ####################################################### import Cookie, os cookstr = os.environ.get("HTTP_COOKIE") cookies = Cookie.SimpleCookie(cookstr) usercook = cookies.get("user") # fetch if sent if usercook == None: # create first time cookies = Cookie.SimpleCookie( ) # print Set-cookie hdr cookies['user'] = 'Brian' print cookies greeting = '
His name shall be... %s
' % cookies['user'] else: greeting = '
Welcome back, %s
' % usercook.value print "Content-type: text/html\n" print greeting

# plus blank line now

Assuming you are running this chapter's local web server from Example 16-1, you can invoke this script with a URL such as http://localhost/cgi-bin/cookies.py (type this in your browser's address field, or submit it interactively with the module urllib2). The first time you visit the script, the script sets the cookie within its reply's headers, and you'll see a reply page with this message:

His name shall be... Set-Cookie: user=Brian;

Thereafter, revisiting the script's URL (use your browser's reload button) produces a reply page with this message:

Welcome back, Brian

This is because the client is sending the previously stored cookie value back to the script, at least until you kill and restart your web browserthe default expiration of a cookie is the end of a browsing session. In a realistic program, this sort of structure might be used by the login page of a web application; a user would need to enter his name only once per browser session.

16.5.3.4. Handling cookies with the module urllib2

As mentioned earlier, the urllib2 module provides an interface similar to urllib for reading the reply from a URL, but it uses the cookielib module to also support storing and sending cookies on the client. For example, to use it to test the last section's script, we simply need to enable the cookiehandler class:

>>> import urllib2 >>> opener = urllib2.build_opener(urllib2.HTTPCookieProcessor( )) >>> urllib2.install_opener(opener) >>> >>> reply = urllib2.urlopen('http://localhost/cgi-bin/cookies.py').read( ) >>> print reply
His name shall be... Set-Cookie: user=Brian;
>>> reply = urllib2.urlopen('http://localhost/cgi-bin/cookies.py').read( ) >>> print reply
Welcome back, Brian
>>> reply = urllib2.urlopen('http://localhost/cgi-bin/cookies.py').read( ) >>> print reply
Welcome back, Brian

This works because urllib2 mimics the cookie behavior of a web browser on the client. Just as in a browser, the cookie is deleted if you exit Python and start a new session to rerun this code. See the library manual for more on this module's interfaces. Although easy to use, cookies have potential downsides. For one, they may be subject to size limitations (4 KB per cookie, 300 total, and 20 per domain are one common limit). For another, users can disable cookies in most browsers, making them less suited to critical data. Some even see them as intrusive, because they can be abused to track user behavior. Many sites simply require cookies to be turned on, finessing the issue completely. Finally, because they are transmitted over the network between client and server, they are still only as secure as the transmission stream itself; this may be an issue for sensitive data if the page is not using secure HTTP transmissions between client and server. We'll explore secure cookies and server concepts in the next chapter. For more details on the cookie modules and the cookie protocol in general, see Python's library manual, and search the Web for resources.

16.5.4. Server-Side Databases For more industrial-strength state retention, Python scripts can employ full-blown database solutions in the server. We will study these options in depth in Chapter 19 of this book. Python scripts have access to a variety of server-side data stores, including flat files, persistent object pickles and shelves, object-oriented databases such as ZODB, and relational SQL-based databases such as MySQL, PostgreSQL, and Oracle. Besides data storage, such systems may provide advanced tools such as transaction commits and rollbacks, concurrent update synchronization, and more. Full-blown databases are the ultimate storage solution. They can be used to represent state both between the pages of a single session (by tagging the data with generated per-session keys) and across multiple sessions (by storing data under per-user keys). Given a user's login name, for example, CGI scripts can fetch all of the context we have gathered in

the past about that user from the server-side database. Server-side databases are ideal for storing more complex cross-session information; a shopping cart application, for instance, can record items added in the past in a server-side database. Databases outlive both pages and sessions. Because data is kept explicitly, there is no need to embed it within the query parameters or hidden form fields of reply pages. Because the data is kept on the server, there is no need to store it on the client in cookies. And because such schemes employ general-purpose databases, they are not subject to the size constraints or optional nature of cookies. In exchange for their added utility, full-blown databases require more in terms of installation, administration, and coding. As we'll see in Chapter 19, luckily the extra coding part of that trade-off is remarkably simple in Python. Moreover, Python's database interfaces may be used in any application, web-based or otherwise.

16.5.5. Extensions to the CGI Model Finally, there are more advanced protocols and frameworks for retaining state on the server, which we won't cover in this book. For instance, the Zope web application framework, discussed briefly in Chapter 18, provides a product interface, which allows for the construction of web-based objects that are automatically persistent. Other schemes, such as FastCGI, as well as server-specific extensions such as mod_python for Apache, may attempt to work around the autonomous, one-shot nature of CGI scripts, or otherwise extend the basic CGI model to support long-lived memory stores. For instance: FastCGI allows web applications to run as persistent processes, which receive input data from and send reply streams to the HTTP web server over Inter-Process Communication (IPC) mechanisms such as sockets. This differs from normal CGI, which communicates inputs and outputs with environment variables, standard streams, and command-line arguments, and assumes scripts run to completion on each request. Because a FastCGI process may outlive a single page, it can retain state information from page to page, and avoids startup performance costs. mod_python extends the open source Apache web server by embedding the Python interpreter

within Apache. Python code is executed directly within the Apache server, eliminating the need to spawn external processes. This package also supports the concept of sessions, which can be used to store data between pages. Session data is locked for concurrent access and can be stored in files or in memory, depending on whether Apache is running in multiprocess or multithreaded mode. mod_python also includes web development tools, such as the Python Server Pages templating language for HTML generation (described later in this book). Such models are not universally supported, though, and may come with some added cost in complexityfor example, to synchronize access to persistent data with locks. Moreover, a failure in a FastCGI-style web application impacts the entire application, not just a single page, and things like memory leaks become much more costly. For more on persistent CGI models, and support in Python for things such as FastCGI, search the Web or consult web-specific resources.

16.5.6. Combining Techniques Naturally, these techniques may be combined to achieve a variety of memory strategies, both for

interaction sessions and for more permanent storage needs. For example: A web application may use cookies to store a per-user or per-session key on the client, and later use that key to index into a server-side database to retrieve the user's or session's full state information. Even for short-lived session information, URL query parameters or hidden form fields may similarly be used to pass a key identifying the session from page to page, to be used by the next script to index a server-side database. Moreover, URL query parameters and hidden fields may be generated for temporary state memory that spans pages, even though cookies and databases are used for retention that must span sessions. The choice of appropriate technique is driven by the application's storage needs. Although not as straightforward as the in-memory variables and objects of single process GUI programs running on a client, with a little creativity, CGI script state retention is entirely possible.

16.6. The Hello World Selector Let's get back to writing some code again. It's time for something a bit more useful than the examples we've seen so far (well, more entertaining, at least). This section presents a program that displays the basic syntax required by various programming languages to print the string "Hello World," the classic language benchmark. To keep it simple, this example assumes that the string is printed to the standard output stream in the selected language, not to a GUI or web page. It also gives just the output command itself, not the complete programs. The Python version happens to be a complete program, but we won't hold that against its competitors here. Structurally, the first cut of this example consists of a main page HTML file, along with a Pythoncoded CGI script that is invoked by a form in the main HTML page. Because no state or database data is stored between user clicks, this is still a fairly simple example. In fact, the main HTML page implemented by Example 16-17 is mostly just one big pull-down selection list within a form.

Example 16-17. PP3E\Internet\Web\languages.html

Languages Hello World selector
This demo shows how to display a "hello world" message in various programming languages' syntax. To keep this simple, only the output command is shown (it takes more code to make a complete program in some of these languages), and only text-based solutions are given (no GUI or HTML construction logic is included). This page is a simple HTML file; the one you see after pressing the button below is generated by a Python CGI script which runs on the server. Pointers:

To see this page's HTML, use the 'View Source' command in your browser.
To view the Python CGI script on the server, click here or here.
To see an alternative version that generates this page dynamically, click here.

Select a programming language:
<select name=language> All Python Perl Tcl

Scheme SmallTalk Java C C++ Basic Fortran Pascal Other

For the moment, let's ignore some of the hyperlinks near the middle of this file; they introduce bigger concepts like file transfers and maintainability that we will explore in the next two sections. When visited with a browser, this HTML file is downloaded to the client and is rendered into the new browser page shown in Figure 16-21.

Figure 16-21. The "Hello World" main page

That widget above the Submit button is a pull-down selection list that lets you choose one of the

tag values in the HTML file. As usual, selecting one of these language names and pressing

the Submit button at the bottom (or pressing your Enter key) sends the selected language name to an instance of the server-side CGI script program named in the form's action option. Example 16-18 contains the Python script that is run by the web server upon submission.

Example 16-18. PP3E\Internet\Web\cgi-bin\languages.py

#!/usr/bin/python ############################################################################# # show hello world syntax for input language name; note that it uses r'...' # raw strings so that '\n' in the table are left intact, and cgi.escape( ) # on the string so that things like 'python languages.py All Content-type: text/html <TITLE>Languages Syntax C
printf("Hello World\n");

Java
System.out.println("Hello World");

Python
print 'Hello World'

Pascal
WriteLn('Hello World');

C++
cout >> from urllib import urlopen >>> request = 'http://localhost/cgi-bin/languages.py?language=Python' >>> reply = urlopen(request).read( ) >>> print reply <TITLE>Languages Syntax Python
print 'Hello World'

To be robust, the script checks for both cases explicitly, as all CGI scripts generally should. For instance, here is the HTML generated in response to a request for the fictitious language GuiDO (you can also see this by selecting your browser's View Source option, after typing the URL manually into your browser's address field):

>>> request = 'http://localhost/cgi-bin/languages.py?language=GuiDO' >>> reply = urlopen(request).read( ) >>> print reply <TITLE>Languages Syntax GuiDO
Sorry--I don't know that language

If the script doesn't receive any language name input, it simply defaults to the "All" case (this can also be triggered if the URL ends with just ?language= and no language name value):

>>> reply = urlopen('http://localhost/cgi-bin/languages.py').read( ) >>> print reply <TITLE>Languages Syntax C
printf("Hello World\n");

Java
System.out.println("Hello World");

Python
print 'Hello World' ...more...

If we didn't detect these cases, chances are that our script would silently die on a Python exception and leave the user with a mostly useless half-complete page or with a default error page (we didn't assign stderr to stdout here, so no Python error message would be displayed). Figure 16-24 shows the page generated if the script is invoked with an explicit URL like this:

http://localhost/cgi-bin/languages.py?language=COBOL

Figure 16-24. Response page for unknown language

To test this error case interactively, the pull-down list includes an "Other" name, which produces a similar error page reply. Adding code to the script's table for the COBOL "Hello World" program is left

as an exercise for the reader.

16.7. Refactoring Code for Maintainability Let's step back from coding details for just a moment to gain some design perspective. As we've seen, Python code, by and large, automatically lends itself to systems that are easy to read and maintain; it has a simple syntax that cuts much of the clutter of other tools. On the other hand, coding styles and program design can often affect maintainability as much as syntax. For example, the "Hello World" selector pages of the preceding section work as advertised and were very easy and fast to throw together. But as currently coded, the languages selector suffers from substantial maintainability flaws. Imagine, for instance, that you actually take me up on that challenge posed at the end of the last section, and attempt to add another entry for COBOL. If you add COBOL to the CGI script's table, you're only half done: the list of supported languages lives redundantly in two placesin the HTML for the main page as well as in the script's syntax dictionary. Changing one does not change the other. More generally, there are a handful of ways that this program might fail the scrutiny of a rigorous code review. These are described next.

Selection list As just mentioned, the list of languages supported by this program lives in two places: the HTML file and the CGI script's table.

Field name The field name of the input parameter, language , is hardcoded into both files as well. You might remember to change it in the other if you change it in one, but you might not.

Form mock-ups We've redundantly coded classes to mock-up form field inputs twice in this chapter already; the "dummy" class here is clearly a mechanism worth reusing.

HTML code HTML embedded in and generated by the script is sprinkled throughout the program in print statements, making it difficult to implement broad web page layout changes or delegate web page design to nonprogrammers. This is a short example, of course, but issues of redundancy and reuse become more acute as your scripts grow larger. As a rule of thumb, if you find yourself changing multiple source files to modify a single behavior, or if you notice that you've taken to writing programs by cut-and-paste copying of existing code, it's probably time to think about more rational program structures. To illustrate coding styles and practices that are friendlier to maintainers, let's rewrite (that is, refactor) this example to fix all of these weaknesses in a single mutation.

16.7.1. Step 1: Sharing Objects Between PagesA New Input Form We can remove the first two maintenance problems listed earlier with a simple transformation; the trick is to generate the main page dynamically, from an executable script, rather than from a precoded HTML file. Within a script, we can import the input field name and selection list values from a common Python module file, shared by the main and reply page generation scripts. Changing the selection list or field name in the common module changes both clients automatically. First, we move shared objects to a common module file, as shown in Example 16-19.

Example 16-19. PP3E\Internet\Web\cgi-bin\languages2common.py

######################################################## # common objects shared by main and reply page scripts; # need change only this file to add a new language. ######################################################## inputkey = 'language' hellos = { 'Python': 'Perl': 'Tcl': 'Scheme': 'SmallTalk': 'Java': 'C': 'C++': 'Basic': 'Fortran': 'Pascal': }

r" r' r' r' r" r' r' r' r' r" r"

# input parameter name

print 'Hello World' ", print "Hello World\n"; ', puts "Hello World" ', (display "Hello World") (newline) ', 'Hello World' print. ", System.out.println("Hello World"); ', printf("Hello World\n"); ', cout python formMockup.py Bob hacker

Spam eggs ham 38 Brian

Since the mock-up now lives in a module, we can reuse it anytime we want to test a CGI script offline. To illustrate, the script in Example 16-22 is a rewrite of the tutor5.py example we saw earlier, using the form mock-up utility to simulate field inputs. If we had planned ahead, we could have tested the script like this without even needing to connect to the Net.

Example 16-22. PP3E\Internet\Web\cgi-bin\tutor5_mockup.py

#!/usr/bin/python ################################################################## # run tutor5 logic with formMockup instead of cgi.FieldStorage( ) # to test: python tutor5_mockup.py > temp.html, and open temp.html ################################################################## from formMockup import formMockup form = formMockup(name='Bob', shoesize='Small', language=['Python', 'C++', 'HTML'], comment='ni, Ni, NI') # rest same as original, less form assignment

Running this script from a simple command line shows us what the HTML response stream will look like:

C:\...\PP3E\Internet\Web\cgi-bin>python tutor5_mockup.py Content-type: text/html <TITLE>tutor5.py Greetings Your name is Bob You wear rather Small shoes Your current job: (unknown) You program in Python and C++ and HTML You also said:
ni, Ni, NI

Running it live yields the page in Figure 16-26. Field inputs are hardcoded, similar in spirit to the tutor5 extension that embedded input parameters at the end of hyperlink URLs. Here, they come from form mock-up objects created in the reply script that cannot be changed without editing the script. Because Python code runs immediately, though, modifying a Python script during the debug

cycle goes as quickly as you can type.

Figure 16-26. A response page with simulated inputs

16.7.3. Step 3: Putting It All TogetherA New Reply Script There's one last step on our path to software maintenance nirvana: we must recode the reply page script itself to import data that was factored out to the common module and import the reusable form mock-up module's tools. While we're at it, we move code into functions (in case we ever put things in this file that we'd like to import in another script), and all HTML code to triple-quoted string blocks. The result is Example 16-23. Changing HTML is generally easier when it has been isolated in single strings like this, instead of being sprinkled throughout a program.

Example 16-23. PP3E\Internet\Web\cgi-bin\languages2reply.py

#!/usr/bin/python ######################################################### # for easier maintenance, use HTML template strings, get # the language table and input key from common module file, # and get reusable form field mockup utilities module. ######################################################### import cgi, sys from formMockup import FieldMockup from languages2common import hellos, inputkey debugme = False

# input field simulator # get common table, name

hdrhtml = """Content-type: text/html\n <TITLE>Languages Syntax""" langhtml = """ %s
%s

""" def showHello(form): # HTML for one language choice = form[inputkey].value # escape lang name too try: print langhtml % (cgi.escape(choice), cgi.escape(hellos[choice])) except KeyError: print langhtml % (cgi.escape(choice), "Sorry--I don't know that language") def main( ): if debugme: form = {inputkey: FieldMockup(sys.argv[1])} else: form = cgi.FieldStorage( )

# name on cmd line # parse real inputs

print hdrhtml if not form.has_key(inputkey) or form[inputkey].value == 'All': for lang in hellos.keys( ): mock = {inputkey: FieldMockup(lang)} showHello(mock) else: showHello(form) print '' if _ _name_ _ == '_ _main_ _': main( )

When global debugme is set to TRue, the script can be tested offline from a simple command line as before:

C:\...\PP3E\Internet\Web\cgi-bin>python languages2reply.py Python

Content-type: text/html <TITLE>Languages Syntax Python
print 'Hello World'

When run online, we get the same reply pages we saw for the original version of this example (we won't repeat them here again). This transformation changed the program's architecture, not its user interface. Most of the code changes in this version of the reply script are straightforward. If you test-drive these pages, the only differences you'll find are the URLs at the top of your browser (they're different files, after all), extra blank lines in the generated HTML (ignored by the browser), and a potentially different ordering of language names in the main page's pull-down selection list. This selection list ordering difference arises because this version relies on the order of the Python dictionary's keys list, not on a hardcoded list in an HTML file. Dictionaries, you'll recall, arbitrarily order entries for fast fetches; if you want the selection list to be more predictable, simply sort the keys list before iterating over it using the list sort method, or the sorted function introduced in Python 2.4:

for lang in sorted(hellos): mock = {inputkey: FieldMockup(lang)}

# dict iterator instead of .keys( )

Faking Inputs with Shell Variables If you know what you're doing, you can also test CGI scripts from the command line on some platforms by setting the same environment variables that HTTP servers set, and then launching your script. For example, we might be able to pretend to be a web server by storing input parameters in the QUERY_STRING environment variable, using the same syntax we employ at the end of a URL string after the ?:

$ setenv QUERY_STRING "name=Mel&job=trainer,+writer" $ python tutor5.py Content-type: text/html <TITLE>tutor5.pyMark Lutz's Python Training Page ----------------------------------------

Now, to make the code in Example 18-2 part of a Zope web site as an external method:

1. Create, copy, or move the module in the Zope Extensions directory. On Windows, put it in C:\Zope-Instance\Extensions. 2. Add its functions as "External Method" objects to your web site in the Zope ZMI (e.g., add it to the "/" root folder to make it visible across the entire site). To add both functions in the module, add two external methods. Once added in the ZMI interface, the functions are external method objects in the web site tree and will be acquired (roughly, inherited) by objects lower in the tree, as well as by paths in URLs that

name the methods to operate on the path context. The end result will be that the two functions in Example 18-2 will become callable through the Web via URLs and from other Zope objects such as DTML template language code and other Python code. Figure 18-1 shows one of the two functions being added in the Zope ZMIthe web-based interface used to build sites.

Figure 18-1. The Zope ZMI

The Zope site tree built in the ZMI is separate from the filesystem where the external method's module lives; here, we're adding the method to the root of the Zope site tree (the "/" folder). In Zope, your entire site is designed and maintained in the ZMI interface, and every object added in the ZMI becomes a persistent Python object in the ZODB database used to store your site. However, some components, such as external method module files, also live on the filesystem; as such, they have access to the machine at large.

18.2.3.1. Calling through the Web Once added to the site tree, your methods are callable through the Web, using URLs that name the Zope server's hostname and port, any nested folder paths, the name of the external method as registered to Zope in the ZMI, and URL query parameters to provide inputs. Here is a URL that runs the web page fetch function in the module directly; Zope listens for HTTP requests on port number 8080 by default and is running on the local machine ("localhost") here:

http://localhost:8080/fetchWebPage?url=http://www.rmi.net/~lutz

As mentioned, Zope uses information entered in the ZMI to map URLs that reference external methods of the form:

server/method?arg1=va11&arg2=val2

into calls to Python functions in Python modules on the server of the following form:

method(arg1=val1, arg2=val2)

In our example, the ZPublisher ORB matches request inputs to method parameters by name, and a call of this form is invoked for the localhost URL:

webtools.fetchWebPage(url='http://www.rmi.net/~lutz')

Because this function returns raw text, Zope automatically renders it in the reply page stream (default reply formatting uses the Python str function). For example, Figure 18-2 shows the reply page returned by Zope for the Python home page, using the following URL in a web browser's address field (technically, the url parameter's value string should probably be escaped with urllib.quote_plus, but it works in all browsers tested as is):

http://localhost:8080/fetchWebPage?url=http://www.python.org

Figure 18-2. Python home page fetched by a Zope external method

The HTML is escaped in the reply in Figure 18-2 because it is not wrapped in enclosing HTML yet; it is taken to be a string when fetched from the method directly. To make this display nicely, we need to move on to the next section.

18.2.3.2. Calling from other objects Besides such direct URLs, Python external methods can also be referenced and called from other types of Zope objects, including Python scripts and DTML templating language code. When referenced, Zope finds the method object by acquisition (web site tree search); calls the Python function in the module file, passing in any arguments; and renders and inserts the returned result into the HTML reply stream. For instance, the following Zope Python script, fetchscript, is a Script object added in the ZMI to the site's /scripts101 folder (it can also be uploaded to the ZMI from an external file). The script becomes a persistent object in the ZODB database used by Zope; it is not stored in the Extensions directory in the filesystem. Assuming this is stored lower in the site tree than the external method, when run, it locates and invokes the code in Example 18-2:

# called from DTML or URL, calls external method # gets external method in "/" by acquisition context # uses FTP, returned string inserted into HTML reply site = 'home.rmi.net' directory = '.' login = ('lutz', 'XXXXXXXX') reply = context.fetchFtpFile(context, site, directory, 'mytrain.html', login,72) return reply

Zope Python scripts are small bits of Python code, designed for running calculations that are too complex for templating languages such as DTML, but are not complex enough to warrant an external method or other construct. Scripts generally perform simple numeric or string manipulations. Unlike external methods, scripts run in a limited secure environment and are stored in the Zope site tree. In scripts, the context variable gives access to the Zope acquisition context in which the script is being run, and other variables give access to request inputs and reply output interfaces. Similarly, the following DTML templating language method object, named fetchdtml and created in the same /scripts101 ZMI web site folder, invokes both the external method directly and the script of the prior listing. Both the script and the DTML objects themselves become addressable by direct URL or by other objects in the web site tree.

External Method call (urllib) <pre> Python script to External Method call (ftplib) <pre>

DTLM combines normal HTML with DTML tags that are evaluated and expanded on the server by Zope when the enclosing page is fetched. The results of DTML tags are inserted into the HTML reply stream. The dtml-var tag, for instance, can name inline Python code to be run (expr=) in the context of the web site tree, or name another object to be looked up in the tree and calledthe expression or object's result text is rendered and inserted into the reply stream HTML, replacing the entire dtml-var tag. The object called from a dtml-var tag can be another DTML templating language object, a Python script or external method object, or other object types such as images. For example, the standard_html_header in this code references another DTML method object higher in the object tree, which in turn references an image object in the tree; by listing this in each page lower in the tree, it provides a common page header. Figure 18-3 captures the reply generated when we visit the DTML code in a web browserthe original Python external method is run twice along the way. This page is addressed by the following URL; replace the last component of this URL with fetchscript to access the Python script by direct URL (it is also run by the DTML method):

http://localhost:8080/scripts101/fetchdtml

Figure 18-3. Running DTML code that calls Python methods

In a sense, DTML embeds Python in HTMLit runs Python code in response to tags embedded in the reply page. This is essentially the opposite of the CGI scripts we met earlier which embed HTML in Python, and it is similar to the ASP and PSP systems we'll meet later in this chapter. More important, DTML, as well as Zope's other templating language, ZPT (TAL), encourages separation of presentation and business logic. DTML presents the results of Python method and script invocations in HTML, but it doesn't know about their operation. The Python code of the script and external method objects referenced by DTML implements more complex programming tasks, but it doesn't know about display formatting of the context in which it may be used. Where appropriate, the display and logic components can be implemented by different specialists.

18.2.4. A Simple Zope Interactive Web Site As a final example, consider the following Zope-based web site. It consists of three Zope objects, all created and edited in the ZMI: an input page, a reply page, and a Python script used for calculations. Its input page form references the reply page object, and the reply page calls a Python script from a DTML expression. The input page is a DTML method object, created and stored as the Zope tree object /scripts101/salaryInput in the ZMI. Its form input parameters are automatically converted to float and integer objects by Zope:

Enter job data:

Hours worked:

Pay per hour:

The reply page, the web tree object /scripts101/salaryResult, is also a Zope DTML method object, invoked by the salaryInput page:

Your pay this week:

Finally, the Python script object, added as /scripts101/calculateSalary in the ZMI, performs numeric calculations required by the reply page, which are outside the scope of DTML display code. Input parameters to this script come automatically from DTML namespaces; their names (hours, rate) may be listed in the ZMI when the script is created or by special comments at the start of the script's code. When run, this script's return value is automatically rendered by Zope and inserted in the HTML reply stream, replacing the dtml-var tag that calls the script by name.

import math if hours < 0: hours = 0 else: hours = math.floor(hours) return hours * rate

As before, this fosters a separation of presentation and business logic: the DTML salaryResult presents the result of Python calculateSalary, but the DTML code doesn't know about salary calculation and the Python code doesn't know about presentation. Ideally, the two parts can be worked on independently, by people with different skill sets. This separation is especially striking when compared with classic CGI scripts, which embed and mix HTML reply code with Python codein the Zope model, salaryResult display is independent of the Python calculateSalary logic. In practice, more complex pages may require additional formatting logic in the templating language code (e.g., loops and tests), but the general separation still applies. Figure 18-4 captures this site's input page (it can also be displayed with the View tab in the ZMI) at the URL http://localhost:8080/scripts101/salaryInput.

Figure 18-4. Input page

Figure 18-5 shows the reply page returned when the input page is submitted. The reply page reflects the DTML code that presents the result returned by the Python script.

Figure 18-5. Reply page

We can also call the calculateSalary Python script directly by its URL, though we have to take care to convert the input arguments to their expected datatypes by using type codes after their namesZope uses these to perform from-string conversions before the values are passed into the called object. We use these in the input fields of salaryInput as well. Alternatively, we could restructure the script to convert from strings to the expected types itself by using the REQUEST inputs object rather than declared parameters. As is, the following URL produces a page that displays just the text "5200.0"the default str rendering of the returned Python floating-point number:

http://localhost:8080/scripts101/calculateSalary?hours:float=65&rate:int=80

The salaryResult DTML page object can be called directly by a similar URL (replace the Python script's name), though the reply is a complete web page produced by the DTML code. In fact, as seen in Figure 18-6, the Python script can also be tested within the ZMI itselfclick the test tab, and input the parameters manually. Objects can be tested this way in the ZMI, without having to type the corresponding URL in another browser window.

Figure 18-6. Testing scripts in the ZMI

As you can probably tell, in this introduction we're just scratching the surface of what Zope can do. For instance, we haven't introduced the other templating language in Zope, Zope Page Templates (ZPT), coded in Template Attribute Language (TAL). ZPT is an alternative way to describe presentation based on attributes of normal HTML tags, rather than embedded DTML tags. As such, ZPT code may be more easily handled by some HTML editors when edited outside the context of Zope. Moreover, published functions and methods can use the Zope object database to save state permanently; there are more advanced Python constructs in Zope, including Zope products; URLs can provide method context using reference paths in ways we have not mentioned here; and Zope provides additional tools such as debugging support, precoded HTTP servers for use with the ORB, and finer-grained control over responses to URL requestors. For all things Zope, visit http://www.zope.org. There, you'll find up-to-date releases, as well as documentation ranging from tutorials to references to full-blown Zope example sites.

During the lifespan of the second edition of this book, Python creator Guido van Rossum and his PythonLabs team of core Python developers were located at the Zope Corporation, home of the Zope framework introduced here. As I write this third edition, Guido has just been hired by Google, but many of the original PythonLabs team members are still at Zope.

18.3. HTMLgen: Web Pages from Objects One of the things that makes basic CGI scripts complex is their inherent dependence on HTML: they must embed and generate legal HTML code to build user interfaces. These tasks might be easier if the syntax of HTML were somehow removed from CGI scripts and handled by an external tool. HTMLgen is a third-party Python tool designed to fill this need. With it, programs build web pages by constructing trees of Python objects that represent the desired page and "know" how to format themselves as HTML. Once constructed, the program asks the top of the Python object tree to generate HTML for itself, and out comes a complete, legally formatted HTML web page. Programs that use HTMLgen to generate pages need never deal with the syntax of HTML; instead, they can use the higher-level object model provided by HTMLgen and trust it to do the formatting step. HTMLgen may be used in any context where you need to generate HTML. It is especially suited for HTML that is generated periodically from static data, but it can also be used for HTML creation in CGI scripts (though its use in the CGI context incurs some extra speed costs). For instance, HTMLgen would be ideal if you run a nightly job to generate web pages from database contents. HTMLgen can also be used to generate documents that don't live on the Web at all; the HTML code it produces works just as well when viewed offline.

18.3.1. A Brief HTMLgen Tutorial We can't investigate HTMLgen in depth here, but let's look at a few simple examples to sample the flavor of the system. HTMLgen is shipped as a collection of Python modules that must be installed on your machine; once it's installed and its directory is added to your module search path, simply import objects from the HTMLgen module corresponding to the tag you wish to generate, and make instances:

C:\Stuff\HTMLgen\HTMLgen>python >>> from HTMLgen import * >>> p = Paragraph("Making pages from objects is easy\n") >>> p >>> print p
Making pages from objects is easy

Here, we make an HTMLgen.Paragraph object (a class instance), passing in the text to be formatted. All HTMLgen objects implement _ _str_ _ methods and can emit legal HTML code for themselves. When we print the Paragraph object, it emits an HTML paragraph construct. HTMLgen objects also define append methods, which do the right thing for the object type; Paragraphs simply add appended text to the end of the text block:

>>> p.append("Special < characters > are & escaped") >>> print p
Making pages from objects is easy Special < characters > are & escaped

Notice that HTMLgen escaped the special characters (e.g., < means >> h = Href('http://www.python.org', 'python') >>> print h python

To generate HTML for complete pages, we create one of the HTML document objects, append its component objects, and print the document object. HTMLgen emits a complete page's code, ready to

be viewed in a browser:

>>> d = SimpleDocument(title='My doc') >>> p = Paragraph('Web pages made easy') >>> d.append(p) >>> d.append(h) >>> print d

Now, suppose we wish to parse this XML code, extracting just the ISBN numbers and titles for each book defined, and stuffing the details into a dictionary indexed by ISBN number. Python's XML parsing tools let us do this in an accurate way. Example 18-10, for instance, defines a SAX-based parsing procedure: its class implements callback methods that will be called during the parse.

Example 18-10. PP3E\Internet\Other\XML\bookhandler.py

################################ # SAX is a callback-based API # for intercepting parser events ################################ import xml.sax.handler class BookHandler(xml.sax.handler.ContentHandler): def _ _init_ _(self): self.inTitle = 0 self.mapping = {} def startElement(self, name, attributes): if name == "book": self.buffer = "" self.isbn = attributes["isbn"] elif name == "title": self.inTitle = 1 def characters(self, data): if self.inTitle: self.buffer += data

# handle XML parser events # a state machine model

# on start book tag # save ISBN for dict key # on start title tag # save title text to follow

# on text within tag # save text if in title

def endElement(self, name): if name == "title": self.inTitle = 0 self.mapping[self.isbn] = self.buffer

# on end title tag # store title text in dict

The SAX model is efficient, but it is potentially confusing at first glance, because the class must keep track of where the parse currently is using state information. For example, when the title tag is first detected, we set a state flag and initialize a buffer; as each character within the title tag is parsed, we append it to the buffer until the ending portion of the title tag is encountered. The net effect saves the title tag's content as a string. To kick off the parse, we make a parser, set its handler to the class in Example 18-10, and start the parse; as Python scans the XML file our class's methods are called automatically as components are encountered:

C:\...\PP3E\Internet\Other\XML>python >>> import xml.sax >>> import bookhandler >>> import pprint >>> >>> parser = xml.sax.make_parser( ) >>> handler = bookhandler.BookHandler( ) >>> parser.setContentHandler(handler) >>> parser.parse('books.xml') >>> >>> pprint.pprint(handler.mapping) {u'0-596-00085-5': u'Programming Python', u'0-596-00128-2': u'Python & XML', u'0-596-00281-5': u'Learning Python', u'0-596-00797-3': u'Python Cookbook'}

When the parse is completed, we use the Python pprint ("pretty printer") module to display the resultthe mapping dictionary object attached to our handler. Beginning with Python 2.3 the Expat parser is included with Python as the underlying parsing engine that drives the events intercepted by our class. DOM parsing is perhaps simpler to understandwe simply traverse a tree of objects after the parsebut it might be less efficient for large documents, if the document is parsed all at once ahead of time. DOM also supports random access to document parts via tree fetches; in SAX, we are limited to a single linear parse. Example 18-11 is a DOM-based equivalent to the SAX parser listed earlier.

Example 18-11. PP3E\Internet\Other\XML\dombook.py

##################################### # DOM gives the whole document to the # application as a traversable object ##################################### import pprint import xml.dom.minidom from xml.dom.minidom import Node doc = xml.dom.minidom.parse("books.xml") mapping = {} for node in doc.getElementsByTagName("book"): isbn = node.getAttribute("isbn") L = node.getElementsByTagName("title") for node2 in L: title = "" for node3 in node2.childNodes: if node3.nodeType == Node.TEXT_NODE: title += node3.data mapping[isbn] = title

# load doc into object # usually parsed up front # traverse DOM object # via DOM object API

# mapping now has the same value as in the SAX example pprint.pprint(mapping)

The output of this script is the same as what we generated interactively for the SAX parser; here, though, it is built up by walking the document object tree after the parse has finished using method calls and attributes defined by the cross-language DOM standard specification:

C:\...\PP3E\Internet\Other\XML>dombook.py {u'0-596-00085-5': u'Programming Python', u'0-596-00128-2': u'Python & XML', u'0-596-00281-5': u'Learning Python', u'0-596-00797-3': u'Python Cookbook'}

Naturally, there is much more to Python's XML support than these simple examples imply. In deference to space, though, here are pointers to XML resources in lieu of additional examples:

Standard library First, be sure to consult the Python library manual for more on the standard library's XML support tools. See the entries for xml.sax and xml.dom for more on this section's examples.

ElementTree The popular ElementTree extension provides easy-to-use tools for parsing, changing, and generating XML documents. It represents documents as a tree of Python objects (in the spirit

of HTMLgen described earlier in this chapter). As of this writing, ElementTree (and a fast C implementation of it) is still a third-party package, but its core components were scheduled to be incorporated into the standard library of Python 2.5 in six months, as the package xml.etree. See the 2.5 library manual for details.

PyXML SIG tools You can also find Python XML tools and documentation at the XML Special Interest Group (SIG) web page at http://www.python.org (click on the SIGs link near the top). This SIG is dedicated to wedding XML technologies with Python, and it publishes a free XML tools package distribution called PyXML. That package contains tools not yet part of the standard Python distribution. Much of the standard library's XML support originated in PyXML, though that package still contains tools not present in Python itself.

Third-party tools You can also find free, third-party Python support tools for XML on the Web by following links at the XML SIGs web page. Of special interest, the 4Suite package from Fourthought provides integrated tools for XML processing, including open technologies such as DOM, SAX, RDF, XSLT, XInclude, XPointer, XLink, and XPath.

Documentation O'Reilly offers a book dedicated to the subject of XML processing in Python, Python & XML, written by Christopher A. Jones and Fred L. Drake, Jr. As usual, be sure to check Python's web site or your favorite web search engine for more recent developments on this front.

18.7. Windows Web Scripting Extensions Although this book doesn't cover the Windows-specific extensions available for Python in detail, a quick look at Internet scripting tools available to Windows programmers is in order here. On Windows, Python can be used as a scripting language for both the Active Scripting and the Active Server Pages systems, which provide client- and server-side control of HTML-based applications. More generally, Python programs can also take the role of COM and DCOM clients and servers on Windows. The .NET framework provides additional options for Python programmers. This section is largely Microsoft specificif you are interested in portability, other systems in this chapter may address your needs better (see Jython's client-side applets, PSP's server-side scripting support, and Zope's server-side object publishing model). On the other hand, if portability isn't a concern, the following techniques provide powerful ways to script both sides of a web conversation with Python.

18.7.1. Active Scripting: Client-Side Embedding Active Scriptingsometimes known as ActiveX Scripting, or just ActiveXis a technology that allows scripting languages to communicate with hosting applications. The hosting application provides an application-specific object model API, which exposes objects and functions for use in the scripting language programs. In one of its more common roles, Active Scripting provides support that allows scripting language code embedded in HTML pages to communicate with the local web browser through an automatically exposed object model API. Internet Explorer, for instance, utilizes Active Scripting to export things such as global functions and user-interface objects, for use in scripts embedded in HTML that are run on the client. With Active Scripting, Python code may be embedded in a web page's HTML between special tags; such code is executed on the client machine and serves the same roles as embedded JavaScript and VBScript. Unfortunately, support for client-side Active Scripting with Python under Internet Explorer no longer works as I write this update using Python 2.4. It relied on the rexec module to implement security for embedded code. As mentioned earlier in this chapter, in the sidebar "The Missing rexec Section," that module was withdrawn due to vulnerabilities. Without it, the Windows extensions have no way to ensure that code embedded in web pages won't do damage on the clientthe code would have full access to the client machine, including all its files and data. Instead of leaving this security hole open, Active Scripting on the client has been disabled, and Python code embedded in HTML will no longer run under Internet Explorer. I've kept a brief overview of its workings in this book, though, because this may be a temporary regression; it might be reenabled in the future if a rexec replacement arises. Moreover, its use in restricted intranets may still be reasonable (though you'll have to modify the Windows extensions code to turn it back on).

Despite the Internet Explorer regression, Python still works as a scripting engine inside trusted hosts, including the Windows Scripting Host, and ASP. In addition, JavaScript or VBScript coded embedded in web pages can still invoke Python-coded COM components on the client. At present, though, Python code embedded in web pages cannot be run on the client directly, and the section you are reading is included for historic or future interest. Check up-to-date resources to see whether Active Scripting under Internet Explorer on the client has been turned back on by the time you read these words.

18.7.1.1. Active Scripting basics Embedding Python in client-side HTML works only on machines where Python is installed and Internet Explorer is configured to know about the Python language. Because of that, this technology doesn't apply to most of the browsers in cyberspace today. On the other hand, if you can configure the machines on which a system is to be delivered, this is a nonissue. Before we get into a Python example, let's look at the way standard browser installations handle other languages embedded in HTML. By default, Internet Explorer knows about JavaScript (really, Microsoft's Jscript implementation of it) and VBScript (a Visual Basic derivative), so you can embed both of those languages in any delivery scenario. For instance, the HTML file in Example 18-12 embeds JavaScript code, the default Internet Explorer scripting language on my PC.

Example 18-12. PP3E\Internet\Other\Win\activescript-js.html

Embedded code demo: JavaScript <SCRIPT> // // // //

pop up 3 alert boxes while this page is being constructed on client side by IE; JavaScript is the default script language, and alert is an automatically exposed name

function message(i) { if (i == 2) { alert("Finished!"); } else { alert("A JavaScript-generated alert => " + i); } } for (count = 0; count < 3; count += 1) { message(count); }

All the text between the <SCRIPT> and tags in this file is JavaScript code. Don't worry about its syntaxthis book isn't about JavaScript. The important thing to know is how this code is used by the browser. When a browser detects a block of code like this while building up a new page, it strips out the code, locates the appropriate interpreter, tells the interpreter about global object names, and passes the code to the interpreter for execution. The global names become variables in the embedded code and provide links to browser context. For instance, the name alert in the code block refers to a global function that creates a message box. Other global names refer to objects that give access to the browser's user interface: window objects, document objects, and so on. You can run this HTML file on the local machine by clicking on its name in a file explorer. It can also be stored on a remote server and accessed via its URL in a browser. Whichever way you start it, three pop-up alert boxes created by the embedded code appear during page construction. Figure 189 shows one under Internet Explorer.

Figure 18-9. Internet Explorer running embedded JavaScript code

18.7.1.2. Embedding Python in HTML So how about putting Python code in that page, then? Alas, we need to do a bit more first. Although Internet Explorer is language neutral in principle, it does support some languages better than others, at least today. Moreover, other browsers may be more rigid and may not support the Active Scripting concept at all.

To make the Python version work, you must do more than simply installing Python on your PC. You must also install the PyWin32 package separately and run its tools to register Python to Internet Explorer. The PyWin32 package includes the win32com extensions for Python, plus the PythonWin IDE (a GUI for editing and running Python programs, written with the MFC interfaces in PyWin32) and many other Windows-specific tools not covered in this book. In the past, registering Python for use in Internet Explorer was either an automatic side-effect of installing PyWin32 or was achieved by running a script located in the Windows extension's code in Lib\site-packages of the Python install tree, named win32comext\axscript\client\pyscript.py. Because this package changes over time, though, see its documentation for current registration details. Once you've registered Python with Internet Explorer, Python code embedded in HTML works just like our JavaScript exampleInternet Explorer presets Python global names to expose its object model and passes the embedded code to your Python interpreter for execution. Example 18-13 shows our alerts example again, programmed with embedded Python code.

Example 18-13. PP3E\Internet\Other\Win\activescript-py.html

Embedded code demo: Python <SCRIPT Language=Python> # # # #

do the same but with Python, if configured; embedded Python code shows three alert boxes as page is loaded; any Python code works here, and uses auto-imported global funcs and objects

def message(i): if i == 2: alert("Finished!") else: alert("A Python-generated alert => %d" % i) for count in range(3): message(count)

Figure 18-10 shows one of the three pop ups you should see when you open this file in Internet Explorer after installing PyWin32 and registering Python to Internet Explorer. Note that the first time you access this page, Internet Explorer may need to load Python, which could induce an apparent delay on slower machines; later accesses generally start up much faster because Python has already been loaded.

Figure 18-10. Internet Explorer running embedded Python code (currently disabled)

With a simple configuration step, Python code can be embedded in HTML and be made to run under Internet Explorer, just like JavaScript and VBScript. Although this works on only some browsers and platforms, for many applications, the portability constraint is acceptable. Active Scripting is a straightforward way to add client-side Python scripting for web browsers, especially when you can control the target delivery environment. For instance, machines running on an intranet within a company may have well-known configurations. In such scenarios, Active Scripting lets developers apply all the power of Python in their client-side scripts.

18.7.2. Active Server Pages: Server-Side Embedding Active Server Pages (ASPs) use a similar model: Python code is embedded in the HTML that defines a web page. But ASP is a server-side technologyembedded Python code runs on the server machine and uses an object-based API to dynamically generate portions of the HTML that is ultimately sent back to the client-side browser. As we saw in the last two chapters, Python server-side CGI scripts embed and generate HTML, and they deal with raw inputs and output streams. By contrast, serverside ASP scripts are embedded in HTML and use a higher-level object model to get their work done. Just like client-side Active Scripting, ASP requires you to install Python and the Python Windows PyWin32 extensions package. But because ASP runs embedded code on the server, you need to configure Python only on one machine. Like CGI scripts in general, this generally makes Python ASP scripting much more widely applicable, as you don't need Python support on every client. Moreover, because you control the content of the embedded code on your server, this is a secure configuration. Unlike CGI scripts, however, ASP requires you to run Microsoft's Internet Information Server (IIS) today.

18.7.2.1. A short ASP example We can't discuss ASP in any real detail here, but here's an example of what an ASP file looks like when Python code is embedded:

<SCRIPT RunAt=Server Language=Python> # # code here is run at the server #

As before, code may be embedded inside SCRIPT tag pairs. This time, we tell ASP to run the code at the server with the RunAt option; if omitted, the code and its tags are passed through to the client and are run by Internet Explorer (if configured properly). ASP also recognizes code enclosed in delimiters and allows a language to be specified for the entire page. This form is handier if there are multiple chunks of code in a page, as shown in Example 18-14.

Example 18-14. PP3E\Internet\Other\Win\asp-py.asp

However the code is marked, ASP executes it on the server after passing in a handful of named objects that the code may use to access input, output, and server context. For instance, the automatically imported Request and Response objects give access to input and output context. The code here calls a Response.Write method to send text back to the browser on the client (much like a print statement in a simple Python CGI script), as well as Request.ServerVariables to access environment variable information. To make this script run live, you'll need to place it in the proper directory on a server machine running IIS with ASP support, after installing and registering Python. Although this is a simple example, the full power of the Python language is at your disposal when it is embedded like this. By combining Python-generated content with HTML formatting, this yields a natural way to generate pages dynamically and separate logic from display. It's similar in spirit to the more platform-neutral Zope DTML idea we saw earlier, and the PSP pages we'll meet later.

18.7.3. The COM Connection At their core, both Internet Explorer and IIS are based on the COM (Component Object Model)

integration systemthey implement their object APIs with standard COM interfaces and look to the rest of the world like any other COM object. From a broader perspective, Python can be used as both a scripting and an implementation language for any COM object. Although the COM mechanism used to run Python code embedded within HTML is automated and hidden, it can also be employed explicitly to make Python programs take the role of both COM clients and COM servers. COM is a general integration technology and is not strictly tied to Internet scripting, but a brief introduction here might help demystify some of the Active Scripting magic behind HTML embedding.

18.7.3.1. A brief introduction to COM COM is a Microsoft technology for language-neutral component integration. It is sometimes marketed as ActiveX, partially derived from a system called Object Linking and Embedding (OLE), and it is the technological heart of the Active Scripting system we met earlier.[*] COM also sports a distributed extension known as DCOM that allows communicating objects to be run on remote machines. Implementing DCOM often simply involves running through Windows registry configuration steps to associate servers with machines on which they run. [*]

Roughly, OLE was a precursor to COM, and Active Scripting is just a technology that defines COM interfaces for activities such as passing objects to arbitrary programming language interpreters by name. Active Scripting is not much more than COM itself with a few extensions, but acronym and buzzword overload seem to run rampant in the Windows development world. To further muddy the waters, .NET is essentially a successor to COM.

Operationally, COM defines a standard way for objects implemented in arbitrary languages to talk to each other, using a published object model. For example, COM components can be written in and used by programs written in Visual Basic, Visual C++, Delphi, PowerBuilder, and Python. Because the COM indirection layer hides the differences among all the languages involved, it's possible for Visual Basic to use an object implemented in Python, and vice versa. Moreover, many software packages register COM interfaces to support end-user scripting. For instance, Microsoft Excel publishes an object model that allows any COM-aware scripting language to start Excel and programmatically access spreadsheet data. Similarly, Microsoft Word can be scripted through COM to automatically manipulate documents. COM's language neutrality means that programs written in any programming language with a COM interface, including Visual Basic and Python, can be used to automate Excel and Word processing. Of most relevance to this chapter, Active Scripting also provides COM objects that allow scripts embedded in HTML to communicate with Microsoft's Internet Explorer (on the client) and IIS (on the server). Both systems register their object models with Windows such that they can be invoked from any COM-aware language. For example, when Internet Explorer extracts and executes Python code embedded in HTML, some Python variable names are automatically preset to COM object components that give access to Internet Explorer context and tools (e.g., alert in Example 18-13). Calls to such components from Python code are automatically routed through COM back to Internet Explorer.

18.7.3.2. Python COM clients With the PyWin32 Python extension package installed, though, we can also write Python programs that serve as registered COM servers and clients, even if they have nothing to do with the Internet at all. For example, the Python program in Example 18-15 acts as a client to the Microsoft Word COM object.

Example 18-15. PP3E\Internet\Other\Win\comclient.py

#################################################################### # a COM client coded in Python: talk to MS-Word via its COM object # model; uses either dynamic dispatch (runtime lookup/binding), # or the static and faster type-library dispatch if makepy.py has # been run; install the Windows PyWin32 extensions package to use # this interface; Word runs hidden unless Visible is set to 1 (and # Visible lets you watch, but impacts interactive Word sessions); #################################################################### from sys import argv docdir = 'C:\\temp\\' if len(argv) == 2: docdir = argv[1]

# ex: comclient.py a:\

from win32com.client import Dispatch word = Dispatch('Word.Application') word.Visible = 1

# early or late binding # connect/start Word # else Word runs hidden

# create and save new doc file newdoc = word.Documents.Add( ) spot = newdoc.Range(0,0) spot.InsertBefore('Hello COM client world!') newdoc.SaveAs(docdir + 'pycom.doc') newdoc.SaveAs(docdir + 'copy.doc') newdoc.Close( )

# call Word methods # insert some text # save in doc file

# open and change a doc file olddoc = word.Documents.Open(docdir + 'copy.doc') finder = word.Selection.Find finder.text = 'COM' finder.Execute( ) word.Selection.TypeText('Automation') olddoc.Close( ) # and so on: see Word's COM interface specs

This particular script starts Microsoft Wordknown as Word.Application to scripting clientsif needed, and converses with it through COM. That is, calls in this script are automatically routed from Python to Microsoft Word and back. This code relies heavily on calls exported by Word, which are not described in this book. Armed with documentation for Word's object API, though, we could use such calls to write Python scripts that automate document updates, insert and replace text, create and print documents, and so on. For instance, Figure 18-11 shows the two Word .doc files generated when the previous script is run on Windows: both are new files, and one is a copy of the other with a text replacement applied. The interaction that occurs while the script runs is more interesting. Because Word's Visible attribute is set to 1, you can actually watch Word inserting and replacing text, saving files, and so on, in response to calls in the script. (Alas, I couldn't quite figure out how to paste a movie clip in this book.)

Figure 18-11. Word files generated by Python COM client

In general, Python COM client calls may be dispatched either dynamically by runtime look-ups in the Windows registry, or statically using type libraries created by a Python utility script. These dispatch modes are sometimes called late and early dispatch binding, respectively. See PyWin32 documentation for more details. Luckily, we don't need to know which scheme will be used when we write client scripts. The Dispatch call used in Example 18-15 to connect to Word is smart enough to use static binding if server type libraries exist or to use dynamic binding if they do not. To force dynamic binding and ignore any generated type libraries, replace the first line with this:

from win32com.client.dynamic import Dispatch

# always late binding

However calls are dispatched, the Python COM interface performs all the work of locating the named server, looking up and calling the desired methods or attributes, and converting Python datatypes according to a standard type map as needed. In the context of Active Scripting, the underlying COM model works the same way, but the server is something like Internet Explorer or IIS (not Word), the set of available calls differs, and some Python variables are preassigned to COM server objects. The notions of "client" and "server" can become somewhat blurred in these scenarios, but the net result is similar.

18.7.3.3. Python COM servers Python scripts can also be deployed as COM servers and can provide methods and attributes that are accessible to any COM-aware programming language or system. This topic is too complex to cover in depth here too, but exporting a Python object to COM is mostly just a matter of providing a set of class attributes to identify the server and utilizing the proper win32com registration utility calls.

Example 18-16 is a simple COM server coded in Python as a class.

Example 18-16. PP3E\Internet\Other\Win\comserver.py

################################################################ # a COM server coded in Python; the _reg_ class attributes # give registry parameters, and others list methods and attrs; # for this to work, you must install Python and the PyWin32 # package, this module file must live on your Python path, # and the server must be registered to COM (see code at end); # run pythoncom.CreateGuid( ) to make your own _reg_clsid_ key; ################################################################ import sys from win32com.server.exception import COMException import win32com.server.util globhellos = 0

# what to raise # server tools

class MyServer: # COM info settings _reg_clsid_ = '{1BA63CC0-7CF8-11D4-98D8-BB74DD3DDE3C}' _reg_desc_ = 'Example Python Server' _reg_progid_ = 'PythonServers.MyServer' # external name _reg_class_spec_ = 'comserver.MyServer' # internal name _public_methods_ = ['Hello', 'Square'] _public_attrs_ = ['version'] # Python methods def _ _init_ _(self): self.version = 1.0 self.hellos = 0 def Square(self, arg): # exported methods return arg ** 2 def Hello(self): # global variables global globhellos # retain state, but globhellos += 1 # self vars don't self.hellos += 1 return 'Hello COM server world [%d, %d]' % (globhellos, self.hellos) # registration functions def Register(pyclass=MyServer): from win32com.server.register import UseCommandLine UseCommandLine(pyclass) def Unregister(classid=MyServer._reg_clsid_): from win32com.server.register import UnregisterServer UnregisterServer(classid) if _ _name_ _ == '_ _main_ _': Register( )

# register server if file run or clicked # unregisters if --unregister cmd-line arg

As usual, this Python file must be placed in a directory in Python's module search path before it can be used by a COM client. Besides the server class itself, the file includes code at the bottom to automatically register and unregister the server to COM when the file is run: To register a server, simply call the UseCommandLine function in the win32com.server.register package and pass in the Python server class. This function uses all the special class attribute settings to make the server known to COM. The file is set to automatically call the registration tools if it is run by itself (e.g., when clicked in a file explorer). To unregister a server, simply pass an unregister argument on the command line when running this file. When run this way, the script automatically calls UseCommandLine again to unregister the server; as its name implies, this function inspects command-line arguments and knows to do the right thing when unregister is passed. You can also unregister servers explicitly with the UnregisterServer call demonstrated near the end of this script, though this is less commonly used. Perhaps the more interesting part of this code, though, is the special class attribute assignments at the start of the Python class. These class annotations can provide server registry settings (the _reg_ attributes), accessibility constraints (the _public_ names), and more. Such attributes are specific to the Python COM framework, and their purpose is to configure the server. For example, the _reg_class_spec_ is simply the Python module and class names separated by a period. If it is set, the resident Python interpreter uses this attribute to import the module and create an instance of the Python class it defines, when accessed by a client.[*] [*]

But note that the _reg_class_spec_ attribute is no longer strictly needed and not specifying it avoids a number of PYTHONPATH issues. Because such settings are prone to change, you should always consult the latest Windows extensions package reference manuals for details on this and other class annotation attributes.

Other attributes may be used to identify the server in the Windows registry. The _reg_clsid_ attribute, for instance, gives a globally unique identifier (GUID) for the server and should vary in every COM server you write. In other words, don't use the value in this script. Instead, do what I did to make this ID, and paste the result returned on your machine into your script:

C:\...>python >>> import pythoncom >>> pythoncom.CreateGuid( ) IID('{1BA63CC0-7CF8-11D4-98D8-BB74DD3DDE3C}')

GUIDs are generated by running a tool shipped with the Windows extensions package; simply import and call the pythoncom.CreateGuid function and insert the returned text in the script. Windows uses the ID stamped into your network card to come up with a complex ID that is likely to be unique across servers and machines. The more symbolic program ID string, _reg_progid_, can be used by clients to name servers too, but it is not as likely to be unique. The rest of the server class is simply pure-Python methods, which implement the exported behavior of the server; that is, things to be called or fetched from clients. Once this Python server is annotated, coded, and registered, it can be used in any COM-aware language. For instance, programs written in Visual Basic, C++, Delphi, and Python may access its public methods and attributes through COM; of course, other Python programs can also simply import this module, but the point of COM is to open up components for even wider reuse.

18.7.3.3.1. Using the Python server from a Python client Let's put this Python COM server to work. The Python script in Example 18-17 tests the server in Example 18-16 in two ways: first by simply importing and calling it directly, and then by employing Python's client-side COM interfaces that were shown earlier to invoke it less directly. When going through COM, the PythonServers.MyServer symbolic program ID we gave the server (by setting the class attribute _reg_progid_) can be used to connect to this server from any languageincluding Python.

Example 18-17. PP3E\Internet\Other\Win\comserver-test.py

################################################################ # test the Python-coded COM server from Python two ways ################################################################ def testViaPython( ): from comserver import MyServer object = MyServer( ) print object.Hello( ) print object.Square(8) print object.version def testViaCom( ): from win32com.client import Dispatch server = Dispatch('PythonServers.MyServer') print server.Hello( ) print server.Square(12) print server.version if _ _name_ _ == '_ _main_ _': testViaPython( ) testViaCom( ) testViaCom( )

# test without COM # use Python class name # works as for any class

# test via client-side COM # use Windows registry name # call public methods # access attributes

# test module, server # COM object retains state

If we've properly configured and registered the Python COM server, we can talk to it by running this Python test script. In the following, we run the server and client files from an MS-DOS console box (though they can usually be run by mouse clicks as well). The first command runs the server file by itself to register the server to COM; the second executes the test script to exercise the server both as an imported module ( testViaPython ) and as a server accessed through COM (testViaCom):

C:\Python24>python d:\PP3E\Internet\Other\Win\comserver.py Registered: PythonServers.MyServer C:\Python24>python d:\PP3E\Internet\Other\Win\comserver-test.py Hello COM server world [1, 1] 64 1.0 Hello COM server world [2, 1]

144 1.0 Hello COM server world [3, 1] 144 1.0 C:\Python24>python d:\PP3E\Internet\Other\Win\comserver.py --unregister Unregistered: PythonServers.MyServer

Notice the two numbers at the end of the Hello output lines: they reflect current values of a global variable and a server instance attribute. Global variables in the server's module retain state as long as the server module is loaded; by contrast, each COM Dispatch (and Python class) call makes a new instance of the server class, and hence new instance attributes. The third command unregisters the server in COM, as a cleanup step. Interestingly, once the server has been unregistered, it's no longer usable, at least not through COM (this output has been truncated to fit here):

C:\Python24>python d:\PP3E\Internet\Other\Win\comserver-test.py Hello COM server world [1, 1] 64 1.0 Traceback (innermost last): File "comserver-test.py", line 21, in ? testViaCom( ) # COM object retains File "comserver-test.py", line 14, in testViaCom server = Dispatch('PythonServers.MyServer') # use Windows register ...more deleted... pywintypes.com_error: (-2147221005, 'Invalid class string', None, None)

18.7.3.3.2. Using the Python server from a Visual Basic client The comserver-test.py script just listed demonstrates how to use a Python COM server from a Python COM client. Once we've created and registered a Python COM server, though, it's available to any language that sports a COM interface. For instance, Visual Basic code can run the Python methods in Example 18-16 just as well. Example 18-18 shows the sort of code we write to access the Python server from Visual Basic. Clients coded in other languages (e.g., Delphi or Visual C++) are analogous, but syntax and instantiation calls vary.

Example 18-18. PP3E\Internet\Other\Win\comserver-test.bas

Sub runpyserver( ) ' use python server from vb client ' alt-f8 in word to start macro editor Set server = CreateObject("PythonServers.MyServer") hello1 = server.hello( ) square = server.square(32) pyattr = server.Version hello2 = server.hello( ) sep = Chr(10) Result = hello1 & sep & square & sep & pyattr & sep & hello2 MsgBox Result End Sub

The real trick, if you're not a Windows developer, is how to run this code. Because Visual Basic is embedded in Microsoft Office products such as Word, one approach is to test this code in the context of those systems. Try this: start Word, and then press Alt and F8 together, and you'll wind up in the Word macro dialog. There, enter a new macro name, and then press Create, and you'll find yourself in a development interface where you can paste and run the VB code just shown. Running the Visual Basic code in this context produces the Word pop-up box in Figure 18-12, showing the results of Visual Basic calls to our Python COM server, initiated by Word. Global variable and instance attribute values at the end of both Hello reply messages are the same this time, because we make only one instance of the Python server class: in Visual Basic, by calling CreateObject, with the program ID of the desired server.

Figure 18-12. Visual Basic client running Python COM server in Word

If your client code runs but generates a COM error, make sure that the PyWin32 package has been installed, that the Python server module file is in a directory on Python's import search path, and that the server file has been run by itself to register the server with COM. If none of that helps, you're probably already beyond the scope of this text. Please see additional Windows programming resources for more details.

18.7.3.3.3. Using the Python server with client-side Active Scripting Another way to kick off the Visual Basic client code is to embed it in a web page and rely on Internet

Explorer to strip it out and launch it for us. In fact, even though Python cannot currently be embedded in a page's HTML directly (see the note in the earlier section "Active Scripting: Client-Side Embedding"), it is OK to embed code which runs a registered Python COM component indirectly. Example 18-19, for instance, embeds similar Visual Basic code in a web page; when this file is opened by Internet Explorer, it runs our Python COM server from Example 18-16 again, producing the page and pop up in Figure 18-13.

Figure 18-13. Internet Explorer running VBScript running Python COM server

Example 18-19. PP3E\Internet\Other\Win\comserver-test.bas

Run Python COM server from VBScript embedded in HTML via IE
<SCRIPT Language=VBScript> Sub runpyserver( ) ' use python server from vb client ' alt-f8 in word to start macro editor Set server = CreateObject("PythonServers.MyServer") hello1 = server.hello( ) square = server.square(9) pyattr = server.Version hello2 = server.hello( ) sep = Chr(10) Result = hello1 & sep & square & sep & pyattr & sep & hello2 MsgBox Result End Sub runpyserver( )

Trace through the calls on your own to see what is happening. An incredible amount of routing is going on herefrom Internet Explorer to Visual Basic to Python and back, and a control flow spanning three systems and HTML and Python filesbut with COM, it simply works. This structure opens the door to arbitrary client-side scripting from web pages with Python. Because Python COM components invoked from web pages must be manually and explicitly registered on the client, it is secure.

18.7.3.4. The bigger COM picture: DCOM So what does writing Python COM servers have to do with the Internet motif of this chapter? After all, Python code embedded in HTML simply plays the role of COM client to Internet Explorer or IIS systems that usually run locally. Besides showing how such systems work their magic, I've presented this topic here because COM, at least in its grander world view, is also about communicating over networks. Although we can't get into details in this text, COM's distributed extensions, DCOM, make it possible to implement Python-coded COM servers to run on machines that are arbitrarily remote from clients. Although largely transparent to clients, COM object calls like those in the preceding client scripts may imply network transfers of arguments and results. In such a configuration, COM may be used as a general client/server implementation model and an alternative to technologies such as Remote Procedure Calls (RPC). For some applications, this distributed object approach may even be a viable alternative to Python's other client- and server-side scripting tools we've studied in this part of the book. Moreover, even when not distributed, COM is an alternative to the lower-level Python/C integration techniques we'll meet later in this book. Once its learning curve is scaled, COM is a straightforward way to integrate arbitrary components and provides a standardized way to script and reuse systems. However, COM also implies a level of dispatch indirection overhead and is a Windows-only solution at this writing. Because of that, it is

generally not as fast or as portable as some of the other client/server and C integration schemes discussed in this book. The relevance of such trade-offs varies per application. As you can probably surmise, there is much more to the Windows scripting story than we cover here. If you are interested in more details, O'Reilly's Python Programming on Win32 (by Mark Hammond and Andy Robinson) provides an in-depth presentation of these and other Windows development topics. Much of the effort that goes into writing scripts embedded in HTML involves using the exposed object model APIs, which are deliberately skipped in this book; see Windows documentation sources for more details.

The IronPython C# Python Compiler Late-breaking news: as this third edition was being developed, Microsoft hired Python developer Jim Huginin (who also created Jython) to work on his IronPython implementation. IronPython is a new, independent Python language implementation, like the Jython system described earlier in this chapter, but it compiles Python scripts for use in the Microsoft C# language environment and .NET Frameworka software component system based on XML that fosters cross-language interoperability. IronPython will also work on the Mono open source version of .NET. As such, it opens the door to other Python web-scripting roles and modes. If successful, this new compiler system promises to be the third complete Python implementation (with Jython and the standard C implementation, and not counting projects such as PyPy) and an exciting development for Python in general. As in the Jython Java-based implementation, IronPython scripts are coded using the standard Python core language presented in this text and are translated to be executed by the underlying C# system. Moreover, .NET interfaces are automatically integrated for use in Python scripts: Python classes may freely use, subclass, and act as .NET components. Also like Jython, this new alternative implementation of Python has a specific target audience and will likely prove to be of most interest to developers concerned with C# and .NET/Mono Framework integration. However, initial performance results suggest that IronPython may compete favorably with standard C Python, since it can leverage all the work done on .NET Just-in-Time (JIT) compilers. Search the Web for up-to-date details on IronPython. An earlier version of .NET integration, Python.NET, has similar goals, but it uses the normal Python runtime engine to allow standard Python to use .NET classes. Be sure to watch Python web forums for more developments on this front.

18.8. Python Server Pages Python Server Pages (PSP) is a server-side templating technology that embeds Python code inside HTML. PSP is a Python-based answer to other server-side embedded scripting approaches. The PSP scripting engine works much like Microsoft's ASP (described earlier) and Sun's Java Server Pages (JSP) specification. At the risk of pushing the acronym tolerance envelope, PSP has also been compared to PHP, a server-side scripting language embedded in HTML. All of these systems, including PSP, embed scripts within HTML and run them on the server to generate portions of the response stream sent back to the browser on the client. Scripts interact with an exposed object model API to get their work done, which gives access to input and output components. PSP is portable to a wide variety of platforms (ASP applications run on Microsoft platforms). PSP uses Python as its scripting languageby all accounts, this is a vastly more appropriate choice for scripting web sites than the Java language used in JSP. Since Python code is embedded under PSP, scripts have access to the large number of Python tools and add-ons from within PSP. We can't cover PSP in detail here; but for a quick look, Example 18-20 illustrates the structure of PSP.

Example 18-20. PP3E\Internet\Other\PSP\hello.psp

$[ # Generate a simple message page with the client's IP address ]$ <TITLE>Hello PSP World $[include banner.psp]$ Hello PSP World
$[ Response.write("Hello from PSP, %s." % (Request.server["REMOTE_ADDR"]) ) ]$

A page like this would be installed on a PSP-aware server machine and referenced by a URL from a browser. PSP uses $[ and ]$ delimiters to enclose Python code embedded in HTML; anything outside these pairs is simply sent to the client browser, and code within these markers is executed. The first code block here is a Python comment (note the # character); the second is an include statement that simply inserts another PSP file's contents.

The third piece of embedded code is more useful. As in Active Scripting technologies, Python code embedded in HTML uses an exposed object API to interact with the execution contextin this case, the Response object is used to write output to the client's browser (much like a print in a CGI script), and Request is used to access HTTP headers for the request. The Request object also has a params dictionary containing GET and POST input parameters, as well as a cookies dictionary holding cookie information stored on the client by a PSP application. Notice that the previous example could just as easily have been implemented with a Python CGI script using a Python print statement, but PSP's full benefit becomes clearer in large pages that embed and execute much more complex Python code to produce a response. Under PSP, Python code is embedded in HTMLessentially the opposite of the CGI examples we met earlier, which embed HTML code in Python. PSP is also similar to the Zope DTML server-side templating language we met earlier; though Zope's embedded tags can do more than run Python code (they can also run acquired objects in the site tree, including other templating code objects). Zope encourages separation of HTML display and Python logic code by making the two distinct objects; in PSP, programmers can achieve similar effects by splitting complex logic off into imported modules. For more details about PSP, visit its web site, currently located at http://www.webwareforpython.org, but search http://www.python.org or Google for other links if this one changes over time.

18.8.1. PSP in Webware and mod_python At the time of this writing, implementations of PSP are also now available as components of the Webware suite of tools, as well as the mod_python Apache extension, both described later in this chapter. In both, the inclusion syntax varies slightly from the original PSP implementation: these systems delimit embedded Python code using the tokens for statement blocks to be executed, and using for expressions that must render as strings and whose values are inserted into the reply stream, as in Example 18-21.

Example 18-21. PP3E\Internet\Other\PSP\webware.psp

The current time is

To use such PSP code, create a standard HTML page that embeds the special PSP tags that your page requires. Save this file with an extension of .psp and place it in a directory that is served by Webware or mod_python. When a request is received for this page, the server will dynamically compile the code to serve requests for that page. The embedded Python code is run on the server when the page is

accessed by a client, to generate parts of the reply. For more information, see PSP, Webware, and mod_python documentation. Although they are largely just variations on a theme, the various PSP implementations diverge in additional ways and are richer than we have space to cover here.

18.9. Rolling Your Own Servers in Python Most of the Internet modules we looked at in the last few chapters deal with client-side interfaces such as FTP and Post Office Protocol (POP), or special server-side protocols such as CGI that hide the underlying server itself. If you want to build servers in Python by hand, you can do so either manually or by using higher-level tools.

18.9.1. Standard Library Socket Servers We explored the sort of code needed to build servers manually in Chapter 13. Python programs typically implement servers either by using raw socket calls with threads, forks, or selects to handle clients in parallel, or by using the standard library SocketServer module. As we learned earlier, this module supports TCP and UDP sockets, in threading and forking flavors; you provide a class method invoked to communicate with clients. Whether clients are handled manually or with Python classes, to serve requests made in terms of higher-level protocols such as FTP, the Network News Transfer Protocol (NNTP), and HTTP, you must listen on the protocol's port and add appropriate code to handle the protocol's message conventions. If you go this route, the client-side protocol modules in Python's standard library can help you understand the message conventions used. You may also be able to uncover protocol server examples in the Demos and Tools directories of the Python source distribution and on the Net at large (search http://www.python.org or do a general web search). See prior chapters for more details on writing socket-based servers. Also see the asyncore module described ahead for an asynchronous server class in the standard library based on the select system call instead of on threads or forks.

18.9.2. Standard Library Web Servers As an even higher-level interface, Python also comes with the standard precoded HTTP web protocol server implementations we met in Chapter 16 and employed in Chapter 17. This support takes the form of three standard modules. BaseHTTPServer implements the server itself; this class is derived from the standard SocketServer.TCPServer class. SimpleHTTPServer and CGIHTTPServer implement standard handlers for incoming HTTP requests; the former handles simple web page file requests, while the latter also runs referenced CGI scripts on the server machine by forking processes. Refer to Example 16-1 for a simple script that uses these modules to implement a web server in Python. Run that script on your server machine to start handling web page requests. This assumes that you have appropriate permissions to run such a script, of course; see the Python library manual for more details on precoded HTTP server and request handler modules. Once you have your server running, you can access it in any web browser or by using either the Python httplib module, which implements the client side of the HTTP protocol, or the Python urllib module, which provides a file-like interface to data fetched from a named URL address (see the urllib examples in Chapters 14, 16, and 17, use a URL of the form "http://..." to access HTTP documents, and use "http://localhost/..." if the server is running on the same machine as the client).

18.9.3. Third-Party Solutions Beyond Python's standard library, the public domain also offers many ways to build servers in Python, including the Twisted system described in Chapter 13 and mentioned in the next section. Open source systems such as Apache provide additional options.

18.10. And Other Cool Stuff The Web and the Internet it runs on are large, dynamic domains, and we haven't done justice to all the available tools they offer to Python programmers. To wrap up, the following is a list of some of the more popular, full-featured, and Python-friendly web tools that are freely available on the Net. This list is incomplete and is prone to change over time too, but by way of introduction, here are some of the things Python people use today:

Medusa, asyncore The Medusa system is an architecture for building long-running, high-performance network servers in Python, and it is used in several mission-critical systems. Beginning in Python 1.5.2, the core of Medusa became standard in Python, in the form of the asyncore and asynchat library modules. These standard modules may be used by themselves to build highperformance network servers, based on an asynchronous, multiplexing, and single-process model. They use an event loop built using the select system call presented in Chapter 13 of this book to provide concurrency without spawning threads or processes, and are well suited to handling short-lived transactions. See the Python library for details. The complete Medusa system (not shipped with Python) also provides precoded HTTP and FTP servers; it is free for noncommercial use, and it requires a license otherwise.

Twisted The Twisted system was introduced in Chapter 13. In short, it is an asynchronous, eventdriven, networking framework written in Python, with support for a large number of network protocols and with precoded implementations of common network servers. See http://twistedmatrix.com/trac or search on Google for details.

Zope We met Zope earlier in this chapter. If you are doing server-side work, be sure to consider the Zope open source web application server. Zope provides a full-featured web framework that implements an object model that is well beyond standard server-side CGI scripting. The Zope world has also developed full-blown servers (e.g., ZServer). See the earlier Zope section in this chapter, and http://www.zope.org.

Other web site frameworks For an alternative to Zope, also see the popular CherryPy, Webware, Quixote, and other systems.

CherryPy Bills itself as a Pythonic object-oriented web development framework, which allows developers to build web applications like any other object-oriented Python program, with little or no

knowledge of the underlying protocols. As such, it yields smaller source code developed in less time.

Webware A suite of Python components for developing object-oriented, web-based applications. The suite uses well-known design patterns and includes features such as a fast application server, servlets, the PSP templating system described earlier in this chapter, an object-relational mapping, and a CGI wrapper.

Quixote Describes itself as a package that supports web application development by Python programmers. In Quixote, the templating language is a small extension of Python itself--the aim is to make web page assembly take maximal advantage of the Python programmer's existing skills.

Django A relatively new arrival on the Python web framework scene and billed as a high-level Python web framework that encourages rapid development and clean, pragmatic design. It includes a dynamic database access API, its own server-side templating language, and more.

TurboGears Also a new arrival in the Python web framework space, this an integrated collection of web development tools: MochiKit (a JavaScript library), Kid (a template system), CherryPy (for web input/output), and SQLObject (for accessing databases as you would normal Python classes).

Plone A Zope-based web site builder, which provides a workflow model (called a content management system) that allows content producers to add their content to a site. By allowing users to add web content, it removes the typical site administrator bottleneck and supports more collaborative sites. Plone is a prepackaged instance of a Zope-based web site, which may be customized both in and with Zope tools. You can find additional web frameworks available for Python in the public domain, and more may appear over time. In fact, this may be something of an embarrassment of richesat this writing, there is no de facto standard web framework in the Python world, though a small set is likely to emerge as frontrunners over time.

Mailman If you are looking for email list support, be sure to explore the GNU mailing list manager, otherwise known as Mailman. Written in Python, Mailman provides a robust, quick, and featurerich email discussion list tool. Mailman allows users to subscribe over the Web, supports webbased administration, and provides mail-to-news gateways and integrated spam prevention (spam of the junk mail variety, that is). At this time, http://www.list.org is the place to find more Mailman details.

Apache For server-side scripting, you may be interested in the highly configurable Apache open source web server. Apache is one of the dominant servers used on the Web today, despite its free nature. Among many other things, it supports running Python server-side scripts in a variety of modes; see the site http://www.apache.org for details on Apache itself.

mod_python We introduced mod_python in Chapter 16, in conjunction with server-side state retention options. This package embeds Python within the Apache open source web server, with a substantial boost in performance and added flexibility. Python code may be executed directly in Apache, eliminating the need for spawning processes. In addition, mod_python supports crosspage session data, access to Apache APIs, its own implementation of the PSP server-side reply templating language described in this chapter, and more. See Chapter 16, as well as the mod_python web site, for more details (search on Google.com for an up-to-date link).

CORBA CORBA is an architecture for distributed programming, in which components communicate across a network, by routing calls through an Object Request Broker (ORB). It is similar in spirit to the distributed flavor of the COM system shown earlier in this chapter, but it is both language and platform neutral. Python support for CORBA is available in the third-party ILU , fnorb, and OmniORB packages.

XML-RPC, SOAP XML-RPC is a technology that provides remote procedural calls to components over networks, by routing requests over the HTTP protocol and shipping data back and forth, packaged as XML text. Python's xmlrpclib handles the client side of this protocol, and its SimpleXMLRPCServer provides tools for the server side. SOAP is a similar but larger system, targeted at the implementation of web servicesreusable software components that run on the Web. The thirdparty SOAPy and PySOAP packages provide Python interfaces for this protocol.

MoinMoin A powerful and popular Wiki system written in Python, which supports flexible web page content that can be changed by its user community. Beyond this list there a dozens of additional Internet-related systems and technologies, but we'll omit further examples and descriptions here in the interest of space. Be sure to watch http://www.python.org for new developments on the server front, as well as late-breaking advances in Python web-scripting techniques in general.

Part V: Tools and Techniques This part of the book presents a collection of additional Python application topics. Most of the tools presented along the way can be used in a wide variety of application domains. You'll find the following chapters here:

Chapter 19, Databases and Persistence This chapter covers commonly used and advanced Python techniques for storing information between program executionsDBM files, object pickling (serialization), object shelves, the ZODB object database, and Python's SQL database interfaces. MySQL is used for the SQL examples, but the API is portable to other systems.

Chapter 20, Data Structures This chapter explores techniques for implementing more advanced data structures in Pythonstacks, sets, binary search trees, graphs, and the like. In Python, these take the form of object implementations.

Chapter 21, Text and Language This chapter addresses Python tools and techniques for parsing text-based informationstring splits and joins, regular expression matching, recursive descent parsing, and more advanced language-based topics. This is the last pure Python part of the book, and it makes heavy use of tools presented earlier in the text, especially the Tkinter GUI library. For instance, a tree browser (PyTree) is used to illustrate various object structures, a form browser (PyForm) helps make database concepts more concrete, and a calculator GUI (PyCalc) serves to demonstrate language processing and code reuse concepts.

Chapter 19. Databases and Persistence Section 19.1. "Give Me an Order of Persistence, but Hold the Pickles" Section 19.2. Persistence Options in Python Section 19.3. DBM Files Section 19.4. Pickled Objects Section 19.5. Shelve Files Section 19.6. The ZODB Object-Oriented Database Section 19.7. SQL Database Interfaces Section 19.8. PyForm: A Persistent Object Viewer

19.1. "Give Me an Order of Persistence, but Hold the Pickles" So far in this book, we've used Python in the system programming, GUI development, and Internet scripting domainsthree of Python's most common applications, and representative of its use as an application programming language at large. In the next three chapters, we're going to take a quick look at other major Python programming topics: persistent data, data structure techniques, and textand language-processing tools. None of these is covered exhaustively (each could easily fill a book alone), but we'll sample Python in action in these domains and highlight their core concepts. If any of these chapters spark your interest, additional resources are readily available in the Python world.

19.2. Persistence Options in Python In this chapter, our focus is on persistent datathe kind that outlives a program that creates it. That's not true by default for objects a script constructs; things like lists, dictionaries, and even class instance objects live in your computer's memory and are lost as soon as the script ends. To make data live longer, we need to do something special. In Python programming, there are today at least six traditional ways to save information in between program executions:

Flat files Storing text and bytes

DBM keyed files Keyed access to strings

Pickled objects Serializing Python objects to files and streams

Shelve files Storing pickled Python objects in DBM keyed files

ZODB object databases Storing Python objects in persistent dictionaries

SQL relational databases Table-based systems that support queries In some sense, Python's interfaces to network-based object transmission protocols such as SOAP, XML-RPC, and CORBA also offer persistence options, but they are beyond the scope of this chapter. Here, our interest is in techniques that allow a program to store its data directly and, usually, on the local machine. Although some database servers may operate on a physically remote machine on a network, this is largely transparent to most of the techniques we'll study here. We studied Python's simple (or "flat") file interfaces in earnest in Chapter 4, and we have been using them ever since. Python provides standard access to both the stdio filesystem (through the built-in open function), as well as lower-level descriptor-based files (with the built-in os module). For simple data storage tasks, these are all that many scripts need. To save for use in a future program run, simply write data out to a newly opened file on your computer and read it back from that file later. As we've seen, for more advanced tasks, Python also supports other file-like interfaces such as pipes, fifos, and sockets.

Since we've already explored flat files, I won't say more about them here. The rest of this chapter introduces the remaining topics on the preceding list. At the end, we'll also meet a GUI program for browsing the contents of things such as shelves and DBM files. Before that, though, we need to learn what manner of beast these are.

19.3. DBM Files Flat files are handy for simple persistence tasks, but they are generally geared toward a sequential processing mode. Although it is possible to jump around to arbitrary locations with seek calls, flat files don't provide much structure to data beyond the notion of bytes and text lines. DBM files, a standard tool in the Python library for database management, improve on that by providing key-based access to stored text strings. They implement a random-access, single-key view on stored data. For instance, information related to objects can be stored in a DBM file using a unique key per object and later can be fetched back directly with the same key. DBM files are implemented by a variety of underlying modules (including one coded in Python), but if you have Python, you have a DBM.

19.3.1. Using DBM Files Although DBM filesystems have to do a bit of work to map chunks of stored data to keys for fast retrieval (technically, they generally use a technique called hashing to store data in files), your scripts don't need to care about the action going on behind the scenes. In fact, DBM is one of the easiest ways to save information in PythonDBM files behave so much like in-memory dictionaries that you may forget you're actually dealing with a file. For instance, given a DBM file object: Indexing by key fetches data from the file. Assigning to an index stores data in the file. DBM file objects also support common dictionary methods such as keys-list fetches and tests and key deletions. The DBM library itself is hidden behind this simple model. Since it is so simple, let's jump right into an interactive example that creates a DBM file and shows how the interface works:

% python >>> import anydbm >>> file = anydbm.open('movie', 'c') >>> file['Batman'] = 'Pow!' >>> file.keys( ) ['Batman'] >>> file['Batman'] 'Pow!'

# get interface: dbm, gdbm, ndbm,.. # make a DBM file called 'movie' # store a string under key 'Batman' # get the file's key directory # fetch value for key 'Batman'

>>> who = ['Robin', 'Cat-woman', 'Joker'] >>> what = ['Bang!', 'Splat!', 'Wham!'] >>> for i in range(len(who)): ... file[who[i]] = what[i] # add 3 more "records" ... >>> file.keys( ) ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>> len(file), file.has_key('Robin'), file['Joker']

(4, 1, 'Wham!') >>> file.close( )

# close sometimes required

Internally, importing anydbm automatically loads whatever DBM interface is available in your Python interpreter, and opening the new DBM file creates one or more external files with names that start with the string 'movie' (more on the details in a moment). But after the import and open, a DBM file is virtually indistinguishable from a dictionary. In effect, the object called file here can be thought of as a dictionary mapped to an external file called movie. Unlike normal dictionaries, though, the contents of file are retained between Python program runs. If we come back later and restart Python, our dictionary is still available. DBM files are like dictionaries that must be opened:

% python >>> import anydbm >>> file = anydbm.open('movie', 'c') >>> file['Batman'] 'Pow!'

# open existing DBM file

>>> file.keys( ) # keys gives an index list ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Robin Bang! Cat-woman Splat! Batman Pow! >>> file['Batman'] = 'Ka-Boom!' >>> del file['Robin'] >>> file.close( )

# change Batman slot # delete the Robin entry # close it after changes

Apart from having to import the interface and open and close the DBM file, Python programs don't have to know anything about DBM itself. DBM modules achieve this integration by overloading the indexing operations and routing them to more primitive library tools. But you'd never know that from looking at this Python codeDBM files look like normal Python dictionaries, stored on external files. Changes made to them are retained indefinitely:

% python >>> import anydbm # open DBM file again >>> file = anydbm.open('movie', 'c') >>> for key in file.keys( ): print key, file[key] ... Joker Wham! Cat-woman Splat! Batman Ka-Boom!

As you can see, this is about as simple as it can be. Table 19-1 lists the most commonly used DBM

file operations. Once such a file is opened, it is processed just as though it were an in-memory Python dictionary. Items are fetched by indexing the file object by key and are stored by assigning to a key.

Table 19-1. DBM file operations Python code

Action Description

import anydbm

Import Get dbm , gdbm, and so on...whatever is installed

file = anydbm.open('filename', 'c')

Open

Create or open an existing DBM file

file['key'] = 'value'

Store

Create or change the entry for key

value = file['key']

Fetch

Load the value for the entry key

count = len(file)

Size

Return the number of entries stored

index = file.keys( )

Index

Fetch the stored keys list

found = file.has_key('key')

Query

See if there's an entry for key

del file['key']

Delete

Remove the entry for key

file.close( )

Close

Manual close, not always needed

Despite the dictionary-like interface, DBM files really do map to one or more external files. For instance, the underlying gdbm interface writes two files, movie.dir and movie.pag, when a GDBM file called movie is made. If your Python was built with a different underlying keyed-file interface, different external files might show up on your computer. Technically, the module anydbm is really an interface to whatever DBM-like filesystem you have available in your Python. When creating a new file, anydbm today tries to load the dbhash, gdbm, and dbm keyed-file interface modules; Pythons without any of these automatically fall back on an allPython implementation called dumbdbm. When opening an already existing DBM file, anydbm tries to determine the system that created it with the whichdb module instead. You normally don't need to care about any of this, though (unless you delete the files your DBM creates). Note that DBM files may or may not need to be explicitly closed, per the last entry in Table 19-1. Some DBM files don't require a close call, but some depend on it to flush changes out to disk. On such systems, your file may be corrupted if you omit the close call. Unfortunately, the default DBM as of the 1.5.2 Windows Python port, dbhash (a.k.a. bsddb), is one of the DBM systems that requires a close call to avoid data loss. As a rule of thumb, always close your DBM files explicitly after making changes and before your program exits, to avoid potential problems. This rule extends by proxy to shelves, which is a topic we'll meet later in this chapter. In Python versions 1.5.2 and later, be sure to also pass a string 'c' as a second argument when calling anydbm.open, to force Python to create the file if it does not yet exist, and to simply open it otherwise. This used to be the default behavior but is no longer. You do not need the 'c' argument when opening shelves discussed aheadthey still use an "open or create" mode by default if passed no open mode argument. Other open mode strings can be passed to anydbm (e.g., n to always create the file and r for read-onlythe new default); see the library reference manuals for more details.

19.4. Pickled Objects Probably the biggest limitation of DBM keyed files is in what they can store: data stored under a key must be a simple text string. If you want to store Python objects in a DBM file, you can sometimes manually convert them to and from strings on writes and reads (e.g., with str and eval calls), but this takes you only so far. For arbitrarily complex Python objects such as class instances and nested data structures, you need something more. Class instance objects, for example, cannot be later recreated from their standard string representations. Custom to-string conversions are error prone and not general. The Python pickle module, a standard part of the Python system, provides the conversion step needed. It converts nearly arbitrary Python in-memory objects to and from a single linear string format, suitable for storing in flat files, shipping across network sockets between trusted sources, and so on. This conversion from object to string is often called serializationarbitrary data structures in memory are mapped to a serial string form. The string representation used for objects is also sometimes referred to as a byte stream, due to its linear format. It retains all the content and references structure of the original in-memory object. When the object is later re-created from its byte string, it will be a new in-memory object identical in structure and value to the original, though located at a different memory address. The re-created object is effectively a copy of the original. Pickling works on almost any Python datatypenumbers, lists, dictionaries, class instances, nested structures, and moreand so is a general way to store data. Because pickles contain native Python objects, there is almost no database API to be found; the objects stored are processed with normal Python syntax when they are later retrieved.

19.4.1. Using Object Pickling Pickling may sound complicated the first time you encounter it, but the good news is that Python hides all the complexity of object-to-string conversion. In fact, the pickle module 's interfaces are incredibly simple to use. For example, to pickle an object into a serialized string, we can either make a pickler and call its methods or use convenience functions in the module to achieve the same effect:

P = pickle.Pickler( file) Make a new pickler for pickling to an open output file object file.

P.dump( object) Write an object onto the pickler's file/stream.

pickle.dump( object, file) Same as the last two calls combined: pickle an object onto an open file.

string = pickle.dumps( object) Return the pickled representation of object as a character string. Unpickling from a serialized string back to the original object is similarboth object and convenience function interfaces are available:

U = pickle.Unpickler( file) Make an unpickler for unpickling from an open input file object file.

object = U.load( ) Read an object from the unpickler's file/stream.

object = pickle.load( file) Same as the last two calls combined: unpickle an object from an open file.

object = pickle.loads( string) Read an object from a character string rather than a file. Pickler and Unpickler are exported classes. In all of the preceding cases, file is either an open file

object or any object that implements the same attributes as file objects: Pickler calls the file's write method with a string argument. Unpickler calls the file's read method with a byte count, and readline without arguments.

Any object that provides these attributes can be passed in to the file parameters. In particular, file can be an instance of a Python class that provides the read/write methods (i.e., the expected file-like interface). This lets you map pickled streams to in-memory objects with classes, for arbitrary use. For instance, the StringIO standard library module discussed in Chapter 3 provides classes that map file calls to and from in-memory strings. This hook also lets you ship Python objects across a network, by providing sockets wrapped to look like files in pickle calls at the sender, and unpickle calls at the receiver (see the sidebar "Making Sockets Look Like Files," in Chapter 13, for more details). In fact, for some, pickling Python objects across a trusted network serves as a simpler alternative to network transport protocols such as SOAP and XML-RPC; provided that Python is on both ends of the communication (pickled objects are represented with a Python-specific format, not with XML text).

19.4.2. Picking in Action In more typical use, to pickle an object to a flat file, we just open the file in write mode and call the dump function:

% python >>> table = {'a': [1, 2, 3], 'b': ['spam', 'eggs'], 'c': {'name':'bob'}} >>> >>> import pickle >>> mydb = open('dbase', 'w') >>> pickle.dump(table, mydb)

Notice the nesting in the object pickled herethe pickler handles arbitrary structures. To unpickle later in another session or program run, simply reopen the file and call load:

% python >>> import pickle >>> mydb = open('dbase', 'r') >>> table = pickle.load(mydb) >>> table {'b': ['spam', 'eggs'], 'a': [1, 2, 3], 'c': {'name': 'bob'}}

The object you get back from unpickling has the same value and reference structure as the original, but it is located at a different address in memory. This is true whether the object is unpickled in the same or a future process. In Python-speak, the unpickled object is == but is not is:

% python >>> import pickle >>> f = open('temp', 'w') >>> x = ['Hello', ('pickle', 'world')] >>> pickle.dump(x, f) >>> f.close( ) >>> >>> f = open('temp', 'r') >>> y = pickle.load(f) >>> y ['Hello', ('pickle', 'world')] >>> >>> x == y, x is y (True, False)

# list with nested tuple # close to flush changes

To make this process simpler still, the module in Example 19-1 wraps pickling and unpickling calls in functions that also open the files where the serialized form of the object is stored.

Example 19-1. PP3E\Dbase\filepickle.py

import pickle def saveDbase(filename, object): file = open(filename, 'w') pickle.dump(object, file) file.close( ) def loadDbase(filename): file = open(filename, 'r') object = pickle.load(file) file.close( ) return object

# pickle to file # any file-like object will do

# unpickle from file # re-creates object in memory

To store and fetch now, simply call these module functions; here they are in action managing a fairly complex structure with multiple references to the same nested objectthe nested list called L at first is stored only once in the file:

C:\...\PP3E\Dbase>python >>> from filepickle import * >>> L = [0] >>> D = {'x':0, 'y':L} >>> table = {'A':L, 'B':D} >>> saveDbase('myfile', table) C:\...\PP3E\Dbase>python >>> from filepickle import * >>> table = loadDbase('myfile') >>> table {'B': {'x': 0, 'y': [0]}, 'A': [0]} >>> table['A'][0] = 1 >>> saveDbase('myfile', table) C:\...\PP3E\Dbase>python >>> from filepickle import * >>> print loadDbase('myfile') {'B': {'x': 0, 'y': [1]}, 'A': [1]}

# L appears twice # serialize to file

# reload/unpickle

# change shared object # rewrite to the file

# both L's updated as expected

Besides built-in types like the lists, tuples, and dictionaries of the examples so far, class instances may also be pickled to file-like objects. This provides a natural way to associate behavior with stored data (class methods process instance attributes) and provides a simple migration path (class changes made in module files are automatically picked up by stored instances). Here's a brief interactive demonstration:

>>> class Rec: def _ _init_ _(self, hours): self.hours = hours def pay(self, rate=50): return self.hours * rate

>>> bob = Rec(40) >>> import pickle >>> pickle.dump(bob, open('bobrec', 'w')) >>> >>> rec = pickle.load(open('bobrec')) >>> rec.hours 40 >>> rec.pay( ) 2000

We'll explore how this works in more detail in conjunction with shelves later in this chapteras we'll see, although the pickle module can be used directly, it is also the underlying translation engine in both shelves and ZODB databases. In fact, Python can pickle just about anything, except for: Compiled code objects; functions and classes record just their names in pickles, to allow for later reimport and automatic acquisition of changes made in module files. Instances of classes that do not follow class importability rules (more on this at the end of the section "Shelve Files," later in this chapter). Instances of some built-in and user-defined types that are coded in C or depend upon transient operating system states (e.g., open file objects cannot be pickled). A PicklingError is raised if an object cannot be pickled.

19.4.3. Pickler Protocols and cPickle In recent Python releases, the pickler introduced the notion of protocolsstorage formats for pickled data. Specify the desired protocol by passing an extra parameter to the pickling calls (but not to unpickling calls: the protocol is automatically determined from the pickled data):

pickle.dump(object, file, protocol)

Pickled data may be created in either text or binary protocols. By default, the storage protocol is text (also known as protocol 0). In text mode, the files used to store pickled objects may be opened in text mode as in the earlier examples, and the pickled data is printable ASCII text, which can be read (it's essentially instructions for a stack machine). The alternative protocols (protocols 1 and 2) store the pickled data in binary format and require that files be opened in binary mode (e.g., rb, wb). Protocol 1 is the original binary format; protocol 2, added in Python 2.3, has improved support for pickling of new-style classes. Binary format is slightly more efficient, but it cannot be inspected. An older option to pickling calls, the bin argument, has been subsumed by using a pickling protocol higher than 0. The pickle module also provides a HIGHEST_PROTOCOL variable that can be passed in to automatically select the maximum value. One note: if you use the default text protocol, make sure you open pickle files in text mode later. On some platforms, opening text data in binary mode may cause unpickling errors due to line-end

formats on Windows:

>>> f = open('temp', 'w') # text mode file on Windows >>> pickle.dump(('ex', 'parrot'), f) # use default text protocol >>> f.close( ) >>> >>> pickle.load(open('temp', 'r')) # OK in text mode ('ex', 'parrot') >>> pickle.load(open('temp', 'rb')) # fails in binary Traceback (most recent call last): File "", line 1, in -toplevelpickle.load(open('temp', 'rb')) ...lines deleted... ValueError: insecure string pickle

One way to sidestep this potential issue is to always use binary mode for your files, even for the text pickle protocol. Since you must open files in binary mode for the binary pickler protocols anyhow (higher than the default 0), this isn't a bad habit to get into:

>>> f = open('temp', 'wb') >>> pickle.dump(('ex', 'parrot'), f) >>> f.close( ) >>> >>> pickle.load(open('temp', 'rb')) ('ex', 'parrot') >>> pickle.load(open('temp', 'r')) ('ex', 'parrot')

# create in binary mode # use text protocol

Refer to Python's library manual for more information on the pickler. Also check out marshal, a module that serializes an object too, but can handle only simple object types. pickle is more general than marshal and is normally preferred. And while you are flipping (or clicking) through that manual, be sure to also see the entries for the cPickle modulea reimplementation of pickle coded in C for faster performance. You can explicitly import cPickle for a substantial speed boost; its chief limitation is that you cannot subclass its versions of Pickle and Unpickle because they are functions, not classes (this is not required by most programs). The pickle and cPickle modules use compatible data formats, so they may be used interchangeably. If it is available in your Python, the shelve module automatically chooses the cPickle module for faster serialization, instead of pickle. I haven't explained shelve yet, but I will now.

19.5. Shelve Files Pickling allows you to store arbitrary objects on files and file-like objects, but it's still a fairly unstructured medium; it doesn't directly support easy access to members of collections of pickled objects. Higher-level structures can be added, but they are not inherent: You can sometimes craft your own higher-level pickle file organizations with the underlying filesystem (e.g., you can store each pickled object in a file whose name uniquely identifies the object), but such an organization is not part of pickling itself and must be manually managed. You can also store arbitrarily large dictionaries in a pickled file and index them by key after they are loaded back into memory, but this will load the entire dictionary all at once when unpickled, not just the entry you are interested in. Shelves provide structure to collections of pickled objects that removes some of these constraints. They are a type of file that stores arbitrary Python objects by key for later retrieval, and they are a standard part of the Python system. Really, they are not much of a new topicshelves are simply a combination of DBM files and object pickling: To store an in-memory object by key, the shelve module first serializes the object to a string with the pickle module, and then it stores that string in a DBM file by key with the anydbm module. To fetch an object back by key, the shelve module first loads the object's serialized string by key from a DBM file with the anydbm module, and then converts it back to the original in-memory object with the pickle module. Because shelve uses pickle internally, it can store any object that pickle can: strings, numbers, lists, dictionaries, cyclic objects, class instances, and more.

19.5.1. Using Shelves In other words, shelve is just a go-between; it serializes and deserializes objects so that they can be placed in DBM files. The net effect is that shelves let you store nearly arbitrary Python objects on a file by key and fetch them back later with the same key. Your scripts never see all of this interfacing, though. Like DBM files, shelves provide an interface that looks like a dictionary that must be opened. In fact, a shelve is simply a persistent dictionary of persistent Python objectsthe shelve dictionary's content is automatically mapped to a file on your computer so that it is retained between program runs. This is quite a trick, but it's simpler to your code than it may sound. To gain access to a shelve, import the module and open your file:

import shelve dbase = shelve.open("mydbase")

Internally, Python opens a DBM file with the name mydbase, or creates it if it does not yet exist. Assigning to a shelve key stores an object:

dbase['key'] = object

Internally, this assignment converts the object to a serialized byte stream and stores it by key on a DBM file. Indexing a shelve fetches a stored object:

value = dbase['key']

Internally, this index operation loads a string by key from a DBM file and unpickles it into an inmemory object that is the same as the object originally stored. Most dictionary operations are supported here too:

len(dbase) dbase.keys( )

# number of items stored # stored item key index

And except for a few fine points, that's really all there is to using a shelve. Shelves are processed with normal Python dictionary syntax, so there is no new database API to learn. Moreover, objects stored and fetched from shelves are normal Python objects; they do not need to be instances of special classes or types to be stored away. That is, Python's persistence system is external to the persistent objects themselves. Table 19-2 summarizes these and other commonly used shelve operations.

Table 19-2. Shelve file operations Python code

Action Description

import shelve

Import Get dbm , gdbm, and so on...whatever is installed

file = shelve.open('filename')

Open

Create or open an existing DBM file

file['key'] = anyvalue

Store

Create or change the entry for key

value = file['key']

Fetch

Load the value for the entry key

count = len(file)

Size

Return the number of entries stored

index = file.keys( )

Index

Fetch the stored keys list

found = file.has_key('key')

Query

See if there's an entry for key

del file['key']

Delete

Remove the entry for key

file.close( )

Close

Manual close, not always needed

Because shelves export a dictionary-like interface too, this table is almost identical to the DBM operation table. Here, though, the module name anydbm is replaced by shelve, open calls do not require a second c argument, and stored values can be nearly arbitrary kinds of objects, not just strings. You still should close shelves explicitly after making changes to be safe, though; shelves use anydbm internally, and some underlying DBMs require closes to avoid data loss or damage.

19.5.2. Storing Built-In Object Types in Shelves Let's run an interactive session to experiment with shelve interfaces. As mentioned, shelves are essentially just persistent dictionaries of objects, which you open and close:

% python >>> import shelve >>> dbase = shelve.open("mydbase") >>> object1 = ['The', 'bright', ('side', 'of'), ['life']] >>> object2 = {'name': 'Brian', 'age': 33, 'motto': object1} >>> dbase['brian'] = object2 >>> dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'} >>> dbase.close( )

Here, we open a shelve and store two fairly complex dictionary and list data structures away permanently by simply assigning them to shelve keys. Because shelve uses pickle internally, almost anything goes herethe trees of nested objects are automatically serialized into strings for storage. To fetch them back, just reopen the shelve and index:

% python >>> import shelve >>> dbase = shelve.open("mydbase") >>> len(dbase) 2

# entries

>>> dbase.keys( ) ['knight', 'brian']

# index

>>> dbase['knight'] {'motto': 'Ni!', 'name': 'Knight'}

# fetch

>>> for row in dbase.keys( ): ... print row, '=>' ... for field in dbase[row].keys( ): ... print ' ', field, '=', dbase[row][field] ... knight => motto = Ni! name = Knight brian => motto = ['The', 'bright', ('side', 'of'), ['life']] age = 33 name = Brian

The nested loops at the end of this session step through nested dictionariesthe outer scans the shelve and the inner scans the objects stored in the shelve. The crucial point to notice is that we're using normal Python syntax, both to store and to fetch these persistent objects, as well as to process them after loading.

19.5.3. Storing Class Instances in Shelves One of the more useful kinds of objects to store in a shelve is a class instance. Because its attributes record state and its inherited methods define behavior, persistent class objects effectively serve the roles of both database records and database-processing programs. We can also use the underlying pickle module to serialize instances to flat files and other file-like objects (e.g., trusted network sockets), but the higher-level shelve module also gives us a convenient keyed-access storage medium. For instance, consider the simple class shown in Example 19-2, which is used to model people.

Example 19-2. PP3E\Dbase\person.py (version 1)

# a person object: fields + behavior class Person: def _ _init_ _(self, name, job, pay=0): self.name = name self.job = job self.pay = pay # real instance data def tax(self): return self.pay * 0.25 # computed on call def info(self): return self.name, self.job, self.pay, self.tax( )

Nothing about this class suggests it will be used for database recordsit can be imported and used independent of external storage. It's easy to use it for a database, though: we can make some persistent objects from this class by simply creating instances as usual, and then storing them by key on an opened shelve:

C:\...\PP3E\Dbase>python >>> from person import Person >>> bob = Person('bob', 'psychologist', 70000) >>> emily = Person('emily', 'teacher', 40000) >>> >>> import shelve >>> dbase = shelve.open('cast') # make new shelve >>> for obj in (bob, emily): # store objects >>> dbase[obj.name] = obj # use name for key >>> dbase.close( ) # need for bsddb

Here we used the instance objects' name attribute as their key in the shelve database. When we come back and fetch these objects in a later Python session or script, they are re-created in memory as they were when they were stored:

C:\...\PP3E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast') >>> >>> dbase.keys( ) ['emily', 'bob'] >>> print dbase['emily'] >>> >>> print dbase['bob'].tax( ) 17500.0

# reopen shelve # both objects are here

# call: bob's tax

Notice that calling Bob's tax method works even though we didn't import the Person class here. Python is smart enough to link this object back to its original class when unpickled, such that all the original methods are available through fetched objects.

19.5.4. Changing Classes of Objects Stored in Shelves Technically, Python reimports a class to re-create its stored instances as they are fetched and unpickled. Here's how this works:

Store When Python pickles a class instance to store it in a shelve, it saves the instance's attributes plus a reference to the instance's class. In effect, pickled class instances in the prior example record the self attributes assigned in the class. Really, Python serializes and stores the instance's _ _dict_ _ attribute dictionary along with enough source file information to be able to locate the class's module later.

Fetch When Python unpickles a class instance fetched from a shelve, it re-creates the instance object in memory by reimporting the class, assigning the saved attribute dictionary to a new empty instance, and linking the instance back to the class. The key point in this is that the class and stored instance data are separate. The class itself is not stored with its instances, but is instead located in the Python source file and reimported later when instances are fetched. The upshot is that by modifying external classes in module files, we can change the way stored objects' data is interpreted and used without actually having to change those stored objects. It's as if the class is a program that processes stored records. To illustrate, suppose the Person class from the previous section was changed to the source code in

Example 19-3.

Example 19-3. PP3E\Dbase\person.py (version 2)

# a person object: fields + behavior # change: the tax method is now a computed attribute class Person: def _ _init_ _(self, name, job, pay=0): self.name = name self.job = job self.pay = pay # real instance data def _ _getattr_ _(self, attr): # on person.attr if attr == 'tax': return self.pay * 0.30 # computed on access else: raise AttributeError # other unknown names def info(self): return self.name, self.job, self.pay, self.tax

This revision has a new tax rate (30 percent), introduces a _ _getattr_ _ qualification overload method, and deletes the original tax method. Tax attribute references are intercepted and computed when accessed:

C:\...\PP3E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast') >>> >>> print dbase.keys( ) ['emily', 'bob'] >>> print dbase['emily'] >>> >>> print dbase['bob'].tax 21000.0

# reopen shelve # both objects are here

# no need to call tax( )

Because the class has changed, tax is now simply qualified, not called. In addition, because the tax rate was changed in the class, Bob pays more this time around. Of course, this example is artificial, but when used well, this separation of classes and persistent instances can eliminate many traditional database update programs. In most cases, you can simply change the class, not each stored instance, for new behavior.

19.5.5. Shelve Constraints Although shelves are generally straightforward to use, there are a few rough edges worth knowing

about.

19.5.5.1. Keys must be strings First, although they can store arbitrary objects, keys must still be strings. The following fails, unless you convert the integer 42 to the string 42 manually first:

dbase[42] = value

# fails, but str(42) will work

This is different from in-memory dictionaries, which allow any immutable object to be used as a key, and derives from the shelve's use of DBM files internally.

19.5.5.2. Objects are unique only within a key Although the shelve module is smart enough to detect multiple occurrences of a nested object and re-create only one copy when fetched, this holds true only within a given slot:

dbase[key] = [object, object]

# OK: only one copy stored and fetched

dbase[key1] = object dbase[key2] = object

# bad?: two copies of object in the shelve

When key1 and key2 are fetched, they reference independent copies of the original shared object; if that object is mutable, changes from one won't be reflected in the other. This really stems from the fact the each key assignment runs an independent pickle operationthe pickler detects repeated objects but only within each pickle call. This may or may not be a concern in your practice, and it can be avoided with extra support logic, but an object can be duplicated if it spans keys.

19.5.5.3. Updates must treat shelves as fetch-modify-store mappings Because objects fetched from a shelve don't know that they came from a shelve, operations that change components of a fetched object change only the in-memory copy, not the data on a shelve:

dbase[key].attr = value

# shelve unchanged

To really change an object stored on a shelve, fetch it into memory, change its parts, and then write it back to the shelve as a whole by key assignment:

object = dbase[key] object.attr = value dbase[key] = object

# fetch it # modify it # store back-shelve changed

19.5.5.4. Concurrent updates are not directly supported The shelve module does not currently support simultaneous updates. Simultaneous readers are OK, but writers must be given exclusive access to the shelve. You can trash a shelve if multiple processes write to it at the same time, which is a common potential in things such as Common Gateway Interface (CGI) server-side scripts. If your shelves may be hit by multiple processes, be sure to wrap updates in calls to the fcntl.flock or os.open built-ins to lock files and provide exclusive access.

19.5.5.5. Underlying DBM format portability With shelves, the files created by an underlying DBM system used to store your persistent objects are not necessarily compatible with all possible DBM implementations or Pythons. For instance, a file generated by gdbm on Linux, or by the BSD library on Windows, may not be readable by a Python with other DBM modules installed. Technically, when a DBM file (or by proxy, a shelve) is created, the anydbm module tries to import all possible DBM system modules in a predefined order and uses the first that it finds. When anydmb later opens an existing file, it attempts to determine which DBM system created it by inspecting the files(s) using the module whichdb. Because the BSD system is tried first at file creation time and is available on both Windows and many Unix-like systems, your DBM file is portable as long as your Pythons support BSD on both platforms. If the system used to create a DBM file is not available on the underlying platform, though, the DBM file cannot be used. If DBM file portability is a concern, make sure that all the Pythons that will read your data use compatible DBM modules. If that is not an option, use the pickle module directly and flat files for storage, or use the ZODB system we'll meet later in this chapter.

19.5.6. Pickled Class Constraints In addition to these shelve constraints, storing class instances in a shelve adds a set of additional rules you need to be aware of. Really, these are imposed by the pickle module, not by shelve, so be sure to follow these if you store class objects with pickle directly too:

Classes must be importable The Python pickler stores instance attributes only when pickling an instance object, and it reimports the class later to re-create the instance. Because of that, the classes of stored objects must be importable when objects are unpickledthey must be coded unnested at the top level of a module file that is accessible on the module import search path at load time (e.g., named in PYTHONPATH or in a .pth file). Further, they must be associated with a real module when instances are pickled, not with a top-level script (with the module name _ _main_ _), unless they will only ever be used in the top-level script. You need to be careful about moving class modules after instances are stored. When an instance is unpickled, Python must find its class's module on the module search using the original module name (including any package path prefixes) and fetch the class from that module using the original class name. If the module or class has been moved or renamed, it

might not be found. In applications where pickled objects are shipped over network sockets, it's possible to deal with this constraint by shipping the text of the class along with stored instances; recipients may simply store the class in a local module file on the import search path prior to unpickling received instances. Where this is inconvenient, simpler objects such as lists and dictionaries with nesting may be transferred instead.

Class changes must be backward compatible Although Python lets you change a class while instances of it are stored on a shelve, those changes must be backward compatible with the objects already stored. For instance, you cannot change the class to expect an attribute not associated with already stored persistent instances unless you first manually update those stored instances or provide extra conversion protocols on the class.

Other pickle module constraints Shelves also inherit the pickling systems' nonclass limitations. As discussed earlier, some types of objects (e.g., open files and sockets) cannot be pickled, and thus cannot be stored in a shelve. In a prior Python release, persistent object classes also had to either use constructors with no arguments or provide defaults for all constructor arguments (much like the notion of a C++ copy constructor). This constraint was dropped as of Python 1.5.2classes with nondefaulted constructor arguments now work fine in the pickling system.[*] [*]

Subtle thing: internally, Python now avoids calling the class to re-create a pickled instance and instead simply makes a class object generically, inserts instance attributes, and sets the instance's _ _class_ _ pointer to the original class directly. This avoids the need for defaults, but it also means that the class _ _init_ _ constructors that are no longer called as objects are unpickled, unless you provide extra methods to force the call. See the library manual for more details, and see the pickle module's source code (pickle.py in the source library) if you're curious about how this works. Better yet, see the formtable module listed ahead in this chapterit does something very similar with _ _class_ _ links to build an instance object from a class and dictionary of attributes, without calling the class's _ _init_ _ constructor. This makes constructor argument defaults unnecessary in classes used for records browsed by PyForm, but it's the same idea.

19.5.7. Other Shelve Limitations Finally, although shelves store objects persistently, they are not really object-oriented database systems. Such systems also implement features such as automatic write-through on changes, transaction commits and rollbacks, safe concurrent updates, and object decomposition and delayed ("lazy") component fetches based on generated object ID. Parts of larger objects are loaded into memory only as they are accessed. It's possible to extend shelves to support such features manually, but you don't need tothe ZODB system provides an implementation of a more complete objectoriented database system. It is constructed on top of Python's built-in pickling persistence support, but it offers additional features for advanced data stores. For more on ZODB, let's move on to the next section.

19.6. The ZODB Object-Oriented Database ZODB, the Zope Object Database, is a full-featured and Python-specific object-oriented database system. ZODB can be thought of as a more powerful alternative to Python's shelves. It allows you to store nearly arbitrary Python objects persistently by key, like shelves, but it adds a set of additional features in exchange for a small amount of extra interface code. ZODB is an open source, third-party add-on for Python. It was originally developed as the database mechanism for web sites developed with the Zope system described in Chapter 18, but it is now available as a standalone package. It's useful outside the context of Zope as a general database management system in any domain. Although ZODB does not support SQL queries, objects stored in ZODB can leverage the full power of the Python language. Moreover, in some applications, stored data is more naturally represented as a structured Python object. Table-based relational systems often must represent such data as parts scattered across multiple tables, associated with keys and joins. Using a ZODB database is very similar to Python's standard library shelves, described in the prior section. Just like shelves, ZODB uses the Python pickling system to implement a persistent dictionary of persistent Python objects. In fact, there is almost no database interface to be foundobjects are made persistent simply by assigning them to keys of the root ZODB dictionary object, or embedding them in objects stored in the database root. And as in a shelve, records take the form of native Python objects, processed with normal Python syntax and tools. Unlike shelves, ZODB adds features critical to some types of programs:

Concurrent updates You don't need to manually lock files to avoid data corruption if there are potentially many concurrent writers, the way you would for shelves.

Transaction commit and rollback If your program crashes, your changes are not retained unless you explicitly commit them to the database.

Automatic updates for some types of in-memory object changes Objects in ZODB derived from a persistence superclass are smart enough to know the database must be updated when an attribute is assigned.

Automatic caching of objects Objects are cached in memory for efficiency and are automatically removed from the cache when they haven't been used.

Platform-independent storage Because ZODB stores your database in a single flat file with large-file support, it is immune to the potential size constraints and DBM filesystem format differences of shelves. As we saw earlier in this chapter, a shelve created on Windows using BSD-DB may not be accessible to a script running with gdbm on Linux. Because of such advantages, ZODB is probably worth your attention if you need to store Python objects in a database persistently, in a production environment. The only significant price you'll pay for using ZODB is a small amount of extra code: Accessing the database requires a small amount of extra boilerplate code to interface with ZODBit's not a simple open call. Classes are derived from a persistence superclass if you want them to take advantage of automatic updates on changespersistent classes are generally not completely independent of the database as in shelves, though they can be. Considering the extra functionality ZODB provides beyond shelves, these trade-offs are usually more than justified for many applications.

19.6.1. A ZODB Tutorial To sample the flavor of ZODB, let's work through a quick interactive tutorial. We'll illustrate common use here, but we won't cover the API exhaustively; as usual, search the Web for more details on ZODB.

19.6.1.1. Installing ZODB The first thing we need to do is install ZODB on top of Python. ZODB is an open source package, but it is not a standard part of Python today; it must be fetched and installed separately. To find ZODB, either run a web search on its name or visit http://www.zope.org. Apart from Python itself, the ZODB package is the only component you must install to use ZODB databases. ZODB is available in both source and self-installer forms. On Windows, ZODB is available as a selfinstalling executable, which installs itself in the standard site-packages subdirectory of the Python standard library (specifically, it installs itself in C:\Python24\site-packages on Windows under Python 2.4). Because that directory is automatically added to your module search path, no path configuration is needed to import ZODB's modules once they are installed. Moreover, much like Python's standard pickle and shelve tools, basic ZODB does not require that a perpetually running server be started in order to access your database. Technically speaking, ZODB itself supports safe concurrent updates among multiple threads, as long as each thread maintains its own connection to the database. ZEO, an additional component that ships with ZODB, supports concurrent updates among multiple processes in a client/server context.

19.6.1.2. The ZEO distributed object server

More generally, ZEO, for Zope Enterprise Objects, adds a distributed object architecture to applications requiring high performance and scalability. To understand how, you have to understand the architecture of ZODB itself. ZODB works by routing object requests to a storage interface object, which in turn handles physical storage tasks. Commonly used storage interface objects allow for file, BerkeleyDB, and even relational database storage media. By delegating physical medium tasks to storage interface objects, ZODB is independent of the underlying storage medium. Essentially, ZEO replaces the standard file-storage interface object used by clients with one that routes requests across a network to a ZEO storage server. The ZEO storage server acts as a frontend to physical storage, synchronizing data access among multiple clients and allowing for more flexible configurations. For instance, this indirection layer allows for distributing load across multiple machines, storage redundancy, and more. Although not every application requires ZEO, it provides advanced enterprise-level support when needed. ZEO itself consists of a TCP/IP socket server and the new storage interface object used by clients. The ZEO server may run on the same or a remote machine. Upon receipt, the server passes requests on to a regular storage interface object of its own, such as simple local file storage. On changes, the ZEO server sends invalidation messages to all connected clients, to update their object caches. Furthermore, ZODB avoids file locking by issuing conflict errors to force retries. As one consequence, ZODB/ZEO-based databases may be more efficient for reads than updates (the common case for web-based applications). Internally, the ZEO server is built with the Python standard library's asyncore module, which implements a socket event loop based on the select system call, much as we did in Chapter 13. In the interest of space, we'll finesse further ZODB and ZEO details here; see other resources for more details on ZEO and ZODB's concurrent updates model. To most programs, ZODB is surprisingly easy to use; let's turn to some real code next.

19.6.1.3. Creating a ZODB database Once you've installed ZODB, its interface takes the form of packages and modules to your code. Let's create a first database to see how this works:

...\PP3E\Database\ZODBscripts\>python >>> from ZODB import FileStorage, DB >>> storage = FileStorage.FileStorage(r'C:\Mark\temp\mydb.fs') >>> db = DB(storage) >>> connection = db.open( ) >>> root = connection.root( )

This is mostly standard code for connecting to a ZODB database: we import its tools, create a FileStorage and a DB from it, and then open the database and create the root object. The root object is the persistent dictionary in which objects are stored. FileStorage is an object that maps the database to a flat file. Other storage interface options, such as relational database-based storage, are also possible. When using the ZEO server configuration discussed earlier, programs import a ClientStorage interface object from the ZEO package instead, but the rest of the code is the same. Now that we have a database, let's add a few objects to it. Almost any Python object will do, including tuples, lists, dictionaries, class instances, and nested combinations thereof. Simply assign your objects to a key in the database root object to make them persistent:

>>> object1 = (1, 'spam', 4, 'YOU') >>> object2 = [[1, 2, 3], [4, 5, 6]] >>> object2.append([7, 8, 9]) >>> object2 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] >>> >>> object3 = {'name': ['Bob', 'Doe'], 'age': 42, 'job': ('dev', 'mgr')} >>> >>> root['mystr'] = 'spam' * 3 >>> root['mytuple'] = object1 >>> root['mylist'] = object2 >>> root['mydict'] = object3 >>> root['mylist'] [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Because ZODB supports transaction rollbacks, you must commit your changes to the database to make them permanent. Ultimately, this transfers the pickled representation of your objects to the underlying file storage mediumhere, three files that include the name of the file we gave when opening:

>>> import transaction >>> transaction.commit( ) >>> storage.close( ) ...\PP3E\Database\ZODBscripts>dir /B c:\mark\temp\mydb* mydb.fs mydb.fs.index mydb.fs.tmp

Without the final commit in this session, none of the changes we made would be saved. This is what we want in generalif a program aborts in the middle of an update task, none of the partially complete work it has done is retained.

19.6.1.4. Fetching and changing OK; we've made a few objects persistent in our ZODB. Pulling them back in another session or program is just as straightforward: reopen the database as before and index the root to fetch objects back into memory. The database root supports dictionary interfacesit may be indexed, has dictionary methods and a length, and so on:

...\PP3E\Database\ZODBscripts\>python >>> from ZODB import FileStorage, DB >>> storage = FileStorage.FileStorage(r'C:\Mark\temp\mydb.fs') >>> db = DB(storage) >>> connection = db.open( ) >>> root = connection.root( ) # connect >>>

>>> print len(root), root.keys( ) # size, index 4 ['mylist', 'mystr', 'mytuple', 'mydict'] >>> >>> print root['mylist'] # fetch objects [[1, 2, 3], [4, 5, 6], [7, 8, 9]] >>> print root['mydict'] {'job': ('dev', 'mgr'), 'age': 42, 'name': ['Bob', 'Doe']} >>> root['mydict']['name'][-1] 'Doe'

# Bob's last name

Because the database root looks just like a dictionary, we can process it with normal dictionary codestepping through the keys list to scan record by record, for instance:

>>> for key in root.keys( ): print key.ljust(10), '=>', root[key] mylist mystr mytuple mydict

=> => => =>

[[1, 2, 3], [4, 5, 6], [7, 8, 9]] spamspamspam (1, 'spam', 4, 'YOU') {'job': ('dev', 'mgr'), 'age': 42, 'name': ['Bob', 'Doe']}

Now, let's change a few of our stored persistent objects. When changing ZODB persistent class instances, in-memory attribute changes are automatically written back to the database. Other types of changes, such as in-place appends and key assignments, still require reassignment to the original key to force the change to be written to disk (built-in list and dictionary objects do not know that they are persistent):

>>> >>> >>> >>> >>> >>> >>> >>>

rec = root['mylist'] rec.append([10, 11, 12]) root['mylist'] = rec rec = root['mydict'] rec['age'] += 1 rec['job'] = None root['mydict'] = rec

# change in memory # write back to db

# change in memory # write back to db

>>> import transaction >>> transaction.commit( ) >>> storage.close( )

As usual, we commit our work before exiting Python or all our changes would be lost. One more interactive session serves to verify that we've updated the database objects; there is no need to commit this time because we aren't making any changes:

...\PP3E\Database\ZODBscripts\>python >>> from ZODB

import FileStorage, DB >>> storage = FileStorage.FileStorage(r'C:\Mark\temp\mydb.fs') >>> db = DB(storage) >>> connection = db.open( ) >>> root = connection.root( ) >>> >>> print root['mylist'] [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]] >>> >>> print root['mydict'] {'job': None, 'age': 43, 'name': ['Bob', 'Doe']} >>> >>> print root['mydict']['age'] 43

We are essentially using Python as an interactive object database query language here; to make use of classes and scripts, let's move on to the next section.

19.6.2. Using Classes with ZODB So far, we've been storing built-in object types such as lists and dictionaries in our ZODB databases. Such objects can handle rich information structures, especially when they are nesteda dictionary with nested lists and dictionaries, for example, can represent complex information. As for shelves, though, class instances open up more possibilitiesthey can also participate in inheritance hierarchies, and they naturally support record processing behavior in the form of class method functions. Classes used with ZODB can either be standalone, as in shelves, or derived from the ZODB Persistent class. The latter scheme provides objects with a set of precoded utility, including the ability to automatically write instance attribute changes out to the database storageno manual reassignment to root keys is required. To see how this works, let's get started by defining the class in Example 19-4: an object that records information about a bookmark in a hypothetical web site application.

Example 19-4. PP3E\Database\ZODBscripts\zodb-class-make.py

############################################################################# # define persistent class, store instances in a ZODB database; # import, call addobjects elsewhere: pickled class cannot be in _ _main_ _ ############################################################################# import time mydbfile = 'data/class.fs' from persistent import Persistent class BookMark(Persistent): def _ _init_ _(self, title, url): self.hits = 0 self.updateBookmark(self, url) def updateBookmark(self, title, url): self.title = title self.url = url self.modtime = time.asctime( ) def connectdb(dbfile): from ZODB import FileStorage, DB storage = FileStorage.FileStorage(dbfile) db = DB(storage) connection = db.open( ) root = connection.root( ) return root, storage

# where database is stored

# inherit ZODB features

# change attrs updates db # no need to reassign to key

# automate connect protocol # caller must still commit

def addobjects( ): root, storage = connectdb(mydbfile) root['ora'] = BookMark('Oreilly', 'http://www.oreilly.com') root['pp3e'] = BookMark('PP3E', 'http://www.rmi.net/~lutz/about-pp.html') import transaction transaction.commit( ) storage.close( )

Notice how this class is no longer standaloneit inherits from a ZODB superclass. In fact, unlike shelve classes, it cannot be tested or used outside the context of a ZODB database. In exchange, updates to instance attributes are automatically written back to the database file. Also note how we've put connection logic in a function for reuse; this avoids repeating login code redundantly, but the caller is still responsible for keeping track of the root and storage objects and for committing changes on exit (we'll see how to hide these details better in the next section). To test, let's make a few database objects interactively:

...\PP3E\Database\ZODBscripts>python >>> from zodb_class_make import addobjects >>> addobjects( ) ...\PP3E\Database\ZODBscripts>dir /B data class.fs class.fs.index class.fs.tmp

We don't generally want to run the creation code in the top level of our process because then those classes would always have to be in the module _ _main_ _ (the name of the top-level file or the interactive prompt) each time the objects are fetched. Recall that this is a constraint of Python's pickling system discussed earlier, which underlies ZODBclasses must be reimported, and hence, located in a file in a directory on the module search path This might work if we load the class name into all our top-level scripts, with from statements, but it can be inconvenient in general. To avoid the issue, define your classes in an imported module file, and not in the main top-level script. To test database updates, Example 19-5 reads back our two stored objects and changes themany change that updates an instance attribute in memory is automatically written through to the database file.

Example 19-5. PP3E\Database\ZODBscripts\zodb-class-read.py

######################################################################## # read, update class instances in db; changing immutables like # lists and dictionaries in-place does not update the db automatically ######################################################################## mydbfile = 'data/class.fs' from zodb_class_make import connectdb root, storage = connectdb(mydbfile) # this updates db: attrs changed in method print 'pp3e url:', root['pp3e'].url print 'pp3e mod:', root['pp3e'].modtime root['pp3e'].updateBookmark('PP3E', 'www.rmi.net/~lutz/about-pp3e.html') # this updates too: attr changed here ora = root['ora'] print 'ora hits:', ora.hits ora.hits += 1 # commit changes made import transaction transaction.commit( ) storage.close( )

Run this script a few times to watch the objects in your database change: the URL and modification time of one is updated, and the hit counter is modified on the other:

...\PP3E\Database\ZODBscripts\>python zodb-class-read.py pp3e url: http://www.rmi.net/~lutz/about-pp.html pp3e mod: Mon Dec 05 09:11:44 2005 ora hits: 0 ...\PP3E\Database\ZODBscripts>python zodb-class-read.py pp3e url: www.rmi.net/~lutz/about-pp3e.html

pp3e mod: Mon Dec 05 09:12:12 2005 ora hits: 1 ...\PP3E\Database\ZODBscripts>python zodb-class-read.py pp3e url: www.rmi.net/~lutz/about-pp3e.html pp3e mod: Mon Dec 05 09:12:24 2005 ora hits: 2

And because these are Python objects, we can always inspect, modify, and add records interactively (be sure to also import the class to make and add a new instance):

...\PP3E\Database\ZODBscripts>c:\python24\python >>> from zodb_class_make import connectdb, mydbfile >>> root, storage = connectdb(mydbfile) >>> len(root) 2 >>> root.keys( ) ['pp3e', 'ora'] >>> root['ora'].hits 3 >>> root['pp3e'].url 'www.rmi.net/~lutz/about-pp3e.html' >>> root['ora'].hits += 1 >>> import transaction >>> transaction.commit( ) >>> storage.close( ) ...\PP3E\Database\ZODBscripts>c:\python24\python >>> from zodb_class_make import connectdb, mydbfile >>> root, storage = connectdb(mydbfile) >>> root['ora'].hits 4

19.6.3. A ZODB People Database As a final ZODB example, let's do something a bit more realistic. If you read the sneak preview in Chapter 2, you'll recall that we used shelves there to record information about people. In this section, we bring that idea back to life, recoded to use ZODB instead. By now, we've written the usual ZODB file storage database connection logic enough times to warrant packaging it as a reusable tool. We used a function to wrap it up in Example 19-4, but we can go a step further with object-oriented programming (OOP). As a first step, let's wrap this up for reuse as a componentthe class in Example 19-6 handles the connection task, automatically logging in on construction and automatically committing changes on close. For convenience, it also embeds the database root object and delegates attribute fetches and index accesses back to the root. The net effect is that this object behaves like an automatically opened and committed database rootit provides the same interface, but adds convenience code for common use cases. You can reuse this class for any file-based ZODB database you wish to process (just pass in your filename), and you have to change only this single copy of the connection logic if it ever has to be updated.

Example 19-6. PP3E\Database\ZODBscripts\zodbtools.py

class FileDB: "automate zodb connect and close protocols" def _ _init_ _(self, filename): from ZODB import FileStorage, DB self.storage = FileStorage.FileStorage(filename) db = DB(self.storage) connection = db.open( ) self.root = connection.root( ) def commit(self): import transaction transaction.commit() # get_transaction( ) deprecated def close(self): self.commit( ) self.storage.close( ) def _ _getitem_ _(self, key): return self.root[key] # map indexing to db root def _ _setitem_ _(self, key, val): self.root[key] = val # map key assignment to root def _ _getattr_ _(self, attr): return getattr(self.root, attr) # keys, items, values

Next, the class in Example 19-7 defines the objects we'll store in our database. They are pickled as usual, but they are written out to a ZODB database, not to a shelve file. Note how this class is no longer standalone, as in our earlier shelve examplesit inherits from the ZODB Persistent class, and thus will automatically notify ZODB of changes when its instance attributes are changed. Also notice the _ _str_ _ operator overloading method here, to give a custom display format for our objects.

Example 19-7. PP3E\Database\ZODBscripts\person.py

####################################################################### # define persistent object classes; this must be in an imported # file on your path, not in _ _main_ _ per Python pickling rules # unless will only ever be used in module _ _main_ _ in the future; # attribute assignments, in class or otherwise, update database; # for mutable object changes, set object's _p_changed to true to # auto update, or manually reassign to database key after changes; ####################################################################### from persistent import Persistent

# new module name in 3.3

class Person(Persistent): def _ _init_ _(self, name, job=None, rate=0): self.name = name self.job = job self.rate = rate def changeRate(self, newrate): self.rate = newrate # auto updates database def calcPay(self, hours=40): return self.rate * hours def _ _str_ _(self): myclass = self._ _class_ _._ _name_ _ format = '' values = (myclass, self.name, self.job, self.rate, self.calcPay( )) return format % values class Engineer(Person): def calcPay(self): return self.rate / 52

# yearly salary

Finally, Example 19-8 tests our Person class, by creating the database and updating objects. As usual for Python's pickling system, we store the class in an imported module, not in this main, top-level script file. Otherwise, it could be reimported by Python only when class instance objects are reloaded, if it is still a part of the module _ _main_ _).

Example 19-8. PP3E\Database\ZODBscripts\person-test.py

############################################################################## # test persistence classes in person.py; this runs as _ _main_ _, so the # classes cannot be defined in this file: class's module must be importable # when obj fetched; can also test from interactive prompt: also is _ _main_ _ ############################################################################## from zodbtools import FileDB from person import Person, Engineer filename = 'people.fs' import sys if len(sys.argv) == 1: db = FileDB(filename) db['bob'] = Person('bob', 'devel', 30) db['sue'] = Person('sue', 'music', 40) tom = Engineer('tom', 'devel', 60000) db['tom'] = tom db.close( ) else: db = FileDB(filename) print db['bob'].name, db.keys( ) print db['sue'] db['sue'].changeRate(db['sue'].rate + 10) tom = db['tom'] print tom tom.changeRate(tom.rate + 5000) tom.name += '.spam' db.close( )

# extended db root # application objects # external storage

# no args: create test records # db is root object # stores in db

# close commits changes # arg: change tom, sue each run

# updates db

# updates db # updates db

When run with no command-line arguments, the test script initialized the database with two class instances: two Persons, and one Engineer . When run with any argument, it updates the existing database records, adding 10 to Sue's pay rate and modifying Tom's rate and name:

...\PP3E\Database\ZODBscripts>python person-test.py ...\PP3E\Database\ZODBscripts>python person-test.py bob ['bob', 'sue', 'tom'] <Engineer: name=tom, job=devel, rate=60000, pay=1153> ...\PP3E\Database\ZODBscripts>python person-test.py bob ['bob', 'sue', 'tom'] <Engineer: name=tom.spam, job=devel, rate=65000, pay=1250> ...PP3E\Database\ZODBscripts>python person-test.py bob ['bob', 'sue', 'tom'] <Engineer: name=tom.spam.spam, job=devel, rate=70000, pay=1346>

Notice how the changeRate method updates Suethere is no need to reassign the updated record back to the original key as we have to do for shelves, because ZODB Persistent class instances are smart enough to write attribute changes to the database automatically on commits. Internally, ZODB's persistent superclasses use normal Python operator overloading to intercept attribute changes and mark the object as changed. However, direct in-place changes to mutable objects (e.g., appending to a built-in list) are not noticed by ZODB and require setting the object's _p_changed, or manual reassignment to the original key, to write changes through. ZODB also provides custom versions of some built-in mutable object types (e.g., PersistentMapping), which write changes through automatically.

19.6.4. ZODB Resources There are additional ZODB concepts and components that we have not covered and do not have space to discuss in detail in this book. For instance, because ZODB stores objects with Python's pickle module, all of that module's constraints discussed earlier in this chapter apply. Moreover, we have not touched on administrative requirements. Because the FileStorage interface works by appending changes to the file, for example, it requires periodically running a utility to pack the database by removing old object revisions. For more about ZODB, search for ZODB and Zope resources on the Web. Here, let's move on to see how Python programs can make use of a very different sort of database interfacerelational databases and SQL.

19.7. SQL Database Interfaces The shelve module and ZODB package of the prior sections are powerful tools. Both allow scripts to throw nearly arbitrary Python objects on a keyed-access file and load them back laterin a single step for shelves and with a small amount of administrative code for ZODB. Especially for applications that record highly structured data, object databases can be convenient and efficientthere is no need to split and later join together the parts of large objects, and stored data is processed with normal Python syntax because it is normal Python objects. Shelves and ZODB aren't relational database systems, though; objects (records) are accessed with a single key, and there is no notion of SQL queries. Shelves, for instance, are essentially databases with a single index and no other query-processing support. Although it's possible to build a multipleindex interface to store data with multiple shelves, it's not a trivial task and requires manually coded extensions. ZODB supports some types of searching beyond shelve (e.g., its cataloging feature), and persistent objects may be traversed with all the power of the Python language. However, neither shelves nor ZODB object-oriented databases provide the full generality of SQL queries. Moreover, especially for data that has a naturally tabular structure, relational databases may sometimes be a better fit. For programs that can benefit from the power of SQL, Python also supports relational database systems. Relational databases are not necessarily mutually exclusive with the object persistence topics we studied earlier in this chapterit is possible, for example, to store the serialized string representation of a Python object produced by pickling in a relational database. ZODB also supports the notion of mapping an object database to a relational storage medium. The databases we'll meet in this section, though, are structured and processed in very different ways: They store data in related tables of columns (rather than in persistent dictionaries of arbitrarily structured persistent Python objects). They support the SQL query language for accessing data and exploiting relationships among it (instead of Python object traversals). For some applications, the end result can be a potent combination. Moreover, some SQL-based database systems provide industrial-strength persistence support. Today, there are freely available interfaces that let Python scripts utilize all common relational database systems, both free and commercial: MySQL, Oracle, Sybase, Informix, InterBase, PostgreSQL (Postgres), SQLite,[*] ODBC, and more. In addition, the Python community has defined a database API specification that works portably with a variety of underlying database packages. Scripts written for this API can be migrated to different database vendor packages, with minimal or no source code changes. [*]

Late-breaking news: Python 2.5 will likely include support for the SQLite relational database system as part of its standard library. For more on the cutting edge, see also the popular SQLObject third-party Object Relational Manage, which grafts an object interface onto your database, with tables as classes, rows as instances, and columns as attributes.

19.7.1. SQL Interface Overview Like ZODB, and unlike the pickle and shelve persistence modules presented earlier, SQL databases are optional extensions that are not part of Python itself. Moreover, you need to know SQL to fully understand their interfaces. Because we don't have space to teach SQL in this text, this section gives a brief overview of the API; please consult other SQL references and the database API resources mentioned in the next section for more details. The good news is that you can access SQL databases from Python, through a straightforward and portable model. The Python database API specification defines an interface for communicating with underlying database systems from Python scripts. Vendor-specific database interfaces for Python may or may not conform to this API completely, but all database extensions for Python seem minor variations on a theme. SQL databases in Python are grounded on a few concepts:

Connection objects Represent a connection to a database, are the interface to rollback and commit operations, and generate cursor objects.

Cursor objects Represent an SQL statement submitted as a string and can be used to step through SQL statement results. Query results of SQL select statements Are returned to scripts as Python sequences of sequences (e.g., a list of tuples), representing database tables of rows. Within these row sequences, column field values are normal Python objects such as strings, integers, and floats (e.g., [('bob',38), ('emily',37)]). Column values may also be special types that encapsulate things such as date and time, and database NULL values are returned as the Python None object. Beyond this, the API defines a standard set of database exception types, special database type object constructors, and informational calls. For instance, to establish a database connection under the Python API-compliant Oracle interface available from Digital Creations, install the extension and Oracle, and then run a line of this form:

connobj = Connect("user/password@system")

The string argument's contents may vary per database and vendor, but they generally contain what you provide to log in to your database system. Once you have a connection object, there a variety of things you can do with it, including:

connobj.close( ) _ _ time) connobj.commit( ) database

close connection now (not at object _ _del commit any pending transactions to the

connobj.rollback( ) transactions connobj.callproce(proc, params) connobj.getSource(proc)

roll database back to start of pending fetch stored procedure's code fetch stored procedure's code

But one of the most useful things to do with a connection object is to generate a cursor object:

cursobj = connobj.cursor( )

return a new cursor object for running SQL

Cursor objects have a set of methods too (e.g., close to close the cursor before its destructor runs), but the most important may be this one:

cursobj.execute(sqlstring [, parameters])

run SQL query or command string

Parameters are passed in as a sequence or mapping of values, and are substituted into the SQL statement string according to the interface module's replacement target conventions. The execute method can be used to run a variety of SQL statement strings: DDL definition statements (e.g., CREATE TABLE) DML modification statements (e.g., UPDATE or INSERT) DQL query statements (e.g., SELECT) After running an SQL statement, the cursor's rowcount attribute gives the number of rows changed (for DML) or fetched (for DQL); execute also returns the number of rows affected or fetched in the most vendor interfaces. For DQL query statements, you must call one of the fetch methods to complete the operation:

tuple = cursobj.fetchone( ) listoftuple = cursobj.fetchmany([size]) listoftuple = cursobj.fetchall( )

fetch next row of a query result fetch next set of rows of query result fetch all remaining rows of the result

And once you've received fetch method results, table information is processed using normal Python sequence operations (e.g., you can step through the tuples in a fetchall result list with a simple for loop). Most Python database interfaces also allow you to provide values to be passed to SQL statement strings, by providing targets and a tuple of parameters. For instance:

query = 'SELECT name, shoesize FROM spam WHERE job = ? AND age = ?' cursobj.execute(query, (value1, value2)) results = cursobj.fetchall( ) for row in results: ...

In this event, the database interface utilizes prepared statements (an optimization and convenience) and correctly passes the parameters to the database regardless of their Python types. The notation used to code targets in the query string may vary in some database interfaces (e.g., :p1 and :p2 rather than the two ?s used by the Oracle interface); in any event, this is not the same as Python's % string formatting operator. Finally, if your database supports stored procedures, you can generally call them with the callproc method or by passing an SQL CALL or EXEC statement string to the execute method. callproc may generate a result table retrieved with a fetch variant, and returns a modified copy of the input sequenceinput parameters are left untouched, and output and input/output parameters are replaced with possibly new values. Additional API features, including support for database blobs, is described in the API's documentation. For now, let's move on to do some real SQL processing in Python.

19.7.2. An SQL Database API Tutorial We don't have space to provide an exhaustive reference for the database API in this book. To sample the flavor of the interface, though, let's step through a few simple examples. We'll use the MySQL database system for this tutorial. Thanks to Python's portable database API, other popular database packages such as PostgreSQL, SQLite, and Oracle are used almost identically, but the initial call to log in to the database will generally require different argument values.

19.7.2.1. The MySQL system With a reported 8 million installations and support for more than 20 platforms, MySQL is by most accounts the most popular open source relational database system today. It is a powerful, fast, and full-featured SQL database system that serves as the storage mechanism for many of the sites you may visit on the Web. MySQL consists of a database server, as well as a collection of clients. Technically, its SQL engine is a multithreaded server, which uses some of the same threaded socket server techniques we met in Chapter 13. It listens for requests on a socket and port, can be run either on a remote machine or on your local computer, and handles clients in parallel threads for efficiency and responsiveness. On Windows, the MySQL server may be run automatically as a Windows service, so it is always available; on Unix-like machines, it runs as a perpetual demon process. In either case, the server can accept requests over a network or simply run on your machine to provide access to locally stored databases. Ultimately, your databases take the form of a set of files, stored in the server's "data" directory and represented as B-tree disk tables. MySQL handles concurrent updates by automatically locking tables when they are written by client conversation threads. The MySQL server is available both as a separate program for use in a client/server networked environment, and as a library that can be linked into standalone applications. Clients can submit queries to the server over a TCP/IP socket on any platform; as usual with sockets, use the machine name "localhost" if the server is running locally. Besides sockets, the database server also supports connections using named pipes on Windows NT-based platforms (NT, 2000, XP, and so on); Unix domain socket files on Unix; shared memory on Windows; as well as ODBC, JDBC, and ADO.NET. For our purposes, the main thing to note is that the standard MySQL interface for Python is compliant with the current version of the database API (2.0). Because of that, most of the code we'll see here will also work unchanged on other database systems, as long as their interfaces also support the portable database API. If you use the PostgreSQL database, for instance, the PyGreSQL open source

Python extension provides DB-API 2.0-compliant interfaces that largely work the same way.

19.7.2.2. Installation Before we can start coding, we need to install both MySQL itself, as well as the Python MySQL interface module. The MySQL system implements a language-neutral database server; the Python interface module maps calls in our script to the database server's interfaces. This is a straightforward process, but here are a few quick notes:

MySQL At this writing, MySQL can be found on the Web at http://dev.mysql.com. It's available in two flavorsa community version, which is open source (under the GNU license), as well as a commercial version, which is not free, but is relatively inexpensive and removes the restrictions of the GNU license (you won't have to make all your source code available, for instance). See the MySQL web site for more on which version may be right for you; the open source package was used for this book. MySQL installs itself today in C:\Program Files\MySQL on Windows. It includes the database server program, command-line tools, and more.

Python MySQL interface The primary DB API-compliant MySQL interface for Python was mysql-python when I wrote this (but run a web search on "Python MySQL" for current information). You may also find links to this package at the http://www.python.org page for the database Special Interest Group (SIG), as well as at the Vaults of Parnassus site. Both mysql-python, as well as MySQL itself, are simple self-installing executables on Windows. On Windows, like most third-party packages, the Python MySQL interface shows up in the Python install tree, in the site-packages subdirectory of the standard library: C:\Python24\Lib\site-packages\MySQLdb. As usual, because this directory is automatically added to the module import search path, no path configuration is required.

19.7.2.3. Getting started Time to write some code; this isn't a book on either MySQL or the SQL language, so we'll defer to other resources for details on the commands we'll be running here (O'Reilly has a suite of books on both topics). In fact, the databases we'll use are tiny, and the commands we'll use are deliberately simple as SQL goesyou'll want to extrapolate from what you see here to the more realistic tasks you face. This section is just a brief look at how to use the Python language in conjunction with an SQL database. The basic SQL interface in Python is very simple, though. In fact, it's hardly object-oriented at allqueries and other database commands are sent as strings of SQL. If you know SQL, you already have most of what you need in Python. The first thing we want to do is open a connection to the database and create a table for storing records:

% python >>> import MySQLdb >>> conn = MySQLdb.connect(host='localhost', user='root', passwd='python')

We start out by importing the Python MySQL interface hereit's a package directory called MySQLdb to our scripts that looks like a simple module. Next we create a connection object, passing in the items our database requires for a loginhere, the name of the machine the server is running on, along with a user and password. This first API tends to vary per database system, because each has unique login requirements. After you've connected, though, the rest of the API is largely the same for any database. As usual, the server name "localhost" means the local computer, but this could also be any TCP/IP server name if we're using a database on a remote machine. You normally would log in with a username created by your database administrator, but we'll use "root" here to keep this simple (the root user is created automatically when MySQL is installed, and I gave it a password of "python" during installation).

19.7.2.4. Making databases and tables Next, make a cursor for submitting SQL statements to the database server:

>>> >>> 1L >>> 0L >>> >>> 0L

curs = conn.cursor( ) curs.execute('create database peopledb') curs.execute('use peopledb') tblcmd = 'create table people (name char(30), job char(10), pay int(4))' curs.execute(tblcmd)

The first command here makes a database called peopledb. We can make more than onefor instance, one for test and one for production, or one per developerbut most SQL statements are relative to a particular database. We can tell the server which database to use with the use SQL statement or by passing in a db keyword argument when calling connect to make the initial connection. In MySQL, we can also qualify table names with their databases in SQL statements (database.table), but it's usually more convenient to select the database and make it implied. Finally, the last command creates the table called "people" within the peopledb database; the name, job, and pay information specifies the columns in this table, as well as their datatypes using a "type(size)" syntaxtwo strings and an integer. Datatypes can be more sophisticated than ours, but we'll ignore such details here (see SQL references).

19.7.2.5. Adding records So far, we've logged in and created a database and table. Now let's start a new Python session and create some records. There are three basic statement-based approaches we can use here: inserting one row at a time, or inserting multiple rows with a single call statement or a Python loop. Here is the simple case:

% python >>> import MySQLdb

>>> conn = MySQLdb.connect(host='localhost', db='peopledb', ... user='root', passwd='python') >>> curs = conn.cursor( ) >>> curs.execute('insert people values (%s, %s, %s)', ('Bob', 'dev', 5000)) 1L >>> curs.rowcount 1L >>> MySQLdb.paramstyle 'format'

Create a cursor object to submit SQL statements to the database server as before. The SQL insert command adds a single row to the table. After an execute call, the cursor's rowcount attribute gives the number of rows produced or affected by the last statement run. This is also available as the return value of an execute call in the MySQL module, but this is not defined in the database API specification (in other words, don't depend on it if you want your database scripts to work on other database systems). Parameters to substitute into the SQL statement string are passed in as a sequence (e.g., list or tuple). Notice the module's paramstylethis tells us what style it uses for substitution targets in the statement string. Here, format means this module accepts string formatting-style targets; %s means a string-based substitution, just as in Python % string expressions. Other database modules might use styles such as qmark (a ? target), or numeric indexes or mapping keys (see the DB API for details). To insert multiple rows with a single statement, use the executemany method and a sequence of row sequences (e.g., a list of lists). This call is like calling execute once for each row sequence in the argument, and in fact may be implemented as such; database interfaces may use database-specific techniques to make this run quicker, though:

>>> curs.executemany('insert people values (%s, %s, %s)', ... [ ('Sue', 'mus', '70000'), ... ('Ann', 'mus', '60000')]) 2L >>> curs.rowcount 2L

We inserted two rows at once in the last statement. It's hardly any more work to achieve the same result by inserting one row at a time with a Python loop:

>>> rows = [['Tom', 'mgr', 100000], ... ['Kim', 'adm', 30000], ...

['pat', 'dev', 90000]]

>>> for row in rows: ... curs.execute('insert people values (%s , %s, %s)', row) ... 1L 1L 1L >>> conn.commit( )

Blending Python and SQL like this starts to open up all sorts of interesting possibilities. Notice the last command; we always need to call the connection's commit method to write our changes out to the database. Otherwise, when the connection is closed, our changes will be lost. In fact, if you quit Python without calling the commit method, none of your inserts will be retained. Technically, the connection object automatically calls its rollback method to back out changes that have not yet been committed, when it is closed (which happens manually when its close method is called or automatically when the connection object is about to be garbage-collected). For database systems that don't support transaction commit and rollback operations, these calls may do nothing.

19.7.2.6. Running queries OK, we've now added six records to our database table. Let's run an SQL query to see how we did:

>>> curs.execute('select * from people') 6L >>> curs.fetchall( ) (('Bob', 'dev', 5000L), ('Sue', 'mus', 70000L), ('Ann', 'mus', 60000L), ('Tom', 'mgr', 100000L), ('Kim', 'adm', 30000L), ('pat', 'dev', 90000L))

Run an SQL select statement with a cursor object to grab all rows and call the cursor's fetchall to retrieve them. They come back to our script as a sequence of sequences. In this module, it's a tuple of tuplesthe outer tuple represents the result table, the nested tuples are that table's rows, and the nested tuple's contents are the column data. Because it's all Python data, once we get the query result, we process it with normal Python code. For example, to make the display a bit more coherent, loop through the query result:

>>> curs.execute('select * from people') 6L >>> for row in curs.fetchall( ): ... print row ... ('Bob', 'dev', 5000L) ('Sue', 'mus', 70000L) ('Ann', 'mus', 60000L) ('Tom', 'mgr', 100000L) ('Kim', 'adm', 30000L) ('pat', 'dev', 90000L)

Tuple unpacking comes in handy in loops here too, to pick out column values as we go; here's a simple formatted display of two of the columns' values:

>>> curs.execute('select * from people') 6L >>> for (name, job, pay) in curs.fetchall( ): ... print name, ':', pay

... Bob Sue Ann Tom Kim pat

: : : : : :

5000 70000 60000 100000 30000 90000

Because the query result is a sequence, we can use Python's powerful sequence tools to process it. For instance, to select just the name column values, we can run a more specific SQL query and get a tuple of tuples:

>>> curs.execute('select name from people') 6L >>> names = curs.fetchall( ) >>> names (('Bob',), ('Sue',), ('Ann',), ('Tom',), ('Kim',), ('pat',))

Or we can use a Python list comprehension to pick out the fields we wantby using Python code, we have more control over the data's content and format:

>>> curs.execute('select * from people') 6L >>> names = [rec[0] for rec in curs.fetchall( )] >>> names ['Bob', 'Sue', 'Ann', 'Tom', 'Kim', 'pat']

The fetchall call we've used so far fetches the entire query result table all at once, as a single sequence (an empty sequence comes back, if the result is empty). That's convenient, but it may be slow enough to block the caller temporarily for large result tables or generate substantial network traffic if the server is running remotely. To avoid such a bottleneck, we can also grab just one row, or a bunch of rows, at a time with fetchone and fetchmany. The fetchone call returns the next result row or a None false value at the end of the table:

>>> curs.execute('select * from people') 6L >>> while True: ... row = curs.fetchone( ) ... if not row: break ... print row ... ('Bob', 'dev', 5000L) ('Sue', 'mus', 70000L) ('Ann', 'mus', 60000L) ('Tom', 'mgr', 100000L) ('Kim', 'adm', 30000L) ('pat', 'dev', 90000L)

The fetchmany call returns a sequence of rows from the result, but not the entire table; you can specify how many rows to grab each time with a parameter or fall back on the setting of the cursor's arraysize attribute. Each call gets at most that many more rows from the result or an empty sequence at the end of the table:

>>> curs.execute('select * from people') 6L >>> while True: ... rows = curs.fetchmany( ) ... if not rows: break ... for row in rows: ... print row ... ('Bob', 'dev', 5000L) ('Sue', 'mus', 70000L) ('Ann', 'mus', 60000L) ('Tom', 'mgr', 100000L) ('Kim', 'adm', 30000L) ('pat', 'dev', 90000L)

# size=N optional argument

For this module at least, the result table is exhausted after a fetchone or fetchmany returns a False, though fetchall continues to return the whole table. The DB API says that fetchall returns "all (remaining) rows," so you may want to call execute anyhow to regenerate results before fetching, for portability:

>>> curs.fetchone( ) >>> curs.fetchmany( ) ( ) >>> curs.fetchall( ) (('Bob', 'dev', 5000L), ('Sue', 'mus', 70000L), ('Ann', 'mus', 60000L), ('Tom', 'mgr', 100000L), ('Kim', 'adm', 30000L), ('pat', 'dev', 90000L))

Naturally, we can do more than fetch an entire table; the full power of the SQL language is at your disposal in Python:

>>> curs.execute('select name, job from people where pay > 60000') 3L >>> curs.fetchall( ) (('Sue', 'mus'), ('Tom', 'mgr'), ('pat', 'dev'))

The last query fetches names and pay fields for people who earn more than $60,000. The next is similar, but passes in the selection value as a parameter and orders the result table:

>>> query = 'select name, job from people where pay >= %s order by name'

>>> curs.execute(query, [60000]) 4L >>> for row in curs.fetchall( ): print row ... ('Ann', 'mus') ('pat', 'dev') ('Sue', 'mus') ('Tom', 'mgr')

19.7.2.7. Running updates Cursor objects also are used to submit SQL update statements to the database serverupdates, deletes, and inserts. We've already seen the insert statement at work. Let's start a new session to perform some other kinds of updates:

% python >>> import MySQLdb >>> conn = MySQLdb.connect(host='localhost', user='root', passwd='python') >>> curs = conn.cursor( ) >>> curs.execute('use peopledb') >>> curs.execute('select * from people') 6L >>> curs.fetchall( ) (('Bob', 'dev', 5000L), ('Sue', 'mus', 70000L), ('Ann', 'mus', 60000L), ('Tom', 'mgr', 100000L), ('Kim', 'adm', 30000L), ('pat', 'dev', 90000L))

The SQL update statement changes recordsthe following changes three records' pay column values to 65000 (Bob, Ann, and Kim), because their pay was no more than $60,000. As usual, the cursor's rowcount gives the number of records changed:

>>> curs.execute('update people set pay=%s where pay >> curs.execute('delete from people where name = %s', ['Bob']) 1L

>>> curs.execute('delete from people where pay >= %s',(90000,)) 2L >>> curs.execute('select * from people') 3L >>> curs.fetchall( ) (('Sue', 'mus', 70000L), ('Ann', 'mus', 65000L), ('Kim', 'adm', 65000L)) >>> conn.commit( )

Finally, remember to commit your changes to the database before exiting Python, assuming you wish to keep them. Without a commit, a connection rollback or close call, as well as the connection's _ _del_ _ deletion method, will back out uncommitted changes. Connection objects are automatically closed if they are still open when they are garbage-collected, which in turn triggers a _ _del_ _ and a rollback; garbage collection happens automatically on program exit, if not sooner.

19.7.3. Building Record Dictionaries Now that we've seen the basics in action, let's move on and apply them to a few large tasks. The SQL API defines query results to be sequences of sequences. One of the more common features that people seem to miss from the API is the ability to get records back as something more structureda dictionary or class instance, for example, with keys or attributes giving column names. Because this is Python, it's easy to code this kind of transformation, and the API already gives us the tools we need.

19.7.3.1. Using table descriptions For example, after a query execute call, the DB API specifies that the cursor's description attribute gives the names and types of the columns in the result table (we're continuing with the database in the state in which we left it in the prior section):

>>> curs.execute('select * from people') 3L >>> curs.description (('name', 254, 3, 30, 30, 0, 1), ('job', 254, 3, 10, 10, 0, 1), ('pay', 3, 5, 4, 4, 0, 1)) >>> curs.fetchall( ) (('Sue', 'mus', 70000L), ('Ann', 'mus', 65000L), ('Kim', 'adm', 65000L))

Formally, the description is a sequence of column-description sequences, each of the following form (see the DB API for more on the meaning of the type code slotit maps to objects at the top level of the MySQLdb module):

(name, type_code, display_size, internal_size, precision, scale, null_ok)

Now, we can use this metadata anytime we want to label the columnsfor instance, in a formatted records display:

>>> colnames = [desc[0] for desc in curs.description] >>> colnames ['name', 'job', 'pay'] >>> for ... ... ... ... name job pay

row in curs.fetchall( ): for name, value in zip(colnames, row): print name, '\t=>', value print => Sue => mus => 70000

name job pay

=> Ann => mus => 65000

name job pay

=> Kim => adm => 65000

Notice how a tab character is used to try to make this output align; a better approach might be to determine the maximum field name length (we'll see how in a later example).

19.7.3.2. Record dictionaries It's a minor extension of our formatted display code to create a dictionary for each record, with field names for keyswe just need to fill in the dictionary as we go:

>>> curs.execute('select * from people') 3L >>> colnames = [desc[0] for desc in curs.description] >>> rowdicts = [] >>> for row in curs.fetchall( ): ... newdict = {} ... for name, val in zip(colnames, row): ... newdict[name] = val ... rowdicts.append(newdict) ... >>> for row in rowdicts: print row ... {'pay': 70000L, 'job': 'mus', 'name': 'Sue'} {'pay': 65000L, 'job': 'mus', 'name': 'Ann'} {'pay': 65000L, 'job': 'adm', 'name': 'Kim'}

Because this is Python, though, there are more powerful ways to build up these record dictionaries.

For instance, the dictionary constructor call accepts the zipped name/value pairs to fill out the dictionaries for us:

>>> curs.execute('select * from people') 3L >>> colnames = [desc[0] for desc in curs.description] >>> rowdicts = [] >>> for row in curs.fetchall( ): ... rowdicts.append( dict(zip(colnames, row)) ) ... >>> rowdicts[0] {'pay': 70000L, 'job': 'mus', 'name': 'Sue'}

And finally, a list comprehension will do the job of collecting the dictionaries into a listnot only is this less to type, but it probably runs quicker than the original version:

>>> curs.execute('select * from people') 3L >>> colnames = [desc[0] for desc in curs.description] >>> rowdicts = [dict(zip(colnames, row)) for row in curs.fetchall( )] >>> rowdicts[0] {'pay': 70000L, 'job': 'mus', 'name': 'Sue'}

One of the things we lose when moving to dictionaries is record field orderif you look back at the raw result of fetchall , you'll notice that record fields are in the name, job, and pay order in which they were stored. Our dictionary's fields come back in the pseudorandom order of Python mappings. As long as we fetch fields by key, this is irrelevant to our script. Tables still maintain their order, and dictionary construction works fine because the description result tuple is in the same order as the fields in row tuples returned by queries. We'll leave the task of translating record tuples into class instances as a suggested exercise, except for the small hint that we can access fields as attributes rather than as keys, by simply creating an empty class instance and assigning to attributes with the Python setattr function. Classes would also provide a natural place to code inheritable tools such as standard display methods.

19.7.3.3. Automating with scripts and modules Up to this point, we've essentially used Python as a command-line SQL clientour queries have been typed and run interactively. All the kinds of code we've run, though, can be used as the basis of database access in script files. Working interactively requires retyping things such as logic calls, which can become tedious. With scripts, we can automate our work. To demonstrate, let's make the last section's prior example into a utility moduleExample 19-9 is a reusable module that knows how to translate the result of a query from row tuples to row dictionaries.

Example 19-9. PP3E\Database\SQLscripts\makedicts.py

############################################################################### # convert list of row tuples to list of row dicts with field name keys # this is not a command-line utility: hardcoded self-test if run ############################################################################### def makedicts(cursor, query, params=( )): cursor.execute(query, params) colnames = [desc[0] for desc in cursor.description] rowdicts = [dict(zip(colnames, row)) for row in cursor.fetchall( )] return rowdicts if _ _name_ _ == '_ _main_ _': # self test import MySQLdb conn = MySQLdb.connect(host='localhost', user='root', passwd='python') cursor = conn.cursor( ) cursor.execute('use peopledb') query = 'select name, pay from people where pay < %s' lowpay = makedicts(cursor, query, [70000]) for rec in lowpay: print rec

As usual, we can run this file from the system command line as a script to invoke its self-test code:

...\PP3E\Database\SQLscripts>makedicts.py {'pay': 65000L, 'name': 'Ann'} {'pay': 65000L, 'name': 'Kim'}

Or we can import it as a module and call its function from another context, like the interactive prompt. Because it is a module, it has become a reusable database tool:

...\PP3E\Database\SQLscripts>python >>> from makedicts import makedicts >>> from MySQLdb import connect >>> conn = connect(host='localhost', user='root',passwd='python', db='peopledb') >>> curs = conn.cursor( ) >>> curs.execute('select * from people') 3L >>> curs.fetchall( ) (('Sue', 'mus', 70000L), ('Ann', 'mus', 65000L), ('Kim', 'adm', 65000L)) >>> rows = makedicts(curs, "select name from people where job = 'mus'") >>> rows [{'name': 'Sue'}, {'name': 'Ann'}]

Our utility handles arbitrarily complex queriesthey are simply passed through the DB API to the database server. The order by clause here sorts the result on the name field:

>>> query = 'select name, pay from people where job = %s order by name' >>> musicians = makedicts(curs, query, ['mus']) >>> for row in musicians: print row ... {'pay': 65000L, 'name': 'Ann'} {'pay': 70000L, 'name': 'Sue'}

19.7.4. Tying the Pieces Together So far, we've learned how to makes databases and tables, insert records into tables, query table contents, and extract column names. For reference, and to show how these techniques are combined, Example 19-10 collects them into a single script.

Example 19-10. PP3E\Database\SQLscripts\testdb.py

from MySQLdb import Connect conn = Connect(host='localhost', user='root', passwd='python') curs = conn.cursor( ) try: curs.execute('drop database testpeopledb') except: pass # did not exist curs.execute('create database testpeopledb') curs.execute('use testpeopledb') curs.execute('create table people (name char(30), job char(10), pay int(4))') curs.execute('insert people values (%s, %s, %s)', ('Bob', 'dev', 50000)) curs.execute('insert people values (%s, %s, %s)', ('Sue', 'dev', 60000)) curs.execute('select * from people') for row in curs.fetchall( ): print row curs.execute('select * from people') colnames = [desc[0] for desc in curs.description] while True: print '-' * 30 row = curs.fetchone( ) if not row: break for (name, value) in zip(colnames, row): print '%s => %s' % (name, value) conn.commit( )

# save inserted records

Refer to prior sections in this tutorial if any of the code in this script is unclear. When run, it creates a

two-record database and lists its content to the standard output stream:

('Bob', 'dev', 50000L) ('Sue', 'dev', 60000L) -----------------------------name => Bob job => dev pay => 50000 -----------------------------name => Sue job => dev pay => 60000 ------------------------------

As is, this example is really just meant to demonstrate the database API. It hardcodes database names, and it re-creates the database from scratch each time. We could turn this code into generally useful tools by refactoring it into reusable parts, as we'll see later in this section. First, though, let's explore techniques for getting data into our databases.

19.7.5. Loading Database Tables from Files One of the nice things about using Python in the database domain is that you can combine the power of the SQL query language, with the power of the Python general-purpose programming language. They naturally compliment each other.

19.7.5.1. Loading with SQL and Python Suppose, for example, that you want to load a database table from a flat file, where each line in the file represents a database row, with individual field values separated by commas. Examples 19-11 and 19-12 list two such datafiles we're going to be using here.

Example 19-11. PP3E\Database\SQLscripts\data.txt

bob,devel,50000 sue,music,60000 ann,devel,40000 tim,admin,30000 kim,devel,60000

Example 19-12. PP3E\Database\SQLscripts\data2.txt

bob,developer,80000 sue,music,90000 ann,manager,80000

Now, MySQL has a handy SQL statement for loading such a table quickly. Its load data statement parses and loads data from a text file, located on either the client or the server machine. In the following, the first command deletes all records in the table, and we're using the fact that Python automatically concatenates adjacent string literals to split the SQL statement over multiple lines:

>>> curs.execute('delete from people') # all records 3L >>> curs.execute( ... "load data local infile 'data.txt' " ... "into table people fields terminated by ','") 5L >>> curs.execute('select * from people') 5L >>> for row in curs.fetchall( ): print row ... ('bob', 'devel', 50000L) ('sue', 'music', 60000L) ('ann', 'devel', 40000L) ('tim', 'admin', 30000L) ('kim', 'devel', 60000L) >>> conn.commit( )

This works as expected. What if you might someday wish to use your script on a system without this SQL statement, though? Perhaps you just need to do something more custom than this MySQL statement allows. Not to worrya small amount of simple Python code can easily accomplish the same result (some irrelevant output lines are omitted here):

>>> curs.execute('delete from people') >>> file = open('data.txt') >>> rows = [line.split(',') for line in file] >>> for rec in rows: ... curs.execute('insert people values (%s, %s, %s)', rec) ... >>> curs.execute('select * from people') >>> for rec in curs.fetchall( ): print rec ... ('bob', 'devel', 50000L) ('sue', 'music', 60000L) ('ann', 'devel', 40000L) ('tim', 'admin', 30000L) ('kim', 'devel', 60000L)

This code makes use of a list comprehension to collect string split results for all lines in the file, and file iterators to step through the file line by line. Its Python loop does the same work as the MySQL load statement, and it will work on more database types. We can get the same result from an executemany DB API call shown earlier as well, but the Python for loop here has the potential to be more general.

19.7.5.2. Python versus SQL In fact, you have the entire Python language at your disposal for processing database results, and a little Python can often duplicate or go beyond SQL syntax. For instance, SQL has special aggregate function syntax for computing things such as sums and averages:

>>> curs.execute("select sum(pay), avg(pay) from people where job = 'devel'") 1L >>> curs.fetchall( ) ((150000.0, 50000.0),)

By shifting the processing to Python, we can sometimes simplify and do more than SQL's syntax allows (albeit sacrificing any query performance optimizations the database may perform). Computing pay sums and averages with Python can be accomplished with a simple loop:

>>> curs.execute("select name, pay from people where job = 'devel'") 3L >>> result = curs.fetchall( ) >>> result (('bob', 50000L), ('ann', 40000L), ('kim', 60000L)) >>> tot = 0 >>> for (name, pay) in result: tot += pay ... >>> print 'total:', tot, 'average:', tot / len(result) total: 150000 average: 50000

Or we can use more advanced tools such as comprehensions and generator expressions to calculate sums, averages, maximums, and the like:

>>> print sum(rec[1] for rec in result) # 2.4 generator expr 150000 >>> print sum(rec[1] for rec in result) / len(result) 50000 >>> print max(rec[1] for rec in result) 60000

The Python approach is more general, but it doesn't buy us much until things become more complex. For example, here are a few more advanced comprehensions that collect the names of people whose pay is above and below the average in the query result set:

>>> avg = sum(rec[1] for rec in result) / len(result) >>> print [rec[0] for rec in result if rec[1] > avg] ['kim'] >>> print [rec[0] for rec in result if rec[1] < avg] ['ann']

We may be able to do some of these kinds of tasks with more advanced SQL techniques such as nested queries, but we eventually reach a complexity threshold where Python's general-purpose nature makes it attractive. For comparison, here is the equivalent SQL:

>>> query = ("select name ... "pay > >>> curs.execute(query) 1L >>> curs.fetchall( ) (('kim',),) >>> >>> query = ("select name ... "pay < >>> curs.execute(query) 1L >>> curs.fetchall( ) (('ann',),)

from people where job = 'devel' and " (select avg(pay) from people where job = 'devel')")

from people where job = 'devel' and " (select avg(pay) from people where job = 'devel')")

This isn't the most complex SQL you're likely to meet, but beyond this point, SQL can become more involved. Moreover, unlike Python, SQL is limited to database-specific tasks by design. Imagine a query that compares a column's values to data fetched off the Web, or from a user in a GUIsimple with Python's Internet and GUI support, but well beyond the scope of a special-purpose language such as SQL. By combining Python and SQL, you get the best of both and can choose which is best suited to your goals. With Python, you also have access to utilities you've already coded: your database tool set is arbitrarily extensible with functions, modules, and classes. To illustrate, here are some of the same operations coded in a more mnemonic fashion with the dictionary-record module we wrote earlier:

>>> from makedicts import makedicts >>> recs = makedicts(curs, "select * from people where job = 'devel'") >>> print len(recs), recs[0] 3 {'pay': 50000L, 'job': 'devel', 'name': 'bob'} >>> print [rec['name'] for rec in recs] ['bob', 'ann', 'kim'] >>> print sum(rec['pay'] for rec in recs) 150000 >>> avg = sum(rec['pay'] for rec in recs) / len(recs) >>> print [rec['name'] for rec in recs if rec['pay'] > avg] ['kim']

>>> print [rec['name'] for rec in recs if rec['pay'] >= avg] ['bob', 'kim']

For more advanced database extensions, see the SQL-related tools available for Python in the thirdparty domain. The Vaults of Parnassus web site, for example, hosts packages that add an OOP flavor to the DB API.

19.7.6. SQL Utility Scripts At this point in our SQL DB API tour, we've started to stretch the interactive prompt to its breaking pointwe wind up retyping the same boilerplate code again every time we start a session and every time we run a test. Moreover, the code we're writing is substantial enough to be reused in other programs. Let's wrap up by transforming our code into reusable scripts that automate tasks and support reuse. To illustrate more of the power of the Python/SQL mix, this section presents a handful of utility scripts that perform common tasksthe sorts of things you'd otherwise have to recode often during development. As an added bonus, most of these files are both command-line utilities and modules of functions that can be imported and called from other programs. Most of the scripts in this section also allow a database name to be passed in on the command line; this allows us to use different databases for different purposes during developmentchanges in one won't impact others.

19.7.6.1. Table load scripts Let's take a quick look at code first, before seeing it in action; feel free to skip ahead to correlate the code here with its behavior. As a first step, Example 19-13 shows a simple way to script-ify the tableloading logic of the prior section.

Example 19-13. PP3E\Database\SQLscripts\loaddb1.py

################################################################################ # load table from comma-delimited text file; equivalent to executing this SQL: # "load data local infile 'data.txt' into table people fields terminated by ','" ################################################################################ import MySQLdb conn = MySQLdb.connect(host='localhost', user='root', passwd='python') curs = conn.cursor( ) curs.execute('use peopledb') file = open('data.txt') rows = [line.split(',') for line in file] for rec in rows: curs.execute('insert people values (%s, %s, %s)', rec) conn.commit( ) conn.close( )

# commit changes now, if db supports transactions # close, _ _del_ _ call rollback if changes not committed yet

As is, Example 19-13 is a top-level script geared toward one particular case. It's hardly any extra work to generalize this into a function that can be imported and used in a variety of scenarios, as in Example 19-14. Notice the way this code uses two list comprehensions to build a string of record values for the insert statement (see its comments for the transforms applied). We could use an executemany call as we did earlier, but we want to be general and avoid hardcoding the fields template. This file also defines a login function to automate the initial connection callsafter retyping this 4command sequence 10 times, it seemed a prime candidate for a function. In addition, this reduces code redundancy; in the future, things like username and host need to be changed in only a single location, as long as the login function is used everywhere. (For an alternative approach to such automation that might encapsulate the connection object, see the class we coded for ZODB connections in the prior section.)

Example 19-14. PP3E\Database\SQLscripts\loaddb.py

############################################################################### # like loaddb1, but insert more than one row at once, and reusable function # command-line usage: loaddb.py dbname? datafile? (tablename is implied) ############################################################################### tablename = 'people'

# generalize me

def login(host='localhost', user='root', passwd='python', db=None): import MySQLdb conn = MySQLdb.connect(host=host, user=user, passwd=passwd) curs = conn.cursor( ) if db: curs.execute('use ' + db) return conn, curs def loaddb(cursor, table, datafile='data.txt', conn=None): file = open(datafile) # x,x,x\nx,x,x\n rows = [line.split(',') for line in file] # [ [x,x,x], [x,x,x] ] rows = [str(tuple(rec)) for rec in rows] # [ "(x,x,x)", "(x,x,x)" ] rows = ', '.join(rows) # "(x,x,x), (x,x,x)" curs.execute('insert ' + table + ' values ' + rows) print curs.rowcount, 'rows loaded' if conn: conn.commit( ) if _ _name_ _ == '_ _main_ _': import sys database, datafile = 'peopledb', 'data.txt' if len(sys.argv) > 1: database = sys.argv[1] if len(sys.argv) > 2: datafile = sys.argv[2] conn, curs = login(db=database) loaddb(curs, tablename, datafile, conn)

19.7.6.2. Table display script Once we load data, we probably will want to display it. Example 19-15 allows us to display results as we goit prints an entire table with either a simple display (which could be parsed by other tools), or a formatted display (generated with the dictionary-record utility we wrote earlier). Notice how it computes the maximum field-name size for alignment with a generator expression; the size is passed in to a string formatting expression by specifying an asterisk (*) for the field size in the format string.

Example 19-15. PP3E\Database\SQLscripts\dumpdb.py

############################################################################### # display table contents as raw tuples, or formatted with field names # command-line usage: dumpdb.py dbname? [-] (dash for formatted display) ############################################################################### def showformat(recs, sept=('-' * 40)): print len(recs), 'records' print sept for rec in recs: maxkey = max(len(key) for key in rec) for key in rec: print '%-*s => %s' % (maxkey, key, rec[key]) print sept

# max key len # or: \t align # -ljust, *len

def dumpdb(cursor, table, format=True): if not format: cursor.execute('select * from ' + table) while True: rec = cursor.fetchone( ) if not rec: break print rec else: from makedicts import makedicts recs = makedicts(cursor, 'select * from ' + table) showformat(recs) if _ _name_ _ == '_ _main_ _': import sys dbname, format = 'peopledb', False cmdargs = sys.argv[1:] if '-' in cmdargs: format = True cmdargs.remove('-') if cmdargs: dbname = cmdargs[0] from loaddb import login conn, curs = login(db=dbname) dumpdb(curs, 'people', format)

# format if '-' in cmdline args # dbname if other cmdline arg

While we're at it, let's code some utility scripts to initialize and erase the database, so we do not have to type these by hand at the interactive prompt again every time we want to start from scratch. Example 19-16 completely deletes and re-creates the database, to reset it to an initial state (we did this manually at the start of the tutorial).

Example 19-16. PP3E\Database\SQLscripts\makedb.py

############################################################################### # physically delete and re-create db files in mysql's data\ directory # usage: makedb.py dbname? (tablename is implied) ############################################################################### import sys dbname = (len(sys.argv) > 1 and sys.argv[1]) or 'peopledb' if raw_input('Are you sure?') not in ('y', 'Y', 'yes'): sys.exit( ) from loaddb import login conn, curs = login(db=None) try: curs.execute('drop database ' + dbname) except: print 'database did not exist' curs.execute('create database ' + dbname) # also: 'drop table tablename' curs.execute('use ' + dbname) curs.execute('create table people (name char(30), job char(10), pay int(4))') conn.commit( ) # this seems optional print 'made', dbname

The clear script in Example 19-17 deletes all rows in the table, instead of dropping and re-creating them entirely. For testing purposes, either approach is usually sufficient.

Example 19-17. PP3E\Database\SQLscripts\cleardb.py

############################################################################### # delete all rows in table, but don't drop the table or database it is in # usage: cleardb.py dbname? (tablename is implied) ############################################################################### import sys if raw_input('Are you sure?') not in ('y', 'Y', 'yes'): sys.exit( ) dbname = 'peopledb' # cleardb.py if len(sys.argv) > 1: dbname = sys.argv[1] # cleardb.py testdb from loaddb import login conn, curs = login(db=dbname) curs.execute('delete from people') conn.commit( ) print curs.rowcount, 'records deleted'

# else rows not really deleted # conn closed by its _ _del_ _

Finally, Example 19-18 provides a command-line tool that runs a query and prints its result table in formatted style. There's not much to this script; because we've automated most of its tasks already, this is largely just a combination of existing tools. Such is the power of code reuse in Python.

Example 19-18. PP3E\Database\SQLscripts\xd5 uerydb.py

############################################################################### # run a query string, display formatted result table # example: querydb.py testdb "select name, job from people where pay > 50000" ############################################################################### import sys database, query = 'peopledb', 'select * from people' if len(sys.argv) > 1: database = sys.argv[1] if len(sys.argv) > 2: query = sys.argv[2] from makedicts import makedicts from dumpdb import showformat from loaddb import login conn, curs = login(db=database) rows = makedicts(curs, query) showformat(rows)

19.7.6.3. Using the scripts Last but not least, here is a log of a session that makes use of these scripts in command-line mode, to illustrate their operation. Most of the files also have functions that can be imported and called from a different program; the scripts simply map command-line arguments to the functions' arguments

when run standalone. The first thing we do is initialize a testing database and load its table from a text file:

...\PP3E\Database\SQLscripts>makedb.py testdb Are you sure?y database did not exist made testdb ...\PP3E\Database\SQLscripts>loaddb.py testdb data2.txt 3 rows loaded

Next, let's check our work with the dump utility (use a - argument to force a formatted display):

...\PP3E\Database\SQLscripts>dumpdb.py testdb ('bob', 'developer', 80000L) ('sue', 'music', 90000L) ('ann', 'manager', 80000L) ...\PP3E\Database\SQLscripts>dumpdb.py testdb 3 records ---------------------------------------pay => 80000 job => developer name => bob ---------------------------------------pay => 90000 job => music name => sue ---------------------------------------pay => 80000 job => manager name => ann ----------------------------------------

The dump script is an exhaustive display; to be more specific about which records to view, use the query script and pass in a query string on the command line (the command line is wrapped here to fit in this book):

...\PP3E\Database\SQLscripts>querydb.py testdb "select name, job from people where pay = 80000" 2 records ---------------------------------------job => developer name => bob ---------------------------------------job => manager name => ann ---------------------------------------...\PP3E\Database\SQLscripts>querydb.py testdb "select * from people where name

= 'sue'" 1 records ---------------------------------------pay => 90000 job => music name => sue ----------------------------------------

Now, let's erase and start again with a new data set file. The clear script erases all records but doesn't reinitialize the database completely:

...\PP3E\Database\SQLscripts>cleardb.py testdb Are you sure?y 3 records deleted ...\PP3E\Database\SQLscripts>dumpdb.py testdb 0 records ---------------------------------------...\PP3E\Database\SQLscripts>loaddb.py testdb data.txt 5 rows loaded ...\PP3E\Database\SQLscripts>dumpdb.py testdb ('bob', 'devel', 50000L) ('sue', 'music', 60000L) ('ann', 'devel', 40000L) ('tim', 'admin', 30000L) ('kim', 'devel', 60000L)

In closing, here are three queries in action on this new data set: they fetch developers' jobs that pay above an amount and record with a given pay sorted on a field. We could run these at the Python interactive prompt, of course, but we're getting a lot of setup and boilerplate code for free here.

...\PP3E\Database\SQLscripts>querydb.py testdb "select name from people where job = 'devel'" 3 records ---------------------------------------name => bob ---------------------------------------name => ann ---------------------------------------name => kim ---------------------------------------...\PP3E\Database\SQLscripts>querydb.py testdb "select job from people where pay >= 60000" 2 records ---------------------------------------job => music ---------------------------------------job => devel

---------------------------------------...\PP3E\Database\SQLscripts>querydb.py testdb "select * from people where pay >= 60000 order by job" 2 records ---------------------------------------pay => 60000 job => devel name => kim ---------------------------------------pay => 60000 job => music name => sue ----------------------------------------

Before we move on, a few caveats are worth noting. The scripts in this section illustrate the benefits of code reuse, accomplish their purpose (which was partly demonstrating the SQL API), and serve as a model for canned database utilities. But they are not as general or powerful as they could be. As is, these scripts allow you to pass in the database name but not much more. For example, we could allow the table name to be passed in on the command line too, support sorting options in the dump script, and so on. Although we could generalize to support more options, at some point we may need to revert to typing SQL commands in a clientpart of the reason SQL is a language is because it must support so much generality. Further extensions to these scripts are left as exercises. Change this code as you like; it's Python, after all.

19.7.7. SQL Resources Although the examples we've seen in this section are simple, their techniques scale up to much more realistic databases and contexts. The web sites we studied in the prior part of the book, for instance, can make use of systems such as MySQL to store page state information as well as long-lived client information. Because MySQL supports both large databases and concurrent updates, it's a natural for web site implementation. There is more to database interfaces than we've seen, but additional API documentation is readily available on the Web. To find the full database API specification, search the Web for "Python Database API" at Google.com (or at a similar site). You'll find the formal API definitionreally just a text file describing PEP number 249 (the Python Enhancement Proposal under which the API was hashed out). Perhaps the best resource for additional information about database extensions today is the home page of the Python database SIG. Go to http://www.python.org, click on the SIGs link near the top, and navigate to the database group's page (or go straight to http://www.python.org/sigs/db-sig, the page's current address at the time of this writing). There, you'll find API documentation (this is where it is officially maintained), links to database-vendor-specific extension modules, and more. While you're at Python.org, be sure to also explore the Gadfly database packagea Python-specific SQL-based database extension, which sports wide portability, socket connections for client/server modes, and more. Gadfly loads data into memory, so it is currently somewhat limited in scope. On the other hand, it is ideal for prototyping database applicationsyou can postpone cutting a check to a vendor until it's time to scale up for deployment. Moreover, Gadfly is suitable by itself for a variety of

applications; not every system needs large data stores, but many can benefit from the power of SQL. And as always, see the Vaults of Parnassus and PyPI web sites for related third-party tools and extensions.

19.8. PyForm: A Persistent Object Viewer Instead of going into additional database interface details that are freely available at Python.org, I'm going to close out this chapter by showing you one way to combine the GUI technology we met earlier in the text with the persistence techniques introduced in this chapter. This section presents PyForm, a Tkinter GUI designed to let you browse and edit tables of records: Tables browsed are shelves, DBM files, in-memory dictionaries, or any other object that looks and feels like a dictionary. Records within tables browsed can be class instances, simple dictionaries, strings, or any other object that can be translated to and from a dictionary. Although this example is about GUIs and persistence, it also illustrates Python design techniques. To keep its implementation both simple and type-independent, the PyForm GUI is coded to expect tables to look like dictionaries of dictionaries. To support a variety of table and record types, PyForm relies on separate wrapper classes to translate tables and records to the expected protocol: At the top table level, the translation is easyshelves, DBM files, and in-memory dictionaries all have the same key-based interface. At the nested record level, the GUI is coded to assume that stored items have a dictionary-like interface too, but classes intercept dictionary operations to make records compatible with the PyForm protocol. Records stored as strings are converted to and from the dictionary objects on fetches and stores; records stored as class instances are translated to and from attribute dictionaries. More specialized translations can be added in new table wrapper classes. The net effect is that PyForm can be used to browse and edit a wide variety of table types, despite its dictionary interface expectations. When PyForm browses shelves and DBM files, table changes made within the GUI are persistentthey are saved in the underlying files. When used to browse a shelve of class instances, PyForm essentially becomes a GUI frontend to a simple object database that is built using standard Python persistence tools.

19.8.1. Processing Shelves with Code Before we get to the GUI, though, let's see why you'd want one in the first place. To experiment with shelves in general, I first coded a canned test datafile. The script in Example 19-19 hardcodes a dictionary used to populate databases ( cast), as well as a class used to populate shelves of class instances ( Actor).

Example 19-19. PP3E\Dbase\testdata.py

# definitions for testing shelves, dbm, and formgui cast = { 'rob': 'buddy': 'sally': 'laura': 'milly': 'mel': 'alan': }

{'name': {'name': {'name': {'name': {'name': {'name': {'name':

('Rob', 'P'), ('Buddy', 'S'), ('Sally', 'R'), ('Laura', 'P'), ('Milly', '?'), ('Mel', 'C'), ('Alan', 'B'),

'job': 'writer', 'spouse': 'Laura'}, 'job': 'writer', 'spouse': 'Pickles'}, 'job': 'writer'}, 'spouse': 'Rob', 'kids':1}, 'spouse': 'Jerry', 'kids':2}, 'job': 'producer'}, 'job': 'comedian'}

class Actor: # unnested file-level class def _ _init_ _(self, name=( ), job=''): # no need for arg defaults, self.name = name # for new pickler or formgui self.job = job def _ _setattr_ _(self, attr, value): # on setattr( ): validate if attr == 'kids' and value > 10: # but set it regardless print 'validation error: kids =', value if attr == 'name' and type(value) != type(( )): print 'validation error: name type =', type(value) self._ _dict_ _[attr] = value # don't trigger _ _setattr_ _

The cast object here is intended to represent a table of records (it's really a dictionary of dictionaries when written out in Python syntax like this). Now, given this test data, it's easy to populate a shelve with cast dictionaries. Simply open a shelve and copy over cast, key for key, as shown in Example 19-20.

Example 19-20. PP3E\Dbase\castinit.py

import shelve from testdata import cast db = shelve.open('data/castfile') for key in cast.keys( ): db[key] = cast[key]

# create a new shelve # store dictionaries in shelve

Once you've done that, it's almost as easy to verify your work with a script that prints the contents of the shelve, as shown in Example 19-21.

Example 19-21. PP3E\Dbase\castdump.py

import shelve db = shelve.open('data/castfile') for key in db.keys( ): print key, db[key]

# reopen shelve # show each key,value

Here are these two scripts in action, populating and displaying a shelve of dictionaries:

...\PP3E\Dbase>python castinit.py ...\PP3E\Dbase>python castdump.py alan {'job': 'comedian', 'name': ('Alan', 'B')} mel {'job': 'producer', 'name': ('Mel', 'C')} buddy {'spouse': 'Pickles', 'job': 'writer', 'name': ('Buddy', 'S')} sally {'job': 'writer', 'name': ('Sally', 'R')} rob {'spouse': 'Laura', 'job': 'writer', 'name': ('Rob', 'P')} milly {'spouse': 'Jerry', 'name': ('Milly', '?'), 'kids': 2} laura {'spouse': 'Rob', 'name': ('Laura', 'P'), 'kids': 1}

So far, so good; but here is where you reach the limitations of manual shelve processing: to modify a shelve you need much more general tools. You could write little Python scripts that each perform very specific updates. Or you might even get by for awhile typing such update commands by hand in the interactive interpreter:

>>> >>> >>> >>> >>>

import shelve db = shelve.open('data/castfile') rec = db['rob'] rec['job'] = 'hacker' db['rob'] = rec

For all but the most trivial databases, though, this will get tedious in a hurryespecially for a system's end users. What you'd really like is a GUI that lets you view and edit shelves arbitrarily, and that can be started up easily from other programs and scripts, as shown in Example 19-22.

Example 19-22. PP3E\Dbase\castview.py

import shelve from TableBrowser.formgui import FormGui db = shelve.open('data/castfile') FormGui(db).mainloop( )

# after initcast # reopen shelve file # browse existing shelve-of-dicts

To make this particular script work, we need to move on to the next section.

19.8.2. Adding a Graphical Interface The path traced in the last section really is what led me to write PyForm, a GUI tool for editing arbitrary tables of records. When those tables are shelves and DBM files, the data PyForm displays is persistent; it lives beyond the GUI's lifetime. Because of that, PyForm can be seen as a simple database browser. We've already met all the GUI interfaces PyForm uses earlier in this book, so I won't go into all of its implementation details here (see the chapters in Part III for background details). Before we see the code at all, though, let's see what it does. Figure 19-1 shows PyForm in action on Windows, browsing a shelve of persistent instance objects, created from the testdata module's Actor class. It looks slightly different but works the same on Linux and Macs.

Figure 19-1. PyForm displaying a shelve of Actor objects

PyForm uses a three-window interface to the table being browsed; all windows are packed for proper window expansion and clipping, as set by the rules we studied earlier in this book. The window in the upper left of Figure 19-1 is the main window, created when PyForm starts; it has buttons for navigating through a table, finding items by key, and updating, creating, and deleting records (more useful when browsing tables that persist between runs). The table (dictionary) key of the record currently displayed shows up in the input field in the middle of this window. The "index" button pops up the listbox window in the upper right, and selecting a record in either window at the top creates the form window at the bottom. The form window is used both to display a record and to edit itif you change field values and press "store," the record is updated. Pressing "new" clears the form for input of new values (fill in the Key=> field and press "store" to save the new record). Field values are typed with Python syntax, so strings are quoted (more on this later). When browsing a table with records that contain different sets of field names, PyForm erases and redraws the form window for new field sets as new records are selected. To avoid seeing the window re-created, use

the same format for all records within a given table.

19.8.3. PyForm GUI Implementation On to the code; the first thing I did when writing PyForm was to code utility functions to hide some of the details of widget creation. By making a few simplifying assumptions (e.g., packing protocol), the module in Example 19-23 helps keep some GUI coding details out of the rest of the PyForm implementation.

Example 19-23. PP3E\Dbase\TableBrowser\guitools.py

# added extras for entry width, calcgui font/color from Tkinter import * def frame(root, side, **extras): widget = Frame(root) widget.pack(side=side, expand=YES, fill=BOTH) if extras: widget.config(**extras) return widget

# or apply(f, ( ), {})

def label(root, side, text, **extras): widget = Label(root, text=text, relief=RIDGE) widget.pack(side=side, expand=YES, fill=BOTH) if extras: widget.config(**extras) return widget def button(root, side, text, command, **extras): widget = Button(root, text=text, command=command) widget.pack(side=side, expand=YES, fill=BOTH) if extras: widget.config(**extras) return widget def entry(root, side, linkvar, **extras): widget = Entry(root, relief=SUNKEN, textvariable=linkvar) widget.pack(side=side, expand=YES, fill=BOTH) if extras: widget.config(**extras) return widget

Armed with this utility module, the file in Example 19-24 implements the rest of the PyForm GUI. It uses the GuiMixin module we wrote in Chapter 11, for simple access to standard pop-up dialogs. It's also coded as a class that can be specialized in subclasses or attached to a larger GUI. I run PyForm as a standalone program. Attaching its FormGui class really attaches its main window only, but it can be used to provide a precoded table browser widget for other GUIs. This file's FormGui class creates the GUI shown in Figure 19-1 and responds to user interaction in all three of the interface's windows. Because we've already covered all the GUI tools that PyForm uses, you should study this module's source code listing for additional implementation details. Notice, though, that this file knows almost nothing about the table being browsed, other than that it looks

and feels like a dictionary of dictionaries. To understand how PyForm supports browsing things such as shelves of class instances, you will need to look elsewhere (or at least wait for the next module).

Example 19-24. PP3E\Dbase\TableBrowser\formgui.py

#!/usr/local/bin/python ############################################################################# # PyForm: a persistent table viewer GUI. Uses guimixin for std dialogs. # Assumes the browsed table has a dictionary-of-dictionary interface, and # relies on table wrapper classes to convert other structures as needed. # Store an initial record with dbinit script to start a dbase from scratch. # Caveat: doesn't do object method calls, shows complex field values poorly. ############################################################################# from Tkinter import * from guitools import frame, label, button, entry from PP3E.Gui.Tools.guimixin import GuiMixin

# Tk widgets # widget builders # common methods

class FormGui(GuiMixin, Frame): def _ _init_ _(self, mapping): # an extended frame Frame._ _init_ _(self) # on default top-level self.pack(expand=YES, fill=BOTH) # all parts expandable self.master.title('PyForm 2.0 - Table browser') self.master.iconname("PyForm") self.makeMainBox( ) self.table = mapping # a dict, dbm, shelve, Table,.. self.index = mapping.keys( ) # list of table keys self.cursor = -1 # current index position self.currslots = [] # current form's (key,text)s self.currform = None # current form window self.listbox = None # index listbox window def makeMainBox(self): frm = frame(self, TOP) frm.config(bd=2) button(frm, LEFT, 'next', self.onNext) # next in list button(frm, LEFT, 'prev', self.onPrev) # backup in list button(frm, LEFT, 'find', self.onFind) # find from key frm = frame(self, TOP) self.keytext = StringVar( ) # current record's key label(frm, LEFT, 'KEY=>') # change before 'find' entry(frm, LEFT, self.keytext) frm = frame(self, TOP) frm.config(bd=2) button(frm, LEFT, 'store', self.onStore) # updated entry data button(frm, LEFT, 'new', self.onNew) # clear fields button(frm, LEFT, 'index', self.onMakeList) # show key list button(frm, LEFT, 'delete', self.onDelete) # show key list button(self, BOTTOM,'quit', self.quit) # from guimixin def onPrev(self): if self.cursor = len(self.index)-1: self.infobox('Advance', "End of table") else: self.cursor += 1 self.display( ) def sameKeys(self, record): keys1 = record.keys( ) keys2 = [x[0] for x in self.currslots] keys1.sort(); keys2.sort( ) return keys1 == keys2

# can we reuse the same form? # or map(lambda x:x[0], list) # keys list order differs # if insertion-order differs

def display(self): key = self.index[self.cursor] # show record at index cursor self.keytext.set(key) # change key in main box record = self.table[key] # in dict, dbm, shelf, class if self.sameKeys(record): self.currform.title('PyForm - Key=' + repr(key)) for (field, text) in self.currslots: text.set(repr(record[field])) # same fields? reuse form else: # repr(x) works like expr 'x' if self.currform: self.currform.destroy( ) # different fields? new = Toplevel( ) # replace current box new.title('PyForm - Key=' + repr(key)) # new resizable window new.iconname("pform") left = frame(new, LEFT) right = frame(new, RIGHT) self.currslots = [] # list of (field, entry) for field in record.keys( ): label(left, TOP, repr(field)) # key,value to strings text = StringVar( ) # we could sort keys here text.set( repr(record[field]) ) entry(right, TOP, text, width=40) self.currslots.append((field, text)) self.currform = new new.protocol('WM_DELETE_WINDOW', lambda:0) # ignore destroy's self.selectlist( ) # update listbox def onStore(self): if not self.currform: return key = self.keytext.get( ) if key in self.index: record = self.table[key] else: record = {} self.index.append(key) if self.listbox: self.listbox.insert(END, key) for (field, text) in self.currslots: try:

# change existing record # not: self.table[key][field]= # create a new record # add to index and listbox # or at len(self.index)-1 # fill out dictionary rec

record[field] = eval(text.get( )) # convert back from string except: self.errorbox('Bad data: "%s" = "%s"' % (field, text.get( ))) record[field] = None self.table[key] = record # add to dict, dbm, shelf,... self.onFind(key) # readback: set cursor,listbox def onNew(self): if not self.currform: return self.keytext.set('?%d' % len(self.index)) for (field, text) in self.currslots: text.set('') self.currform.title('Key: ?')

# clear input form and key # default key unless typed # clear key/fields for entry

def onFind(self, key=None): target = key or self.keytext.get( ) # passed in, or entered try: self.cursor = self.index.index(target) # find label in keys list self.display( ) except: self.infobox('Not found', "Key doesn't exist", 'info') def onDelete(self): if not self.currform or not self.index: return currkey = self.index[self.cursor] del self.table[currkey] # table, index, listbox del self.index[self.cursor:self.cursor+1] # like "list[i:i+1] = []" if self.listbox: self.listbox.delete(self.cursor) # delete from listbox if self.cursor < len(self.index): self.display( ) # show next record if any elif self.cursor > 0: self.cursor = self.cursor-1 # show prior if delete end self.display( ) else: # leave box if delete last self.onNew( ) def onList(self,evnt): if not self.index: return index = self.listbox.curselection( ) label = self.listbox.get(index) self.onFind(label)

# on listbox double-click # fetch selected key text # or use listbox.get(ACTIVE) # and call method here

def onMakeList(self): if self.listbox: return # already up? new = Toplevel( ) # new resizable window new.title("PyForm - Key Index") # select keys from a listbox new.iconname("pindex") frm = frame(new, TOP) scroll = Scrollbar(frm) list = Listbox(frm, bg='white') scroll.config(command=list.yview, relief=SUNKEN) list.config(yscrollcommand=scroll.set, relief=SUNKEN) scroll.pack(side=RIGHT, fill=BOTH) list.pack(side=LEFT, expand=YES, fill=BOTH) # pack last, clip first for key in self.index: # add to list-box

list.insert(END, key) list.config(selectmode=SINGLE, setgrid=1) list.bind('', self.onList) self.listbox = list if self.index and self.cursor >= 0: self.selectlist( ) new.protocol('WM_DELETE_WINDOW', lambda:0)

# or: sort list first # select,resize modes # on double-clicks # highlight position # ignore destroy's

def selectlist(self): # listbox tracks cursor if self.listbox: self.listbox.select_clear(0, self.listbox.size( )) self.listbox.select_set(self.cursor) if _ _name_ _ == '_ _main_ _': from PP3E.Dbase.testdata import cast for k in cast.keys( ): print k, cast[k] FormGui(cast).mainloop( ) for k in cast.keys( ): print k, cast[k]

# self-test code # view in-memory dict-of-dicts # show modified table on exit

The file's self-test code starts up the PyForm GUI to browse the in-memory dictionary of dictionaries called "cast" in the testdata module listed earlier. To start PyForm, you simply make and run the FormGui class object this file defines, passing in the table to be browsed. Here are the messages that show up in stdout after running this file and editing a few entries displayed in the GUI; the dictionary is displayed on GUI startup and exit:

...\PP3E\Dbase\TableBrowser>python formgui.py alan {'job': 'comedian', 'name': ('Alan', 'B')} sally {'job': 'writer', 'name': ('Sally', 'R')} rob {'spouse': 'Laura', 'job': 'writer', 'name': ('Rob', 'P')} mel {'job': 'producer', 'name': ('Mel', 'C')} milly {'spouse': 'Jerry', 'name': ('Milly', '?'), 'kids': 2} buddy {'spouse': 'Pickles', 'job': 'writer', 'name': ('Buddy', 'S')} laura {'spouse': 'Rob', 'name': ('Laura', 'P'), 'kids': 1} alan {'job': 'comedian', 'name': ('Alan', 'B')} jerry {'spouse': 'Milly', 'name': 'Jerry', 'kids': 0} sally {'job': 'writer', 'name': ('Sally', 'R')} rob {'spouse': 'Laura', 'job': 'writer', 'name': ('Rob', 'P')} mel {'job': 'producer', 'name': ('Mel', 'C')} milly {'spouse': 'Jerry', 'name': ('Milly', '?'), 'kids': 2} buddy {'spouse': 'Pickles', 'job': 'writer', 'name': ('Buddy', 'S')} laura {'name': ('Laura', 'P'), 'kids': 3, 'spouse': 'bob'}

The last line represents a change made in the GUI. Since this is an in-memory table, changes made in the GUI are not retained (dictionaries are not persistent by themselves). To see how to use the PyForm GUI on persistent stores such as DBM files and shelves, we need to move on to the next topic.

19.8.4. PyForm Table Wrappers

The following file defines generic classes that "wrap" (interface with) various kinds of tables for use in PyForm. It's what makes PyForm useful for a variety of table types. The prior module was coded to handle GUI chores, and it assumes that tables expose a dictionary-ofdictionaries interface. Conversely, this next module knows nothing about the GUI but provides the translations necessary to browse nondictionary objects in PyForm. In fact, this module doesn't even import Tkinter at allit deals strictly in object protocol conversions and nothing else. Because PyForm's implementation is divided into functionally distinct modules like this, it's easier to focus on each module's task in isolation. Here is the hook between the two modules: for special kinds of tables, PyForm's FormGui is passed an instance of the Table class coded here. The Table class intercepts table index fetch and assignment operations and uses an embedded record wrapper class to convert records to and from dictionary format as needed. For example, because DBM files can store only strings, Table converts real dictionaries to and from their printable string representation on table stores and fetches. For class instances, Table exTRacts the object's _ _dict_ _ attribute dictionary on fetches and copies a dictionary's fields to attributes of a newly generated class instance on stores.[*] The end result is that the GUI thinks the table is all dictionaries, even if it is really something very different here. [*]

Subtle thing revisited: like the new pickle module, PyForm tries to generate a new class instance on store operations by simply setting a generic instance object's _ _class_ _ pointer to the original class; only if this fails does PyForm fall back on calling the class with no arguments (in which case the class must have defaults for any constructor arguments other than self). Assignment to _ _class_ _ can fail in restricted execution mode. See the class InstanceRecord in the source listing for further details.

While you study this module's listing, shown in Example 19-25, notice that there is nothing here about the record formats of any particular database. In fact, there was none in the GUI-related formgui module either. Because neither module cares about the structure of fields used for database records, both can be used to browse arbitrary records.

Example 19-25. PP3E\Dbase\formtable.py

############################################################################# # PyForm table wrapper classes and tests # Because PyForm assumes a dictionary-of-dictionary interface, this module # converts strings and class instance records to and from dicts. PyForm # contains the table mapping--Table is not a PyForm subclass. Note that # some of the wrapper classes may be useful outside PyForm--DmbOfString can # wrap a dbm containing arbitrary datatypes. Run the dbinit scripts to # start a new database from scratch, and run the dbview script to browse # a database other than the one tested here. No longer requires classes to # have defaults in constructor args, and auto picks up record class from the # first one fetched if not passed in to class-record wrapper. Caveat: still # assumes that all instances in a table are instances of the same class. ############################################################################ ############################################################################# # records within tables ############################################################################# class DictionaryRecord: def todict(self, value):

return value def fromdict(self, value): return value

# to dictionary: no need to convert # from dictionary: no need to convert

class StringRecord: def todict(self, value): return eval(value) # convert string to dictionary (or any) def fromdict(self, value): return str(value) # convert dictionary (or any) to string class InstanceRecord: def _ _init_ _(self, Class=None): # need class object to make instances self.Class = Class def todict(self, value): # convert instance to attr dictionary if not self.Class: # get class from obj if not yet known self.Class = value._ _class_ _ return value._ _dict_ _ def fromdict(self, value): # convert attr dictionary to instance try: class Dummy: pass # try what new pickle does instance = Dummy( ) # fails in restricted mode instance._ _class_ _ = self.Class except: # else call class, no args instance = self.Class( ) # init args need defaults for attr in value.keys( ): setattr(instance, attr, value[attr]) # set instance attributes return instance # may run Class._ _setattr_ _ ############################################################################# # table containing records ############################################################################# class Table: def _ _init_ _(self, mapping, converter): self.table = mapping self.record = converter def storeItems(self, items): for key in items.keys( ): self[key] = items[key]

# table object, record converter # wrap arbitrary table mapping # wrap arbitrary record types # initialize from dictionary # do _ _setitem_ _ to xlate, store

def printItems(self): for key in self.keys( ): print key, self[key]

# print wrapped mapping # do self.keys to get table keys # do _ _getitem_ _ to fetch, xlate

def _ _getitem_ _(self, key): rawval = self.table[key] return self.record.todict(rawval)

# on tbl[key] index fetch # fetch from table mapping # translate to dictionary

def _ _setitem_ _(self, key, value): rawval = self.record.fromdict(value) self.table[key] = rawval

# on tbl[key]=val index assign # translate from dictionary # store in table mapping

def _ _delitem_ _(self, key): del self.table[key] def keys(self): return self.table.keys( )

# delete from table mapping

# get table mapping keys index

def close(self): if hasattr(self.table, 'close'): self.table.close( )

# call table close if has one # may need for shelves, dbm

############################################################################# # table/record combinations ############################################################################# import shelve, anydbm def ShelveOfInstance(filename, Class=None): return Table(shelve.open(filename), InstanceRecord(Class)) def ShelveOfDictionary(filename): return Table(shelve.open(filename), DictionaryRecord( )) def ShelveOfString(filename): return Table(shelve.open(filename), StringRecord( )) def DbmOfString(filename): return Table(anydbm.open(filename, 'c'), StringRecord( )) def DictOfInstance(dict, Class=None): return Table(dict, InstanceRecord(Class)) def DictOfDictionary(dict): return Table(dict, DictionaryRecord( )) def DictOfString(filename): return Table(dict, StringRecord( )) ObjectOfInstance = DictOfInstance ObjectOfDictionary = DictOfDictionary ObjectOfString = DictOfString

# other mapping objects # classes that look like dicts

############################################################################# # test common applications ############################################################################# if _ _name_ _ == '_ _main_ _': from sys import argv from formgui import FormGui from PP3E.Dbase.testdata import Actor, cast TestType = TestInit = TestFile = if len(argv) if len(argv) if len(argv)

'shelve' 0 '../data/shelve1' > 1: TestType = argv[1] > 2: TestInit = int(argv[2]) > 3: TestFile = argv[3]

if TestType == 'shelve': print 'shelve-of-instance test' table = ShelveOfInstance(TestFile, Actor) if TestInit: table.storeItems(cast) FormGui(table).mainloop( ) table.close( ) ShelveOfInstance(TestFile).printItems( )

# get dict-based GUI # get class, dict-of-dicts # shelve, dbm, dict # init file on startup? # external filename

# Python formtbl.py shelve? # wrap shelf in Table object # Python formtbl.py shelve 1

# class picked up on fetch

elif TestType == 'dbm': print 'dbm-of-dictstring test' table = DbmOfString(TestFile) if TestInit: table.storeItems(cast) FormGui(table).mainloop( ) table.close( ) DbmOfString(TestFile).printItems( )

# Python formtbl.py dbm # wrap dbm in Table object # Python formtbl.py dbm 1

# dump new table contents

Besides the Table and record-wrapper classes, the module defines generator functions (e.g., ShelveOfInstance) that create a Table for all reasonable table and record combinations. Not all combinations are validDBM files, for example, can contain only dictionaries coded as strings because class instances don't easily map to the string value format expected by DBM. However, these classes are flexible enough to allow additional Table configurations to be introduced. The only thing that is GUI related about this file at all is its self-test code at the end. When run as a script, this module starts a PyForm GUI to browse and edit either a shelve of persistent Actor class instances or a DBM file of dictionaries, by passing in the right kind of Table object. The GUI looks like the one we saw in Figure 19-1 earlier; when run without arguments, the self-test code lets you browse a shelve of class instances:

...\PP3E\Dbase\TableBrowser>python formtable.py shelve-of-instance test ...display of contents on exit...

Because PyForm displays a shelve this time, any changes you make are retained after the GUI exits. To reinitialize the shelve from the cast dictionary in testdata , pass a second argument of 1 (0 means don't reinitialize the shelve). To override the script's default shelve filename, pass a different name as a third argument:

...\PP3E\Dbase\TableBrowser>python formtable.py shelve 1 ...\PP3E\Dbase\TableBrowser>python formtable.py shelve 0 ../data/shelve1

To instead test PyForm on a DBM file of dictionaries mapped to strings, pass a dbm in the first command-line argument; the next two arguments work the same:

...\PP3E\Dbase\TableBrowser>python formtable.py dbm 1 ..\data\dbm1 dbm-of-dictstring test ...display of contents on exit...

Finally, because these self-tests ultimately process concrete shelve and DBM files, you can manually open and inspect their contents using normal library calls. Here is what they look like when opened in an interactive session:

...\PP3E\Dbase\data>ls dbm1 myfile

shelve1

...\PP3E\Dbase\data>python >>> import shelve >>> db = shelve.open('shelve1') >>> db.keys( ) ['alan', 'buddy', 'sally', 'rob', 'milly', 'laura', 'mel'] >>> db['laura'] >>> import anydbm >>> db = anydbm.open('dbm1') >>> db.keys( ) ['alan', 'mel', 'buddy', 'sally', 'rob', 'milly', 'laura'] >>> db['laura'] "{'name': ('Laura', 'P'), 'kids': 2, 'spouse': 'Rob'}"

The shelve file contains real Actor class instance objects, and the DBM file holds dictionaries converted to strings. Both formats are retained in these files between GUI runs and are converted back to dictionaries for later redisplay.[*] [*]

Note that DBM files of dictionaries use str and eval to convert to and from strings, but could also simply store the pickled representations of record dictionaries in DBM files instead using pickle. But since this is exactly what a shelve of dictionaries does, the str/eval scheme was chosen for illustration purposes here. Suggested exercise: add a new PickleRecord record class based upon the pickle module's loads and dumps functions described earlier in this chapter and compare its performance to StringRecord. See also the pickle file database structure in Chapter 14; its directory scheme with one flat-file per record could be used to implement a "table" here too, with appropriate Table subclassing.

19.8.5. PyForm Creation and View Utility Scripts The formtable module's self-test code proves that it works, but it is limited to canned test-case files and classes. What about using PyForm for other kinds of databases that store more useful kinds of data? Luckily, both the formgui and the formtable modules are written to be genericthey are independent of a particular database's record format. Because of that, it's easy to point PyForm to databases of your own; simply import and run the FormGui object with the (possibly wrapped) table you wish to browse. The required startup calls are not too complex, and you could type them at the interactive prompt every time you want to browse a database; but it's usually easier to store them in scripts so that they can be reused. The script in Example 19-26, for example, can be run to open PyForm on any shelve containing records stored in class instance or dictionary format.

Example 19-26. PP3E\Dbase\dbview.py

################################################################## # view any existing shelve directly; this is more general than a # "formtable.py shelve 1 filename" cmdline--only works for Actor; # pass in a filename (and mode) to use this to browse any shelve: # formtable auto picks up class from the first instance fetched; # run dbinit1 to (re)initialize dbase shelve with a template. ################################################################## from sys import argv from formtable import * from formgui import FormGui mode = 'class' file = '../data/mydbase-' + mode if len(argv) > 1: file = argv[1] if len(argv) > 2: mode = argv[2] if mode == 'dict': table = ShelveOfDictionary(file) else: table = ShelveOfInstance(file) FormGui(table).mainloop( ) table.close( )

# dbview.py file? mode??

# view dictionaries # view class objects

# close needed for some dbm

The only catch here is that PyForm doesn't handle completely empty tables very well; there is no way to add new records within the GUI unless a record is already present. That is, PyForm has no record layout design tool; its "new" button simply clears an existing input form. Because of that, to start a new database from scratch, you need to add an initial record that gives PyForm the field layout. Again, this requires only a few lines of code that could be typed interactively, but why not instead put it in generalized scripts for reuse? The file in Example 19-27 shows one way to go about initializing a PyForm database with a first empty record.

Example 19-27. PP3E\Dbase\dbinit1.py

###################################################################### # store a first record in a new shelve to give initial fields list; # PyForm GUI requires an existing record before you can add records; # delete the '?' key template record after real records are added; # change mode, file, template to use this for other kinds of data; # if you populate shelves from other datafiles you don't need this; # see dbinit2 for object-based version, and dbview to browse shelves. ###################################################################### import os from sys import argv mode = 'class' file = '../data/mydbase-' + mode if len(argv) > 1: file = argv[1] if len(argv) > 2: mode = argv[2] try: os.remove(file) except: pass if mode == 'dict': template = {'name': None, 'age': None, 'job': None} else: from PP3E.Dbase.person import Person template = Person(None, None) import shelve dbase = shelve.open(file) dbase['?empty?'] = template dbase.close( )

# dbinit1.py file? mode??

# delete if present

# start dict shelve # one arg defaulted # start object shelve

# create it now

Now, simply change some of this script's settings or pass in command-line arguments to generate a new shelve-based database for use in PyForm. You can substitute any fields list or class name in this script to maintain a simple object database with PyForm that keeps track of real-world information (we'll see two such databases in action in a moment). The empty record created by this script shows up with the key ?empty? when you first browse the database in PyForm with dbview; replace it with a first real record using the PyForm store key, and you are in business. As long as you don't change the database's shelve outside of the GUI, all of its records will have the same fields format, as defined in the initialization script. But notice that the dbinit1 script goes straight to the shelve file to store the first record; that's fine today, but it might break if PyForm is ever changed to do something more custom with its stored data representation. Perhaps a better way to populate tables outside the GUI is to use the Table wrapper classes it employs. The following alternative script, for instance, initializes a PyForm database with generated Table objects, not direct shelve operations (see Example 19-28).

Example 19-28. PP3E\Dbase\dbinit2.py

################################################################# # this works too--based on Table objects not manual shelve ops; # store a first record in shelve, as required by PyForm GUI. ################################################################# from formtable import * import sys, os mode = 'dict' file = '../data/mydbase-' + mode if len(sys.argv) > 1: file = sys.argv[1] if len(sys.argv) > 2: mode = sys.argv[2] try: os.remove(file) except: pass if mode == 'dict': table = ShelveOfDictionary(file) template = {'name': None, 'shoesize': None, 'language': 'Python'} else: from PP3E.Dbase.person import Person table = ShelveOfInstance(file, Person) template = Person(None, None)._ _dict_ _ table.storeItems({'?empty?': template}) table.close( )

19.8.5.1. Creating and browsing custom databases Let's put the prior section's scripts to work to initialize and edit a couple of custom databases. Figure 19-2 shows one being browsed after initializing the database with a script and adding a handful of real records within the GUI.

Figure 19-2. A shelve of Person objects (dbinit1, dbview)

The listbox here shows the record I added to the shelve within the GUI. I ran the following commands to initialize the database with a starter record and to open it in PyForm to add records (that is, Person class instances):

...\PP3E\Dbase\TableBrowser>python dbinit1.py ...\PP3E\Dbase\TableBrowser>python dbview.py

You can tweak the class name or fields dictionary in the dbinit scripts to initialize records for any sort of database you care to maintain with PyForm; use dictionaries if you don't want to represent persistent objects with classes (but classes let you add other sorts of behavior as methods not visible under PyForm). Be sure to use a distinct filename for each database; the initial ?empty? record can be deleted as soon as you add a real entry (later, simply select an entry from the listbox and press "new" to clear the form for input of a new record's values). The data displayed in the GUI represents a true shelve of persistent Person class instance objectschanges and additions made in the GUI will be retained for the next time you view this shelve with PyForm. If you like to type, though, you can still open the shelve directly to check PyForm's work:

...\PP3E\Dbase\data>ls mydbase-class myfile

shelve1

...\PP3E\Dbase\data>python >>> import shelve >>> db = shelve.open('mydbase-class') >>> db.keys( ) ['emily', 'jerry', '?empty?', 'bob', 'howard'] >>> db['bob'] >>> db['emily'].job 'teacher' >>> db['bob'].tax 30000.0

Notice that bob is an instance of the Person class we met earlier in this chapter (see the section "Shelve Files"). Assuming that the person module is still the version that introduced a _ _getattr_ _ method, asking for a shelved object's tax attribute computes a value on the fly because this really invokes a class method. Also note that this works even though Person was never imported herePython loads the class internally when re-creating its shelved instances. You can just as easily base a PyForm-compatible database on an internal dictionary structure, instead of on classes. Figure 19-3 shows one being browse after being initialized with a script and populated with the GUI.

Figure 19-3. A shelve of dictionaries (dbinit2, dbview)

Besides its different internal format, this database has a different record structure (its record's field names differ from the last example), and it is stored in a shelve file of its own. Here are the commands I used to initialize and edit this database:

...\PP3E\Dbase\TableBrowser>python dbinit2.py ../data/mydbase-dict dict ...\PP3E\Dbase\TableBrowser>python dbview.py ../data/mydbase-dict dict

After adding a few records (that is, dictionaries) to the shelve, you can either view them again in PyForm or open the shelve manually to verify PyForm's work:

...\PP3E\Dbase\data>ls mydbase-class mydbase-dict

myfile

shelve1

...\PP3E\Dbase\data>python >>> db = shelve.open('mydbase-dict') >>> db.keys( ) ['tom', 'guido', '?empty?', 'larry', 'randal', 'mel'] >>> db['guido'] {'shoesize': 42, 'name': 'benevolent dictator', 'language': 'Python'} >>> db['mel']['shoesize'] {'left': 7.5, 'right': 7L}

This time, shelve entries are really dictionaries, not instances of a class or converted strings. PyForm doesn't care, thoughbecause all tables are wrapped to conform to PyForm's interface, both formats look the same when browsed in the GUI.

19.8.6. Data as Code Notice that the shoesize and language fields in the screenshot in Figure 19-3 really are a dictionary

and a list. You can type any Python expression syntax into this GUI's form fields to give values (that's why strings are quoted there). PyForm uses the Python built-in repr function to convert value objects for display (repr(x) is like the older 'x' expression and is similar to str(x) but yields an as-code display that adds quotes around strings). To convert from a string back to value objects, PyForm uses the Python eval function to parse and evaluate the code typed into fields. The key entry/display field in the main window does not add or accept quotes around the key string because keys must still be strings in things such as shelves (even though fields can be arbitrary types). As we've seen at various points in this book, eval (and its statement cousin, exec) is powerful but dangerousyou never know when a user might type something that removes files, hangs the system, emails your boss, and so on. If you can't be sure that field values won't contain harmful code (whether malicious or otherwise), use the rexec restricted execution mode tools we met in Chapter 18 to evaluate strings. Alternatively, you can simply limit the kinds of expressions allowed and evaluate them with simpler tools (e.g., int , str ) or store all data as strings.

19.8.7. Browsing Other Kinds of Objects with PyForm Although PyForm expects to find a dictionary-of-dictionary interface (protocol) in the tables it browses, a surprising number of objects fit this mold because dictionaries are so pervasive in Python object internals. In fact, PyForm can be used to browse things that have nothing to do with the notion of database tables of records at all, as long as they can be made to conform to the protocol. For instance, the Python sys.modules table we met in Chapter 3 is a built-in dictionary of loaded module objects. With an appropriate wrapper class to make modules look like dictionaries, there's no reason we can't browse the in-memory sys.modules with PyForm too, as shown in Example 19-29.

Example 19-29. PP3E\Dbase\TableBrowser\viewsysmod.py

# view the sys.modules table in FormGui class modrec: def todict(self, value): return value._ _dict_ _ # not dir(value): need dict def fromdict(self, value): assert 0, 'Module updates not supported' import sys from formgui import FormGui from formtable import Table FormGui(Table(sys.modules, modrec())).mainloop( )

This script defines a class to pull out a module's _ _dict_ _ attribute dictionary (formtable's InstanceRecord won't do, because it also looks for a _ _class_ _). The rest of it simply passes sys.modules to PyForm (FormGui) wrapped in a Table object; the result appears in Figure 19-4.

Figure 19-4. FormGui browsing sys.modules (viewsysmod)

With similar record and table wrappers, all sorts of objects could be viewed in PyForm. As usual in Python, all that matters is that they provide a compatible interface.

19.8.8. Browsing Other Kinds of Databases with PyForm In fact, with just a little creativity, we could also write table wrappers that allow the PyForm GUI to view objects in ZODB databases and records in SQL databases third-party systems we studied earlier in this chapter: ZODB should be simple: it is an access-by-key storage medium with a dictionary-like interface similar to shelves. We would need to provide a close method that commits changes, though, since the table wrapper protocol expects one. SQL databases would be more challenging, since they are composed of tables of rows, not objects stored under unique keys. We could, however, define a column to be the unique key values for records in a table and run SQL queries to fetch by key on indexing. In deference to space, we'll leave the second of these extensions as a suggested exercise. The first is straightforward: Example 19-30 launches the PyForm GUI to browse the ZODB people database we used as an example earlier in this chapter. This script worksit allows you to use the GUI to browse and update persistent class instances stored in a ZODB object databasebut it suffers from some innate limitations in the GUI's design. As coded, PyForm doesn't support instances of more than one class in the database, and it has no

way to call class methods. More subtly, PyForm assumes that instances either are created from a class with no nondefault constructor arguments or support _ _class_ _ attribute assignments (its code tries both schemes to re-create the instance from its dictionary-based representation). The former of these constraints was not coded in the original class, and the latter did not work for classes derived from ZODB persistence classes when this script was tested. Because of these constraints, the test script in Example 19-30 uses an empty class to initialize the database: since methods and derived subclasses aren't yet supported, classes in PyForm are little more than flat attribute namespaces. As currently coded, PyForm does not leverage the full power of Python classesany methods they contain may still be called by code outside the context of the PyForm GUI, but they have no purpose within it. We'll explore some of these design issues in more detail in the next section. Perhaps just as remarkable as its flaws, though, is the fact that PyForm can be used on a ZODB database at allby encapsulating the database behind a common object interface, it supports any conforming object.

Example 19-30. PP3E\Database\ZODBscripts\viewzodb.py

########################################################## # view the person ZODB database in PyForm's FormGui; # FileDB maps indexing to db root, close does commit; # caveat 1: FormGui doesn't yet allow mixed class types; # caveat 2: FormGui has no way to call class methods; # caveat 3: Persistent subclasses don't allow _ _class_ _ # to be set: must have defaults for all _ _init_ _ args; # Person here works only if always defined in _ _main_ _; ########################################################## import sys filename = 'data/people-simple.fs' from zodbtools import FileDB from PP3E.Dbase.TableBrowser.formgui import FormGui from PP3E.Dbase.TableBrowser.formtable import Table, InstanceRecord class Person: pass initrecs = {'bob': dict(name='bob', job='devel', pay=30), 'sue': dict(name='sue', job='music', pay=40)} dbtable = Table(FileDB(filename), InstanceRecord(Person)) if len(sys.argv) > 1: for key in dbtable.keys( ): del dbtable[key] # "viewzodb.py -" inits db dbtable.storeItems(initrecs) # "viewzodb.py" browses db FormGui(dbtable).mainloop( ) dbtable.printItems( ) dbtable.close( )

Run this code on your machine to see its windowsthey are exactly like those we've seen before, but the records browsed are objects that reside in a ZODB database instead of a shelve.

19.8.9. PyForm Limitations Although the sys.modules and ZODB viewer scripts of the last two sections work, they highlight a few limitations of PyForm's current design:

Two levels only PyForm is set up to handle a two-dimensional table/record-mapping structure only. You can't descend further into fields shown in the form, large data structures in fields print as long strings, and complex objects such as nested modules, classes, and functions that contain attributes of their own simply show their default print representation. We could add object viewers to inspect nested objects interactively, but they might be complex to code.

No big forms PyForm is not equipped to handle a large number of record fieldsif you select the os module's entry in the index listbox in Figure 19-4, you'll get a huge form that is likely too big to even fit on your screen (the os module has lots and lots of attributes; it goes off my screen after about 40). We could fix this with a scroll bar, but it's unlikely that records in the databases that PyForm was designed to view will have many dozens of fields.

Data attributes only PyForm displays record attribute values, but it does not support calling method functions of objects being browsed and cannot display dynamically computed attributes (e.g., the tax attribute in Person objects). Since some class methods require arguments to be passed, an additional interface would be necessary; required arguments could be extracted from the method function itself (hint: see built-in function and code attributes such as function.func_code.co_argcount).

One class per table PyForm currently assumes all instances in a table are of the same class, even though that's not a requirement for shelves in general. New style classes with _ _slots_ _ don't work As coded, PyForm may not currently support some instances of new style classes. In particular, new style classes with a _ _slots_ _ attribute may not have a _ _dict_ _ namespace dictionary and so will not work in PyForm (slots save the space normally taken by the instance _ _dict_ _, and may be fetched quicker). This same restriction currently exists in the Python pickle module, thougha class that defines _ _slots_ _ without defining _ _getstate_ _ (called to return a state to pickle) cannot be pickledso this is not an additional constraint imposed by the GUI. Supporting _ _slots_ _ in addition to _ _dict_ _ may be possible, but we leave this as an exercise (this may require a class tree climb to collect all _ _slot_ _ lists in all superclasses, or inspecting the result of a dir call).

Wrapping protocol alternatives In some cases, it may be possible to avoid the to/from dictionary conversion for class instances browsed. The trick would be to wrap records rather than tables. This would almost allow us to get rid of the Table wrapper class completely for this use casethe GUI could browse either a shelve of instances or a shelve of dictionaries directly, with no conversions. It would not, however, handle other use cases (e.g., DBM files of evaluated strings), and it might turn out to be more complex than the current general dictionary-based scheme, due to extra-special cases. The last item in the preceding list is a subtle design point, and it merits some addition explanation. PyForm current overloads table index fetch and assignment, and the GUI internally uses dictionaries to represent records. Fetches assume a dictionary-like object comes back, and stores make a new dictionary object (or use the current one), fill it out, and pass it off to the Table wrapper for conversion to the table's underlying record implementation. When browsing tables of instances, the fetch conversion is trivial (we use the instance's _ _dict_ _ directly), but stores must create and fill out a new instance. It would be almost as easy to overload record field index fetch and assignment instead, to avoid converting dictionaries to instances, and possibly avoid the Table wrapper layer. In this scheme, records held in PyForm might be whatever object the table stores (not necessarily dictionaries), and each record field fetch or assignment in PyForm would be routed back to record wrapper classes. For example, by wrapping instance records in a class that maps dictionary field indexing to class attributes with _ _getitem_ _ and _ _setitem_ _ overload methods, the GUI might browse actual class instance objects. These two overload methods would simply call the getattr and setattr builtin functions to access the attribute corresponding to the key by string name, and the keys call in the GUI used to extract field names could be mapped by the record wrapper to the instance _ _dict_ _. The trickiest part of this scheme is that the GUI would have to know how to make a new empty record before filling its fieldsthis would likely require that the GUI have knowledge of the concrete type of the record (dictionary or instance, as well as the class if it is an instance) or use of a Table wrapper with a customizable method for creating a new empty record. By building and filling dictionaries, the GUI currently finesses this issue completely and delegates it to the customized table and record wrappers. There are also a few substantial downsides to this approach. For one, PyForm could not browse any instance object unless it inherits from the record wrapper class or is wrapped up in one automatically by a Table interface class on fetches and stores. For another, Table also has some additional interfaces not provided by shelves, which we have to code elsewhere. This scheme might also preclude use of indexing overload methods in the record class itself, though the GUI itself does not support such operations anyhow. Most significantly, this model would not transparently handle other use cases, such as string-based records. Cases requiring conversion with eval and str , for instance, would not fit the new model at allDBM files that map whole records to strings might require complex special case logic to handle field-at-a-time requests or fall back to converting from and to dictionaries on fetches and stores, as is currently done. Because of such exceptions, we would probably wind up with a Table wrapper anyhow, unless we limit the GUI's use cases. Generating a new empty record just by itself varies so much per record kind that we need a class hierarchy to customize the operation. In the end, it may be easier to use dictionaries in all cases and convert from that where needed, as PyForm currently does. In other words, there is room for improvement if you care to experiment. On the other hand, extensions in this domain are somewhat open-ended, so we'll leave them as suggested exercises. PyForm was designed to view mappings of mappings and was never meant to be a general Python

object viewer. But as a simple GUI for tables of persistent objects, it meets its design goals as planned. Python's shelves and classes make such systems both easy to code and powerful to use. Complex data can be stored and fetched in a single step, as well as augmented with methods that provide dynamic record behavior. As an added bonus, by programming such programs in Python and Tkinter, they are automatically portable among all major GUI platforms. When you mix Python persistence and GUIs, you get a lot of features "for free."

Chapter 20. Data Structures Section 20.1. "Roses Are Red, Violets Are Blue; Lists Are Mutable, and So Is Set Foo" Section 20.2. Implementing Stacks Section 20.3. Implementing Sets Section 20.4. Subclassing Built-In Types Section 20.5. Binary Search Trees Section 20.6. Graph Searching Section 20.7. Reversing Sequences Section 20.8. Permuting Sequences Section 20.9. Sorting Sequences Section 20.10. Data Structures Versus Python Built-Ins Section 20.11. PyTree: A Generic Tree Object Viewer

20.1. "Roses Are Red, Violets Are Blue; Lists Are Mutable, and So Is Set Foo" Data structures are a central theme in most programs, whether you know it or not. It may not always be obvious because Python provides a set of built-in types that make it easy to deal with structured data: lists, strings, tuples, dictionaries, and the like. For simple systems, these types are usually enough. Technically, dictionaries make many of the classical searching algorithms unnecessary in Python, and lists replace much of the work you'd do to support collections in lower-level languages. Both are so easy to use, though, that you generally never give them a second thought. But for advanced applications, we may need to add more sophisticated types of our own to handle extra requirements. In this chapter, we'll explore a handful of advanced data structure implementations in Python: sets, stacks, graphs, and so on. As we'll see, data structures take the form of new object types in Python, integrated into the language's type model. That is, objects we code in Python become full-fledged datatypesto the scripts that use them, they can look and feel just like built-in lists, numbers, and dictionaries. Although the examples in this chapter illustrate advanced programming techniques, they also underscore Python's support for writing reusable software. By coding object implementations with classes, modules, and other Python tools, they naturally become generally useful components, which may be used in any program that imports them. In effect, we will be building libraries of data structure classes, whether we plan for it or not. In addition, although the examples in this chapter are pure Python code, we will also be building a path toward the next part of the book here. From the most general perspective, new Python objects can be implemented in either Python or an integrated language such as C. In particular, pay attention to the stack objects implemented in the first section of this chapter; they will later be reimplemented in C to gauge both the benefits and the complexity of C migration.

20.2. Implementing Stacks Stacks are a common and straightforward data structure, used in a variety of applications: language processing, graph searches, and so on. In short, stacks are a last-in-first-out collection of objectsthe last item added to the collection is always the next one to be removed. Clients use stacks by: Pushing items onto the top Popping items off the top Depending on client requirements, there may also be tools for such tasks as testing whether the stack is empty, fetching the top item without popping it, iterating over a stack's items, testing for item membership, and so on. In Python, a simple list is often adequate for implementing a stack: because we can change lists in place, we can add and delete items from either the beginning (left) or the end (right). Table 20-1 summarizes various built-in operations available for implementing stack-like behavior with Python lists, depending on whether the stack "top" is the first or the last node in the list. In this table, the string 'c' is the top item on the stack.

Table 20-1. Stacks as lists Operation

Top is end-of-list

Top is front-of-list

Top is front-of-list

New

stack=['a','b','c']

stack=['c','b','a']

stack=['c','b','a']

Push

stack.append('d')

stack.insert(0,'d')

stack[0:0] = ['d']

x = stack[-1];

x = stack[0];

x = stack[0];

del stack[-1]

del stack[:1]

stack[:1] = []

Pop

Other coding schemes are possible as well. For instance, Python 1.5 introduced a list pop method designed to be used in conjunction with append to implement stacksfor example, to push run list.append(value) and to pop run x=list.pop( ) . By default, the pop method is equivalent to fetching, and then deleting the last item at offset -1 (and is equivalent to the two statements in the last row in column 1 of Table 20-1). With an argument, pop deletes and returns the item at that offsetlist.pop(0) is the same as the table's last rows in columns 2 and 3. And del stack[0] is yet another way to delete the first item in a list-based stack. This list arrangement works and will be relatively fast. But it also binds stack-based programs to the stack representation chosen: all stack operations will be hardcoded. If we later want to change how a stack is represented or extend its basic operations, we're stuck. Every stack-based program will have to be updated. For instance, to add logic that monitors the number of stack operations a program performs, we'd have to add code around each hardcoded stack operation. In a large system, this operation may be nontrivial. As we'll see in the next part of the book, we may also decide to move stacks to a C-based

implementation, if they prove to be a performance bottleneck. As a general rule, hardcoded operations on built-in data structures don't support future migrations as well as we'd sometimes like. Built-in types such as lists are actually class-like objects in Python that we can subclass to customize. Unless we anticipate future changes and make instances of a subclass, though, we still have a maintenance issue if we use built-in list operations and ever want to extend what they do in the future (more on subclassing built-in types later in this chapter).

20.2.1. A Stack Module Perhaps a better approach is to encapsulatethat is, wrap upstack implementations behind interfaces, using Python's code reuse tools. As long as clients stick to using the interfaces, we're free to change the interfaces' implementations arbitrarily without having to change every place they are called. Let's begin by implementing a stack as a module containing a Python list, plus functions to operate on it (see Example 20-1).

Example 20-1. PP3E\Dstruct\Basic\stack1.py

stack = [] class error(Exception): pass

# on first import # local excs, stack1.error

def push(obj): global stack stack = [obj] + stack

# use 'global' to change # add item to the front

def pop( ): global stack if not stack: raise error, 'stack underflow' top, stack = stack[0], stack[1:] return top def top( ): if not stack: raise error, 'stack underflow' return stack[0] def def def def def

# raise local error # remove item at front

# raise local error # or let IndexError occur

empty( ): return not stack # is the stack []? member(obj): return obj in stack # item in stack? item(offset): return stack[offset] # index the stack length( ): return len(stack) # number entries dump( ): print '<Stack:%s>' % stack

This module creates a list object ( stack) and exports functions to manage access to it. The stack is declared global in functions that change it, but not in those that just reference it. The module also defines an error object (error) that can be used to catch exceptions raised locally in this module. Some stack errors are built-in exceptions: the method item triggers IndexError for out-of-bounds indexes.

Most of the stack's functions just delegate the operation to the embedded list used to represent the stack. In fact, the module is really just a wrapper around a Python list. But this extra layer of interface logic makes clients independent of the actual implementation of the stack. So, we're able to change the stack later without impacting its clients. As usual, one of the best ways to understand such code is to see it in action. Here's an interactive session that illustrates the module's interfaces:

C:\...\PP3E\Dstruct\Basic>python >>> import stack1 >>> stack1.push('spam') >>> stack1.push(123) >>> stack1.top( ) 123 >>> stack1.stack [123, 'spam'] >>> stack1.pop( ) 123 >>> stack1.dump( ) <Stack:['spam']> >>> stack1.pop( ) 'spam' >>> stack1.empty( ) 1 >>> for c in 'spam': stack1.push(c) ... >>> while not stack1.empty( ): ... print stack1.pop( ), ... m a p s >>> >>> stack1.pop( ) Traceback (most recent call last): File "<stdin>", line 1, in ? File "stack1.py", line 11, in pop raise error, 'stack underflow' stack1.error: stack underflow

# raise local error

Other operations are analogous, but the main thing to notice here is that all stack operations are module functions. For instance, it's possible to iterate over the stack, but we need to use counterloops and indexing function calls (item). Nothing is preventing clients from accessing (and changing) stack1.stack directly, but doing so defeats the purpose of interfaces like this one.

20.2.2. A Stack Class Perhaps the biggest drawback of the module-based stack is that it supports only a single stack object. All clients of the stack module effectively share the same stack. Sometimes we want this feature: a stack can serve as a shared-memory object for multiple modules. But to implement a true stack datatype, we need to use classes. To illustrate, let's define a full-featured stack class. The Stack class shown in Example 20-2 defines a

new datatype with a variety of behaviors. Like the module, the class uses a Python list to hold stacked objects. But this time, each instance gets its own list. The class defines both "real" methods, and specially named methods that implement common type operations. Comments in the code describe special methods.


class error(Exception): pass class Stack: def _ _init_ _(self, start=[]): self.stack = [] for x in start: self.push(x) self.reverse( ) def push(self, obj): self.stack = [obj] + self.stack

# when imported: local exception

# self is the instance object # start is any sequence: stack.. # undo push's order reversal # methods: like module + self # top is front of list

def pop(self): if not self.stack: raise error, 'underflow' top, self.stack = self.stack[0], self.stack[1:] return top def top(self): if not self.stack: raise error, 'underflow' return self.stack[0] def empty(self): return not self.stack # overloads def _ _repr_ _(self): return '[Stack:%s]' % self.stack def _ _cmp_ _(self, other): return cmp(self.stack, other.stack) def _ _len_ _(self): return len(self.stack) def _ _add_ _(self, other): return Stack(self.stack + other.stack) def _ _mul_ _(self, reps): return Stack(self.stack * reps) def _ _getitem_ _(self, offset): return self.stack[offset] def _ _getslice_ _(self, low, high): return Stack(self.stack[low : high]) def _ _getattr_ _(self, name): return getattr(self.stack, name)

# instance.empty( )

# print, backquotes,.. # '==', '>, '>> from stacklog import StackLog >>> x = StackLog( ) >>> y = StackLog( ) # make two stack objects >>> for i in range(3): x.push(i) # and push object on them ... >>> for c in 'spam': y.push(c) ... >>> x, y # run inherited _ _repr_ _ ([Stack:[2, 1, 0]], [Stack:['m', 'a', 'p', 's']]) >>> x.stats(), y.stats( ) ((3, 7, 0), (4, 7, 0)) >>> >>> y.pop(), x.pop( ) ('m', 2) >>> x.stats(), y.stats( ) # my maxlen, all pushes, all pops ((3, 7, 2), (4, 7, 2))

Notice the use of class attributes to record overall pushes and pops, and instance attributes for perinstance maximum length. By hanging attributes on different objects, we can expand or narrow their scopes.

20.2.4. Optimization: Tuple Tree Stacks One of the nice things about wrapping objects up in classes is that you are free to change the underlying implementation without breaking the rest of your program. Optimizations can be added in the future, for instance, with minimal impact; the interface is unchanged, even if the internals are. There are a variety of ways to implement stacks, some more efficient than others. So far, our stacks have used slicing and concatenation to implement pushing and popping. This method is relatively inefficient: both operations make copies of the wrapped list object. For large stacks, this practice can add a significant time penalty. One way to speed up such code is to change the underlying data structure completely. For example, we can store the stacked objects in a binary tree of tuples: each item may be recorded as a pair, (object, tree), where object is the stacked item and tree is either another tuple pair giving the rest of the stack or None to designate an empty stack. A stack of items [1,2,3,4] would be internally stored as a tuple tree (1,(2,(3,(4,None)))). This tuple-based representation is similar to the notion of "cons-cells" in Lisp-family languages: the object on the left is the car , and the rest of the tree on the right is the cdr . Because we add or remove only a top tuple to push and pop items, this structure avoids copying the entire stack. For large stacks, the benefit might be significant. The next class, shown in Example 20-4, implements these ideas.


class Stack: def _ _init_ _(self, start=[]): self.stack = None for i in range(-len(start), 0): self.push(start[-i - 1])

# init from any sequence # even other (fast)stacks # push in reverse order

def push(self, node): self.stack = node, self.stack

# grow tree 'up/left' # new root tuple: (node, tree)

def pop(self): node, self.stack = self.stack return node

# remove root tuple # TypeError if empty

def empty(self): return not self.stack

# is it 'None'?

def _ _len_ _(self): len, tree = 0, self.stack while tree: len, tree = len+1, tree[1] return len def _ _getitem_ _(self, index): len, tree = 0, self.stack while len < index and tree: len, tree = len+1, tree[1] if tree: return tree[0] else: raise IndexError

# on: len, not

# visit right subtrees

# on: x[i], in, for # visit/count nodes

# IndexError if out-of-bounds # so 'in' and 'for' stop

def _ _repr_ _(self): return '[FastStack:' + repr(self.stack) + ']'

This class's _ _getitem_ _ method handles indexing, in tests, and for loop iteration as before, but this version has to traverse a tree to find a node by index. Notice that this isn't a subclass of the original Stack class. Since nearly every operation is implemented differently here, inheritance won't really help. But clients that restrict themselves to the operations that are common to both classes can still use them interchangeablythey just need to import a stack class from a different module to switch implementations. Here's a session with this stack version; as long as we stick to pushing, popping, indexing, and iterating, this version is essentially indistinguishable from the original:

>>> from stack3 import Stack >>> x = Stack( ) >>> y = Stack( ) >>> for c in 'spam': x.push(c) ... >>> for i in range(3): y.push(i) ... >>> x [FastStack:('m', ('a', ('p', ('s', None))))] >>> y

[FastStack:(2, (1, (0, None)))] >>> len(x), x[2], x[-1] (4, 'p', 'm') >>> x.pop( ) 'm' >>> x [FastStack:('a', ('p', ('s', None)))] >>> >>> while y: print y.pop( ), ... 2 1 0

20.2.5. Optimization: In-Place List Modifications Perhaps a better way to speed up the stack object, though, is to fall back on the mutability of Python's list object. Because lists can be changed in place, they can be modified more quickly than any of the prior examples. In-place change operations such as append are prone to complications when a list is referenced from more than one place. But because the list inside the stack object isn't meant to be used directly, we're probably safe. The module in Example 20-5 shows one way to implement a stack with in-place changes; some operator overloading methods have been dropped to keep this simple. The new Python pop method it uses is equivalent to indexing and deleting the item at offset -1 (top is end-of-list here).


class error(Exception): pass

# when imported: local exception

class Stack: def _ _init_ _(self, start=[]): self.stack = [] for x in start: self.push(x)

# self is the instance object # start is any sequence: stack...

def push(self, obj): self.stack.append(obj)

# methods: like module + self # top is end of list

def pop(self): if not self.stack: raise error, 'underflow' return self.stack.pop( ) # like fetch and delete stack[-1] def top(self): if not self.stack: raise error, 'underflow' return self.stack[-1] def empty(self): return not self.stack

# instance.empty( )

def _ _len_ _(self): return len(self.stack)

# len(instance), not instance

def _ _getitem_ _(self, offset): return self.stack[offset]

# instance[offset], in, for

def _ _repr_ _(self): return '[Stack:%s]' % self.stack

This version works like the original in module stack2 too; just replace stack2 with stack4 in the previous interaction to get a feel for its operation. The only obvious difference is that stack items are in reverse when printed (i.e., the top is the end):

>>> from stack4 import Stack >>> x = Stack( ) >>> x.push('spam') >>> x.push(123) >>> x [Stack:['spam', 123]] >>> >>> y = Stack( ) >>> y.push(3.1415) >>> y.push(x.pop( )) >>> x, y ([Stack:['spam']], [Stack:[3.1415, 123]]) >>> y.top( ) 123

20.2.6. Timing the Improvements The in-place changes stack object probably runs faster than both the original and the tuple-tree versions, but the only way to really be sure how much faster is to time the alternative implementations. Since this could be something we'll want to do more than once, let's first define a general module for timing functions in Python. In Example 20-6, the built-in time module provides a clock function that we can use to get the current CPU time in floating-point seconds, and the function timer.test simply calls a function reps times and returns the number of elapsed seconds by subtracting stop from start CPU times.

Example 20-6. PP3E\Dstruct\Basic\timer.py

def test(reps, func, *args): import time start = time.clock( ) for i in xrange(reps): func(*args) return time.clock( ) - start

# current CPU time in float seconds # call function reps times # discard any return value # stop time - start time

Next, we define a test driver script (see Example 20-7). It expects three command-line arguments: the number of pushes, pops, and indexing operations to perform (we'll vary these arguments to test different scenarios). When run at the top level, the script creates 200 instances of the original and optimized stack classes and performs the specified number of operations on each stack.[*] Pushes and pops change the stack; indexing just accesses it. [*]

If you have a copy of the first edition of this book lying around, you might notice that I had to scale all test factors way up to get even close to the run times I noticed before. Both Python and chips have gotten a lot faster in five years.

Example 20-7. PP3E\Dstruct\Basic\stacktime.py

import import import import

stack2 stack3 stack4 timer

# # # #

list-based stacks: [x]+y tuple-tree stacks: (x,y) in-place stacks: y.append(x) general function timer function

rept = 200 from sys import argv pushes, pops, items = eval(argv[1]), eval(argv[2]), eval(argv[3]) def stackops(stackClass): #print stackClass._ _module_ _ x = stackClass('spam') for i in range(pushes): x.push(i) for i in range(items): t = x[i] for i in range(pops): x.pop( )

# make a stack object # exercise its methods

print 'stack2:', timer.test(rept, stackops, stack2.Stack) print 'stack3:', timer.test(rept, stackops, stack3.Stack) print 'stack4:', timer.test(rept, stackops, stack4.Stack)

# pass class to test # rept*(push+pop+ix)

20.2.6.1. Results under Python 1.5.2 Here are some of the timings reported by the test driver script. The three outputs represent the measured run times in seconds for the original, tuple, and in-place stacks. For each stack type, the first test creates 200 stack objects and performs roughly 120,000 stack operations (200 repetitions x (200 pushes + 200 indexes + 200 pops)) in the test duration times listed. These results were obtained on a 650 MHz Pentium III Windows machine and a Python 1.5.2 install:

C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 200 200 stack2: 1.67890008213 stack3: 7.70020952413 stack4: 0.694291724635 C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 50 200 stack2: 1.06876246669 stack3: 7.48964866994 stack4: 0.477584270605

C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 200 50 stack2: 1.34536448817 stack3: 0.795615917129 stack4: 0.57297976835 C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 200 0 stack2: 1.33500477715 stack3: 0.300776077373 stack4: 0.533050336077

If you look closely enough, you'll notice that the results show that the tuple-based stack (stack3) performs better when we do more pushing and popping, but worse if we do much indexing. Indexing lists is extremely fast for built-in lists (stack2 and stack4), but very slow for tuple treesthe Python class must traverse the tree manually. The in-place change stacks (stack4) are almost always fastest, unless no indexing is done at alltuples won by a hair in the last test case. When there is no indexing (the last test), the tuple and in-place change stacks are roughly four and three times quicker than the simple list-based stack, respectively. Since pushes and pops are most of what clients would do to a stack, tuples are a contender, despite their poor indexing performance. Of course, we're talking about fractions of a second after many tens of thousands of operations; in many applications, your users probably won't care either way. If you access a stack millions of times in your program, though, this difference may accumulate to a significant amount of time.

20.2.6.2. Results under Python 2.4 Performance results like those of the prior section are prone to change from release to release in Python, because ongoing optimization work always finds its way into the interpreter over time. For the third edition of this book, I reran the tests of the prior section on a machine that was roughly twice as fast (1.2 GHz), and under Python 2.4. The results are very different, though relatively similar:

C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 200 200 stack2: 0.383535616957 stack3: 1.74261840541 stack4: 0.160929391864 C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 50 200 stack2: 0.320245170825 stack3: 1.70567264834 stack4: 0.121246694761 C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 200 50 stack2: 0.335854417251 stack3: 0.208287086767 stack4: 0.124496549142 C:\...\PP3E\Dstruct\Basic>python stacktime.py 200 200 0 stack2: 0.353687130627 stack3: 0.0953431232182

stack4: 0.110205067963

This time, if you study the results long enough, you'll notice that the relative performance of stack2 (simple lists) and stack4 (in-place list changes) is roughly the samethe in-place list stack is usually about three times quicker again, regardless of the amount of indexing going on (which makes sense, given that the two list-based stacks index at the same speed). And as before, in the last test when there is no indexing as is common for stacks, the tuple-based stack3 still performs best of all three: roughly four times better than simple lists, and slightly better than in-place lists. The results, though, seem to reflect the fact that all of the stack code has been optimized in Python itself since this book's prior edition. All three stacks are roughly four times faster today, likely reflecting a 2X boost in hardware, plus a 2X boost in Python itself. In this case, the relative performance results are similar; but in other cases, such optimizations may invalidate conclusions derived from tests run under previous Python releases. [*] [*]

Trust me on this. I once made a sweeping statement in another book about map and list comprehensions being twice as fast as for loops, only to be made wrong by a later Python release that optimized the others much more than map, except in select use cases. Performance measurement in Python is an ongoing task.

The short story here is that you must collect timing data for your code, on your machine, and under your version of Python. All three factors can skew results arbitrarily. In the next section, we'll see a more dramatic version impact on the relative performance of set alternatives, and we'll learn how to use the Python profiler to collect performance data in a more generic fashion.

20.3. Implementing Sets Another commonly used data structure is the set, a collection of objects that support operations such as:

Intersection Make a new set with all items in common.

Union Make a new set with all items in either operand.

Membership Test whether an item exists in a set. And there are others, depending on the intended use. Sets come in handy for dealing with more abstract group combinations. For instance, given a set of engineers and a set of writers, you can pick out individuals who do both activities by intersecting the two sets. A union of such sets would contain either type of individual, but would include any given individual only once. Python lists, tuples, and strings come close to the notion of a set: the in operator tests membership, for iterates, and so on. Here, we add operations not directly supported by Python sequences. The idea is that we're extending built-in types for unique requirements.

20.3.1. Set Functions As before, let's first start out with a function-based set manager. But this time, instead of managing a shared set object in a module, let's define functions to implement set operations on passed-in Python sequences (see Example 20-8).

Example 20-8. PP3E\Dstruct\Basic\inter.py

def intersect(seq1, seq2): res = [] for x in seq1: if x in seq2: res.append(x) return res def union(seq1, seq2): res = list(seq1) for x in seq2: if not x in res: res.append(x) return res

# start with an empty list # scan the first sequence # add common items to the end

# make a copy of seq1 # add new items in seq2

These functions work on any type of sequencelists strings, tuples, and other objects that conform to the sequence protocols expected by these functions (for loops, in membership tests). In fact, we can even use them on mixed object types: the last two commands in the following code compute the intersection and union of a list and a tuple. As usual in Python, the object interface is what matters, not the specific types:

C:\...\PP3E\Dstruct\Basic>python >>> from inter import * >>> s1 = "SPAM" >>> s2 = "SCAM" >>> intersect(s1, s2), union(s1, s2) (['S', 'A', 'M'], ['S', 'P', 'A', 'M', 'C']) >>> intersect([1,2,3], (1,4)) [1] >>> union([1,2,3], (1,4)) [1, 2, 3, 4]

Notice that the result is always a list here, regardless of the type of sequences passed in. We could work around this by converting types or by using a class to sidestep this issue (and we will in a moment). But type conversions aren't clear-cut if the operands are mixed-type sequences. Which type do we convert to?

20.3.1.1. Supporting multiple operands If we're going to use the intersect and union functions as general tools, one useful extension is support for multiple arguments (i.e., more than two). The functions in Example 20-9 use Python's variable-length argument lists feature to compute the intersection and union of arbitrarily many operands.

Example 20-9. PP3E\Dstruct\Basic\inter2.py

def intersect(*args): res = [] for x in args[0]: for other in args[1:]: if x not in other: break else: res.append(x) return res def union(*args): res = [] for seq in args: for x in seq: if not x in res: res.append(x) return res

# scan the first list # for all other arguments # this item in each one? # add common items to the end

# for all sequence-arguments # for all nodes in argument # add new items to result

The multioperand functions work on sequences in the same way as the original functions, but they support three or more operands. Notice that the last two examples in the following session work on lists with embedded compound objects: the in tests used by the intersect and union functions apply equality testing to sequence nodes recursively, as deep as necessary to determine collection comparison results:

C:\...\PP3E\Dstruct\Basic>python >>> from inter2 import * >>> s1, s2, s3 = 'SPAM', 'SLAM', 'SCAM' >>> intersect(s1, s2) ['S', 'A', 'M'] >>> intersect(s1, s2, s3) ['S', 'A', 'M'] >>> intersect(s1, s2, s3, 'HAM') ['A', 'M'] >>> union(s1, s2), union(s1, s2, s3) (['S', 'P', 'A', 'M', 'L'], ['S', 'P', 'A', 'M', 'L', 'C']) >>> intersect([1,2,3], (1,4), range(5)) [1] >>> s1 = (9, (3.14, 1), "bye", [1,2], "mello") >>> s2 = [[1,2], "hello", (3.14, 0), 9] >>> intersect(s1, s2) [9, [1, 2]] >>> union(s1, s2) [9, (3.14, 1), 'bye', [1, 2], 'mello', 'hello', (3.14, 0)]

20.3.2. Set Classes The set functions can operate on a variety of sequences, but they aren't as friendly as true objects.

Among other things, your scripts need to keep track of the sequences passed into these functions manually. Classes can do better: the class in Example 20-10 implements a set object that can hold any type of object. Like the stack classes, it's essentially a wrapper around a Python list with extra set operations.

Example 20-10. PP3E\Dstruct\Basic\set.py

class Set: def _ _init_ _(self, value = []): self.data = [] self.concat(value) def intersect(self, other): res = [] for x in self.data: if x in other: res.append(x) return Set(res)

# other is any sequence type # self is the instance subject

# return a new Set

def union(self, other): res = self.data[:] for x in other: if not x in res: res.append(x) return Set(res)

# make a copy of my list

def concat(self, value): for x in value: if not x in self.data: self.data.append(x) def def def def def

_ _ _ _ _

_len_ _(self): _getitem_ _(self, key): _and_ _(self, other): _or_ _(self, other): _repr_ _(self):

# on object creation # manages a local list

# value: a list, string, Set... # filters out duplicates

return return return return return

len(self.data) self.data[key] self.intersect(other) self.union(other) '<Set:' + repr(self.data) + '>'

The Set class is used like the Stack class we saw earlier in this chapter: we make instances and apply sequence operators plus unique set operations to them. Intersection and union can be called as methods, or by using the & and | operators normally used for built-in integer objects. Because we can string operators in expressions now (e.g., x & y & z), there is no obvious need to support multiple operands in intersect/union methods here. As with all objects, we can either use the Set class within a program or test it interactively as follows:

>>> from set import Set >>> users1 = Set(['Bob', 'Emily', 'Howard', 'Peeper']) >>> users2 = Set(['Jerry', 'Howard', 'Carol']) >>> users3 = Set(['Emily', 'Carol']) >>> users1 & users2 <Set:['Howard']>

>>> users1 | users2 <Set:['Bob', 'Emily', 'Howard', 'Peeper', 'Jerry', 'Carol']> >>> users1 | users2 & users3 <Set:['Bob', 'Emily', 'Howard', 'Peeper', 'Carol']> >>> (users1 | users2) & users3 <Set:['Emily', 'Carol']> >>> users1.data ['Bob', 'Emily', 'Howard', 'Peeper']

20.3.3. Optimization: Moving Sets to Dictionaries Once you start using the Set class, the first problem you might encounter is its performance: its nested for loops and in scans become exponentially slow. That slowness may or may not be significant in your applications, but library classes should generally be coded as efficiently as possible. One way to optimize set performance is by changing the implementation to use dictionaries rather than lists, for storing sets internallyitems may be stored as the keys of a dictionary whose values are all None. Because lookup time is constant and short for dictionaries, the in list scans of the original set may be replaced with direct dictionary fetches in this scheme. In traditional terms, moving sets to dictionaries replaces slow linear searches with fast hash table fetches. A computer scientist would explain this by saying that the repeated nested scanning of the list-based intersection is an exponential algorithm in terms of its complexity, but dictionaries can be linear. The module in Example 20-11 implements this idea. Its class is a subclass of the original set, and it redefines the methods that deal with the internal representation but inherits others. The inherited & and | methods trigger the new intersect and union methods here, and the inherited len method works on dictionaries as is. As long as Set clients are not dependent on the order of items in a set, they can switch to this version directly by just changing the name of the module from which Set is imported; the class name is the same.

Example 20-11. PP3E\Dstruct\Basic\fastset.py

import set # fastset.Set extends set.Set class Set(set.Set): def _ _init_ _(self, value = []): self.data = {} self.concat(value) def intersect(self, other): res = {} for x in other: if self.data.has_key(x): res[x] = None return Set(res.keys( )) def union(self, other): res = {} for x in other: res[x] = None for x in self.data.keys( ): res[x] = None return Set(res.keys( ))

# manages a local dictionary # hashing: linear search times

# other: a sequence or Set # use hash-table lookup (or "in") # a new dictionary-based Set

# other: a sequence or Set # scan each set just once # '&' and '|' come back here # so they make new fastset's

def concat(self, value): for x in value: self.data[x] = None # inherit and, or, len def _ _getitem_ _(self, key): def _ _repr_ _(self):

return self.data.keys( )[key] return '<Set:' + repr(self.data.keys( )) + '>'

This works about the same as the previous version:

>>> from fastset import Set >>> users1 = Set(['Bob', 'Emily', 'Howard', 'Peeper']) >>> users2 = Set(['Jerry', 'Howard', 'Carol']) >>> users3 = Set(['Emily', 'Carol']) >>> users1 & users2 <Set:['Howard']> >>> users1 | users2 <Set:['Emily', 'Howard', 'Jerry', 'Carol', 'Peeper', 'Bob']> >>> users1 | users2 & users3 <Set:['Emily', 'Howard', 'Carol', 'Peeper', 'Bob']> >>> (users1 | users2) & users3 <Set:['Emily', 'Carol']> >>> users1.data {'Emily': None, 'Bob': None, 'Peeper': None, 'Howard': None}

The main functional difference in this version is the order of items in the set: because dictionaries are randomly ordered, this set's order will differ from the original. For instance, you can store compound objects in sets, but the order of items varies in this version:

>>> import set, fastset >>> a = set.Set([(1,2), (3,4), (5,6)]) >>> b = set.Set([(3,4), (7,8)]) >>> a & b <Set:[(3, 4)]> >>> a | b <Set:[(1, 2), (3, 4), (5, 6), (7, 8)]> >>> a = fastset.Set([(1,2), (3,4), (5,6)]) >>> b = fastset.Set([(3,4), (7,8)]) >>> a & b <Set:[(3, 4)]> >>> a | b <Set:[(3, 4), (1, 2), (7, 8), (5, 6)]> >>> b | a <Set:[(3, 4), (5, 6), (1, 2), (7, 8)]>

Sets aren't supposed to be ordered anyhow, so this isn't a showstopper. A deviation that might matter, though, is that this version cannot be used to store unhashable (that is, immutable) objects. This stems from the fact that dictionary keys must be immutable. Because values are stored in keys, dictionary sets can contain only things such as tuples, strings, numbers, and class objects with immutable signatures. Mutable objects such as lists and dictionaries won't work directly. For example, the following call:

fastset.Set([[1,2],[3,4]])

raises an exception with this dictionary-based set, but it works with the original set class. Tuples do work here as set items because they are immutable; Python computes hash values and tests key equality as expected.

20.3.3.1. Timing the results under Python 2.4 So how did we do on the optimization front? Example 20-12 contains a script to compare set class performance. It reuses the timer module of Example 20-6 used earlier to test stacks.

Example 20-12. PP3E\Dstruct\Basic\settime.py

import timer, sys import set, fastset def setops(Class): a = Class(range(50)) b = Class(range(20)) c = Class(range(10)) d = Class(range(5)) for i in range(5): t = a & b & c & d t = a | b | c | d

# a 50-integer set # a 20-integer set

# 3 intersections # 3 unions

if _ _name_ _ == '_ _main_ _': rept = int(sys.argv[1]) print 'set => ', timer.test(rept, setops, set.Set) print 'fastset =>', timer.test(rept, setops, fastset.Set)

The setops function makes four sets and combines them with intersection and union operators five times. A command-line argument controls the number of times this whole process is repeated. More accurately, each call to setops makes 34 Set instances (4 + [5 x (3 + 3)]) and runs the intersect and union methods 15 times each (5 x 3) in the for loop's body. The performance improvement is equally dramatic this time around, on a 1.2 GHz machine:

C:\...\PP3E\Dstruct\Basic>python settime.py 50 set => 0.605568584834 fastset => 0.10293794323 C:\...\PP3E\Dstruct\Basic>python settime.py 100 set => 1.21189676342 fastset => 0.207752661302 C:\...\PP3E\Dstruct\Basic>python settime.py 200 set => 2.47468966028 fastset => 0.415944763929

These results will vary per machine, and they may vary per Python release. But at least for this test case, the dictionary-based set implementation (fastest) is roughly six times faster than the simple list-based set (set ). In fact, this sixfold speedup is probably sufficient. Python dictionaries are already optimized hash tables that you might be hard-pressed to improve on. Unless there is evidence that dictionary-based sets are still too slow, our work here is probably done.

20.3.3.2. Timing results under Python 1.5.2: version skew For detail-minded readers, the prior section's sixfold speedup results listed in this edition of the book were timed with Python 2.4 on a 1.2 GHz machine. Surprisingly, in the second edition, under an older Python (1.5.2) and slower machine (650 MHz), all list-based set results were roughly twice as slow as today, but the dictionary-based set was roughly four times slower (e.g., 1.54 and 0.44 seconds for

dictionaries and lists at 50 iterations). In relative terms, the net effect is that dictionary sets went from being approximately three times faster than list sets to being six times faster today. That is, machine speed doubled, but in addition, the dictionary code grew twice as quickly as before, relative to list-based sets. This larger jump reflects optimizations in Python itself. As you can see, version skew is an important consideration when analyzing performance; in this case, dictionaries are twice the performance boost they were a few years earlier.

20.3.3.3. Using the Python profiler Timing code sections helps, but the ultimate way to isolate bottlenecks is profiling. The Python profiler provides another way to gather performance data besides timing sections of code as done in this chapter so far. Because the profiler tracks all function calls, it provides much more information in a single blow. As such, it's a more powerful way to isolate bottlenecks in slow programsafter profiling, you should have a good idea of where to focus your optimization efforts. The profiler ships with Python as the standard library module called profile, and it provides a variety of interfaces for measuring code performance. It is structured and used much like the pdb commandline debugger: import the profile module and call its functions with a code string to measure performance. The simplest profiling interface is its profile.run(statementstring) function. When invoked, the profiler runs the code string, collects statistics during the run, and issues a report on the screen when the statement completes. See it in action profiling a 100-iteration call to the set test function of the previous section's Example 20-12, for the list-based sets of Example 20-10 (hint: profile an import statement to profile an entire file):

>>> import settime, timer, set >>> import profile >>> profile.run('timer.test(100, settime.setops, set.Set)') 675906 function calls in 6.328 CPU seconds Ordered by: standard name ncalls 118500 2 500 1 1 0 1 1500 3400 3400 544000 1500 1500 1500 100 1

tottime 0.434 0.000 0.003 0.004 0.000 0.000 0.000 0.133 0.029 0.621 2.032 0.014 0.013 3.009 0.033 0.002

percall 0.000 0.000 0.000 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.000 0.002

cumtime 0.434 0.000 0.003 0.004 6.323 0.000 6.328 0.951 1.011 0.982 2.032 5.226 0.964 5.212 6.321 6.323

percall 0.000 0.000 0.000 0.004 6.323 6.328 0.001 0.000 0.000 0.000 0.003 0.001 0.003 0.063 6.323

filename:lineno(function) :0(append) :0(clock) :0(range) :0(setprofile) <string>:1(?) profile:0(profiler) profile:0(timer.test(100,...more...)) set.py:13(union) set.py:2(_ _init_ _) set.py:20(concat) set.py:26(_ _getitem_ _) set.py:27(_ _and_ _) set.py:28(_ _or_ _) set.py:6(intersect) settime.py:4(setops) timer.py:1(test)

The report's format is straightforward and well documented in the Python library manual. By default, it lists the number of calls and times spent in each function invoked during the run. When the profiler is enabled, each interpreter event is routed to a Python handler. This gives an accurate picture of performance, but it tends to make the program being profiled run much slower than normal. In fact, the call profiled runs five times slower in this case under Python 2.4 and the 1.2 GHz test machine (6 seconds versus 1.2 seconds when not profiled). On the other hand, the profiler's report helps you isolate which functions to recode and possibly migrate to the C language. In the preceding listing, for instance, we can see that the intersect function (and the corresponding _ _and_ _ operator method) is the main drag on performance; it takes roughly five-sixths of the total execution time. Indexing (_ _getitem_ _ ) is a close second, most likely because it occurs so often with the repeated scans used by intersection. Union, on the other hand, is fairly quick from a relative perspective. Here is a profile of the dictionary-based set implementation of Example 20-11 for comparison; the code runs five times slower under the profiler again (1 second versus 0.2 seconds), though the relative speed of the list and dictionary-based set code is the same when both are profiled (6 seconds versus 1 second, the same sixfold difference we noticed before):

>>> import settime, timer, fastset >>> import profile >>> profile.run('timer.test(100, settime.setops, fastset.Set)') 111406 function calls in 1.049 CPU seconds Ordered by: standard name ncalls 2 17500 42500 500 1 1 1500 3400 38000 3400 1500 0 1 1500 1500 100 1

tottime 0.000 0.065 0.166 0.003 0.000 0.000 0.159 0.052 0.299 0.030 0.208 0.000 0.000 0.014 0.015 0.035 0.001

percall 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001

cumtime 0.000 0.065 0.166 0.003 0.000 1.049 0.439 0.052 0.445 0.082 0.533 0.000 1.049 0.548 0.454 1.048 1.049

percall 0.000 0.000 0.000 0.000 0.000 1.049 0.000 0.000 0.000 0.000 0.000 1.049 0.000 0.000 0.010 1.049

filename:lineno(function) :0(clock) :0(has_key) :0(keys) :0(range) :0(setprofile) <string>:1(?) fastset.py:15(union) fastset.py:23(concat) fastset.py:27(_ _getitem_ _) fastset.py:4(_ _init_ _) fastset.py:8(intersect) profile:0(profiler) profile:0(timer.test(100, ...more...)) set.py:27(_ _and_ _) set.py:28(_ _or_ _) settime.py:4(setops) timer.py:1(test)

This time, there is no obvious culprit behind the total execution time: intersection and union take roughly the same amount of time, and indexing is not much of a factor. Ultimately, the real difference is the exponential algorithm of list-based set intersection versus the linear nature of the dictionarybase algorithms, and this factor outweighs the choice of programming language used to implement the objects. Recoding in Python first is the best bet, since an exponential algorithm would be just as slow if implemented in C.

20.3.4. Optimizing fastset by Coding Techniques (or Not) As coded, there seems to be a bottleneck in the fastset class: each time we call a dictionary's keys method, Python makes a new list to hold the result. This can happen repeatedly during intersections and unions. If you are interested in trying to optimize this further, see the following files in the book's examples distribution: PP3E\Dstruct\Basic\fastset2.py PP3E\Dstruct\Basic\fastset3.py I wrote these to try to speed up sets further, but failed miserably. It turns out that adding extra code to try to shave operations usually negates the speedup you obtain. There may be faster codings, but the biggest win here was likely in changing the underlying data structure to dictionaries, not in minor code tweaks. As a rule of thumb, your intuition about performance is almost always wrong in a dynamic language such as Python: the algorithm is usually the real culprit behind performance problems, not the coding style or even the implementation language. By removing the combinatorial list scanning algorithm of the original set class, the Python implementation became dramatically faster. In fact, moving the original set class to C without fixing the algorithm would not have addressed the real performance problem. Coding tricks don't usually help as much either, and they make your programs difficult to understand. In Python, it's almost always best to code for readability first and optimize later if needed. Despite its simplicity, fastset is fast indeed.

20.3.5. Adding Relational Algebra to Sets (External) If you are interested in studying additional set-like operations coded in Python, see the following files in this book's examples distribution:

PP3E\Dstruct\Basic\rset.py RSet implementation

PP3E\Dstruct\Basic\reltest.py Test script for RSet The RSet subclass defined in rset.py adds basic relational algebra operations for sets of dictionaries. It assumes the items in sets are mappings (rows), with one entry per column (field). RSet inherits all the original Set operations (iteration, intersection, union, & and | operators, uniqueness filtering, and so on), and adds new operations as methods:

Select

Return a set of nodes that have a field equal to a given value.

Bagof Collect set nodes that satisfy an expression string.

Find Select tuples according to a comparison, field, and value.

Match Find nodes in two sets with the same values for common fields.

Product Compute a Cartesian product: concatenate tuples from two sets.

Join Combine tuples from two sets that have the same value for a field.

Project Extract named fields from the tuples in a table.

Difference Remove one set's tuples from another. Alternative implementations of set difference operations can also be found in the diff.py file in the same examples distribution directory.

20.4. Subclassing Built-In Types There is one more twist in the stack and set story, before we move on to some more classical data structures. In recent Python releases, it is also possible to subclass built-in datatypes such as lists and dictionaries, in order to extend them. That is, because datatypes now look and feel just like customizable classes, we can code unique datatypes that are extensions of built-ins, with subclasses that inherit built-in tool sets. For instance, here are our stack and set objects coded in the prior sections, revised as customized lists (the set union method has been simplified slightly here to remove a redundant loop):

class Stack(list): "a list with extra methods" def top(self): return self[-1] def push(self, item): list.append(self, item) def pop(self): if not self: return None else: return list.pop(self)

# avoid exception

class Set(list): " a list with extra methods and operators" def _ _init_ _(self, value=[]): # on object creation list._ _init_ _(self) self.concat(value) def intersect(self, other): res = [] for x in self: if x in other: res.append(x) return Set(res) def union(self, other): res = Set(self) res.concat(other) return res def concat(self, value): for x in value: if not x in self: self.append(x)

# other is any sequence type # self is the instance subject

# return a new Set

# new set with a copy of my list # insert uniques from other

# value: a list, string, Set... # filters out duplicates

# len, getitem, iter inherited, use list repr def _ _and_ _(self, other): return self.intersect(other)

def _ _or_ _(self, other): def _ _str_ _(self):

return self.union(other) return '<Set:' + repr(self) + '>'

class FastSet(dict): pass # this doesn't simplify much

The stack and set implemented in this code are essentially like those we saw earlier, but instead of embedding and managing a list, these objects really are customized lists. They add a few additional methods, but they inherit all of the list object's functionality. This can reduce the amount of wrapper code required, but it can also expose functionality that might not be appropriate in some cases. In the following self-test code, for example, we're able to sort and insert into stacks and reverse a set, because we've inherited these methods from the list object. In most cases, such operations don't make sense for the data structures in question, and the wrapper class approach of the prior sections may still be preferred:

def selfTest( ): # normal use cases stk = Stack( ) print stk for c in 'spam': stk.push(c) print stk, stk.top( ) while stk: print stk, stk.pop( ) print stk, stk.pop( ) print set = print print print print

Set('spam') set, 'p' in set set & Set('slim') set | 'slim' Set('slim') | Set('spam')

# downside? these work too print stk = Stack('spam') print stk stk.insert(1, 'X') # should only access top print stk stk.sort( ) # stack not usually sorted print stk set = Set('spam') set.reverse( ) print set, set[1]

# order should not matter

if _ _name_ _ == '_ _main_ _': selfTest( )

When run, this module produces the following results on standard output; we're able to treat the stack and set objects like lists, whether we should or not:

[] ['s', 'p', ['s', 'p', ['s', 'p', ['s', 'p'] ['s'] s [] None

'a', 'm'] m 'a', 'm'] m 'a'] a p

<Set:['s', <Set:['s', <Set:['s', <Set:['s',

'p', 'a', 'm']> True 'm']> 'p', 'a', 'm', 'l', 'i']> 'l', 'i', 'm', 'p', 'a']>

['s', 'p', ['s', 'X', ['X', 'a', <Set:['m',

'a', 'p', 'm', 'a',

'm'] 'a', 'm'] 'p', 's'] 'p', 's']> a

Subclassing built-in types has other applications, which may be more useful than those demonstrated by the preceding code. Consider a queue, or ordered dictionary, for example. The queue could take the form of a list subclass with get and put methods to insert on one end and delete from the other; the dictionary could be coded as a dictionary subclass with an extra list of keys that is sorted on insertion or request. Similarly, the built-in Python bool Boolean datatype is implemented as a subclass that customizes the integer int with a specialized display format (true is like integer 1, but it prints itself as the word True). You can also use type subclassing to alter the way built-in types behave; a list subclass could map indexes 1..N to built-in indexes 0..N-1, for instance:

# map 1..N to 0..N-1, by calling back to built-in version class MyList(list): def _ _getitem_ _(self, offset): print '(indexing %s at %s)' % (self, offset) return list._ _getitem_ _(self, offset - 1) if _ _name_ _ == '_ _main_ _': print list('abc') x = MyList('abc') print x print x[1] print x[3] x.append('spam'); print x x.reverse( ); print x % python typesubclass.py ['a', 'b', 'c'] ['a', 'b', 'c'] (indexing ['a', 'b', 'c'] at 1) a (indexing ['a', 'b', 'c'] at 3) c ['a', 'b', 'c', 'spam']

# # # # #

_ _init_ _ inherited from list _ _repr_ _ inherited from list MyList._ _getitem_ _ customizes list superclass method attributes from list superclass

['spam', 'c', 'b', 'a']

This works, but it is probably not a good idea in general. It would likely confuse its usersthey will expect Python's standard 0..N-1 indexing, unless they are familiar with the custom class. As a rule of thumb, type subclasses should probably adhere to the interface of the built-in types they customize.

The New Built-In Set Datatype Python has a way of conspiring to make my book examples obsolete over time. Beginning in version 2.4, the language sprouted a new built-in set datatype, which implements much of the set functionality that we coded in the set examples of this chapter (and more). It is implemented with some of the same algorithms we used, but it is available on all Pythons today. Built-in set usage is straightforward: set objects are created by calling the new built-in set function, passing in an iterable/sequence for the initial components of the set (sets are also available in 2.3, but the set creation call must be imported from a module):

>>> x = set('abcde') >>> y = set('bdxyz') >>> x set(['a', 'c', 'b', 'e', 'd']) >>> 'e' in x # membership True >>> x - y # difference set(['a', 'c', 'e']) >>> x | y # union set(['a', 'c', 'b', 'e', 'd', 'y', 'x', 'z']) >>> x & y # intersection set(['b', 'd'])

Interestingly, just like the dictionary-based optimized set we coded, built-in sets are unordered and require that all set components be hashable (immutable). In fact, their current implementation is based on wrapped dictionaries. Making a set with a dictionary of items works, but only because set uses the dictionary iterator, which returns the next key on each iteration (it ignores key values):

>>> x = set(['spam', 'ham', 'eggs']) >>> x set(['eggs', 'ham', 'spam'])

# list of immutables

>>> x = set([['spam', 'ham'], ['eggs']]) Traceback (most recent call last): File "", line 1, in -toplevelx = set([['spam', 'ham'], ['eggs']]) TypeError: list objects are unhashable

>>> x = set({'spam':[1, 1], 'ham': [2, 2], 'eggs':[3, 3]}) >>> x set(['eggs', 'ham', 'spam'])

Built-in sets also support operations such as superset testing, and they come in two flavors: mutable and frozen (and thus hashable, for sets of sets). For more details, see the set type in the built-in types section of the Python library manual. The set examples in this chapter are still useful as demonstrations of general data structure coding techniques, but they are not strictly required for set functionality in Python today. In fact, this is how Python tends to evolve over time: operations that are coded manually often enough wind up becoming built-in tools. I can't predict Python evolution, of course, but with any luck at all, the 10th edition of this book might be just a pamphlet.

20.5. Binary Search Trees Binary trees are a data structure that impose an order on inserted nodes: items less than a node are stored in the left subtree, and items greater than a node are inserted in the right. At the bottom, the subtrees are empty. Because of this structure, binary trees naturally support quick, recursive traversalsat least ideally, every time you follow a link to a subtree, you divide the search space in half.[*] [*]

If you're looking for a more graphical image of binary trees, skip ahead to the PyTree examples at the end of this chapter, or simply run PyTree on your own machine.

Binary trees are named for the implied branch-like structure of their subtree links. Typically, their nodes are implemented as a triple of values: (LeftSubtree, NodeValue, RightSubtree) . Beyond that, there is fairly wide latitude in the tree implementation. Here we'll use a class-based approach: BinaryTree is a header object, which initializes and manages the actual tree. EmptyNode is the empty object, shared at all empty subtrees (at the bottom). BinaryNode objects are nonempty tree nodes with a value and two subtrees.

Instead of coding distinct search functions, binary trees are constructed with "smart" objects (class instances) that know how to handle insert/lookup and printing requests and pass them to subtree objects. In fact, this is another example of the object-oriented programming (OOP) composition relationship in action: tree nodes embed other tree nodes and pass search requests off to the embedded subtrees. A single empty class instance is shared by all empty subtrees in a binary tree, and inserts replace an EmptyNode with a BinaryNode at the bottom (see Example 20-13).

Example 20-13. PP3E\Dstruct\Classics\btree.py

class BinaryTree: def _ _init_ _(self): def _ _repr_ _(self): def lookup(self, value): def insert(self, value):

self.tree = EmptyNode( ) return repr(self.tree) return self.tree.lookup(value) self.tree = self.tree.insert(value)

class EmptyNode: def _ _repr_ _(self): return '*' def lookup(self, value): return 0 def insert(self, value): return BinaryNode(self, value, self)

# fail at the bottom

# add new node at bottom

class BinaryNode: def _ _init_ _(self, left, value, right): self.data, self.left, self.right = value, left, right

def lookup(self, value): if self.data == value: return 1 elif self.data > value: return self.left.lookup(value) # look in left else: return self.right.lookup(value) # look in right def insert(self, value): if self.data > value: self.left = self.left.insert(value) # grow in left elif self.data < value: self.right = self.right.insert(value) # grow in right return self def _ _repr_ _(self): return ('( %s, %s, %s )' % (repr(self.left), repr(self.data), repr(self.right)))

As usual, BinaryTree can contain objects of any type that support the expected interface protocolhere, > and < comparisons. This includes class instances with the _ _cmp_ _ method. Let's experiment with this module's interfaces. The following code stuffs five integers into a new tree, and then searches for values 0 . . . 9:

C:\...\PP3E\Dstruct\Classics>python >>> from btree import BinaryTree >>> x = BinaryTree( ) >>> for i in [3,1,9,2,7]: x.insert(i) ... >>> for i in range(10): print (i, x.lookup(i)), ... (0, 0) (1, 1) (2, 1) (3, 1) (4, 0) (5, 0) (6, 0) (7, 1) (8, 0) (9, 1)

To watch this tree grow, add a print statement after each insert. Tree nodes print themselves as triples, and empty nodes print as *. The result reflects tree nesting:

>>> y = BinaryTree( ) >>> y * >>> for i in [3,1,9,2,7]: ... y.insert(i); print y ... ( *, 3, * ) ( ( *, 1, * ), 3, * ) ( ( *, 1, * ), 3, ( *, 9, * ) ) ( ( *, 1, ( *, 2, * ) ), 3, ( *, 9, * ) ) ( ( *, 1, ( *, 2, * ) ), 3, ( ( *, 7, * ), 9, * ) )

At the end of this chapter, we'll see another way to visualize trees in a GUI (which means you're invited to flip ahead now). Node values in this tree object can be any comparable Python objectfor

instance, here is a tree of strings:

>>> z = BinaryTree( ) >>> for c in 'badce': z.insert(c) ... >>> z ( ( *, 'a', * ), 'b', ( ( *, 'c', * ), 'd', ( *, 'e', * ) ) ) >>> z = BinaryTree( ) >>> for c in 'abcde': z.insert(c) ... >>> z ( *, 'a', ( *, 'b', ( *, 'c', ( *, 'd', ( *, 'e', * ) ) ) ) )

Notice the last result here: if items inserted into a binary tree are already ordered, you wind up with a linear structure and lose the search-space partitioning magic of binary trees (the tree grows in right branches only). This is a worst-case scenario, and binary trees generally do a good job of dividing values in practice. But if you are interested in pursuing this topic further, see a data structures text for tree-balancing techniques that automatically keep the tree as dense as possible. Also note that to keep the code simple, these trees store a value only and lookups return a 1 or (true or false). In practice, you sometimes may want to store both a key and an associated value (or even more) at each tree node. Example 20-14 shows what such a tree object looks like, for any prospective lumberjacks in the audience.

Example 20-14. PP3E\Dstruct\Classics\btree-keys.py

class KeyedBinaryTree: def _ _init_ _(self): def _ _repr_ _(self): def lookup(self, key): def insert(self, key, val):

self.tree = EmptyNode( ) return repr(self.tree) return self.tree.lookup(key) self.tree = self.tree.insert(key, val)

class EmptyNode: def _ _repr_ _(self): return '*' def lookup(self, key): return None def insert(self, key, val): return BinaryNode(self, key, val, self) class BinaryNode: def _ _init_ _(self, left, key, val, right): self.key, self.val = key, val self.left, self.right = left, right def lookup(self, key): if self.key == key: return self.val elif self.key > key: return self.left.lookup(key) else:

# fail at the bottom

# add node at bottom

# look in left

return self.right.lookup(key) # look in right def insert(self, key, val): if self.key == key: self.val = val elif self.key > key: self.left = self.left.insert(key, val) # grow in left elif self.key < key: self.right = self.right.insert(key, val) # grow in right return self def _ _repr_ _(self): return ('( %s, %s=%s, %s )' % (repr(self.left), repr(self.key), repr(self.val), repr(self.right))) if _ _name_ _ == '_ _main_ _': t = KeyedBinaryTree( ) for (key, val) in [('bbb', 1), ('aaa', 2), ('ccc', 3)]: t.insert(key, val) print t print t.lookup('aaa'), t.lookup('ccc') t.insert('ddd', 4) t.insert('aaa', 5) # changes key's value print t

Here is this script's self-test code at work; nodes simply have more content this time around:

C:\...\PP3E\Dstruct\Classics>python btree-keys.py ( ( *, 'aaa'=2, * ), 'bbb'=1, ( *, 'ccc'=3, * ) ) 2 3 ( ( *, 'aaa'=5, * ), 'bbb'=1, ( *, 'ccc'=3, ( *, 'ddd'=4, * ) ) )

20.6. Graph Searching Many problems can be represented as a graph, which is a set of states with transitions ("arcs") that lead from one state to another. For example, planning a route for a trip is really a graph search problem in disguisethe states are places you'd like to visit, and the arcs are the transportation links between them. This section presents simple Python programs that search through a directed, cyclic graph to find the paths between a start state and a goal. Graphs can be more general than trees because links may point at arbitrary nodeseven ones already searched (hence the word cyclic). The graph used to test searchers in this section is sketched in Figure 20-1. Arrows at the end of arcs indicate valid paths (e.g., A leads to B, E, and G). The search algorithms will traverse this graph in a depth-first fashion, and they will trap cycles in order to avoid looping. If you pretend that this is a map, where nodes represent cities and arcs represent roads, this example will probably seem a bit more meaningful.

Figure 20-1. A directed graph

The first thing we need to do is choose a way to represent this graph in a Python script. One approach is to use built-in datatypes and searcher functions. The file in Example 20-15 builds the test graph as a simple dictionary: each state is a dictionary key, with a list of keys of nodes it leads to (i.e., its arcs). This file also defines a function that we'll use to run a few searches in the graph.

Example 20-15. PP3E\Dstruct\Classics\gtestfunc.py

Graph = {'A': 'B': 'C': 'D': 'E': 'F': 'G':

['B', 'E', 'G'], ['C'], ['D', 'E'], ['F'], ['C', 'F', 'G'], [ ], ['A'] }

# a directed, cyclic graph # stored as a dictionary # 'key' leads-to [nodes]

def tests(searcher): # test searcher function print searcher('E', 'D', Graph) # find all paths from 'E' to 'D' for x in ['AG', 'GF', 'BA', 'DA']: print x, searcher(x[0], x[1], Graph)

Now, let's code two modules that implement the actual search algorithms. They are both independent of the graph to be searched (it is passed in as an argument). The first searcher, in Example 20-16, uses recursion to walk through the graph.

Example 20-16. PP3E\Dstruct\Classics\gsearch1.py

# find all paths from start to goal in graph def search(start, goal, graph): solns = [] generate([start], goal, solns, graph) solns.sort( lambda x, y: cmp(len(x), len(y)) ) return solns

# collect paths # sort by path length

def generate(path, goal, solns, graph): state = path[-1] if state == goal: # found goal here solns.append(path) # change solns in-place else: # check all arcs here for arc in graph[state]: # skip cycles on path if arc not in path: generate(path + [arc], goal, solns, graph) if _ _name_ _ == '_ _main_ _': import gtestfunc gtestfunc.tests(search)

The second searcher, in Example 20-17, uses an explicit stack of paths to be expanded using the tuple-tree stack representation we explored earlier in this chapter.

Example 20-17. PP3E\Dstruct\Classics\gsearch2.py

# use paths stack instead of recursion def search(start, goal, graph): solns = generate(([start], []), goal, graph) solns.sort( lambda x, y: cmp(len(x), len(y)) ) return solns def generate(paths, goal, graph): solns = [] while paths: front, paths = paths state = front[-1] if state == goal: solns.append(front) else: for arc in graph[state]: if arc not in front: paths = (front + [arc]), paths return solns

# returns solns list # use a tuple-stack # pop the top path

# goal on this path # add all extensions

if _ _name_ _ == '_ _main_ _': import gtestfunc gtestfunc.tests(search)

To avoid cycles, both searchers keep track of nodes visited along a path. If an extension is already on the current path, it is a loop. The resulting solutions list is sorted by increasing lengths using the list sort method and the built-in cmp comparison function. To test the searcher modules, simply run them; their self-test code calls the canned search test in the gtestfunc module:

C:\...\PP3E\Dstruct\Classics>python gsearch1.py [['E', 'C', 'D'], ['E', 'G', 'A', 'B', 'C', 'D']] AG [['A', 'G'], ['A', 'E', 'G'], ['A', 'B', 'C', 'E', 'G']] GF [['G', 'A', 'E', 'F'], ['G', 'A', 'B', 'C', 'D', 'F'], ['G', 'A', 'B', 'C', 'E', 'F'], ['G', 'A', 'E', 'C', 'D', 'F']] BA [['B', 'C', 'E', 'G', 'A']] DA [] C:\...\PP3E\Dstruct\Classics>python gsearch2.py [['E', 'C', 'D'], ['E', 'G', 'A', 'B', 'C', 'D']] AG [['A', 'G'], ['A', 'E', 'G'], ['A', 'B', 'C', 'E', 'G']] GF [['G', 'A', 'E', 'F'], ['G', 'A', 'E', 'C', 'D', 'F'], ['G', 'A', 'B', 'C', 'E', 'F'], ['G', 'A', 'B', 'C', 'D', 'F']] BA [['B', 'C', 'E', 'G', 'A']] DA []

This output shows lists of possible paths through the test graph; I added two line breaks to make it more readable. Notice that both searchers find the same paths in all tests, but the order in which they find those solutions may differ. The gsearch2 order depends on how and when extensions are added to its path's stack.

20.6.1. Moving Graphs to Classes Using dictionaries to represent graphs is efficient: connected nodes are located by a fast hashing operation. But depending on the application, other representations might make more sense. For instance, classes can be used to model nodes in a network too, much like the binary tree example earlier. As classes, nodes may contain content useful for more sophisticated searches. To illustrate, Example 20-18 shows an alternative coding for our graph searcher; its algorithm is closest to gsearch1 .

Example 20-18. PP3E\Dstruct\Classics\graph.py

# build graph with objects that know how to search class Graph: def _ _init_ _(self, label, extra=None): self.name = label self.data = extra self.arcs = []

# nodes=inst objects # graph=linked objs

def _ _repr_ _(self): return self.name def search(self, goal): Graph.solns = [] self.generate([self], goal) Graph.solns.sort(lambda x,y: cmp(len(x), len(y))) return Graph.solns def generate(self, path, goal): if self == goal: Graph.solns.append(path) else: for arc in self.arcs: if arc not in path: arc.generate(path + [arc], goal)

# class == tests addr # or self.solns: same

if _ _name_ _ == '_ _main_ _': import gtestobj1 gtestobj1.tests(Graph)

In this version, graphs are represented as a network of embedded class instance objects. Each node in the graph contains a list of the node objects it leads to (arcs), which it knows how to search. The generate method walks through the objects in the graph. But this time, links are directly available on each node's arcs list; there is no need to index (or pass) a dictionary to find linked objects. To test, the module in Example 20-19 builds the test graph again, this time using linked instances of the Graph class. Notice the use of exec in the self-test code: it executes dynamically constructed strings to do the work of seven assignment statements (A=Graph('A'), B=Graph('B'), and so on).

Example 20-19. PP3E\Dstruct\Classics\gtestobj1.py

def tests(Graph): for name in "ABCDEFG": exec "%s = Graph('%s')" % (name, name) A.arcs B.arcs C.arcs D.arcs E.arcs G.arcs

= = = = = =

[B, E, G] [C] [D, E] [F] [C, F, G] [A]

# make objects first # label=variable-name

# now configure their links: # embedded class-instance list

A.search(G) for (start, stop) in [(E,D), (A,G), (G,F), (B,A), (D,A)]: print start.search(stop)

Run this test by running the graph module to pass in a graph class, like this:

C:\...\PP3E\Dstruct\Classics>python graph.py [[E, C, D], [E, G, A, B, C, D]] [[A, G], [A, E, G], [A, B, C, E, G]] [[G, A, E, F], [G, A, B, C, D, F], [G, A, B, C, E, F], [G, A, E, C, D, F]] [[B, C, E, G, A]] []

The results are the same as for the functions, but node name labels are not quoted: nodes on path lists here are Graph instances, and this class's _ _repr_ _ scheme suppresses quotes. Example 20-20 is one last graph test before we move on; sketch the nodes and arcs on paper if you have more trouble following the paths than Python.

Example 20-20. PP3E\Dstruct\Classics\gtestobj2.py

from graph import Graph S P A M

= = = =

Graph('s') Graph('p') Graph('a') Graph('m')

S.arcs = [P, M] P.arcs = [S, M, A] A.arcs = [M] print S.search(M)

# a graph of spam # make node objects

# S leads to P and M # arcs: embedded objects # find all paths from S to M

This test finds three paths in its graph between nodes S and M. If you'd like to see more Python graph code, check out the files in the directory MoreGraphs in this book's examples distribution. These are roughly the same as the ones listed here, but they add user interaction as each solution is found. In addition, we've really only scratched the surface of this domain here; see other books for additional topics (e.g., breadth- and best-first search):

C:\...\PP3E\Dstruct\Classics>python gtestobj2.py [[s, m], [s, p, m], [s, p, a, m]]

20.7. Reversing Sequences Reversal of collections is another typical operation. We can code it either recursively or iteratively in Python, and as functions or class methods. Example 20-21 is a first attempt at two simple reversal functions.

Example 20-21. PP3E\Dstruct\Classics\rev1.py

def reverse(list): # recursive if list == []: return [] else: return reverse(list[1:]) + list[:1] def ireverse(list): # iterative res = [] for x in list: res = [x] + res return res

Both reversal functions work correctly on lists. But if we try reversing nonlist sequences (strings, tuples, and so on) we're in trouble: the ireverse function always returns a list for the result regardless of the type of sequence passed:

>>> ireverse("spam") ['m', 'a', 'p', 's']

Much worse, the recursive reverse version won't work at all for nonlistsit gets stuck in an infinite loop. The reason is subtle: when reverse reaches the empty string (""), it's not equal to the empty list ([]), so the else clause is selected. But slicing an empty sequence returns another empty sequence (indexes are scaled): the else clause recurs again with an empty sequence, without raising an exception. The net effect is that this function gets stuck in a loop, calling itself over and over again until Python runs out of memory. The versions in Example 20-22 fix both problems by using generic sequence handling techniques: reverse uses the not operator to detect the end of the sequence and returns the empty

sequence itself, rather than an empty list constant. Since the empty sequence is the type of the original argument, the + operation always builds the correct type sequence as the recursion unfolds. ireverse makes use of the fact that slicing a sequence returns a sequence of the same type. It

first initializes the result to the slice [:0], a new, empty slice of the argument's type. Later, it uses slicing to extract one-node sequences to add to the result's front, instead of a list constant.

Example 20-22. PP3E\Dstruct\Classics\rev2.py

def reverse(list): if not list: return list else: return reverse(list[1:]) + list[:1] def ireverse(list): res = list[:0] for i in range(len(list)): res = list[i:i+1] + res return res

# empty? (not always []) # the same sequence type # add front item on the end

# empty, of same type # add each item to front

These functions work on any sequence, and they return a new sequence of the same type as the sequence passed in. If we pass in a string, we get a new string as the result. In fact, they reverse any sequence object that responds to slicing, concatenation, and len even instances of Python classes and C types. In other words, they can reverse any object that has sequence interface protocols. Here they are working on lists, strings, and tuples:

% python >>> from rev2 import * >>> reverse([1,2,3]), ireverse([1,2,3]) ([3, 2, 1], [3, 2, 1]) >>> reverse("spam"), ireverse("spam") ('maps', 'maps') >>> reverse((1.2, 2.3, 3.4)), ireverse((1.2, 2.3, 3.4)) ((3.4, 2.3, 1.2), (3.4, 2.3, 1.2))

20.8. Permuting Sequences The functions defined in Example 20-23 shuffle sequences in a number of ways: permute constructs a list with all valid permutations of any sequence. subset constructs a list with all valid permutations of a specific length. combo works like subset, but order doesn't matter: permutations of the same items are filtered

out. These results are useful in a variety of algorithms: searches, statistical analysis, and more. For instance, one way to find an optimal ordering for items is to put them in a list, generate all possible permutations, and simply test each one in turn. All three of the functions make use of the generic sequence slicing tricks of the reversal functions in the prior section so that the result list contains sequences of the same type as the one passed in (e.g., when we permute a string, we get back a list of strings).

Example 20-23. PP3E\Dstruct\Classics\permcomb.py

def permute(list): if not list: return [list] else: res = [] for i in range(len(list)): rest = list[:i] + list[i+1:] for x in permute(rest): res.append(list[i:i+1] + x) return res def subset(list, size): if size == 0 or not list: return [list[:0]] else: result = [] for i in range(len(list)): pick = list[i:i+1] rest = list[:i] + list[i+1:] for x in subset(rest, size-1): result.append(pick + x) return result def combo(list, size): if size == 0 or not list: return [list[:0]] else:

# shuffle any sequence # empty sequence

# delete current node # permute the others # add node at front

# order matters here # an empty sequence

# sequence slice # keep [:i] part

# order doesn't matter # xyz == yzx

result = [] for i in range(0, (len(list) - size) + 1): pick = list[i:i+1] rest = list[i+1:] for x in combo(rest, size - 1): result.append(pick + x) return result

# iff enough left # drop [:i] part

As in the reversal functions, all three of these work on any sequence object that supports len , slicing, and concatenation operations. For instance, we can use permute on instances of some of the stack classes defined at the start of this chapter; we'll get back a list of stack instance objects with shuffled nodes. Here are our sequence shufflers in action. Permuting a list enables us to find all the ways the items can be arranged. For instance, for a four-item list, there are 24 possible permutations (4 x 3 x 2 x 1). After picking one of the four for the first position, there are only three left to choose from for the second, and so on. Order matters: [1,2,3] is not the same as [1,3,2], so both appear in the result:

C:\...\PP3E\Dstruct\Classics>python >>> from permcomb import * >>> permute([1,2,3]) [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]] >>> permute('abc') ['abc', 'acb', 'bac', 'bca', 'cab', 'cba'] >>> permute('help') ['help', 'hepl', 'hlep', 'hlpe', 'hpel', 'hple', 'ehlp', 'ehpl', 'elhp', 'elph', 'ephl', 'eplh', 'lhep', 'lhpe', 'lehp', 'leph', 'lphe', 'lpeh', 'phel', 'phle', 'pehl', 'pelh', 'plhe', 'pleh']

combo results are related to permutations, but a fixed-length constraint is put on the result, and order doesn't matter: abc is the same as acb , so only one is added to the result set:

>>> combo([1,2,3], 3) [[1, 2, 3]] >>> combo('abc', 3) ['abc'] >>> combo('abc', 2) ['ab', 'ac', 'bc'] >>> combo('abc', 4) [] >>> combo((1, 2, 3, 4), 3) [(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)] >>> for i in range(0, 6): print i, combo("help", i) ... 0 [''] 1 ['h', 'e', 'l', 'p'] 2 ['he', 'hl', 'hp', 'el', 'ep', 'lp'] 3 ['hel', 'hep', 'hlp', 'elp'] 4 ['help'] 5 []

Finally, subset is just fixed-length permutations; order matters, so the result is larger than for combo. In fact, calling subset with the length of the sequence is identical to permute:

>>> subset([1,2,3], 3) [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]] >>> subset('abc', 3) ['abc', 'acb', 'bac', 'bca', 'cab', 'cba'] >>> for i in range(0, 6): print i, subset("help", i) ... 0 [''] 1 ['h', 'e', 'l', 'p'] 2 ['he', 'hl', 'hp', 'eh', 'el', 'ep', 'lh', 'le', 'lp', 'ph', 'pe', 'pl'] 3 ['hel', 'hep', 'hle', 'hlp', 'hpe', 'hpl', 'ehl', 'ehp', 'elh', 'elp', 'eph', 'epl', 'lhe', 'lhp', 'leh', 'lep', 'lph', 'lpe', 'phe', 'phl', 'peh', 'pel', 'plh', 'ple'] 4 ['help', 'hepl', 'hlep', 'hlpe', 'hpel', 'hple', 'ehlp', 'ehpl', 'elhp', 'elph', 'ephl', 'eplh', 'lhep', 'lhpe', 'lehp', 'leph', 'lphe', 'lpeh', 'phel', 'phle', 'pehl', 'pelh', 'plhe', 'pleh'] 5 ['help', 'hepl', 'hlep', 'hlpe', 'hpel', 'hple', 'ehlp', 'ehpl', 'elhp', 'elph', 'ephl', 'eplh', 'lhep', 'lhpe', 'lehp', 'leph', 'lphe', 'lpeh', 'phel', 'phle', 'pehl', 'pelh', 'plhe', 'pleh']

20.9. Sorting Sequences Another staple of many systems is sorting: ordering items in a collection according to some constraint. The script in Example 20-24 defines a simple sort routine in Python, which orders a list of objects on a field. Because Python indexing is generic, the field can be an index or a keythis function can sort lists of either sequences or mappings.

Example 20-24. PP3E\Dstruct\Classics\sort1.py

def sort(list, field): res = [] for x in list: i = 0 for y in res: if x[field] y['name'])) print sort(tuple(table), (lambda x, y: x['name'] python sort2.py [{'name': 'john'}, {'name': 'doe'}] ({'name': 'doe'}, {'name': 'john'}) abcxyz

This version also dispenses with the notion of a field altogether and lets the passed-in function handle indexing if needed. That makes this version much more general; for instance, it's also useful for sorting strings.

20.10. Data Structures Versus Python Built-Ins Now that I've shown you all of these complicated algorithms, I need to also tell you that at least in some cases, they may not be an optimal approach. Built-in types such as lists and dictionaries are often a simpler and more efficient way to represent data. For instance:

Binary trees These may be useful in many applications, but Python dictionaries already provide a highly optimized, C-coded, search table tool. Indexing a dictionary by key is likely to be faster than searching a Python-coded tree structure: >>> >>> >>> (0,

x = {} for i in [3,1,9,2,7]: x[i] = None # insert for i in range(10): print (i, x.has_key(i)), # lookup 0) (1, 1) (2, 1) (3, 1) (4, 0) (5, 0) (6, 0) (7, 1) (8, 0) (9, 1)

Because dictionaries are built into the language, they are always available and will usually be faster than Python-based data structure implementations.

Graph algorithms These serve many purposes, but a purely Python-coded implementation of a very large graph might be less efficient than you want in some applications. Graph programs tend to require peak performance; using dictionaries rather than class instances to represent graphs may boost performance some, but using linked-in compiled extensions may as well.

Sorting algorithms These are an important part of many programs too, but Python's built-in list sort method is so fast that you would be hard-pressed to beat it in Python in most scenarios. In fact, it's generally better to convert sequences to lists first just so that you can use the built-in:[*] [*] Recent

news: in Python 2.4, the sort list method also accepts a Boolean reverse flag to reverse the result (there is no need to manually reverse after the sort), and there is a new sorted built-in function, which returns its result list and works on any iterable, not just on lists (there is no need to convert to a list to sort). Python makes lives easier over time. The underlying sort routine in Python is very good, by the way. In fact, its documentation claims that it has "supernatural performance"not bad for a sorter.

temp = list(sequence) temp.sort( ) ...use items in temp...

For custom sorts, simply pass in a comparison function of your own:

>>> L = [{'n':3}, {'n':20}, {'n':0}, {'n':9}] >>> L.sort( lambda x, y: cmp(x['n'], y['n']) ) >>> L [{'n': 0}, {'n': 3}, {'n': 9}, {'n': 20}]

Reversal algorithms These are generally superfluous by the same tokenbecause Python lists provide a fast reverse method, you may be better off converting a nonlist to a list first, just so that you can run the built-in list method. Don't misunderstand: sometimes you really do need objects that add functionality to built-in types or do something more custom. The set classes we met, for instance, add tools not directly supported by Python today, and the tuple-tree stack implementation was actually faster than one based on built-in lists for common usage patterns. Permutations are something you need to add on your own too. Moreover, class encapsulations make it possible to change and extend object internals without impacting the rest of your system. They also support reuse much better than built-in typestypes are not classes today, and they cannot be specialized directly without wrapper class logic. Yet because Python comes with a set of built-in, flexible, and optimized datatypes, data structure implementations are often not as important in Python as they are in lesser-equipped languages such as C and C++. Before you code that new datatype, be sure to ask yourself whether a built-in type or call might be more in line with the Python way of thinking.

20.11. PyTree: A Generic Tree Object Viewer Up to now, this chapter has been command-line-oriented. To wrap up, I want to show you a program that merges the GUI technology we studied earlier in the book with some of the data structure ideas we've met in this chapter. This program is called PyTree, a generic tree data structure viewer written in Python with the Tkinter GUI library. PyTree sketches out the nodes of a tree on-screen as boxes connected by arrows. It also knows how to route mouse clicks on drawn tree nodes back to the tree, to trigger tree-specific actions. Because PyTree lets you visualize the structure of the tree generated by a set of parameters, it's a fun way to explore tree-based algorithms. PyTree supports arbitrary tree types by "wrapping" real trees in interface objects. The interface objects implement a standard protocol by communicating with the underlying tree object. For the purposes of this chapter, PyTree is instrumented to display binary search trees; for the next chapter, it's also set up to render expression parse trees. New trees can be viewed by coding wrapper classes to interface to new tree types. The GUI interfaces PyTree utilizes were covered in depth earlier in this book, so I won't go over this code in much detail here. See Part III for background details and be sure to run this program on your own computer to get a better feel for its operation. Because it is written with Python and Tkinter, it should be portable to Windows, Unix, and Macs.

20.11.1. Running PyTree Before we get to the code, let's see what PyTree looks like. You can launch PyTree from the PyDemos launcher bar (see the top level of the examples distribution source tree) or by directly running the treeview.py file listed in Example 20-27. Figure 20-2 shows PyTree in action displaying the binary tree created by the "test1" button. Trees are sketched as labels embedded in a canvas and are connected by lines with arrows. The lines reflect parent-to-child relationships in the actual tree; PyTree attempts to lay out the tree to produce a more or less uniform display like this one.

Figure 20-2. PyTree viewing a binary search tree (test1)

PyTree's window consists of a canvas with vertical and horizontal scrolls and a set of controls at the bottom: radio buttons for picking the type of tree you wish to display, a set of buttons that trigger canned tree drawing tests, and an input field for typing text to specify and generate a new tree. The set of test buttons changes if you pick the Parser radio button (you get one less test button); PyTree use widget pack_forget and pack methods to hide and show tree-specific buttons on the fly. When you pick one of the canned test buttons, it displays in the input field the string you would type to generate the tree drawn. For binary trees, type a list of values separated by spaces and press the "input" button or the Enter key to generate a new tree; the new tree is the result of inserting the typed values from left to right. For parse trees, input an expression string in the input field instead (more on this later). Figure 20-3 shows the result of typing a set of values into the input field and submitting; the resulting binary tree shows up in the canvas.

Figure 20-3. A binary tree typed manually with on-click pop up

Notice the pop up in this screenshot; left-clicking on a displayed tree node with your mouse runs whatever action a tree wrapper class defines and displays its result in the pop up. Binary trees have no action to run, so we get a default message in the pop up, but parse trees use the mouse click to evaluate the subtree rooted at the clicked node (again, more on parse trees later). Just for fun, maximize this window and press the "test4" buttonit inserts 100 numbers from zero through 99 into a new binary tree at random and displays the result. Figure 20-4 captures one portion of this tree; it's much too large to fit on one screen (or on one book page), but you can move around the tree with the canvas scroll bars.

Figure 20-4. PyTree viewing a large binary search tree (test4)

PyTree uses an algorithm to connect all parents to their children in this tree without crossing their connecting lines. It does some upfront analysis to try to arrange descendents at each level to be as close to their parents as possible. This analysis step also yields the overall size of a new treePyTree uses it to reset the scrollable area size of the canvas for each tree drawn.

20.11.2. PyTree Source Code Let's move on to the code; similar to PyForm in the prior chapter, PyTree is coded as two modules. Here, one module handles the task of sketching trees in the GUI, and another implements wrappers to interface to various tree types and extends the GUI with extra widgets.

20.11.2.1. Tree-independent GUI implementation The module in Example 20-26 does the work of drawing trees in a canvas. It's coded to be independent of any particular tree structureits TReeViewer class delegates to its treeWrapper class when it needs tree-specific information for the drawing (e.g., node label text and node child links). treeWrapper in turn expects to be subclassed for a specific kind of tree; in fact, it raises assertion errors if you try to use it without subclassing. In design terms, treeViewer embeds a treeWrapper; it's almost as easy to code TReeViewer subclasses per tree type, but that limits a viewer GUI to one particular kind of tree (see treeview_subclasses.py on the book's examples distribution for a subclassing-based alternative).

Trees are drawn in two steps: a planning traversal that builds a layout data structure that links parents and children, and a drawing step that uses the generated plan to draw and link node labels on the canvas. The two-step approach simplifies some of the logic required to lay out trees uniformly. Study Example 20-26 for more details.

Example 20-26. PP3E\Dstruct\TreeView\treeview_wrappers.py

######################################################################### # PyTree: sketch arbitrary tree data structures in a scrolled canvas; # this version uses tree wrapper classes embedded in the viewer GUI # to support arbitrary trees (i.e., composition, not viewer subclassing); # also adds tree node label click callbacks--run tree specific actions; # see treeview_subclasses.py for subclass-based alternative structure; # subclassing limits one tree viewer to one tree type, wrappers do not; # see treeview_left.py for an alternative way to draw the tree object; # see and run treeview.py for binary and parse tree wrapper test cases; ######################################################################### from Tkinter import * from tkMessageBox import showinfo Width, Height = 350, 350 Rowsz = 100 Colsz = 100

# start canvas size (reset per tree) # pixels per tree row # pixels per tree col

################################### # interface to tree object's nodes ################################### class TreeWrapper: # subclass for a tree type def children(self, treenode): assert 0, 'children method must be specialized for tree type' def label(self, treenode): assert 0, 'label method must be specialized for tree type' def value(self, treenode): return '' def onClick(self, treenode): # node label click callback return '' def onInputLine(self, line, viewer): # input line sent callback pass ###########$###################### # tree view GUI, tree independent ################################## class TreeViewer(Frame): def _ _init_ _(self, wrapper, parent=None, tree=None, bg='brown', fg='beige'): Frame._ _init_ _(self, parent) self.pack(expand=YES, fill=BOTH) self.makeWidgets(bg) # build GUI: scrolled canvas self.master.title('PyTree 1.0') # assume I'm run standalone self.wrapper = wrapper # embed a TreeWrapper object self.fg = fg # setTreeType changes wrapper

if tree: self.drawTree(tree) def makeWidgets(self, bg): self.title = Label(self, text='PyTree 1.0') self.canvas = Canvas(self, bg=bg, borderwidth=0) vbar = Scrollbar(self) hbar = Scrollbar(self, orient='horizontal') self.title.pack(side=TOP, fill=X) vbar.pack(side=RIGHT, fill=Y) # pack canvas after bars hbar.pack(side=BOTTOM, fill=X) self.canvas.pack(side=TOP, fill=BOTH, expand=YES) vbar.config(command=self.canvas.yview) hbar.config(command=self.canvas.xview) self.canvas.config(yscrollcommand=vbar.set) self.canvas.config(xscrollcommand=hbar.set) self.canvas.config(height=Height, width=Width)

# call on scroll move # call on canvas move # viewable area size

def clearTree(self): mylabel = 'PyTree 1.0 - ' + self.wrapper._ _class_ _._ _name_ _ self.title.config(text=mylabel) self.unbind_all('<Button-1>') self.canvas.delete('all') # clear events, drawing def drawTree(self, tree): self.clearTree( ) wrapper = self.wrapper levels, maxrow = self.planLevels(tree, wrapper) self.canvas.config(scrollregion=( 0, 0, (Colsz * maxrow), (Rowsz * len(levels)) )) self.drawLevels(levels, maxrow, wrapper)

# scrollable area # upleft, lowright

def planLevels(self, root, wrap): levels = [] maxrow = 0 # traverse tree to currlevel = [(root, None)] # lay out rows, cols while currlevel: levels.append(currlevel) size = len(currlevel) if size > maxrow: maxrow = size nextlevel = [] for (node, parent) in currlevel: if node != None: children = wrap.children(node) # list of nodes if not children: nextlevel.append((None, None)) # leave a hole else: for child in children: nextlevel.append((child, node)) # parent link currlevel = nextlevel return levels, maxrow def drawLevels(self, levels, maxrow, wrap): rowpos = 0

# draw tree per plan

for level in levels: # set click handlers colinc = (maxrow * Colsz) / (len(level) + 1) # levels is treenodes colpos = 0 for (node, parent) in level: colpos = colpos + colinc if node != None: text = wrap.label(node) more = wrap.value(node) if more: text = text + '=' + more win = Label(self.canvas, text=text, bg=self.fg, bd=3, relief=RAISED) win.pack( ) win.bind('<Button-1>', (lambda evt, node=node: self.onClick(evt, node))) self.canvas.create_window(colpos, rowpos, anchor=NW, window=win, width=Colsz*.5, height=Rowsz*.5) if parent != None: self.canvas.create_line( parent._ _colpos + Colsz*.25, # from x-y, to x-y parent._ _rowpos + Rowsz*.5, colpos + Colsz*.25, rowpos, arrow='last', width=1) node._ _rowpos = rowpos node._ _colpos = colpos # mark node, private attrs rowpos = rowpos + Rowsz def onClick(self, event, node): label = event.widget wrap = self.wrapper text = 'Label = ' + wrap.label(node) value = wrap.value(node) if value: text = text + '\nValue = ' + value result = wrap.onClick(node) if result: text = text + '\n' + result showinfo('PyTree', text)

# add action result # pop up std dialog

def onInputLine(self, line): self.wrapper.onInputLine(line, self)

# feed text to tree wrapper # ex: parse and redraw tree

def setTreeType(self, newTreeWrapper): if self.wrapper != newTreeWrapper: self.wrapper = newTreeWrapper self.clearTree( )

# change tree type drawn # effective on next draw

# on label click

# add tree text if any # run tree action if any

# else old node, new wrapper

20.11.2.2. Tree wrappers and test widgets The other half of PyTree consists of a module that defines treeWrapper subclasses that interface to binary and parser trees, implements canned test case buttons, and adds the control widgets to the bottom of the PyTree window.[*] These control widgets were split off into this separate module (in Example 20-27) on purpose, because the PyTree canvas might be useful as a viewer component in other GUI applications.

[*]

If you're looking for a coding exercise, try adding another wrapper class and radio button to view the KeyedBinaryTree we wrote earlier in this chapter. You'll probably want to display the key in the GUI and pop up the associated value on-clicks.

Example 20-27. PP3E\Dstruct\TreeView\treeview.py

# PyTree launcher script # wrappers for viewing tree types in the book, plus test cases/GUI from from from from

Tkinter import * treeview_wrappers import TreeWrapper, TreeViewer PP3E.Dstruct.Classics import btree PP3E.Lang.Parser import parser2

################################################################### # binary tree wrapper ################################################################### class BinaryTreeWrapper(TreeWrapper): def children(self, node): try: return [node.left, node.right] except: return None def label(self, node): try: return str(node.data) except: return str(node) def onInputLine(self, line, viewer): items = line.split( ) t = btree.BinaryTree( ) for x in items: t.insert(x) viewer.drawTree(t.tree)

# embed binary tree in viewer # adds viewer protocols # to interface with tree

# on test entry at bottom # make tree from text input # draw resulting btree # no onClick handler here

################################################################### # binary tree extension ################################################################### class BinaryTree(btree.BinaryTree): def _ _init_ _(self, viewer): btree.BinaryTree._ _init_ _(self) self.viewer = viewer def view(self): self.viewer.drawTree(self.tree)

# embed viewer in tree # but viewer has a wrapper

################################################################### # parse tree wrapper ################################################################### class ParseTreeWrapper(TreeWrapper): def _ _init_ _(self): self.dict = {} def children(self, node):

# embed parse tree in viewer # adds viewer protocols

try: return [node.left, node.right] except: try: return [node.var, node.val] except: return None def label(self, node): for attr in ['label', 'num', 'name']: if hasattr(node, attr): return str(getattr(node, attr)) return 'set' def onClick(self, node): try: result = node.apply(self.dict) return 'Value = ' + str(result) except: return 'Value = <error>' def onInputLine(self, line, viewer): p = parser2.Parser( ) p.lex.newtext(line) t = p.analyse( ) if t: viewer.drawTree(t)

# # # #

on tree label click tree-specific action evaluate subtree show result in pop up

# on input line # parse expr text # draw resulting tree

################################################################### # canned test cases (or type new nodelists/exprs in input field) ################################################################### def shownodes(sequence): sequence = map(str, sequence) entry.delete(0, END) entry.insert(0, ' '.join(sequence)) def test1_binary( ): nodes = [3, 1, 9, 2, 7] tree = BinaryTree(viewer) for i in nodes: tree.insert(i) shownodes(nodes) tree.view( ) def test2_binary( ): nodes = 'badce' tree = btree.BinaryTree( ) for c in nodes: tree.insert(c) shownodes(nodes) viewer.drawTree(tree.tree)

# convert nodes to strings # show nodes in text field # tree type is binary wrapper # make a binary tree # embed viewer in tree # show nodes in input field # sketch tree via embedded viewer

# embed wrapper in viewer # make a binary tree # ask viewer to draw it

def test3_binary( ): nodes = 'abcde' tree = BinaryTree(viewer) for c in nodes: tree.insert(c) shownodes(nodes) tree.view( ) def test4_binary( ): tree = BinaryTree(viewer) import random

# make a big binary tree

nodes = range(100) order = [] while nodes: item = random.choice(nodes) nodes.remove(item) tree.insert(item) order.append(item) shownodes(order) tree.view( ) def test_parser(expr): parser = parser2.Parser( ) parser.lex.newtext(expr) tree = parser.analyse( ) entry.delete(0, END) entry.insert(0, expr) if tree: viewer.drawTree(tree)

# insert 100 nodes at random # and sketch in viewer

# tree type is parser wrapper # subtrees evaluate when clicked # input line parses new expr # vars set in wrapper dictionary # see lang/text chapter for parser

def test1_parser( ): test_parser("1 + 3 * (2 * 3 + 4)") def test2_parser( ): test_parser("set temp 1 + 3 * 2 * 3 + 4") def test3_parser( ): test_parser("set result temp + ((1 + 3) * 2) * (3 + 4)") ################################################################### # build viewer with extra widgets to test tree types ################################################################### if _ _name_ _ == '_ _main_ _': root = Tk( ) bwrapper = BinaryTreeWrapper( ) pwrapper = ParseTreeWrapper( ) viewer = TreeViewer(bwrapper, root)

# build a single viewer GUI # add extras: input line, test btns # make wrapper objects # start out in binary mode

def onRadio( ): if var.get( ) == 'btree': viewer.setTreeType(bwrapper) for btn in p_btns: btn.pack_forget( ) for btn in b_btns: btn.pack(side=LEFT) elif var.get( ) == 'ptree': viewer.setTreeType(pwrapper) for btn in b_btns: btn.pack_forget( ) for btn in p_btns: btn.pack(side=LEFT)

# change viewer's wrapper # erase parser test buttons # unhide binary buttons

var = StringVar( ) var.set('btree') Radiobutton(root, text='Binary', command=onRadio, variable=var, value='btree').pack(side=LEFT) Radiobutton(root, text='Parser', command=onRadio, variable=var, value='ptree').pack(side=LEFT) b_btns = [] b_btns.append(Button(root, text='test1', command=test1_binary)) b_btns.append(Button(root, text='test2', command=test2_binary)) b_btns.append(Button(root, text='test3', command=test3_binary)) b_btns.append(Button(root, text='test4', command=test4_binary)) p_btns = [] p_btns.append(Button(root, text='test1', command=test1_parser)) p_btns.append(Button(root, text='test2', command=test2_parser))

p_btns.append(Button(root, text='test3', command=test3_parser)) onRadio( ) def onInputLine( ): line = entry.get( ) viewer.onInputLine(line)

# use per current tree wrapper type # type a node list or expression

Button(root, text='input', command=onInputLine).pack(side=RIGHT) entry = Entry(root) entry.pack(side=RIGHT, expand=YES, fill=X) entry.bind('', lambda event: onInputLine( )) # button or enter key root.mainloop( ) # start up the GUI

20.11.3. PyTree Does Parse Trees Too Finally, I want to show you what happens when you click the Parser radio button in the PyTree window. The GUI changes over to an expression parse tree viewer by simply using a different tree wrapper class: the label at the top changes, the test buttons change, and input is now entered as an arithmetic expression to be parsed and sketched. Figure 20-5 shows a tree generated for the expression string displayed in the input field.

Figure 20-5. PyTree viewing an expression parse tree

PyTree is designed to be genericit displays both binary and parse trees, but it is easy to extend for new tree types with new wrapper classes. On the GUI, you can switch between binary and parser tree types at any time by clicking the radio buttons. Input typed into the input field is always evaluated according to the current tree type. When the viewer is in parse tree mode, clicking on a node in the tree evaluates the part of the expression represented by the parse tree rooted at the node you clicked. Figure 20-6 shows the pop up you get when you click the root node of the tree displayed.

Figure 20-6. PyTree pop up after clicking a parse tree node

When viewing parse trees, PyTree becomes a sort of visual calculatoryou can generate arbitrary expression trees and evaluate any part of them by clicking on nodes displayed. But at this point, there is not much more I can tell you about these kinds of trees until you move on to Chapter 21.

Chapter 21. Text and Language Section 21.1. "See Jack Hack. Hack, Jack, Hack" Section 21.2. Strategies for Parsing Text in Python Section 21.3. String Method Utilities Section 21.4. Regular Expression Pattern Matching Section 21.5. Advanced Language Tools Section 21.6. Handcoded Parsers Section 21.7. PyCalc: A Calculator Program/Object

21.1. "See Jack Hack. Hack, Jack, Hack" In one form or another, processing text- based information is one of the more common tasks that applications need to perform. This can include anything from scanning a text file by columns to analyzing statements in a language defined by a formal grammar. Such processing usually is called parsinganalyzing the structure of a text string. In this chapter, we'll explore ways to handle language and text-based information and summarize some Python development concepts in sidebars along the way. Some of this material is advanced, but the examples are small. For instance, recursive descent parsing is illustrated with a simple example to show how it can be implemented in Python. We'll also see that it's often unnecessary to write custom parsers for each language processing task in Python. They can usually be replaced by exporting APIs for use in Python programs, and sometimes by a single built-in function call. Finally, this chapter closes by presenting PyCalca calculator GUI written in Python, and the last major Python coding example in this text. As we'll see, writing calculators isn't much more difficult than juggling stacks while scanning text.

21.2. Strategies for Parsing Text in Python In the grand scheme of things, there are a variety of ways to handle text processing in Python: Built-in string object expressions String object method calls Regular expression matching Parser-generator integrations Handcoded and generated parsers Running Python code with eval and exec built-ins For simpler tasks, Python's built-in string object is often all we really need. Python strings can be indexed, concatenated, sliced, and processed with both string method calls and built-in functions. Our emphasis in this chapter, though, is on higher-level tools and techniques for analyzing textual information. Let's briefly explore each of the other approaches with representative examples.

21.3. String Method Utilities Python's string methods include a variety of text-processing utilities that go above and beyond string expression operators. For instance, given an instance str of the built-in string object type:

str.find(substr) Performs substring searches

str.replace(old, new) Performs substring substitutions

str.split(delim) Chops up a string around delimiters

str.join(seq) Puts substrings together with delimiters between

str.strip( ) Removes leading and trailing whitespace

str.rstrip( ) Removes trailing whitespace only, if any

str.rjust(width) Right-justifies a string in a fixed-width field

str.upper( ) Converts to uppercase

str.isupper( ) Tests whether the string is uppercase

str.isdigit( ) Tests whether the string is all digit characters

str.endswith(substr) Tests for a substring at the end

str.startswith(substr) Tests for a substring at the front This list is representative but partial, and some of these methods take additional optional arguments. For the full list of string methods, run a dir(str) call at the Python interactive prompt and run help(str.method) on any method for some quick documentation. The Python library manual also includes an exhaustive list. Moreover, in Python today, Unicode (wide) strings fully support all normal string methods, and most of the older string module's functions are also now available as string object methods. For instance, in Python 2.0 and later, the following two expressions are equivalent:

string.find(aString, substr) aString.find(substr)

# original module # methods new in 2.0

However, the second form does not require callers to import the string module first. As of this third edition of the book, the method call form is used everywhere, since it has been the recommended best-practice pattern for some time. If you see older code based on the module call pattern, it is a simple mapping to the newer method-based call form. The original string module still contains predefined constants (e.g., string.uppercase), as well as the new Template substitution interface in 2.4, and so remains useful in some contexts apart from method calls.

21.3.1. Templating with Replacements and Formats Speaking of templates, as we saw when coding the web page migration scripts in Part II of this book, the string replace method is often adequate as a string templating toolwe can compute values and insert them at fixed positions in a string with a single replace call:

>>> template = '---$target1---$target2---' >>> val1 = 'Spam' >>> val2 = 'shrubbery' >>> template = template.replace('$target1', val1) >>> template = template.replace('$target2', val2) >>> template '---Spam---shrubbery---'

As we also saw when generating HTML code in our Common Gateway Interface (CGI) scripts in Part

III of this book, the string % formatting operator is also a powerful templating toolsimply fill out a dictionary with values and apply substitutions to the HTML string all at once:

>>> ... ... ... ... >>> >>> >>> >>> >>>

template = """ -----%(key1)s-----%(key2)s--""" vals = {} vals['key1'] = 'Spam' vals['key2'] = 'shrubbery' print template % vals

-----Spam-----shrubbery---

The 2.4 string module's Template feature is essentially a simplified variation of the dictionary-based format scheme, but it allows some additional call patterns:

>>> vals {'key2': 'shrubbery', 'key1': 'Spam'} >>> import string >>> template = string.Template('---$key1---$key2---') >>> template.substitute(vals) '---Spam---shrubbery---' >>> template.substitute(key1='Brian', key2='Loretta') '---Brian---Loretta---'

See the library manual for more on this extension. Although the string datatype does not itself support the pattern-directed text processing that we'll meet later in this chapter, its tools are powerful enough for many tasks.

21.3.2. Parsing with Splits and Joins In terms of this chapter's main focus, Python's built-in tools for splitting and joining strings around tokens turn out to be especially useful when it comes to parsing text:

str.split(delimiter?, maxsplits?) Splits a string into substrings, using either whitespace substrings (tabs, spaces, newlines) or an explicitly passed string as a delimiter. maxsplits limits the number of splits performed, if passed.

delimiter.join(sequence) Concatenates a sequence of substrings (e.g., list or tuple), adding the subject separator string between each. These two are among the most powerful of string methods. As we saw earlier in Chapter 3, split chops a string into a list of substrings and join puts them back together:[*] [*]

Very early Python releases had similar tools called spitfields and joinfields; the more modern (and less verbose) split and join are the preferred way to spell these today.

>>> 'A B C D'.split( ) ['A', 'B', 'C', 'D'] >>> 'A+B+C+D'.split('+') ['A', 'B', 'C', 'D'] >>> '--'.join(['a', 'b', 'c']) 'a--b--c'

Despite their simplicity, they can handle surprisingly complex text-parsing tasks. Moreover, string method calls are very fast because they are implemented in C language code. For instance, to quickly replace all tabs in a file with four periods, pipe the file into a script that looks like this:

from sys import * stdout.write( ('.' * 4).join( stdin.read( ).split('\t') ) )

The split call here divides input around tabs, and the join puts it back together with periods where tabs had been. The combination of the two calls is equivalent to using the global replacement string method call as follows:

stdout.write( stdin.read( ).replace('\t', '.'*4) )

As we'll see in the next sections, splitting strings is sufficient for many text-parsing goals.

21.3.3. Summing Columns in a File Let's look at a couple of practical applications of string splits and joins. In many domains, scanning files by columns is a fairly common task. For instance, suppose you have a file containing columns of numbers output by another system, and you need to sum each column's numbers. In Python, string splitting does the job, as demonstrated by Example 21-1. As an added bonus, it's easy to make the solution a reusable tool in Python.

Example 21-1. PP3E\Lang\summer.py

#!/usr/local/bin/python def summer(numCols, fileName): sums = [0] * numCols for line in open(fileName): cols = line.split( ) for i in range(numCols): sums[i] += eval(cols[i]) return sums if _ _name_ _ == '_ _main_ _': import sys print summer(eval(sys.argv[1]), sys.argv[2])

# make list of zeros # scan file's lines # split up columns # around blanks/tabs # add numbers to sums

# '% summer.py cols file'

Notice that we use file iterators here to read line by line, instead of calling the file readlines method explicitly (recall from Chapter 4 that iterators avoid loading the entire file into memory all at once). As usual, you can both import this module and call its function and run it as a shell tool from the command line. The summer.py script calls split to make a list of strings representing the line's columns, and eval to convert column strings to numbers. Here's an input file that uses both blanks and tabs to separate columns:

C:\...\PP3E\Lang>type table1.txt 1 5 10 2 1.0 2 10 20 4 2.0 3 15 30 8 3 4 20 40 16 4.0 C:\...\PP3E\Lang>python summer.py 5 table1.txt [10, 50, 100, 30, 10.0]

Also notice that because the summer script uses eval to convert file text to numbers, you could really store arbitrary Python expressions in the file. Here, for example, it's run on a file of Python code snippets:

C:\...\PP3E\Lang>type table2.txt 2 1+1 1 ['a b', ' c d'] # -> [['a','b'], ['c','d']]

# [['a','b'], ['c','d']] # -> ['a b', 'c d'] # -> 'a b, c d'

Notice that we could use newer list comprehensions to gain some conciseness here. The internal function, for instance, could be recoded to simply:

return [clause.split( ) for clause in conjunct.split(',')]

to produce the desired nested lists by combining two steps into one. This form might run faster; we'll leave it to the reader to decide whether it is more difficult to understand. As usual, we can test components of this module interactively:

>>> import rules >>> rules.internal('a ?x, b') [['a', '?x'], ['b']] >>> rules.internal_rule('rule x if a ?x, b then c, d ?x') {'if': [['a', '?x'], ['b']], 'rule': 'x', 'then': [['c'], ['d', '?x']]} >>> r = rules.internal_rule('rule x if a ?x, b then c, d ?x') >>> rules.external_rule(r) 'rule x if a ?x, b then c, d ?x.'

Parsing by splitting strings around tokens like this takes you only so far. There is no direct support for recursive nesting of components, and syntax errors are not handled very gracefully. But for simple language tasks like this, string splitting might be enough, at least for prototyping systems. You can always add a more robust rule parser later or reimplement rules as embedded Python code or classes.

Lesson 1: Prototype and Migrate As a rule of thumb, use the string object's methods rather than things such as regular expressions whenever you can. Although this can vary from release to release, some string methods may be faster because they have less work to do. In fact, the original implementation of these operations in the string module became substantially faster when they were moved to the C language. When you imported string, it internally replaced most of its content with functions imported from the strop C extension module; strop methods were reportedly 100 to 1,000 times faster than their Python-coded equivalents at the time (though Python has been heavily optimized since then). The string module was originally written in Python but demands for string efficiency prompted recoding it in C. The result was dramatically faster performance for string client programs without impacting the interface. That is, string module clients became instantly faster without having to be modified for the new C-based module. A similar migration was applied to the pickle module we met in Chapter 19the later cPickle recoding is compatible but much faster. This is a great lesson about Python development: modules can be coded quickly in Python at first and translated to C later for efficiency if required. Because the interface to Python and C extension modules is identical (both are imported), C translations of modules are backward compatible with their Python prototypes. The only impact of the translation of such modules on clients is an improvement in performance. There is usually no need to move every module to C for delivery of an application: you can pick and choose performance-critical modules (such as string and pickle) for translation and leave others coded in Python. Use the timing and profiling techniques discussed in Chapter 20 to isolate which modules will give the most improvement when translated to C. C-based extension modules are introduced in Part VI of this book. Actually, in Python 2.0, the string module changed its implementation again: it is now a

frontend to new string methods, which are able to also handle Unicode strings. As mentioned, most string functions are also available as object methods in 2.0. For instance, string.split(X) is now simply a synonym for X.split( ); both forms are still supported, but the latter is more prevalent and preferred today (and may be the only option in the future). Either way, clients of the original string module are not affected by this changeyet another lesson!

21.3.5. More on the holmes Expert System Shell So how are these rules actually used? As mentioned, the rule parser we just met is part of the Python-coded holmes expert system shell. This book does not cover holmes in detail due to lack of space; see the PP3E\AI\ExpertSystem directory in this book's examples distribution for its code and documentation. But by way of introduction, holmes is an inference engine that performs forward and backward chaining deduction on rules that you supply. For example, the rule:

rule pylike if ?X likes coding, ?X likes spam then ?X likes Python

can be used both to prove whether someone likes Python (backward, from "then" to "if"), and to deduce that someone likes Python from a set of known facts (forward, from "if" to "then"). Deductions may span multiple rules, and rules that name the same conclusion represent alternatives. holmes also performs simple pattern-matching along the way to assign the variables that appear in rules (e.g., ?X), and it is able to explain its work. To make all of this more concrete, let's step through a simple holmes session. The += interactive command adds a new rule to the rule base by running the rule parser, and @@ prints the current rule base:

C:..\PP3E\Ai\ExpertSystem\holmes\holmes>python holmes.py -Holmes inference engineholmes> += rule pylike if ?X likes coding, ?X likes spam then ?X likes Python holmes> @@ rule pylike if ?X likes coding, ?X likes spam then ?X likes Python.

Now, to kick off a backward-chaining proof of a goal, use the ?- command. A proof explanation is shown here; holmes can also tell you why it is asking a question. Holmes pattern variables can show up in both rules and queries; in rules, variables provide generalization; in a query, they provide an answer:

holmes> ?- mel likes Python is this true: "mel likes coding" ? y is this true: "mel likes spam" ? y yes: (no variables) show proof ? yes "mel likes Python" by rule pylike "mel likes coding" by your answer

"mel likes spam" by your answer more solutions? n holmes> ?- ann likes ?X is this true: "ann likes coding" ? y is this true: "ann likes spam" ? y yes: ann likes Python

Forward chaining from a set of facts to conclusions is started with a +- command. Here, the same rule is being applied but in a different way:

holmes> +- chris likes spam, chris likes coding I deduced these facts... chris likes Python I started with these facts... chris likes spam chris likes coding time: 0.0

More interestingly, deductions chain through multiple rules when part of a rule's "if" is mentioned in another rule's "then":

holmes> += rule 1 if thinks ?x then human ?x holmes> += rule 2 if human ?x then mortal ?x holmes> ?- mortal bob is this true: "thinks bob" ? y yes: (no variables) holmes> +- thinks bob I deduced these facts... human bob mortal bob I started with these facts... thinks bob time: 0.0

Finally, the @= command is used to load files of rules that implement more sophisticated knowledge bases; the rule parser is run on each rule in the file. Here is a file that encodes animal classification rules (other example files are available in the book's examples distribution, if you'd like to experiment):

holmes> @= ..\kbases\zoo.kb holmes> ?- it is a penguin is this true: "has feathers" ? why to prove "it is a penguin" by rule 17 this was part of your original query. is this true: "has feathers" ? y

is this true: "able to fly" ? n is this true: "black color" ? y yes: (no variables)

Type stop to end a session and help for a full commands list; see the text files in the holmes directories for more details. Holmes is an old system written before Python 1.0 (and around 1993), but it still works unchanged on all platforms.

21.4. Regular Expression Pattern Matching Splitting and joining strings is a simple way to process text, as long as it follows the format you expect. For more general text analysis tasks, Python provides regular expression matching utilities. Regular expressions are simply strings that define patterns to be matched against other strings. Supply a pattern and a string and ask whether the string matches your pattern. After a match, parts of the string matched by parts of the pattern are made available to your script. That is, matches not only give a yes/no answer, but also can pick out substrings as well. Regular expression pattern strings can be complicated (let's be honestthey can be downright gross to look at). But once you get the hang of them, they can replace larger handcoded string search routinesa single pattern string generally does the work of dozens of lines of manual string scanning code and may run much faster. They are a concise way to encode the expected structure of text and extract portions of it. In Python, regular expressions are not part of the syntax of the Python language itself, but they are supported by extension modules that you must import to use. The modules define functions for compiling pattern strings into pattern objects, matching these objects against strings and fetching matched substrings after a match. They also provide tools for pattern-based splitting, replacing, and so on. Beyond those generalities, Python's regular expression story is complicated a little by history:

The regex module (old) In earlier Python releases, a module called regex was the standard (and only) regular expression module. It was fast and supported patterns coded in awk, grep, and emacs styles, but it is now somewhat deprecated. (It generates a deprecation when imported today, though it will likely still be available for some time to come.) The re module (new) Today, you should use re, a new regular expression module for Python that was introduced sometime around Python release 1.5. This module provides a much richer regular expression pattern syntax that tries to be close to that used to code patterns in the Perl language (yes, regular expressions are a feature of Perl worth emulating). For instance, re supports the notions of named groups, character classes, and non-greedy matchesregular expression pattern operators that match as few characters as possible (other regular expression pattern operators always match the longest possible substring). When it was first made available, re was generally slower than regex, so you had to choose between speed and Perl-like regular expression syntax. Today, though, re has been optimized to the extent that regex no longer offers any clear advantages. Moreover, re supports a richer pattern syntax and matching of Unicode strings (strings with 16-bit-wide or wider characters for representing large character sets). Because of this migration, I've recoded regular expression examples in this text to use the new re module rather than regex. The old regex-based versions are still available in the book's examples

distribution in the directory PP3E\lang\old-regex. If you find yourself having to migrate old regex code, you can also find a document describing the translation steps needed at http://www.python.org. Both modules' interfaces are similar, but re introduces a match object and changes pattern syntax in minor ways. Having said that, I also want to warn you that regular expressions is a complex topic that cannot be covered in depth here. If this area sparks your interest, the text Mastering Regular Expressions, written by Jeffrey E. F. Friedl (O'Reilly), is a good next step to take. We won't be able to go into pattern construction in much depth here. Once you learn how to code patterns, though, the top-level interface for performing matches is straightforward. In fact, they are so easy to use that we'll jump right into an example before getting into more details.

21.4.1. First Examples There are two basic ways to kick off matches: through top-level function calls and via methods of precompiled pattern objects. The latter precompiled form is quicker if you will be applying the same pattern more than onceto all lines in a text file, for instance. To demonstrate, let's do some matching on the following strings:

>>> text1 = 'Hello spam...World' >>> text2 = 'Hello spam...other'

The match performed in the following code does not precompile: it executes an immediate match to look for all the characters between the words Hello and World in our text strings:

>>> import re >>> matchobj = re.match('Hello(.*)World', text2) >>> print matchobj None

When a match fails as it does here (the text2 string doesn't end in World), we get back the None object, which is Boolean false if tested in an if statement. In the pattern string we're using here (the first argument to re.match ), the words Hello and World match themselves, and (.*) means any character ( .) repeated zero or more times (*). The fact that it is enclosed in parentheses tells Python to save away the part of the string matched by that part of the pattern as a groupa matched substring. To see how, we need to make a match work:

>>> matchobj = re.match('Hello(.*)World', text1) >>> print matchobj >>> matchobj.group(1) ' spam...'

When a match succeeds, we get back a match object, which has interfaces for extracting matched substringsthe group(1) call returns the portion of the string matched by the first, leftmost, parenthesized portion of the pattern (our (.*)). In other words, matching is not just a yes/no answer (as already mentioned); by enclosing parts of the pattern in parentheses, it is also a way to extract matched substrings. The interface for precompiling is similar, but the pattern is implied in the pattern object we get back from the compile call:

>>> pattobj = re.compile('Hello(.*)World') >>> matchobj = pattobj.match(text1) >>> matchobj.group(1) ' spam...'

Again, you should precompile for speed if you will run the pattern multiple times. Here's something a bit more complex that hints at the generality of patterns. This one allows for zero or more blanks or tabs at the front ([ \t]*), skips one or more after the word Hello ([ \t]+), and allows the final word to begin with an upper- or lowercase letter ([Ww]); as you can see, patterns can handle wide variations in data:

>>> patt = '[ \t]*Hello[ \t]+(.*)[Ww]orld' >>> line = ' Hello spamworld' >>> mobj = re.match(patt, line) >>> mobj.group(1) 'spam'

In addition to the tools these examples demonstrate, there are methods for scanning ahead to find a match (search), splitting and replacing on patterns, and so on. All have analogous module and precompiled call forms. Let's dig into a few details of the module before we get to more code.

21.4.2. Using the re Module The Python re module comes with functions that can search for patterns right away or make compiled pattern objects for running matches later. Pattern objects (and module search calls) in turn generate match objects, which contain information about successful matches and matched substrings. The next few sections describe the module's interfaces and some of the operators you can use to code patterns.

21.4.2.1. Module functions The top level of the module provides functions for matching, substitution, precompiling, and so on:

compile(pattern [, flags]) Compile a regular expression pattern string into a regular expression pattern object, for later matching. See the reference manual for the flags argument's meaning.

match(pattern, string [, flags]) If zero or more characters at the start of string match the pattern string, return a corresponding match object, or None if no match is found. Roughly like a search for a pattern that begins with the ^ operator.

search(pattern, string [, flags]) Scan through string for a location matching pattern, and return a corresponding match object, or None if no match is found.

split(pattern, string [, maxsplit]) Split string by occurrences of pattern. If capturing parenthese ( ( ) ) are used in the pattern, occurrences of patterns or subpatterns are also returned.

sub(pattern, repl, string [, count]) Return the string obtained by replacing the (first count) leftmost nonoverlapping occurrences of pattern (a string or a pattern object) in string by repl (which may be a string or a function that is passed a single match object).

subn(pattern, repl, string [, count]) Same as sub , but returns a tuple: (new-string, number-of-substitutions-made).

findall(pattern, string [, flags]) Return a list of strings giving all nonoverlapping matches of pattern in string; if there are any groups in patterns, returns a list of groups.

finditer(pattern, string [, flags]) Return iterator over all nonoverlapping matches of pattern in string.

escape(string) Return string with all nonalphanumeric characters backslashed, such that they can be compiled as a string literal.

21.4.2.2. Compiled pattern objects At the next level, pattern objects provide similar attributes, but the pattern string is implied. The

re.compile function in the previous section is useful to optimize patterns that may be matched more than once (compiled patterns match faster). Pattern objects returned by re.compile have these sorts

of attributes.

match(string [, pos] [, endpos]) search(string [, pos] [, endpos]) split(string [, maxsplit]) sub(repl, string [, count]) subn(repl, string [, count]) findall(string [, pos [, endpos]]) finditer(string [, pos [, endpos]])

Same as the re functions, but the pattern is implied, and pos and endpos give start/end string indexes for the match.

21.4.2.3. Match objects Finally, when a match or search function or method is successful, you get back a match object ( None comes back on failed matches). Match objects export a set of attributes of their own, including:

group(g)group([g1, g2, ...]) Return the substring that matched a parenthesized group (or groups) in the pattern. Accept group numbers or names. Group numbers start at 1; group 0 is the entire string matched by the pattern.

groups( ) Returns a tuple of all groups' substrings of the match.

groupdict( ) Returns a dictionary containing all named groups of the match.

start([group])end([group]) Indices of the start and end of the substring matched by group (or the entire matched string, if no group).

span([group]) Returns the two-item tuple: (start(group),end(group)).

expand(template])

Performs backslash group substitutions; see the Python library manual.

21.4.2.4. Regular expression patterns Regular expression strings are built up by concatenating single-character regular expression forms, shown in Table 21-1. The longest-matching string is usually matched by each form, except for the nongreedy operators. In the table, R means any regular expression form, C is a character, and N denotes a digit.

Table 21-1. re pattern syntax Operator

Interpretation

.

Matches any character (including newline if DOTALL flag is specified)

^

Matches start of the string (of every line in MULTILINE mode)

$

Matches end of the string (of every line in MULTILINE mode)

C

Any nonspecial character matches itself

R*

Zero or more of preceding regular expression R (as many as possible)

R+

One or more of preceding regular expression R (as many as possible)

R?

Zero or one occurrence of preceding regular expression R

R{m}

Matches exactly m copies preceding R: a{5} matches 'aaaaa'

R{m,n}

Matches from m to n repetitions of preceding regular expression R

R*?, R+?, R??, R{m,n}?

Same as *, +, and ? but matches as few characters/times as possible; these are known as nongreedy match operators (unlike others, they match and consume as few characters as possible)

[...]

Defines character set: e.g., [a-zA-Z] to match all letters

[^...]

Defines complemented character set: matches if char is not in set

\

Escapes special chars (e.g., *?+|( )) and introduces special sequences

\\

Matches a literal \ (write as \\\\ in pattern, or r'\\')

\number

Matches the contents of the group of the same number: (.+) \1 matches "42 42"

R|R

Alternative: matches left or right R

RR

Concatenation: match both Rs

(R)

Matches any regular expression inside ( ) , and delimits a group (retains matched substring)

(?: R)

Same but doesn't delimit a group

Operator

Interpretation

(?= R)

Look-ahead assertion: matches if R matches next, but doesn't consume any of the string (e.g., X (?=Y) matches X only if followed by Y)

(?! R)

Matches if R doesn't match next; negative of (?=R)

(?PR)

Matches any regular expression inside ( ) , and delimits a named group Matches whatever text was matched by the earlier group named

(?P=name)

name

(?#...)

A comment; ignored

(?letter)

Set mode flag; letter is one of i, L, m, s, u, x (see the library manual)

(? Matches if the current position in the string is not preceded by a match for R; negative of (?python re-groups.py 0 1 2 ('000', '111', '222') ('A', 'Y', 'C') ('spam', '1 + 2 + 3')

Finally, besides matches and substring extraction, re also includes tools for string replacement or substitution (see Example 21-5).

Example 21-5. PP3E\lang\re-subst.py

# substitutions (replace occurrences of patt with repl in string) import re print re.sub('[ABC]', '*', 'XAXAXBXBXCXC') print re.sub('[ABC]_', '*', 'XA-XA_XB-XB_XC-XC_')

In the first test, all characters in the set are replaced; in the second, they must be followed by an underscore:

C:\...\PP3E\Lang>python re-subst.py X*X*X*X*X*X* XA-X*XB-X*XC-X*

21.4.4. Scanning C Header Files for Patterns On to some realistic examples: the script in Example 21-6 puts these pattern operators to more practical use. It uses regular expressions to find #define and #include lines in C header files and extract their components. The generality of the patterns makes them detect a variety of line formats; pattern groups (the parts in parentheses) are used to extract matched substrings from a line after a match.

Example 21-6. PP3E\Lang\cheader.py

#! /usr/local/bin/python import sys, re pattDefine = re.compile( '^#[\t ]*define[\t ]+([a-zA-Z0-9_]+)[\t ]*(.*)')

# compile to pattobj # "# define xxx yyy..."

pattInclude = re.compile( '^#[\t ]*include[\t ]+[..."

def scan(file): count = 0 while 1: # scan line-by-line line = file.readline( ) if not line: break count += 1 matchobj = pattDefine.match(line) # None if match fails if matchobj: name = matchobj.group(1) # substrings for (...) parts body = matchobj.group(2) print count, 'defined', name, '=', body.strip( ) continue matchobj = pattInclude.match(line) if matchobj: start, stop = matchobj.span(1) # start/stop indexes of (...) filename = line[start:stop] # slice out of line print count, 'include', filename # same as matchobj.group(1) if len(sys.argv) == 1: scan(sys.stdin) else: scan(open(sys.argv[1], 'r'))

# no args: read stdin # arg: input filename

To test, let's run this script on the text file in Example 21-7.

Example 21-7. PP3E\Lang\test.h

#ifndef TEST_H #define TEST_H #include <stdio.h> #include # include "Python.h" #define DEBUG #define HELLO 'hello regex world' # define SPAM 1234 #define EGGS sunny + side + up #define ADDER(arg) 123 + arg #endif

Notice the spaces after # in some of these lines; regular expressions are flexible enough to account for such departures from the norm. Here is the script at work; picking out #include and #define lines and their parts. For each matched line, it prints the line number, the line type, and any matched substrings:

C:\...\PP3E\Lang>python cheader.py test.h 2 defined TEST_H = 4 include stdio.h 5 include lib/spam.h 6 include Python.h 8 defined DEBUG = 9 defined HELLO = 'hello regex world' 10 defined SPAM = 1234 12 defined EGGS = sunny + side + up 13 defined ADDER = (arg) 123 + arg

21.4.5. A File Pattern Search Utility The next script searches for patterns in a set of files, much like the grep command-line program. We wrote file and directory searchers earlier in Chapter 7. Here, the file searches look for patterns rather than simple strings (see Example 21-8). The patterns are typed interactively, separated by a space, and the files to be searched are specified by an input pattern for Python's glob.glob filename expansion tool that we studied earlier.

Example 21-8. PP3E\Lang\pygrep1.py

#!/usr/local/bin/python import sys, re, glob help_string = """ Usage options. interactive: % pygrep1.py """ def getargs( ): if len(sys.argv) == 1: return raw_input("patterns? >").split( ), raw_input("files? >") else: try: return sys.argv[1], sys.argv[2] except: print help_string sys.exit(1) def compile_patterns(patterns): res = [] for pattstr in patterns: try: res.append(re.compile(pattstr)) except: print 'pattern ignored:', pattstr return res def searcher(pattfile, srchfiles): patts = compile_patterns(pattfile) for file in glob.glob(srchfiles): lineno = 1 print '\n[%s]' % file for line in open(file, 'r').readlines( ): for patt in patts: if patt.search(line): print '%04d)' % lineno, line, break lineno = lineno+1 if _ _name_ _ == '_ _main_ _': searcher(*getargs( ))

# make re patt object # or use re.match

# compile for speed # all matching files # glob uses re too # all lines in file # try all patterns # match if not None

# was apply(func, args)

Here's what a typical run of this script looks like, scanning old versions of some of the source files in this chapter; it searches all Python files in the current directory for two different patterns, compiled for speed. Notice that files are named by a pattern tooPython's glob module also uses re internally:

C:\...\PP3E\Lang>python pygrep1.py patterns? >import.*string spam files? >*.py [cheader.py]

[finder2.py] 0002) import string, glob, os, sys [patterns.py] 0048) mobj = patt.search(" # define [pygrep1.py] [rules.py] [summer.py] 0002) import string [_ _init_ _.py]

spam

1 + 2 + 3")

21.5. Advanced Language Tools If you have a background in parsing theory, you may know that neither regular expressions nor string splitting is powerful enough to handle more complex language grammars. Roughly, regular expressions don't have the stack "memory" required by true grammars, so they cannot support arbitrary nesting of language constructs (nested if statements in a programming language, for instance). From a theoretical perspective, regular expressions are intended to handle just the first stage of parsingseparating text into components, otherwise known as lexical analysis. Language parsing requires more. In most applications, the Python language itself can replace custom languages and parsersuserentered code can be passed to Python for evaluation with tools such as eval and exec. By augmenting the system with custom modules, user code in this scenario has access to both the full Python language and any application-specific extensions required. In a sense, such systems embed Python in Python. Since this is a common application of Python, we'll revisit this approach later in this chapter. For some sophisticated language analysis tasks, though, a full-blown parser may still be required. Since Python is built for integrating C tools, we can write integrations to traditional parser generator systems such as yacc and bison, tools that create parsers from language grammar definitions. Better yet, we could use an integration that already existsinterfaces to such common parser generators are freely available in the open source domain (run a web search in Google for up-to-date details and links). Python-specific parsing systems also are accessible from Python's web site. Among them, the kwParsing system is a parser generator written in Python, and the SPARK toolkit is a lightweight system that employs the Earley algorithm to work around technical problems with LALR parser generation (if you don't know what that means, you probably don't need to care). Since all of these are complex tools, though, we'll skip their details in this text. Consult http://www.python.org for information on parser generator tools available for use in Python programs. Even more demanding language analysis tasks require techniques developed in artificial intelligence research, such as semantic analysis and machine learning. For instance, the Natural Language Toolkit, or NLTK, is an open source suite of Python libraries and programs for symbolic and statistical natural language processing. It applies linguistic techniques to textual data, and it can be used in the development of natural language recognition software and systems.

Lesson 2: Don't Reinvent the Wheel Speaking of parser generators, to use some of these tools in Python programs, you'll need an extension module that integrates them. The first step in such scenarios should always be to see whether the extension already exists in the public domain. Especially for common tools like these, chances are that someone else has already written an integration that you can use off-the-shelf instead of writing one from scratch. Of course, not everyone can donate all their extension modules to the public domain, but there's a growing library of available components that you can pick up for free and a community of experts to query. Visit http://www.python.org for links to Python software resources. With roughly one million Python users out there as I write this book, much can be found in the prior-art department.

Of special interest to this chapter, also see Yet Another Python Parser System (YAPPS). YAPPS is a parser generator written in Python. It uses supplied grammar rules to generate human-readable Python code that implements a recursive descent parser. The parsers generated by YAPPS look much like (and are inspired by) the handcoded expression parsers shown in the next section. YAPPS creates LL(1) parsers, which are not as powerful as LALR parsers but are sufficient for many language tasks. For more on YAPPS, see http://theory.stanford.edu/~amitp/Yapps or search the Web at large.

21.6. Handcoded Parsers Since Python is a general-purpose programming language, it's also reasonable to consider writing a handcoded parser. For instance, recursive descent parsing is a fairly well-known technique for analyzing language-based information. Since Python is a very high-level language, writing the parser itself is usually easier than it would be in a traditional language such as C or C++. To illustrate, this section develops a custom parser for a simple grammar: it parses and evaluates arithmetic expression strings. This example also demonstrates the utility of Python as a generalpurpose programming language. Although Python is often used as a frontend or rapid development language, it's also useful for the kinds of things we'd normally write in a systems development language such as C or C++.

21.6.1. The Expression Grammar The grammar that our parser will recognize can be described as follows:

goal -> <expr> END goal -> END

[number, variable, ( ] [set]

assign -> 'set' <expr>

[set]

expr -> <expr-tail>

[number, variable, ( ]

expr-tail -> ^ expr-tail -> '+' <expr-tail> expr-tail -> '-' <expr-tail>

[END, ) ] [+] [-]

factor ->

[number, variable, ( ]

factor-tail -> ^ factor-tail -> '*' factor-tail -> '/'

[+, -, END, ) ] [*] [/]

term -> term -> term -> '(' <expr> ')'

[number] [variable] [(]

tokens: (, ), num, var, -, +, /, *, set, end

This is a fairly typical grammar for a simple expression language, and it allows for arbitrary expression nesting (some example expressions appear at the end of the testparser module listing in Example 21-11). Strings to be parsed are either an expression or an assignment to a variable name (set ). Expressions involve numbers, variables, and the operators +, -, *, and /. Because factor is nested in expr in the grammar, * and / have higher precedence (i.e., they bind tighter) than + and -.

Expressions can be enclosed in parentheses to override precedence, and all operators are left associativethat is, they group on the left (e.g., 1-2-3 is treated the same as (1-2)-3). Tokens are just the most primitive components of the expression language. Each grammar rule earlier is followed in square brackets by a list of tokens used to select it. In recursive descent parsing, we determine the set of tokens that can possibly start a rule's substring, and we use that information to predict which rule will work ahead of time. For rules that iterate (the -tail rules), we use the set of possibly following tokens to know when to stop. Typically, tokens are recognized by a string processor (a scanner), and a higher-level processor (a parser) uses the token stream to predict and step through grammar rules and substrings.

21.6.2. The Parser's Code The system is structured as two modules, holding two classes: The scanner handles low-level character-by-character analysis. The parser embeds a scanner and handles higher-level grammar analysis. The parser is also responsible for computing the expression's value and testing the system. In this version, the parser evaluates the expression while it is being parsed. To use the system, we create a parser with an input string and call its parse method. We can also call parse again later with a new expression string. There's a deliberate division of labor here. The scanner extracts tokens from the string, but it knows nothing about the grammar. The parser handles the grammar, but it is naive about the string itself. This modular structure keeps the code relatively simple. And it's another example of the objectoriented programming (OOP) composition relationship at work: parsers embed and delegate to scanners. The module in Example 21-9 implements the lexical analysis taskdetecting the expression's basic tokens by scanning the text string left to right on demand. Notice that this is all straightforward logic; such analysis can sometimes be performed with regular expressions instead (described earlier), but the pattern needed to detect and extract tokens in this example would be too complex and fragile for my tastes. If your tastes vary, try recoding this module with re.

Example 21-9. PP3E\Lang\Parser\scanner.py

#################################################### # the scanner (lexical analyser) #################################################### import string class SyntaxError(Exception): pass class LexicalError(Exception): pass class Scanner: def _ _init_ _(self, text): self.next = 0 self.text = text + '\0'

# local errors # used to be strings

def newtext(self, text): Scanner._ _init_ _(self, text) def showerror(self): print '=> ', self.text print '=> ', (' ' * self.start) + '^' def match(self, token): if self.token != token: raise SyntaxError, [token] else: value = self.value if self.token != '\0': self.scan( ) return value

# next token/value # return prior value

def scan(self): self.value = None ix = self.next while self.text[ix] in string.whitespace: ix = ix+1 self.start = ix if self.text[ix] in ['(', ')', '-', '+', '/', '*', '\0']: self.token = self.text[ix] ix = ix+1 elif self.text[ix] in string.digits: str = '' while self.text[ix] in string.digits: str = str + self.text[ix] ix = ix+1 if self.text[ix] == '.': str = str + '.' ix = ix+1 while self.text[ix] in string.digits: str = str + self.text[ix] ix = ix+1 self.token = 'num' self.value = float(str) else: self.token = 'num' self.value = long(str) elif self.text[ix] in string.letters: str = '' while self.text[ix] in (string.digits + string.letters): str = str + self.text[ix] ix = ix+1 if str.lower( ) == 'set': self.token = 'set' else: self.token = 'var' self.value = str else:

raise LexicalError self.next = ix

The parser module's class creates and embeds a scanner for its lexical chores and handles interpretation of the expression grammar's rules and evaluation of the expression's result, as shown in Example 21-10.

Example 21-10. PP3E\Lang\Parser\parser1.py

######################################################## # the parser (syntax analyser, evaluates during parse) ######################################################## class UndefinedError(Exception): pass from scanner import Scanner, LexicalError, SyntaxError class Parser: def _ _init_ _(self, text=''): self.lex = Scanner(text) self.vars = {'pi':3.14159}

# embed a scanner # add a variable

def parse(self, *text): if text: # main entry-point self.lex.newtext(text[0]) # reuse this parser? try: self.lex.scan( ) # get first token self.Goal( ) # parse a sentence except SyntaxError: print 'Syntax Error at column:', self.lex.start self.lex.showerror( ) except LexicalError: print 'Lexical Error at column:', self.lex.start self.lex.showerror( ) except UndefinedError, name: print "'%s' is undefined at column:" % name, self.lex.start self.lex.showerror( ) def Goal(self): if self.lex.token in ['num', 'var', '(']: val = self.Expr( ) self.lex.match('\0') print val elif self.lex.token == 'set': self.Assign( ) self.lex.match('\0') else: raise SyntaxError def Assign(self): self.lex.match('set') var = self.lex.match('var')

# expression? # set command?

val = self.Expr( ) self.vars[var] = val

# assign name in dict

def Expr(self): left = self.Factor( ) while 1: if self.lex.token in ['\0', ')']: return left elif self.lex.token == '+': self.lex.scan( ) left = left + self.Factor( ) elif self.lex.token == '-': self.lex.scan( ) left = left - self.Factor( ) else: raise SyntaxError def Factor(self): left = self.Term( ) while 1: if self.lex.token in ['+', '-', '\0', ')']: return left elif self.lex.token == '*': self.lex.scan( ) left = left * self.Term( ) elif self.lex.token == '/': self.lex.scan( ) left = left / self.Term( ) else: raise SyntaxError def Term(self): if self.lex.token == 'num': val = self.lex.match('num') return val elif self.lex.token == 'var': if self.vars.has_key(self.lex.value): val = self.vars[self.lex.value] self.lex.scan( ) return val else: raise UndefinedError, self.lex.value elif self.lex.token == '(': self.lex.scan( ) val = self.Expr( ) self.lex.match(')') return val else: raise SyntaxError if _ _name_ _ == '_ _main_ _': import testparser testparser.test(Parser, 'parser1')

# numbers

# look up name's value

# sub-expression

# self-test code # test local Parser

If you study this code closely, you'll notice that the parser keeps a dictionary ( self.vars) to manage variable names: they're stored in the dictionary on a set command and are fetched from it when they appear in an expression. Tokens are represented as strings, with an optional associated value (a numeric value for numbers and a string for variable names). The parser uses iteration (while loops) rather than recursion for the expr-tail and factor-tail rules. Other than this optimization, the rules of the grammar map directly onto parser methods: tokens become calls to the scanner, and nested rule references become calls to other methods. When the file parser1.py is run as a top-level program, its self-test code is executed, which in turn simply runs a canned test in the module shown in Example 21-11. Note that all integer math uses Python long integers (unlimited precision integers) because the scanner converts numbers to strings with long. Also notice that mixed integer/floating-point operations cast up to floating point since Python operators are used to do the actual calculations.

Example 21-11. PP3E\Lang\Parser\testparser.py

#################################################### # parser test code #################################################### def test(ParserClass, msg): print msg, ParserClass x = ParserClass('4 / 2 + 3') x.parse( ) x.parse('3 + 4 / 2') x.parse('(3 + 4) / 2') x.parse('4 / (2 + 3)') x.parse('4.0 / (2 + 3)') x.parse('4 / (2.0 + 3)') x.parse('4.0 / 2 * 3') x.parse('(4.0 / 2) * 3') x.parse('4.0 / (2 * 3)') x.parse('(((3))) + 1')

# allow different Parser's

# like eval('3 + 4 / 2')...

y = ParserClass( ) y.parse('set a 4 / 2 + 1') y.parse('a * 3') y.parse('set b 12 / a') y.parse('b') z = ParserClass( ) z.parse('set a 99') z.parse('set a a + 1') z.parse('a') z = ParserClass( ) z.parse('pi') z.parse('2 * pi') z.parse('1.234 + 2.1') def interact(ParserClass):

# command-line entry

print ParserClass x = ParserClass( ) while 1: cmd = raw_input('Enter=> ') if cmd == 'stop': break x.parse(cmd)

Correlate the following results to print statements in the self-test module:

C:\...\PP3E\Lang\Parser>python parser1.py parser1 _ _main_ _.Parser 5 5 3 0 0.8 0.8 6.0 6.0 0.666666666667 4 9 4 100 3.14159 6.28318 3.334

The integer results here are really long integers; in the past they printed with a trailing L (e.g., 5L), but they no longer do in recent Python releases because normal integers are automatically converted to long integers as needed (in fact, there may be no distinction between the two types at all in a future Python release). Change the Goal method to print repr(val) if you still want to see the L repr prints objects as code. As usual, we can also test and use the system interactively:

% python >>> import parser1 >>> x = parser1.Parser( ) >>> x.parse('1 + 2') 3

Error cases are trapped and reported:

>>> x.parse('1 + a') 'a' is undefined at column: 4

=> 1 + a => ^ >>> x.parse('1+a+2') 'a' is undefined at column: 2 => 1+a+2 => ^ >>> x.parse('1 * 2 $') Lexical Error at column: 6 => 1 * 2 $ => ^ >>> x.parse('1 * - 1') Syntax Error at column: 4 => 1 * - 1 => ^ >>> x.parse('1 * (9') Syntax Error at column: 6 => 1 * (9 => ^

Pathologically big numbers are handled well, because Python's built-in objects and operators are used along the way:

>>> x.parse('888888888888888888888888888888888888888888888.9999999') 8.88888888889e+044 >>> x.parse('99999999999999999999999999999999999999999 + 2') 100000000000000000000000000000000000000001 >>> x.parse('999999999999999999999999999999.88888888888 + 1.1') 1e+030

In addition, there is an interactive loop interface in the testparser module if you want to use the parser as a simple command-line calculator (or if you get tired of typing parser method calls). Pass the Parser class, so testparser can make one of its own:

>>> import testparser >>> testparser.interact(parser1.Parser) Enter=> 4 * 3 + 5 17 Enter=> 5 + 4 * 3 17 Enter=> (5 + 4) * 3 27 Enter=> set a 99 Enter=> set b 66 Enter=> a + b 165 Enter=> # + 1 Lexical Error at column: 0 => # + 1 => ^ Enter=> a * b + c

'c' is undefined at column: 8 => a * b + c => ^ Enter=> a * b * + c Syntax Error at column: 8 => a * b * + c => ^ Enter=> a 99 Enter=> a * a * a 970299 Enter=> stop >>>

Lesson 3: Divide and Conquer As the parser system demonstrates, modular program design is almost always a major win. By using Python's program structuring tools (functions, modules, classes, and so on), big tasks can be broken down into small, manageable parts that can be coded and tested independently. For instance, the scanner can be tested without the parser by making an instance with an input string and calling its scan or match methods repeatedly. We can even test it like this interactively, from Python's command line. By separating programs into logical components, they become easier to understand and modify. Imagine what the parser would look like if the scanner's logic was embedded rather than called.

21.6.3. Adding a Parse Tree Interpreter One weakness in the parser1 program is that it embeds expression evaluation logic in the parsing logic: the result is computed while the string is being parsed. This makes evaluation quick, but it can also make it difficult to modify the code, especially in larger systems. To simplify, we could restructure the program to keep expression parsing and evaluation separate. Instead of evaluating the string, the parser can build up an intermediate representation of it that can be evaluated later. As an added incentive, building the representation separately makes it available to other analysis tools (e.g., optimizers, viewers, and so on)they can be run as separate passes over the tree. Example 21-12 shows a variant of parser1 that implements this idea. The parser analyzes the string and builds up a parse treethat is, a tree of class instances that represents the expression and that may be evaluated in a separate step. The parse tree is built from classes that "know" how to evaluate themselves: to compute the expression, we just ask the tree to evaluate itself. Root nodes in the tree ask their children to evaluate themselves, and then combine the results by applying a single operator. In effect, evaluation in this version is simply a recursive traversal of a tree of embedded class instances constructed by the parser.

Example 21-12. PP3E\Lang\Parser\parser2.py

TraceDefault = False class UndefinedError(Exception): pass from scanner import Scanner, SyntaxError, LexicalError

#################################################### # the interpreter (a smart objects tree) #################################################### class TreeNode: def validate(self, dict): pass def apply(self, dict): pass def trace(self, level): print '.'*level + '<empty>'

# default error check # default evaluator # default unparser

# ROOTS class BinaryNode(TreeNode): def _ _init_ _(self, left, right): # inherited methods self.left, self.right = left, right # left/right branches def validate(self, dict): self.left.validate(dict) # recurse down branches self.right.validate(dict) def trace(self, level): print '.'*level + '[' + self.label + ']' self.left.trace(level+3) self.right.trace(level+3) class TimesNode(BinaryNode): label = '*' def apply(self, dict): return self.left.apply(dict) * self.right.apply(dict) class DivideNode(BinaryNode): label = '/' def apply(self, dict): return self.left.apply(dict) / self.right.apply(dict) class PlusNode(BinaryNode): label = '+' def apply(self, dict): return self.left.apply(dict) + self.right.apply(dict) class MinusNode(BinaryNode): label = '-' def apply(self, dict): return self.left.apply(dict) - self.right.apply(dict) # LEAVES class NumNode(TreeNode): def _ _init_ _(self, num): self.num = num

# already numeric

def apply(self, dict): # use default validate return self.num def trace(self, level): print '.'*level + repr(self.num) # as code, was 'self.num' class VarNode(TreeNode): def _ _init_ _(self, text, start): self.name = text self.column = start def validate(self, dict): if not dict.has_key(self.name): raise UndefinedError, (self.name, def apply(self, dict): return dict[self.name] def assign(self, value, dict): dict[self.name] = value def trace(self, level): print '.'*level + self.name

# variable name # column for errors

self.column) # validate before apply # local extension

# COMPOSITES class AssignNode(TreeNode): def _ _init_ _(self, var, val): self.var, self.val = var, val def validate(self, dict): self.val.validate(dict) # don't validate var def apply(self, dict): self.var.assign( self.val.apply(dict), dict ) def trace(self, level): print '.'*level + 'set ' self.var.trace(level + 3) self.val.trace(level + 3) #################################################### # the parser (syntax analyser, tree builder) #################################################### class Parser: def _ _init_ _(self, text=''): self.lex = Scanner(text) self.vars = {'pi':3.14159} self.traceme = TraceDefault def parse(self, *text): if text: self.lex.newtext(text[0]) tree = self.analyse( ) if tree: if self.traceme: print; tree.trace(0) if self.errorCheck(tree): self.interpret(tree) def analyse(self): try: self.lex.scan( )

# make a scanner # add constants

# external interface # reuse with new text # parse string # dump parse-tree? # check names # evaluate tree

# get first token

return self.Goal( ) # build a parse-tree except SyntaxError: print 'Syntax Error at column:', self.lex.start self.lex.showerror( ) except LexicalError: print 'Lexical Error at column:', self.lex.start self.lex.showerror( ) def errorCheck(self, tree): try: tree.validate(self.vars) # error checker return 'ok' except UndefinedError, instance: # args is a tuple varinfo = instance.args # instance is a sequence print "'%s' is undefined at column: %d" % varinfo self.lex.start = varinfo[1] self.lex.showerror( ) # returns None def interpret(self, tree): result = tree.apply(self.vars) if result != None: print result

# tree evals itself # ignore 'set' result

def Goal(self): if self.lex.token in ['num', 'var', '(']: tree = self.Expr( ) self.lex.match('\0') return tree elif self.lex.token == 'set': tree = self.Assign( ) self.lex.match('\0') return tree else: raise SyntaxError def Assign(self): self.lex.match('set') vartree = VarNode(self.lex.value, self.lex.start) self.lex.match('var') valtree = self.Expr( ) return AssignNode(vartree, valtree) # two subtrees def Expr(self): left = self.Factor( ) while 1: if self.lex.token in ['\0', ')']: return left elif self.lex.token == '+': self.lex.scan( ) left = PlusNode(left, self.Factor( )) elif self.lex.token == '-': self.lex.scan( ) left = MinusNode(left, self.Factor( )) else: raise SyntaxError

# left subtree

# add root-node

# grows up/right

def Factor(self): left = self.Term( ) while 1: if self.lex.token in ['+', '-', '\0', ')']: return left elif self.lex.token == '*': self.lex.scan( ) left = TimesNode(left, self.Term( )) elif self.lex.token == '/': self.lex.scan( ) left = DivideNode(left, self.Term( )) else: raise SyntaxError def Term(self): if self.lex.token == 'num': leaf = NumNode(self.lex.match('num')) return leaf elif self.lex.token == 'var': leaf = VarNode(self.lex.value, self.lex.start) self.lex.scan( ) return leaf elif self.lex.token == '(': self.lex.scan( ) tree = self.Expr( ) self.lex.match(')') return tree else: raise SyntaxError

#################################################### # self-test code: use my parser, parser1's tester #################################################### if _ _name_ _ == '_ _main_ _': import testparser testparser.test(Parser, 'parser2')

#

run with Parser class here

When parser2 is run as a top-level program, we get the same test code output as for parser1. In fact, it reuses the same test code: both parsers pass in their parser class object to testparser.test. And since classes are objects, we can also pass this version of the parser to testparser's interactive loop: testparser.interact(parser2.Parser). The new parser's external behavior is identical to that of the original. Notice the way we handle undefined name exceptions in errorCheck. This exception is a class instance now, not a string as in the prior edition (string exceptions are now deprecated). When exceptions derived from the built-in Exception class are used as sequences, they return the arguments passed to the exception constructor call. This doesn't quite work for string formatting, though, because it expects a real tuplewe have to call tuple( ) manually on the instance, extract the arguments with its args attribute, or write our own custom constructor to handle the state information. Also notice that the new parser reuses the same scanner module as well. To catch errors raised by

the scanner, it also imports the specific strings that identify the scanner's exceptions. Both the scanner and the parser can raise exceptions on errors (lexical errors, syntax errors, and undefined name errors). They're caught at the top level of the parser, and they end the current parse. There's no need to set and check status flags to terminate the recursion. Since math is done using long integers, floating-point numbers, and Python's operators, there's usually no need to trap numeric overflow or underflow errors. But as is, the parser doesn't handle errors such as division by zerosuch errors make the parser system exit with a Python stack dump. Uncovering the cause and fix for this is left as an exercise.

21.6.4. Parse Tree Structure In fact, the only real difference with this latest parser is that it builds and uses trees to evaluate an expression internally instead of evaluating as it parses. The intermediate representation of an expression is a tree of class instances, whose shape reflects the order of operator evaluation. This parser also has logic to print an indented listing of the constructed parse tree if the TRaceme attribute is set to TRue (or 1). Integers print with a trailing L here because the trace logic displays them with repr, indentation gives the nesting of subtrees, and binary operators list left subtrees first. For example:

% python >>> import parser2 >>> p = parser2.Parser( ) >>> p.traceme = 1 >>> p.parse('5 + 4 * 2') [+] ...5L ...[*] ......4L ......2L 13L

When this tree is evaluated, the apply method recursively evaluates subtrees and applies root operators to their results. Here, * is evaluated before +, since it's lower in the tree. The Factor method consumes the * substring before returning a right subtree to Expr:

>>> p.parse('5 * 4 - 2') [-] ...[*] ......5L ......4L ...2L 18L

In this example, * is evaluated before -. The Factor method loops through a substring of * and / expressions before returning the resulting left subtree to Expr:

>>> p.parse('1 + 3 * (2 * 3 + 4)') [+] ...1L ...[*] ......3L ......[+] .........[*] ............2L ............3L .........4L 31L

Trees are made of nested class instances. From an OOP perspective, it's another way to use composition. Since tree nodes are just class instances, this tree could be created and evaluated manually too:

PlusNode( NumNode(1), TimesNode( NumNode(3), PlusNode( TimesNode(NumNode(2), NumNode(3)), NumNode(4) ))).apply({})

But we might as well let the parser build it for us (Python is not that much like Lisp, despite what you may have heard).

21.6.5. Exploring Parse Trees with PyTree But waitthere is a better way to explore parse tree structures. Figure 21-1 shows the parse tree generated for the string 1 + 3 * (2 * 3 + 4), displayed in PyTree, the tree visualization GUI presented at the end of Chapter 20. This works only because the parser2 module builds the parse tree explicitly ( parser1 evaluates during a parse instead), and because PyTree's code is generic and reusable.

Figure 21-1. Parse tree built for 1 + 3 * (2 * 3 + 4)

If you read the last chapter, you'll recall that PyTree can draw most any tree data structure, but it is preconfigured to handle binary search trees and the parse trees we're studying in this chapter. You might also remember that clicking on nodes in a displayed parse tree evaluates the subtree rooted there. Figure 21-2 shows the pop up generated after clicking the tree's root node (you get different results if you click other parts of the tree because smaller subtrees are evaluated).

Figure 21-2. Clicking the root node to evaluate a tree

PyTree makes it easy to learn about and experiment with the parser. To determine the tree shape produced for a given expression, start PyTree, click on its Parser radio button, type the expression in the input field at the bottom, and press "input" (or your Enter key). The parser class is run to generate a tree from your input, and the GUI displays the result. For instance, Figure 21-3 sketches the parse tree generated if we remove the parentheses from the first expression in the input field. The root node evaluates to 23 this time, due to the different shape's evaluation order.

Figure 21-3. Parse tree for 1 + 3 * 2 * 3 + 4, result=23

To generate a shape that is even more different, try introducing more parentheses to the expression and hitting the Enter key again. Figure 21-4 shows a much flatter tree structure produced by adding a few parentheses to override operator precedence. Because these parentheses change the tree shape, they also change the expression's overall result again. Figure 21-5 shows the resulting pop up after clicking the root node in this display.

Figure 21-4. Parse tree built for (1 + 3) * (2 * (3 + 4))

Figure 21-5. Clicking and evaluating the root node

Depending on the operators used within an expression, some very differently shaped trees yield the same result when evaluated. For instance, Figure 21-6 shows a more left-heavy tree generated from a different expression string that evaluates to 56 nevertheless.

Figure 21-6. Parse tree for (1 + 3) * 2 * (3 + 4), result=56

Finally, Figure 21-7 shows a parsed assignment statement; clicking the set root assigns the variable spam, and clicking the node spam then evaluates to -4. If you find the parser puzzling, try running PyTree like this on your computer to get a better feel for the parsing process. (I'd like to show more example trees, but I ran out of page real estate at this point in the book.)

Figure 21-7. Assignment, left-grouping: "set spam 1 - 2 - 3"

21.6.6. Parsers Versus Python The handcoded parser programs shown earlier illustrate some interesting concepts and underscore the power of Python for general-purpose programming. Depending on your job description, they may also be typical of the sort of thing you'd write regularly in a traditional language such as C. Parsers are an important component in a wide variety of applications, but in some cases, they're not as necessary as you might think. Let me explain why. So far, we started with an expression parser and added a parse tree interpreter to make the code easier to modify. As is, the parser works, but it may be slow compared to a C implementation. If the parser is used frequently, we could speed it up by moving parts to C extension modules. For instance, the scanner might be moved to C initially, since it's often called from the parser. Ultimately, we might add components to the grammar that allow expressions to access application-specific variables and functions. All of these steps constitute good engineering. But depending on your application, this approach may not be the best one in Python. Often the easiest way to evaluate input expressions in Python is to let Python do it, by calling the eval built-in function. In fact, we can usually replace the entire expression evaluation program with one function call. The next example will demonstrate how this is done. More important, the next section underscores a core idea behind the language: if you already have an extensible, embeddable, high-level language system, why invent another? Python itself can often satisfy language-based component needs.

21.7. PyCalc: A Calculator Program/Object To wrap up this chapter, I'm going to show you a practical application for some of the parsing technology introduced in the preceding section. This section presents PyCalc, a Python calculator program with a graphical interface similar to the calculator programs available on most window systems. But like most of the GUI examples in this book, PyCalc offers a few advantages over existing calculators. Because PyCalc is written in Python, it is both easily customized and widely portable across window platforms. And because it is implemented with classes, it is both a standalone program and a reusable object library.

21.7.1. A Simple Calculator GUI Before I show you how to write a full-blown calculator, though, the module shown in Example 21-13 starts this discussion in simpler terms. It implements a limited calculator GUI, whose buttons just add text to the input field at the top in order to compose a Python expression string. Fetching and running the string all at once produces results. Figure 21-8 shows the window this module makes when run as a top-level script.

Figure 21-8. The calc0 script in action on Windows (result=160.283)

Example 21-13. PP3E\Lang\Calculator\calc0.py

#!/usr/local/bin/python # a simple calculator GUI: expressions run all at once with eval/exec from Tkinter import * from PP3E.Dbase.TableBrowser.guitools import frame, button, entry class CalcGui(Frame): def _ _init_ _(self, parent=None): Frame._ _init_ _(self, parent) self.pack(expand=YES, fill=BOTH) self.master.title('Python Calculator 0.1') self.master.iconname("pcalc1") self.names = {} text = StringVar( ) entry(self, TOP, text)

# an extended frame # on default top-level # all parts expandable # 6 frames plus entry

# namespace for variables

rows = ["abcd", "0123", "4567", "89( )"] for row in rows: frm = frame(self, TOP) for char in row: button(frm, LEFT, char, lambda char=char: text.set(text.get( ) + char)) frm = frame(self, TOP) for char in "+-*/=": button(frm, LEFT, char, lambda char=char: text.set(text.get( )+ ' ' + char + ' ')) frm = frame(self, BOTTOM) button(frm, LEFT, 'eval', lambda: self.eval(text) ) button(frm, LEFT, 'clear', lambda: text.set('') ) def eval(self, text): try: text.set(str(eval(text.get( ), self.names, self.names))) # was 'x' except SyntaxError: try: exec(text.get( ), self.names, self.names) except: text.set("ERROR") # bad as statement too? else: text.set('') # worked as a statement except: text.set("ERROR") # other eval expression errors if _ _name_ _ == '_ _main_ _': CalcGui().mainloop( )

21.7.1.1. Building the GUI Now, this is about as simple as a calculator can be, but it demonstrates the basics. This window

comes up with buttons for entry of numbers, variable names, and operators. It is built by attaching buttons to frames: each row of buttons is a nested Frame, and the GUI itself is a Frame subclass with an attached Entry and six embedded row frames (grids would work here too). The calculator's frame, entry field, and buttons are made expandable in the imported guitools utility module. This calculator builds up a string to pass to the Python interpreter all at once on "eval" button presses. Because you can type any Python expression or statement in the entry field, the buttons are really just a convenience. In fact, the entry field isn't much more than a command line. Try typing import sys, and then dir(sys) to display sys module attributes in the input field at the topit's not what you normally do with a calculator, but it is demonstrative nevertheless.[*] [*]

Once again, I need to warn you about running strings like this if you can't be sure they won't cause damage. If these strings can be entered by users you cannot trust, they will have access to anything on the computer that the Python process has access to. See the Chapter 18 discussion of the (now defunct) rexec module for more on this topic.

In CalcGui's constructor, buttons are coded as lists of strings; each string represents a row and each character in the string represents a button. Lambdas are used to save extra callback data for each button. The callback functions retain the button's character and the linked text entry variable so that the character can be added to the end of the entry widget's current string on a press. Notice how we must pass in the loop variable as a default argument to some lambdas in this code. Recall from Chapter 8 how references within a lambda (or nested def ) to names in an enclosing scope are evaluated when the nested function is called, not when it is created. When the generated function is called, enclosing scope references inside the lambda reflect their latest setting in the enclosing scope, which is not necessarily the values they held when the lambda expression ran. By contrast, defaults are evaluated at function creation time instead and so can remember the current values of loop variables. Without the defaults, each button would reflect the last iteration of the loop.

Lesson 4: Embedding Beats Parsers The calculator uses eval and exec to call Python's parser/interpreter at runtime instead of analyzing and evaluating expressions manually. In effect, the calculator runs embedded Python code from a Python program. This works because Python's development environment (the parser and bytecode compiler) is always a part of systems that use Python. Because there is no difference between the development and the delivery environments, Python's parser can be used by Python programs. The net effect here is that the entire expression evaluator has been replaced with a single call to eval. In broader terms, this is a powerful technique to remember: the Python language itself can replace many small, custom languages. Besides saving development time, clients have to learn just one language, one that's potentially simple enough for end-user coding. Furthermore, Python can take on the flavor of any application. If a language interface requires application-specific extensions, just add Python classes, or export an API for use in embedded Python code as a C extension. By evaluating Python code that uses application-specific extensions, custom parsers become almost completely unnecessary. There's also a critical added benefit to this approach: embedded Python code has access to all the tools and features of a powerful, full-blown programming language. It can use lists, functions, classes, external modules, and even larger Python tools like Tkinter GUIs, shelve storage, multiple threads, network sockets, and web page fetches. You'd probably spend years trying to provide similar functionality in a custom language parser. Just ask Guido.

21.7.1.2. Running code strings This module implements a GUI calculator in 45 lines of code (counting comments and blank lines). But to be honest, it cheats: expression evaluation is delegated to Python. In fact, the built-in eval and exec tools do most of the work here:

eval Parses, evaluates, and returns the result of a Python expression represented as a string.

exec Runs an arbitrary Python statement represented as a string; there's no return value because the code is a string. Both accept optional dictionaries to be used as global and local namespaces for assigning and evaluating names used in the code strings. In the calculator, self.names becomes a symbol table for running calculator expressions. A related Python function, compile, can be used to precompile code strings to code objects before passing them to eval and exec (use it if you need to run the same string many times). By default, a code string's namespace defaults to the caller's namespaces. If we didn't pass in dictionaries here, the strings would run in the eval method's namespace. Since the method's local namespace goes away after the method call returns, there would be no way to retain names assigned in the string. Notice the use of nested exception handlers in the eval method:

1. It first assumes the string is an expression and tries the built-in eval function. 2. If that fails due to a syntax error, it tries evaluating the string as a statement using exec. 3. Finally, if both attempts fail, it reports an error in the string (a syntax error, undefined name, and so on). Statements and invalid expressions might be parsed twice, but the overhead doesn't matter here, and you can't tell whether a string is an expression or a statement without parsing it manually. Note that the "eval" button evaluates expressions, but = sets Python variables by running an assignment statement. Variable names are combinations of the letter keys "abcd" (or any name typed directly). They are assigned and evaluated in a dictionary used to represent the calculator's namespace.

21.7.1.3. Extending and attaching Clients that reuse this calculator are as simple as the calculator itself. Like most class-based Tkinter GUIs, this one can be extended in subclassesExample 21-14 customizes the simple calculator's constructor to add extra widgets.

Example 21-14. PP3E\Lang\Calculator\calc0ext.py

from Tkinter import * from calc0 import CalcGui class Inner(CalcGui): def _ _init_ _(self): CalcGui._ _init_ _(self) Label(self, text='Calc Subclass').pack( ) Button(self, text='Quit', command=self.quit).pack( )

# extend GUI

# add after # top implied

Inner().mainloop( )

It can also be embedded in a container classExample 21-15 attaches the simple calculator's widget package, along with extras, to a common parent.

Example 21-15. PP3E\Lang\Calculator\calc0emb.py

from Tkinter import * from calc0 import CalcGui

# add parent, no master calls

class Outer: def _ _init_ _(self, parent): # embed GUI Label(parent, text='Calc Attachment').pack( ) # side=top CalcGui(parent) # add calc frame Button(parent, text='Quit', command=parent.quit).pack( ) root = Tk( ) Outer(root) root.mainloop( )

Figure 21-9 shows the result of running both of these scripts from different command lines. Both have a distinct input field at the top. This works; but to see a more practical application of such reuse techniques, we need to make the underlying calculator more practical too.

Figure 21-9. The calc0 script's object attached and extended

21.7.2. PyCalcA Real Calculator GUI Of course, real calculators don't usually work by building up expression strings and evaluating them all at once; that approach is really little more than a glorified Python command line. Traditionally, expressions are evaluated in piecemeal fashion as they are entered, and temporary results are displayed as soon as they are computed. Implementing this behavior requires a bit more work: expressions must be evaluated manually and in parts, instead of calling the eval function only once. But the end result is much more useful and intuitive.

Lesson 5: Reusability Is Power Though simple, attaching and subclassing the calculator graphically, as shown in Figure 21-9, illustrates the power of Python as a tool for writing reusable software. By coding programs with modules and classes, components written in isolation almost automatically become general-purpose tools. Python's program organization features promote reusable code. In fact, code reuse is one of Python's major strengths and has been one of the main themes of this book thus far. Good object-oriented design takes some practice and forethought, and the benefits of code reuse aren't apparent immediately. And sometimes we're more interested in a quick fix rather than a future use for the code. But coding with some reusability in mind can save development time in the long run. For instance, the handcoded parsers shared a scanner, the calculator GUI uses the guitools module we discussed earlier, and the next section will reuse the GuiMixin class. Sometimes we're able to finish part of a job before we start.

This section presents the implementation of PyCalca Python/Tkinter program that implements such a traditional calculator GUI. It touches on the subject of text and languages in two ways: it parses and evaluates expressions, and it implements a kind of stack-based language to perform the evaluation.

Although its evaluation logic is more complex than the simpler calculator shown earlier, it demonstrates advanced programming techniques and serves as an interesting finale for this chapter.

21.7.2.1. Running PyCalc As usual, let's look at the GUI before the code. You can run PyCalc from the PyGadgets and PyDemos launcher bars at the top of the examples tree, or by directly running the file calculator.py listed shortly (e.g., click it in a file explorer). Figure 21-10 shows PyCalc's main window. By default, it shows operand buttons in black-on-blue (and opposite for operator buttons), but font and color options can be passed into the GUI class's constructor method. Of course, that means gray-on-gray in this book, so you'll have to run PyCalc yourself to see what I mean.

Figure 21-10. PyCalc calculator at work on Windows

If you do run this, you'll notice that PyCalc implements a normal calculator modelexpressions are evaluated as entered, not all at once at the end. That is, parts of an expression are computed and displayed as soon as operator precedence and manually typed parentheses allow. I'll explain how this evaluation works in a moment. PyCalc's CalcGui class builds the GUI interface as frames of buttons much like the simple calculator of the previous section, but PyCalc adds a host of new features. Among them are another row of action buttons, inherited methods from GuiMixin (presented in Chapter 11), a new "cmd" button that pops up nonmodal dialogs for entry of arbitrary Python code, and a recent calculations history pop up. Figure 21-11 captures some of PyCalc's pop-up windows.

Figure 21-11. PyCalc calculator with some of its pop ups

You may enter expressions in PyCalc by clicking buttons in the GUI, typing full expressions in command-line pop ups, or typing keys on your keyboard. PyCalc intercepts key press events and interprets them the same as corresponding button presses; typing + is like pressing the + button, the Space bar key is "clear," Enter is "eval," backspace erases a character, and ? is like pressing "help." The command-line pop-up windows are nonmodal (you can pop up as many as you like). They accept any Python codepress the Run button or your Enter key to evaluate text in the input field. The result of evaluating this code in the calculator's namespace dictionary is thrown up in the main window for use in larger expressions. You can use this as an escape mechanism to employ external tools in your calculations. For instance, you can import and use functions coded in Python or C within these pop ups. The current value in the main calculator window is stored in newly opened command-line pop ups too, for use in typed expressions. PyCalc supports long integers (unlimited precision), negatives, and floating-point numbers just because Python does. Individual operands and expressions are still evaluated with the eval built-in, which calls the Python parser/interpreter at runtime. Variable names can be assigned and referenced in the main window with the letter, =, and "eval" keys; they are assigned in the calculator's namespace dictionary (more complex variable names may be typed in command-line pop ups). Note the use of pi in the history window: PyCalc preimports names in the math and random modules into the namespace where expressions are evaluated.

21.7.2.2. Evaluating expressions with stacks Now that you have the general idea of what PyCalc does, I need to say a little bit about how it does what it does. Most of the changes in this version involve managing the expression display and

evaluating expressions. PyCalc is structured as two classes:

The CalcGui class Manages the GUI itself. It controls input events and is in charge of the main window's display field at the top. It doesn't evaluate expressions, though; for that, it sends operators and operands entered in the GUI to an embedded instance of the Evaluator class. The Evaluator class Manages two stacks. One stack records pending operators (e.g., +), and one records pending operands (e.g., 3.141). Temporary results are computed as new operators are sent from CalcGui and pushed onto the operands stack. As you can see from this, the magic of expression evaluation boils down to juggling the operator and operand stacks. In a sense, the calculator implements a little stack-based language, to evaluate the expressions being entered. While scanning expression strings from left to right as they are entered, operands are pushed along the way, but operators delimit operands and may trigger temporary results before they are pushed. Because it records states and performs transitions, some might use the term state machine to describe this calculator language implementation. Here's the general scenario:

1. When a new operator is seen (i.e., when an operator button or key is pressed), the prior operand in the entry field is pushed onto the operands stack. 2. The operator is then added to the operators stack, but only after all pending operators of higher precedence have been popped and applied to pending operands (e.g., pressing + makes any pending * operators on the stack fire). 3. When "eval" is pressed, all remaining operators are popped and applied to all remaining operands, and the result is the last remaining value on the operands stack. In the end, the last value on the operands stack is displayed in the calculator's entry field, ready for use in another operation. This evaluation algorithm is probably best described by working through examples. Let's step through the entry of a few expressions and watch the evaluation stacks grow. PyCalc stack tracing is enabled with the debugme flag in the module; if true, the operator and operand stacks are displayed on stdout each time the Evaluator class is about to apply an operator and reduce (pop) the stacks. Run PyCalc with a console window to see the traces. A tuple holding the stack lists (operators, operands) is printed on each stack reduction; tops of stack are at the ends of the lists. For instance, here is the console output after typing and evaluating a simple string:

1) Entered keys: "5 * 3 + 4 <eval>" [result = 19] (['*'], ['5', '3']) (['+'], ['15', '4'])

[on '+' press: displays "15"] [on 'eval' press: displays "19"]

Note that the pending (stacked) * subexpression is evaluated when the + is pressed: * operators bind tighter than +, so the code is evaluated immediately before the + operator is pushed. When the +

button is pressed, the entry field contains 3; we push 3 onto the operands stack, reduce the * subexpression (5 * 3), push its result onto operands, push + onto operators, and continue scanning user inputs. When "eval" is pressed at the end, 4 is pushed onto operators, and the final + on operators is applied to stacked operands. The text input field and expression stacks are integrated by the calculator class. In general, the text entry field always holds the prior operand when an operator button is pressed; the text in the entry field is pushed onto the operands stack before the operator is resolved. Because of this, we have to pop results before displaying them after "eval" or ) is pressed (otherwise the results are pushed onto the stack twicethey would be both on the stack and in the display field, from which they would be immediately pushed again when the next operator is input). When an operator is seen (or "eval" or ) is applied), we also have to take care to erase the entry field when the next operand's entry is started. Expression stacks also defer operations of lower precedence as the input is scanned. In the next trace, the pending + isn't evaluated when the * button is pressed: since * binds tighter, we need to postpone the + until the * can be evaluated. The * operator isn't popped until its right operand 4 has been seen. There are two operators to pop and apply to operand stack entries on the "eval" pressthe * at the top of operators is applied to the 3 and 4 at the top of operands, and then + is run on 5 and the 12 pushed for *:

2) "5 + 3 * 4 <eval>" [result = 17] (['+', '*'], ['5', '3', '4']) (['+'], ['5', '12'])

[on 'eval' press] [displays "17"]

For strings of same-precedence operators such as the following, we pop and evaluate immediately as we scan left to right, instead of postponing evaluation. This results in a left-associative evaluation, in the absence of parentheses: 5+3+4 is evaluated as ((5+3)+4). For + and * operations this is irrelevant because order doesn't matter:

3) "5 + 3 + 4 <eval>" [result = 12] (['+'], ['5', '3']) (['+'], ['8', '4'])

[on the second '+'] [on 'eval']

The following trace is more complex. In this case, all the operators and operands are stacked (postponed) until we press the ) button at the end. To make parentheses work, ( is given a higher precedence than any operator and is pushed onto the operators stack to seal off lower stack reductions until the ) is seen. When the ) button is pressed, the parenthesized subexpression is popped and evaluated ((3 * 4), then (1 + 12)), and 13 is displayed in the entry field. On pressing "eval," the rest is evaluated ((3 * 13), (1 +39)), and the final result (40) is shown. This result in the entry field itself becomes the left operand of a future operator.

4) "1 + 3 * ( 1 + 3 * 4 ) <eval>" [result = 40] (['+', '*', '(', '+', '*'], ['1', '3', '1', '3', '4']) (['+', '*', '(', '+'], ['1', '3', '1', '12']) (['+', '*'], ['1', '3', '13'])

[on ')'] [displays "13"] [on 'eval']

(['+'], ['1', '39'])

In fact, any temporary result can be used again: if we keep pressing an operator button without typing new operands, it's reapplied to the result of the prior pressthe value in the entry field is pushed twice and applied to itself each time. Press * many times after entering 2 to see how this works (e.g., 2***). On the first *, it pushes 2 and the *. On the next *, it pushes 2 again from the entry field, pops and evaluates the stacked (2 * 2), pushes back and displays the result, and pushes the new *. And on each following *, it pushes the currently displayed result and evaluates again, computing successive squares. Figure 21-12 shows how the two stacks look at their highest level while scanning the expression in the prior example trace. On each reduction, the top operator is applied to the top two operands and the result is pushed back for the operator below. Because of the way the two stacks are used, the effect is similar to converting the expression to a string of the form +1*3(+1*34 and evaluating it right to left. In other cases, though, parts of the expression are evaluated and displayed as temporary results along the way, so it's not simply a string conversion process.

Figure 21-12. Evaluation stacks: 1 + 3 * (1 + 3 * 4)

Finally, the next example's string triggers an error. PyCalc is casual about error handling. Many errors are made impossible by the algorithm itself, but things such as unmatched parentheses still trip up the evaluator. Instead of trying to detect all possible error cases explicitly, a general TRy statement in the reduce method is used to catch them all: expression errors, numeric errors, undefined name errors, syntax errors, and so on. Operands and temporary results are always stacked as strings, and each operator is applied by calling eval. When an error occurs inside an expression, a result operand of *ERROR* is pushed, which makes all remaining operators fail in eval too. *ERROR* essentially percolates to the top of the expression. At the end, it's the last operand and is displayed in the text entry field to alert you of the mistake:

5) "1 + 3 * ( 1 + 3 * 4 <eval>" [result = *ERROR*] (['+', (['+', (['+', (['+',

'*', '(', '+', '*'], ['1', '3', '1', '3', '4']) '*', '(', '+'], ['1', '3', '1', '12']) '*', '('], ['1', '3', '13']) '*'], ['1', '*ERROR*'])

[on eval]

(['+'], ['*ERROR*']) (['+'], ['*ERROR*', '*ERROR*'])

Try tracing through these and other examples in the calculator's code to get a feel for the stackbased evaluation that occurs. Once you understand the general shift/reduce (push/pop) mechanism, expression evaluation is straightforward.

21.7.2.3. PyCalc source code Example 21-16 contains the PyCalc source module that puts these ideas to work in the context of a GUI. It's a single-file implementation (not counting utilities imported and reused). Study the source for more details; as usual, there's no substitute for interacting with the program on your own to get a better feel for its functionality. Also see the opening comment's "to do" list for suggested areas for improvement. Like all software systems, this calculator is prone to evolve over time (and in fact it has, with each new edition of this book). Since it is written in Python, such future mutations will be easy to apply.

Example 21-16. PP3E\Lang\Calculator\calculator.py

#!/usr/local/bin/python ############################################################################## # PyCalc 3.0: a Python/Tkinter calculator program and GUI component. # evaluates expressions as they are entered, catches keyboard keys for # expression entry; 2.0 added integrated command-line popups, a recent # calculations history display popup, fonts and colors configuration, # help and about popups, preimported math/random constants, and more; # # 3.0 changes (PP3E): # -use 'readonly' entry state, not 'disabled', else field is greyed # out (fix for 2.3 Tkinter change); # -avoid extended display precision for floats by using str( ), instead # of 'x'/repr( ) (fix for Python change); # -apply font to input field to make it larger; # -use justify=right for input field so it displays on right, not left; # -add 'E+' and 'E-' buttons (and 'E' keypress) for float exponents; # -remove 'L' button (but still allow 'L' keypress): superfluous now, # because Python auto converts up if too big ('L' forced this in past); # -use smaller font size overall; # -use windows.py module to get a window icon; # -auto scroll to the end in the history window # # to do: add a commas-insertion mode, allow '**' as a operator key, allow # '+' and 'J' inputs for complex numbers, use new decimal type for fixed # precision floats; as is, can use 'cmd' popup windows to input and eval # an initial complex, complex exprs, and 'E' '-' key sequences, but can't # be input via main window; caveat: this calulator's precision, accuracy, # and some of its behaviour, is currently bound by result of str( ) call; #

# note that the new nested scopes simplify some lambdas here, but we have to # use defaults to pass in some scope values in lambdas here anyhow, because # enclosing scope names are looked up when the nested function is called, not # when it is created (but defaults are); when the generated function is # called enclosing scope refs are whatever they were set to last in the # enclosing function's block, not what they were when the lambda ran; ############################################################################## from Tkinter import * from PP3E.Gui.Tools.guimixin import GuiMixin from PP3E.Dbase.TableBrowser.guitools import * Fg, Bg, Font = 'black', 'skyblue', ('courier', 14, 'bold')

# # # #

widgets, consts quit method widget builders default config

debugme = 1 def trace(*args): if debugme: print args

############################################################################## # the main class - handles user interface; # an extended Frame, on new Toplevel, orembedded in another container widget ############################################################################## class CalcGui(GuiMixin, Frame): Operators = "+-*/=" Operands = ["abcd", "0123", "4567", "89( )"]

# button lists # customizable

def _ _init_ _(self, parent=None, fg=Fg, bg=Bg, font=Font): Frame._ _init_ _(self, parent) self.pack(expand=YES, fill=BOTH) # all parts expandable self.eval = Evaluator( ) # embed a stack handler self.text = StringVar( ) # make a linked variable self.text.set("0") self.erase = 1 # clear "0" text next self.makeWidgets(fg, bg, font) # build the GUI itself if not parent or not isinstance(parent, Frame): self.master.title('PyCalc 3.0') # title iff owns window self.master.iconname("PyCalc") # ditto for key bindings self.master.bind('', self.onKeyboard) self.entry.config(state='readonly') # 3.0: not 'disabled'=grey else: self.entry.config(state='normal') self.entry.focus( ) def makeWidgets(self, fg, bg, font): # 7 frames plus text-entry self.entry = entry(self, TOP, self.text) # font, color configurable self.entry.config(font=font) # 3.0: make display larger self.entry.config(justify=RIGHT) # 3.0: on right, not left for row in self.Operands: frm = frame(self, TOP) for char in row: button(frm, LEFT, char, lambda op=char: self.onOperand(op), fg=fg, bg=bg, font=font) frm = frame(self, TOP)

for char in self.Operators: button(frm, LEFT, char, lambda op=char: self.onOperator(op), fg=bg, bg=fg, font=font) frm = frame(self, button(frm, LEFT, button(frm, LEFT, button(frm, LEFT, button(frm, LEFT, button(frm, LEFT, button(frm, LEFT,

TOP) 'dot ', ' E+ ', ' E- ', 'cmd ', 'help', 'quit',

frm = frame(self, button(frm, LEFT, button(frm, LEFT, button(frm, LEFT,

BOTTOM) 'eval ', self.onEval) 'hist ', self.onHist) 'clear', self.onClear)

lambda: self.onOperand('.')) lambda: self.text.set(self.text.get( )+'E+')) lambda: self.text.set(self.text.get( )+'E-')) self.onMakeCmdline) self.help) self.quit) # from guimixin

def onClear(self): self.eval.clear( ) self.text.set('0') self.erase = 1 def onEval(self): self.eval.shiftOpnd(self.text.get( )) self.eval.closeall( ) self.text.set(self.eval.popOpnd( )) self.erase = 1

# last or only opnd # apply all optrs left # need to pop: optr next?

def onOperand(self, char): if char == '(': self.eval.open( ) self.text.set('(') # clear text next self.erase = 1 elif char == ')': self.eval.shiftOpnd(self.text.get( )) # last or only nested opnd self.eval.close( ) # pop here too: optr next? self.text.set(self.eval.popOpnd( )) self.erase = 1 else: if self.erase: self.text.set(char) # clears last value else: self.text.set(self.text.get( ) + char) # else append to opnd self.erase = 0 def onOperator(self, char): self.eval.shiftOpnd(self.text.get( )) self.eval.shiftOptr(char) self.text.set(self.eval.topOpnd( )) self.erase = 1 def onMakeCmdline(self): new = Toplevel( ) new.title('PyCalc command line') frm = frame(new, TOP)

# push opnd on left # eval exprs to left? # push optr, show opnd|result # erased on next opnd|'('

# new top-level window # arbitrary Python code # only the Entry expands

label(frm, LEFT, '>>>').pack(expand=NO) var = StringVar( ) ent = entry(frm, LEFT, var, width=40) onButton = (lambda: self.onCmdline(var, ent)) onReturn = (lambda event: self.onCmdline(var, ent)) button(frm, RIGHT, 'Run', onButton).pack(expand=NO) ent.bind('', onReturn) var.set(self.text.get( )) def onCmdline(self, var, ent): # eval cmdline pop-up input try: value = self.eval.runstring(var.get( )) var.set('OKAY') if value != None: # run in eval namespace dict self.text.set(value) # expression or statement self.erase = 1 var.set('OKAY => '+ value) except: # result in calc field var.set('ERROR') # status in pop-up field ent.icursor(END) # insert point after text ent.select_range(0, END) # select msg so next key deletes def onKeyboard(self, event): pressed = event.char # on keyboard press event if pressed != '': # pretend button was pressed if pressed in self.Operators: self.onOperator(pressed) else: for row in self.Operands: if pressed in row: self.onOperand(pressed) break else: if pressed == '.': self.onOperand(pressed) # can start opnd if pressed in 'LlEe': self.text.set(self.text.get( )+pressed) # can't: no erase elif pressed == '\r': self.onEval( ) # enter key=eval elif pressed == ' ': self.onClear( ) # spacebar=clear elif pressed == '\b': self.text.set(self.text.get( )[:-1]) # backspace elif pressed == '?': self.help( ) def onHist(self): # show recent calcs log popup from ScrolledText import ScrolledText new = Toplevel( ) ok = Button(new, text="OK", command=new.destroy) ok.pack(pady=1, side=BOTTOM) text = ScrolledText(new, bg='beige') text.insert('0.0', self.eval.getHist( )) text.see(END) text.pack(expand=YES, fill=BOTH)

# make new window # pack first=clip last # add Text + scrollbar # get Evaluator text # 3.0: scroll to end

# new window goes away on ok press or enter key new.title("PyCalc History") new.bind("", (lambda event: new.destroy( ))) ok.focus_set( ) # make new window modal: new.grab_set( ) # get keyboard focus, grab app new.wait_window( ) # don't return till new.destroy def help(self): self.infobox('PyCalc', 'PyCalc 3.0\n' 'A Python/Tk calculator\n' 'Programming Python 3E\n' 'June, 2005\n' '(2.0 1999, 1.0 1996)\n\n' 'Use mouse or keyboard to\n' 'input numbers and operators,\n' 'or type code in cmd popup')

############################################################################## # the expression evaluator class # embedded in and used by a CalcGui instance, to perform calculations ############################################################################## class Evaluator: def _ _init_ _(self): self.names = {} self.opnd, self.optr = [], [] self.hist = [] self.runstring("from math import *") self.runstring("from random import *") def clear(self): self.opnd, self.optr = [], [] if len(self.hist) > 64: self.hist = ['clear'] else: self.hist.append('--clear--')

# # # # #

a names-space for my vars two empty stacks my prev calcs history log preimport math modules into calc's namespace

# leave names intact # don't let hist get too big

def popOpnd(self): value = self.opnd[-1] self.opnd[-1:] = [] return value

# pop/return top|last opnd # to display and shift next # or x.pop( ), or del x[-1]

def topOpnd(self): return self.opnd[-1]

# top operand (end of list)

def open(self): self.optr.append('(')

# treat '(' like an operator

def close(self): self.shiftOptr(')') self.optr[-2:] = []

# on ')' pop downto higest '(' # ok if empty: stays empty # pop, or added again by optr

def closeall(self): while self.optr:

# force rest on 'eval'

self.reduce( ) # last may be a var name try: self.opnd[0] = self.runstring(self.opnd[0]) except: self.opnd[0] = '*ERROR*' # pop else added again next: afterMe = {'*': '/': '+': '-': ')': '=':

['+', ['+', ['(', ['(', ['(', ['(']

'-', '(', '='], '-', '(', '='], '='], '='], '='], }

def shiftOpnd(self, newopnd): self.opnd.append(newopnd)

# # # # #

class member optrs to not pop for key if prior optr is this: push else: pop/eval prior optr all left-associative as is

# push opnd at optr, ')', eval

def shiftOptr(self, newoptr): # apply ops with /* module functions */ static PyObject * /* returns object */ message(PyObject *self, PyObject *args) /* self unused in modules */ { /* args from Python call */ char *fromPython, result[64]; if (! PyArg_Parse(args, "(s)", &fromPython)) /* convert Python -> C */ return NULL; /* null=raise exception */ else { strcpy(result, "Hello, "); /* build up C string */ strcat(result, fromPython); /* add passed Python string */ return Py_BuildValue("s", result); /* convert C -> Python */ } } /* registration table */ static struct PyMethodDef hello_methods[] = { {"message", message, 1}, /* method name, C func ptr, always-tuple */ {NULL, NULL} /* end of table marker */ }; , /* module initializer */ void inithello( ) /* called on first import */ { /* name matters if loaded dynamically */ (void) Py_InitModule("hello", hello_methods); /* mod name, table ptr */

}

Ultimately, Python code will call this C file's message function, passing in a string object and getting back a new string object. First, though, it has to be somehow linked into the Python interpreter. To use this C file in a Python script, compile it into a dynamically loadable object file (e.g., hello.so on Linux, hello.dll under Cygwin on Windows) with a makefile like the one listed in Example 22-2, and drop the resulting object file into a directory listed on your module import search path exactly as though it were a .py or .pyc file.

Example 22-2. PP3E\Integrate\Extend\Hello\makefile.hello

############################################################# # Compile hello.c into a shareable object file on Cygwin, # to be loaded dynamically when first imported by Python. ############################################################# PYLIB = /usr/bin PYINC = /usr/include/python2.4 hello.dll: hello.c gcc hello.c -g -I$(PYINC) -shared -L$(PYLIB) -lpython2.4 -o hello.dll clean: rm -f hello.dll core

This is a Cygwin makefile that uses gcc to compile our C code; other platforms are analogous but will vary. As mentioned in Chapter 5 in the sidebar "Forking on Windows with Cygwin," Cygwin provides a Unix-like environment and libraries on Windows. To work along with the examples here, either see http://www.cygwin.com for download details or change the makefiles listed per your compiler and platform requirements. Be sure to include the path to Python's install directory with -I flags to access Python include (a.k.a. header) files, as well as the path to the Python binary library file with -L flags, if needed. Now, to use the makefile in Example 22-2 to build the extension module in Example 22-1, simply type a standard make command at your shell (the Cygwin shell is used here):

.../PP3E/Integrate/Extend/Hello$ make -f makefile.hello gcc hello.c -g -I/usr/include/python2.4 -shared -L/usr/bin -lpython2.4 -o hello.dll

This generates a shareable object filea .dll under Cygwin on Windows. When compiled this way, Python automatically loads and links the C module when it is first imported by a Python script. At import time, the .dll binary library file will be located in a directory on the Python import search path, just like a .py file. Because Python always searches the current working directory on imports, this chapter's examples will run from the directory you compile them in (.) without any file copies or

moves. In larger systems, you will generally place compiled extensions in a directory listed in PYTHONPATH or .pth files instead. Finally, to call the C function from a Python program, simply import the module hello and call its hello.message function with a string; you'll get back a normal Python string:

.../PP3E/Integrate/Extend/Hello$ python >>> import hello >>> hello.message('world') 'Hello, world' >>> hello.message('extending') 'Hello, extending'

# import a C module # call a C function

And that's ityou've just called an integrated C module's function from Python. The most important thing to notice here is that the C function looks exactly as if it were coded in Python. Python callers send and receive normal string objects from the call; the Python interpreter handles routing calls to the C function, and the C function itself handles Python/C data conversion chores. In fact, there is little to distinguish hello as a C extension module at all, apart from its filename. Python code imports the module and fetches its attributes as if it had been written in Python. C extension modules even respond to dir calls as usual and have the standard module and filename attributes (though the filename doesn't end in a .py or .pyc this time around):

>>> dir(hello) # C module attributes ['_ _doc_ _', '_ _file_ _', '_ _name_ _', 'message'] >>> hello._ _name_ _, hello._ _file_ _ ('hello', 'hello.dll') >>> hello.message >>> hello <module 'hello' from 'hello.dll'>

# a C function object # a C module object

Like any module in Python, you can also access the C extension from a script file. The Python file in Example 22-3, for instance, imports and uses the C extension module.

Example 22-3. PP3E\Integrate\Extend\Hello\hellouse.py

import hello print hello.message('C') print hello.message('module ' + hello._ _file_ _) for i in range(3): print hello.message(str(i))

Run this script as any otherwhen the script first imports the module hello, Python automatically finds the C module's .dll object file in a directory on the module search path and links it into the process dynamically. All of this script's output represents strings returned from the C function in the file hello.c:

.../PP3E/Integrate/Extend/Hello$ python hellouse.py Hello, C Hello, module /cygdrive/c/.../PP3E/Integrate/Extend/Hello/hello.dll Hello, 0 Hello, 1 Hello, 2

22.5. Extension Module Details Now that I've shown you the somewhat longer story, let's fill in the rest. The next few sections go into more detail on compilation and linking, code structure, data conversions, error handling, and reference counts. These are core ideas in Python C extensionssome of which we will later learn you can often largely forget.

22.5.1. Compilation and Linking You always must compile C extension files such as the hello.c example and somehow link them with the Python interpreter to make them accessible to Python scripts, but there is wide variability on how you might go about doing so. For example, a rule of the following form could be used to compile this C file on Linux too:

hello.so: hello.c gcc hello.c -c -g -fpic -I$(PYINC) -o hello.o gcc -shared hello.o -o hello.so rm -f hello.o

To compile the C file into a shareable object file on Solaris, you might instead say something like this:

hello.so: hello.c cc hello.c -c -KPIC -o hello.o ld -G hello.o -o hello.so rm hello.o

On other platforms, it's more different still. Because compiler options vary widely, you'll want to consult your C or C++ compiler's documentation or Python's extension manuals for platform- and compiler-specific details. The point is to determine how to compile a C source file into your platform's notion of a shareable or dynamically loaded object file. Once you have, the rest is easy; Python supports dynamic loading of C extensions on all major platforms today. Because build details vary so widely from machine to machine (and even compiler to compiler), the build scripts in this book will take some liberties with platform details. In general, most are shown under the Cygwin Unix-like environment on Windows, partly because it is a simpler alternative to a full Linux install and partly because this writer's background is primarily in Unix development. Be sure to translate for your own context. If you use standard Windows build tools, see also the directories PC and PCbuild in Python's current source distribution for pointers.

22.5.1.1. Dynamic binding

Technically, what I've been showing you so far is called dynamic binding, and it represents one of two ways to link compiled C extensions with the Python interpreter. Since the alternative, static binding, is more complex, dynamic binding is almost always the way to go. To bind dynamically, simply follow these steps:

1. Compile hello.c into a shareable object file for your system (e.g., .dll, .so). 2. Put the object file in a directory on Python's module search path. That is, once you've compiled the source code file into a shareable object file, simply copy or move the object file to a directory listed in sys.path (which includes PYTHONPATH and .pth path file settings). It will be automatically loaded and linked by the Python interpreter at runtime when the module is first imported anywhere in the Python processincluding imports from the interactive prompt, a standalone or embedded Python program, or a C API call. Notice that the only non-static name in the hello.c example C file is the initialization function. Python calls this function by name after loading the object file, so its name must be a C global and should generally be of the form initX, where X is both the name of the module in Python import statements and the name passed to Py_InitModule . All other names in C extension files are arbitrary because they are accessed by C pointer, not by name (more on this later). The name of the C source file is arbitrary tooat import time, Python cares only about the compiled object file.

22.5.1.2. Static binding Although dynamic binding is preferred in most applications, static binding allows extensions to be added to the Python interpreter in a more permanent fashion. This is more complex, though, because you must rebuild Python itself, and hence you need access to the Python source distribution (an interpreter executable won't do). Moreover, static linking of extensions is prone to change over time, so you should consult the README file at the top of Python's source distribution tree for current details.[*] [*]

In fact, starting with Python 2.1, the setup.py script at the top of the source distribution attempts to detect which modules can be built, and it automatically compiles them using the distutils system described in the next section. The setup.py script is run by Python's make system after building a minimal interpreter. This process doesn't always work, though, and you can still customize the configuration by editing the Modules/Setup file. As a more recent alternative, see also the example lines in Python's setup.py for xxmodule.c.

In short, though, one way to statically link the extension of Example 22-1 is to add a line such as the following:

hello ~/PP3E/Integrate/Extend/Hello/hello.c

to the Modules/Setup configuration file in the Python source code tree (change the ~ if this isn't in your home directory). Alternatively, you can copy your C file to the Modules directory (or add a link to it there with an ln command) and add a line to Setup, such as hello hello.c . Then, rebuild Python itself by running a make command at the top level of the Python source tree. Python reconstructs its own makefiles to include the module you added to Setup, such that your code becomes part of the interpreter and its libraries. In fact, there's really no distinction between C extensions written by Python users and services that are a standard part of the language; Python is built with this same interface. The full format of module declaration lines looks like this:

<module> ... [<sourceOrObjectFile> ...] [ ...] [ ...]

Under this scheme, the name of the module's initialization function must match the name used in the Setup file, or you'll get linking errors when you rebuild Python. The name of the source or object file doesn't have to match the module name; the leftmost name is the resulting Python module's name. This process and syntax are prone to change over time, so again, be sure to consult the README file at the top of Python's source tree.

22.5.1.3. Static versus dynamic binding Static binding works on any platform and requires no extra makefile to compile extensions. It can be useful if you don't want to ship extensions as separate files, or if you're on a platform without dynamic linking support. Its downsides are that you need to update Python configuration files and rebuild the Python interpreter itself, so you must therefore have the full source distribution of Python to use static linking at all. Moreover, all statically linked extensions are always added to your interpreter, regardless of whether they are used by a particular program. This can needlessly increase the memory needed to run all Python programs. With dynamic binding, you still need Python include files, but you can add C extensions even if all you have is a binary Python interpreter executable. Because extensions are separate object files, there is no need to rebuild Python itself or to access the full source distribution. And because object files are only loaded on demand in this mode, it generally makes for smaller executables tooPython loads into memory only the extensions actually imported by each program run. In other words, if you can use dynamic linking on your platform, you probably should.

22.5.2. Compiling with the Distutils System As an alternative to makefiles, it's possible to specify compilation of C extensions by writing Python scripts that use tools in the Distutils packagea standard part of Python that is used to build, install, and distribute Python extensions coded in Python or C. Its larger goal is automated building of distributed packages on target machines. We won't go into Distutils exhaustively in this text; see Python's standard distribution and installation manuals for more details. Among other things, Distutils is the de facto way to distribute larger Python packages these days. Its tools know how to install a system in the right place on target machines (usually, in Python's standard site-packages ) and handle many platform-specific details that are tedious and error prone to accommodate manually. For our purposes here, though, because Distutils also has built-in support for running common compilers on a variety of platforms (including Cygwin), it provides an alternative to makefiles for situations where the complexity of makefiles is either prohibitive or unwarranted. For example, to compile the C code in Example 22-1, we can code the makefile of Example 22-2, or we can code and run the Python script in Example 22-4.

Example 22-4. PP3E\Integrate\Extend\Hello\hellouse.py

# to build: python disthello.py build # resulting dll shows up in build subdir from distutils.core import setup, Extension setup(ext_modules=[Extension('hello', ['hello.c'])])

Example 22-4 is a Python script run by Python; it is not a makefile. Moreover, there is nothing in it about a particular compiler or compiler options. Instead, the Distutils tools it employs automatically detect and run an appropriate compiler for the platform, using compiler options that are appropriate for building dynamically linked Python extensions on that platform. For the Cygwin test machine, gcc is used to generate a .dll dynamic library ready to be imported into a Python scriptexactly like the result of the makefile in Example 22-2, but considerably simpler:

.../PP3E/Integrate/Extend/Hello$ python disthello.py build running build running build_ext building 'hello' extension creating build creating build/temp.cygwin-1.5.19-i686-2.4 gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/usr/include/python2.4 -c hello.c -o build/temp.cygwin-1.5.19-i686-2.4/hello.o hello.c:31: warning: function declaration isn't a prototype creating build/lib.cygwin-1.5.19-i686-2.4 gcc -shared -Wl,--enable-auto-image-base build/temp.cygwin-1.5.19-i686-2.4/hello .o -L/usr/lib/python2.4/config -lpython2.4 -o build/lib.cygwin-1.5.19-i686-2.4/hello.dll

The resulting binary library file shows up in the generated built subdirectory, but it's used in Python code just as before:

.../PP3E/Integrate/Extend/Hello$ cd build/lib.cygwin-1.5.19-i686-2.4/ .../PP3E/Integrate/Extend/Hello/build/lib.cygwin-1.5.19-i686-2.4$ ls hello.dll .../PP3E/Integrate/Extend/Hello/build/lib.cygwin-1.5.19-i686-2.4$ python >>> import hello >>> hello._ _file_ _ 'hello.dll' >>> hello.message('distutils') 'Hello, distutils'

Distutils scripts can become much more complex in order to specify build options; for example,

here is a slightly more verbose version of ours:

from distutils.core import setup, Extension setup(name='hello', version='1.0', ext_modules=[Extension('hello', ['hello.c'])])

Unfortunately, further details about both Distutils and makefiles are beyond the scope of this chapter and book. Especially if you're not used to makefiles, see the Python manuals for more details on Distutils. Makefiles are a traditional way to build code on some platforms and we will employ them in this book, but Distutils can sometimes be simpler in cases where they apply.

22.5.3. Anatomy of a C Extension Module Though simple, the hello.c code of Example 22-1 illustrates the structure common to all C modules. Most of it is glue code, whose only purpose is to wrap the C string processing logic for use in Python scripts. In fact, although this structure can vary somewhat, this file consists of fairly typical boilerplate code:

Python header files The C file first includes the standard Python.h header file (from the installed Python Include directory). This file defines almost every name exported by the Python API to C, and it serves as a starting point for exploring the API itself.

Method functions The file then defines a function to be called from the Python interpreter in response to calls in Python programs. C functions receive two Python objects as input, and send either a Python object back to the interpreter as the result or a NULL to trigger an exception in the script (more on this later). In C, a PyObject* represents a generic Python object pointer; you can use more specific type names, but you don't always have to. C module functions can be declared C static (local to the file) because Python calls them by pointer, not by name.

Registration table Near the end, the file provides an initialized table (array) that maps function names to function pointers (addresses). Names in this table become module attribute names that Python code uses to call the C functions. Pointers in this table are used by the interpreter to dispatch C function calls. In effect, the table "registers" attributes of the module. A NULL enTRy terminates the table.

Initialization function Finally, the C file provides an initialization function, which Python calls the first time this module is imported into a Python program. This function calls the API function Py_InitModule to build up the new module's attribute dictionary from the entries in the registration table and create an entry for the C module on the sys.modules table (described in Chapter 3). Once so initialized, calls from Python are routed directly to the C function through the registration table's function

pointers.

22.5.4. Data Conversions C module functions are responsible for converting Python objects to and from C datatypes. In Example 22-1, message gets two Python input objects passed from the Python interpreter: args is a Python tuple holding the arguments passed from the Python caller (the values listed in parentheses in a Python program), and self is ignored. It is useful only for extension types (discussed later in this chapter). After finishing its business, the C function can return any of the following to the Python interpreter: a Python object (known in C as PyObject*), for an actual result; a Python None (known in C as Py_None), if the function returns no real result; or a C NULL pointer, to flag an error and raise a Python exception. There are distinct API tools for handling input conversions (Python to C) and output conversions (C to Python). It's up to C functions to implement their call signatures (argument lists and types) by using these tools properly.

22.5.4.1. Python to C: using Python argument lists When the C function is run, the arguments passed from a Python script are available in the args Python tuple object. The API function PyArg_Parseand its cousin, PyArg_ParseTuple, which assumes it is converting a tuple objectis probably the easiest way to extract and convert passed arguments to C form. PyArg_Parse takes a Python object, a format string, and a variable-length list of C target addresses. It

converts the items in the tuple to C datatype values according to the format string, and it stores the results in the C variables whose addresses are passed in. The effect is much like C's scanf string function. For example, the hello module converts a passed-in Python string argument to a C char* using the s convert code:

PyArg_Parse(args, "(s)", &fromPython)

# or PyArg_ParseTuple(args, "s",...

To handle multiple arguments, simply string format codes together and include corresponding C targets for each code in the string. For instance, to convert an argument list holding a string, an integer, and another string to C, say this:

PyArg_Parse(args, "(sis)", &s1, &i, &s2)

# or PyArg_ParseTuple(args, "sis",...

To verify that no arguments were passed, use an empty format string like this:

PyArg_Parse(args,"( )")

This API call checks that the number and types of the arguments passed from Python match the format string in the call. If there is a mismatch, it sets an exception and returns zero to C (more on errors shortly).

22.5.4.2. Python to C: using Python return values As we'll see in Chapter 23, API functions may also return Python objects to C as results when Python is being run as an embedded language. Converting Python return values in this mode is almost the same as converting Python arguments passed to C extension functions, except that Python return values are not always tuples. To convert returned Python objects to C form, simply use PyArg_Parse. Unlike PyArg_ParseTuple, this call takes the same kinds of arguments but doesn't expect the Python object to be a tuple.

22.5.4.3. C to Python: returning values to Python There are two ways to convert C data to Python objects: by using type-specific API functions or via the general object-builder function, Py_BuildValue . The latter is more general and is essentially the inverse of PyArg_Parse, in that Py_BuildValue converts C data to Python objects according to a format string. For instance, to make a Python string object from a C char*, the hello module uses an s convert code:

return Py_BuildValue("s", result)

# "result" is a C char []/*

More specific object constructors can be used instead:

return PyString_FromString(result)

# same effect

Both calls make a Python string object from a C character array pointer. See the now-standard Python extension and runtime API manuals for an exhaustive list of such calls available. Besides being easier to remember, though, Py_BuildValue has syntax that allows you to build lists in a single step, described next.

22.5.4.4. Common conversion codes With a few exceptions, PyArg_Parse(Tuple) and Py_BuildValue use the same conversion codes in format strings. A list of all supported conversion codes appears in Python's extension manuals. The most commonly used are shown in Table 22-1; the tuple, list, and dictionary formats can be nested.

Table 22-1. Common Python/C data conversion codes

Format-string code

C datatype

Python object type

s

char*

String

s#

char*, int

String, length

i

int

Integer

l

long int

Integer

c

char

String

f

float

Floating-point

d

double

Floating-point

O

PyObject*

Raw (unconverted) object

O&

&converter, void*

Converted object (calls converter)

(items)

Targets or values

Nested tuple

[items]

Series of arguments/values

List

{items}

Series of key ,value arguments

Dictionary

These codes are mostly what you'd expect (e.g., i maps between a C int and a Python integer object), but here are a few usage notes on this table's entries: Pass in the address of a char* for s codes when converting to C, not the address of a char array: Python copies out the address of an existing C string (and you must copy it to save it indefinitely on the C side: use strdup or similar). The O code is useful to pass raw Python objects between languages; once you have a raw object pointer, you can use lower-level API tools to access object attributes by name, index and slice sequences, and so on. The O& code lets you pass in C converter functions for custom conversions. This comes in handy for special processing to map an object to a C datatype not directly supported by conversion codes (for instance, when mapping to or from an entire C struct or C++ class instance). See the extensions manual for more details. The last two entries, [...] and {...}, are currently supported only by Py_BuildValue : you can construct lists and dictionaries with format strings, but you can't unpack them. Instead, the API includes type-specific routines for accessing sequence and mapping components given a raw object pointer. PyArg_Parse supports some extra codes, which must not be nested in tuple formats ((...)):

| The remaining arguments are optional (varargs, much like the Python language's * arguments). The C targets are unchanged if arguments are missing in the Python tuple. For instance, si|sd requires two arguments but allows up to four.

: The function name follows, for use in error messages set by the call (argument mismatches).

Normally Python sets the error message to a generic string.

; A full error message follows, running to the end of the format string. This format code list isn't exhaustive, and the set of convert codes may expand over time; refer to Python's extension manual for further details.

22.5.5. Error Handling When you write C extensions, you need to be aware that errors can occur on either side of the languages fence. The following sections address both possibilities.

22.5.5.1. Raising Python exceptions in C C extension module functions return a C NULL value for the result object to flag an error. When control returns to Python, the NULL result triggers a normal Python exception in the Python code that called the C function. To name an exception, C code can also set the type and extra data of the exceptions it triggers. For instance, the PyErr_SetString API function sets the exception object to a Python object and sets the exception's extra data to a character string:

PyErr_SetString(ErrorObject, message)

We will use this in the next example to be more specific about exceptions raised when C detects an error. C modules may also set a built-in Python exception; for instance, returning NULL after saying this:

PyErr_SetString(PyExc_IndexError, "index out-of-bounds")

raises a standard Python IndexError exception with the message string data. When an error is raised inside a Python API function, both the exception object and its associated "extra data" are automatically set by Python; there is no need to set it again in the calling C function. For instance, when an argument-passing error is detected in the PyArg_Parse function, the hello stack module just returns NULL to propagate the exception to the enclosing Python layer, instead of setting its own message.

22.5.5.2. Detecting errors that occur in Python Python API functions may be called from C extension functions or from an enclosing C layer when Python is embedded. In either case, C callers simply check the return value to detect errors raised in Python API functions. For pointer result functions, Python returns NULL pointers on errors. For integer result functions, Python generally returns a status code of -1 to flag an error and a 0 or positive value

on success. (PyArg_Parse is an exception to this rule: it returns 0 when it detects an error.) To make your programs robust, you should check return codes for error indicators after most Python API calls; some calls can fail for reasons you may not have expected (e.g., memory overflow).

22.5.6. Reference Counts The Python interpreter uses a reference-count scheme to implement garbage collection. Each Python object carries a count of the number of places it is referenced; when that count reaches zero, Python reclaims the object's memory space automatically. Normally, Python manages the reference counts for objects behind the scenes; Python programs simply make and use objects without concern for managing storage space. When extending or embedding Python, though, integrated C code is responsible for managing the reference counts of the Python objects it uses. How important this becomes depends on how many raw Python objects a C module processes and which Python API functions it calls. In simple programs, reference counts are of minor, if any, concern; the hello module, for instance, makes no reference-count management calls at all. When the API is used extensively, however, this task can become significant. In later examples, we'll see calls of these forms show up:

Py_INCREF(obj) Increments an object's reference count.

Py_DECREF(obj) Decrements an object's reference count (reclaims if zero).

Py_XINCREF(obj) Behaves similarly to Py_INCREF(obj), but ignores a NULL object pointer.

Py_XDECREF(obj) Behaves similarly to py_DECREF(obj), but ignores a NULL object pointer. C module functions are expected to return either an object with an incremented reference count or NULL to signal an error. As a general rule, API functions that create new objects increment their reference counts before returning them to C; unless a new object is to be passed back to Python, the C program that creates it should eventually decrement the object's counts. In the extending scenario, things are relatively simple; argument object reference counts need not be decremented, and new result objects are passed back to Python with their reference counts intact. The upside of reference counts is that Python will never reclaim a Python object held by C as long as C increments the object's reference count (or doesn't decrement the count on an object it owns). Although it requires counter management calls, Python's garbage collector scheme is fairly well suited to C integration.

22.5.7. Other Extension Tasks: Threads Some C extensions may be required to perform additional tasks beyond data conversion, error handling, and reference counting. For instance, long-running C extension functions in threaded applications must release and later reacquire the global interpreter lock, so as to allow Python language threads to run in parallel. See the introduction to this topic in Chapter 5 for background details. Calls to long-running tasks implemented in C extensions, for example, are normally wrapped up in two C macros:

Py_BEGIN_ALLOW_THREADS ...Perform a potentially blocking operation... Py_END_ALLOW_THREADS

The first of these saves the thread state data structure in a local variable and releases the global lock; the second reacquires the lock and restores the thread state from the local variable. The net effect is to allow Python threads to run during the execution of the code in the enclosed block, instead of making them wait. The C code in the calling thread can run freely of and in parallel with other Python threads, as long as it doesn't reenter the Python C API until it reacquires the lock. The API has addition thread calls, and depending on the application, there may be other C coding requirements in general. In deference to space, though, and because we're about to meet a tool that automates much of our integration work, we'll defer to Python's integration manuals for additional details.

22.6. The SWIG Integration Code Generator But don't do that. As you can probably tell, manual coding of C extensions can become fairly involved (this is almost inevitable in C language work). I've introduced the basics in this chapter thus far so that you understand the underlying structure. But today, C extensions are usually better and more easily implemented with a tool that generates all the required integration glue code automatically. There are a variety of such tools for use in the Python world, including SIP, SWIG, and Boost.Python; we'll explore alternatives at the end of this chapter. Of these, the SWIG system is likely still the most widely used. The Simplified Wrapper and Interface Generator (SWIG) is an open source system created by Dave Beazley and now developed by its community, much like Python. It uses C and C++ type declarations to generate complete C extension modules that integrate existing libraries for use in Python scripts. The generated C (and C++) extension modules are complete: they automatically handle data conversion, error protocols, reference-count management, and more. That is, SWIG is a program that automatically generates all the glue code needed to plug C and C++ components into Python programs; simply run SWIG, compile its output, and your extension work is done. You still have to manage compilation and linking details, but the rest of the C extension task is largely performed by SWIG.

22.6.1. A Simple SWIG Example To use SIWG, instead of writing all that C code in the prior sections, write the C function you want to use from Python without any Python integration logic at all, as though it is to be used from C alone. For instance, Example 22-5 is a recoding of Example 22-1 as a straight C function.

Example 22-5. PP3E\Integrate\Extend\HelloLib\hellolib.c

/********************************************************************* * A simple C library file, with a single function, "message", * which is to be made available for use in Python programs. * There is nothing about Python here--this C function can be * called from a C program, as well as Python (with glue code). *********************************************************************/ #include <string.h> #include static char result[64]; char * message(char *label) { strcpy(result, "Hello, "); strcat(result, label); return result; }

/* this isn't exported */

/* this is exported */ /* build up C string */ /* add passed-in label */ /* return a temporary */

While you're at it, define the usual C header file to declare the function externally, as shown in Example 22-6. This is probably overkill for such a small example, but it will prove a point.

Example 22-6. PP3E\Integrate\Extend\HelloLib\hellolib.h

/******************************************************************** * Define hellolib.c exports to the C namespace, not to Python * programs--the latter is defined by a method registration * table in a Python extension module's code, not by this .h; ********************************************************************/ extern char *message(char *label);

Now, instead of all the Python extension glue code shown in the prior sections, simply write a SWIG type declarations input file, as in Example 22-7.

Example 22-7. PP3E\Integrate\Extend\Swig\hellolib.i

/****************************************************** * Swig module description file, for a C lib file. * Generate by saying "swig -python hellolib.i". ******************************************************/ %module hellowrap %{ #include %} extern char *message(char*);

/* or: %include "../HelloLib/hellolib.h" */ /* or: %include hellolib.h, and use -I arg */

This file spells out the C function's type signature. In general, SWIG scans files containing ANSI C and C++ declarations. Its input file can take the form of an interface description file (usually with a .i suffix) or a C/C++ header or source file. Interface files like this one are the most common input form; they can contain comments in C or C++ format, type declarations just like standard header files, and SWIG directives that all start with %. For example:

%module Sets the module's name as known to Python importers.

%{...%} Encloses code added to generated wrapper file verbatim. extern statements Declare exports in normal ANSI C/C++ syntax.

%include Makes SWIG scan another file (-I flags give search paths). In this example, SWIG could also be made to read the hellolib.h header file of Example 22-6 directly. But one of the advantages of writing special SWIG input files like hellolib.i is that you can pick and choose which functions are wrapped and exported to Python, and you may use directives to gain more control over the generation process. SWIG is a utility program that you run from your build scripts; it is not a programming language, so there is not much more to show here. Simply add a step to your makefile that runs SWIG and compile its output to be linked with Python. Example 22-8 shows one way to do it on Cygwin.

Example 22-8. PP3E\Integrate\Extend\Swig\makefile.hellolib-swig

################################################################## # Use SWIG to integrate hellolib.c for use in Python programs on # Cygwin. The DLL must have a leading "_" in its name in current # SWIG (>1.3.13) because also makes a .py without "_" in its name. ################################################################## PYLIB = /usr/bin PYINC = /usr/include/python2.4 CLIB = ../HelloLib # the library plus its wrapper _hellowrap.dll: hellolib_wrap.o $(CLIB)/hellolib.o gcc -shared hellolib_wrap.o $(CLIB)/hellolib.o \ -L$(PYLIB) -lpython2.4 -o $@ # generated wrapper module code hellolib_wrap.o: hellolib_wrap.c $(CLIB)/hellolib.h gcc hellolib_wrap.c -g -I$(CLIB) -I$(PYINC) -c -o $@ hellolib_wrap.c: hellolib.i swig -python -I$(CLIB) hellolib.i # C library code (in another directory) $(CLIB)/hellolib.o: $(CLIB)/hellolib.c $(CLIB)/hellolib.h gcc $(CLIB)/hellolib.c -g -I$(CLIB) -c -o $(CLIB)/hellolib.o clean: rm -f *.dll *.o *.pyc core force: rm -f *.dll *.o *.pyc core hellolib_wrap.c hellowrap.py

When run on the hellolib.i input file by this makefile, SWIG generates two files:

hellolib_wrap.c The generated C extension module glue code file.[*] [*] You

can wade through this generated file in the book's examples distribution if you are so inclined, though they are highly prone to change over time (in fact, the .py module generated by SWIG for this example is also new since the second edition of this book). Also see the file PP3E\Integrate\Extend\HelloLib\hellolib_wrapper.c in the book's examples distribution for a handcoded equivalent; it's shorter because SWIG also generates extra support code.

hellowrap.py A Python module that imports the generated C extension module. The former is named for the input file, and the later per the %module directive. Really, SWIG generates two modules today: it uses a combination of Python and C code to achieve the integration. Scripts ultimately import the generated Python module file, which internally imports the generated and compiled C module.

To build the C module, the makefile runs a compile after running SWIG, and then combines the result with the original C library code:

.../PP3E/Integrate/Extend/Swig$ make -f makefile.hellolib-swig force rm -f *.dll *.o *.pyc core hellolib_wrap.c hellowrap.py .../PP3E/Integrate/Extend/Swig$ ls Environ Shadow hellolib.i makefile.hellolib-swig .../PP3E/Integrate/Extend/Swig$ make -f makefile.hellolib-swig swig -python -I../HelloLib hellolib.i gcc hellolib_wrap.c -g -I../HelloLib -I/usr/include/python2.4 -c -o hellolib_wrap.o gcc -shared hellolib_wrap.o ../HelloLib/hellolib.o \ -L/usr/bin -lpython2.4 -o _hellowrap.dll .../PP3E/Integrate/Extend/Swig$ ls Environ _hellowrap.dll hellolib_wrap.c Shadow hellolib.i hellolib_wrap.o

hellowrap.py makefile.hellolib-swig

More specifically, the makefile runs SWIG over the input file, compiles the generated C glue code file into a .o object file, and then links it with hellolib.c's compiled object file to produce _hellowrap.dll. The result is a dynamically loaded C extension module file ready to be imported by Python code. Like all modules, _hellowrap.dll must, along with hellowrap.py, be placed in a directory on your Python module search path (a period [.] will suffice if you're working in the directory where you compile). Notice that the .dll file must be built with a leading underscore in its name; as of SWIG 1.3.14, this is required because SWIG also created the .py file of the same name without the underscore. As usual in C development, you may have to barter with the makefile to get it to work on your system. Once you've run the makefile, though, you are finished. The generated C module is used exactly like the manually coded version shown before, except that SWIG has taken care of the complicated parts automatically:

.../PP3E/Integrate/Extend/Swig$ python >>> import hellowrap >>> hellowrap.message('swig world') 'Hello, swig world'

# import glue + library file # cwd always searched on imports

>>> hellowrap._ _file_ _ 'hellowrap.py' >>> dir(hellowrap) ['_ _builtins_ _', '_ _doc_ _', '_ _file_ _', '_ _name_ _', '_hellowrap', ...]

In other words, once you learn how to use SWIG, you can largely forget all the integration coding details introduced in this chapter. In fact, SWIG is so adept at generating Python glue code that it's usually much easier and less error prone to code C extensions for Python as purely C- or C++-based libraries first, and later add them to Python by running their header files through SWIG, as demonstrated here.

22.6.2. SWIG Details Of course, you must have SWIG before you can run SWIG; it's not part of Python itself. Unless it is already on your system, fetch SWIG off the Web and run its installer or build it from its source code. To do the latter, you'll need a C++ compiler; see SWIG's README file and web site for more details. SWIG is a command-line program and generally can be run just by saying the following:

swig -python hellolib.i

Along the way in this chapter, we'll meet a few more SWIG-based alternatives to the remaining examples. By way of introduction, here is a quick look at a few more SWIG highlights:

C++ "shadow" classes We'll learn how to use SWIG to integrate C++ classes for use in your Python scripts. When given C++ class declarations, SWIG generates glue code that makes C++ classes look just like Python classes in Python scripts. In fact, C++ classes are Python classes under SWIG; you get what SWIG calls a C++ shadow (or proxy) class that interfaces with a C++-coded extension module, which in turn talks to C++ classes using a function-based interface. Because the integration's outer layer is Python classes, those classes may be subclassed in Python and their instances processed with normal Python object syntax.

Variables Besides functions and C++ classes, SWIG can also wrap C global variables and constants for use in Python: they become attributes of an object named cvar inserted in generated modules (e.g., module.cvar.name fetches the value of C's variable name from a SWIG-generated wrapper module).

structs C structs are converted into a set of get and set accessor functions that are called to fetch and assign fields with a struct object pointer (e.g., module.Vector_fieldx_get(v) fetches C's Vector.fieldx from a Vector pointer v, like C's v->fieldx). Similar accessor functions are generated for data members and methods of C++ classes (the C++ class is roughly a struct with extra syntax), but the SWIG shadow class feature allows you to treat wrapped classes just like Python classes, instead of calling the lower-level accessor functions.

Other For C++, besides wrapping up classes and functions for use from Python, SWIG also generates code to support overloaded operators, routing of virtual method calls from C++ back to Python, templates, and much more. Consult the SWIG Python user manual for the full scoop on its features. SWIG's feature set and implementation are both prone to change over time (e.g., its pointers are no longer strings, and Python new-style classes are employed in dual-mode proxy classes), so we'll defer to its

documentation for more internals information. Later in this chapter, we'll see SWIG in action two more times, wrapping up C environment calls and a C++ class. Although the SWIG examples in this book are simple, you should also know that SWIG handles industrial-strength libraries just as easily. For instance, Python developers have successfully used SWIG to integrate libraries as complex as Windows extensions and commonly used graphics APIs such as OpenGL. SWIG can also generate integration code for other scripting languages such as Tcl and Perl. In fact, one of its underlying goals is to make components independent of scripting language choicesC/C++ libraries can be plugged into whatever scripting language you prefer to use (I prefer to use Python, but I might be biased). SWIG's support for things such as classes seems strongest for Python, though, probably because Python is considered to be strong in the classes department. As a language-neutral integration tool, SWIG addresses some of the same goals as systems such as COM and CORBA (described in Chapter 23), but it provides a code generation-based alternative rather than an object model. You can find SWIG by a web search or by visiting its current home page on the Web at http://www.swig.org. Along with full source code, SWIG comes with outstanding documentation (including documentation specifically for Python). The documentation also describes how to build SWIG extensions with other platforms and compilers, including standard Windows without Cygwin.

22.7. Wrapping C Environment Calls Let's move on to a more useful application of C extension modules. The handcoded C file in Example 22-9 integrates the standard C library's getenv and putenv shell environment variable calls for use in Python scripts.

Example 22-9. PP3E\Integrate\Extend\CEnviron\cenviron.c

/****************************************************************** * A C extension module for Python, called "cenviron". Wraps the * C library's getenv/putenv routines for use in Python programs. ******************************************************************/ #include #include <stdlib.h> #include <string.h> /***********************/ /* 1) module functions */ /***********************/ static PyObject * wrap_getenv(PyObject *self, PyObject *args) { char *varName, *varValue; PyObject *returnObj = NULL;

/* returns object */ /* self not used */ /* args from python */ /* null=exception */

if (PyArg_Parse(args, "s", &varName)) { /* Python -> C */ varValue = getenv(varName); /* call C getenv */ if (varValue != NULL) returnObj = Py_BuildValue("s", varValue); /* C -> Python */ else PyErr_SetString(PyExc_SystemError, "Error calling getenv"); } return returnObj; } static PyObject * wrap_putenv(PyObject *self, PyObject *args) { char *varName, *varValue, *varAssign; PyObject *returnObj = NULL; if (PyArg_Parse(args, "(ss)", &varName, &varValue)) { varAssign = malloc(strlen(varName) + strlen(varValue) + 2); sprintf(varAssign, "%s=%s", varName, varValue);

if (putenv(varAssign) == 0) { Py_INCREF(Py_None); /* C call success */ returnObj = Py_None; /* reference None */ } else PyErr_SetString(PyExc_SystemError, "Error calling putenv"); } return returnObj; } /**************************/ /* 2) registration table */ /**************************/ static struct PyMethodDef cenviron_methods[] = { {"getenv", wrap_getenv}, {"putenv", wrap_putenv}, /* method name, address */ {NULL, NULL} }; /*************************/ /* 3) module initializer */ /*************************/ void initcenviron( ) /* called on first import */ { (void) Py_InitModule("cenviron", cenviron_methods); /* mod name, table */ }

This example is less useful now than it was in the first edition of this bookas we learned in Part II, not only can you fetch shell environment variables by indexing the os.environ table, but assigning to a key in this table automatically calls C's putenv to export the new setting to the C code layer in the process. That is, os.environ['key'] fetches the value of the shell variable 'key', and os.environ['key']=value assigns a variable both in Python and in C. The second actionpushing assignments out to Cwas added to Python releases after the first edition of this book was published. Besides demonstrating additional extension coding techniques, though, this example still serves a practical purpose: even today, changes made to shell variables by the C code linked into a Python process are not picked up when you index os.environ in Python code. That is, once your program starts, os.environ reflects only subsequent changes made by Python code. Moreover, although Python now has both a putenv and a getenv call in its os module, their integration seems incomplete. Changes to os.environ call os.putenv, but direct calls to os.putenv do not update os.environ, so the two can become out of sync. And os.getenv today simply translates to an os.environ fetch, and hence will not pick up environment changes made in the process outside of Python code after startup time. This may rarely, if ever, be an issue for you, but this C extension module is not completely without purpose; to truly interface environment variables with linked-in C code, we need to call the C library routines directly.[*] [*]

This code is also open to customization (e.g., it can limit the set of shell variables read and written by checking names), but you could do the same by wrapping os.environ. In fact, because os.environ is simply a Python UserDict subclass that preloads shell variables on startup, you could almost add the required getenv call to load C layer changes by simply wrapping os.environ accesses in a Python class whose _ _getitem_ _ calls gentenv before passing the access off to os.environ. But you still need C's getenv call in the first place, and it's not directly available in os today.

The cenviron.c C file in Example 22-9 creates a Python module called cenviron that does a bit more than the prior examplesit exports two functions, sets some exception descriptions explicitly, and makes a reference count call for the Python None object (it's not created anew, so we need to add a reference before passing it to Python). As before, to add this code to Python, compile and link into an object file; the Cygwin makefile in Example 22-10 builds the C source code for dynamic binding.

Example 22-10. PP3E\Integrate\Extend\Cenviron\makefile.cenviron

################################################################## # Compile cenviron.c into cenviron.dll--a shareable object file # on Cygwin, which is loaded dynamically when first imported. ################################################################## PYLIB = /usr/bin PYINC = /usr/include/python2.4 cenviron.dll: cenviron.c gcc cenviron.c -g -I$(PYINC) -shared

-L$(PYLIB) -lpython2.4 -o $@

clean: rm -f *.pyc cenviron.dll

To build, type make -f makefile.cenviron at your shell. To run, make sure the .dll file is in a directory on Python's module path (a period [.] works too):

.../PP3E/Integrate/Extend/Cenviron$ python >>> import cenviron >>> cenviron.getenv('USER') # like os.environ[key] but refetched 'mark' >>> cenviron.putenv('USER', 'gilligan') # like os.environ[key]=value >>> cenviron.getenv('USER') # C sees the changes too 'gilligan'

As before, cenviron is a bona fide Python module object after it is imported, with all the usual attached information:

>>> dir(cenviron) ['_ _doc_ _', '_ _file_ _', '_ _name_ _', 'getenv', 'putenv'] >>> cenviron._ _file_ _ './cenviron.dll' >>> cenviron._ _name_ _ 'cenviron' >>> cenviron.getenv >>> cenviron <module 'cenviron' from 'cenviron.dll'> >>> cenviron.getenv('PYTHONPATH')

'/cygdrive/c/Mark/PP3E-cd/Examples'

Here is an example of the problem this module addresses (but you have to pretend that some of these calls are made by linked-in C code, not by Python):

.../PP3E/Integrate/Extend/Cenviron$ python >>> import os >>> os.environ['USER'] 'skipper' >>> from cenviron import getenv, putenv >>> getenv('USER') 'skipper' >>> putenv('USER', 'gilligan') >>> getenv('USER') 'gilligan' >>> os.environ['USER'] 'skipper' >>> os.getenv('USER') 'skipper'

# initialized from the shell # direct C library call access

# changes for C but not Python

# oops--does not fetch values again # ditto

22.7.1. Adding Wrapper Classes to Flat Libraries As is, the C extension module exports a function-based interface, but you can wrap its functions in Python code that makes the interface look any way you like. For instance, Example 22-11 makes the functions accessible by dictionary indexing and integrates with the os.environ objectit guarantees that the object will stay in sync with fetches and changes made by calling our C extension functions.

Example 22-11. PP3E\Integrate\Extend\Cenviron\envmap.py

import os from cenviron import getenv, putenv class EnvMapping: def _ _setitem_ _(self, key, value): os.environ[key] = value putenv(key, value) def _ _getitem_ _(self, key): value = getenv(key) os.environ[key] = value return value Env = EnvMapping( )

# get C module's methods # wrap in a Python class # on writes: Env[key]=value # put in os.environ too

# on reads: Env[key] # integrity check

# make one instance

To use this module, clients may import its Env object using Env['var'] dictionary syntax to refer to

environment variables. And Example 22-12 exports the functions as qualified attribute names rather than as callsvariables are referenced with Env.var attribute syntax. The main point to notice here is that you can graft many different sorts of interface models on top of extension functions by providing Python wrappers on top of the extension's C wrappers.

Example 22-12. PP3E\Integrate\Extend\Cenviron\envattr.py

import os from cenviron import getenv, putenv

# get C module's methods

class EnvWrapper: # wrap in a Python class def _ _setattr_ _(self, name, value): os.environ[name] = value # on writes: Env.name=value putenv(name, value) # put in os.environ too def _ _getattr_ _(self, name): value = getenv(name) os.environ[name] = value return value

# on reads: Env.name # integrity check

Env = EnvWrapper( )

# make one instance

22.7.2. But Don't Do That EitherSWIG You can manually code extension modules like we just did, but you don't necessarily have to. Because this example really just wraps functions that already exist in standard C libraries, the entire cenviron.c C code file in Example 22-9 can be replaced with a simple SWIG input file that looks like Example 22-13.

Example 22-13. PP3E\Integrate\Extend\Swig\Environ\environ.i

/*************************************************************** * Swig module description file, to generate all Python wrapper * code for C lib getenv/putenv calls: "swig -python environ.i". ***************************************************************/ %module environ extern char * getenv(const char *varname); extern int putenv(char *assignment);

And you're done. Well, almost; you still need to run this file through SWIG and compile its output. As before, simply add a SWIG step to your makefile and compile its output file into a shareable object, and you're in business. Example 22-14 is a Cygwin makefile that does the job.

Example 22-14. PP3E\Integrate\Extend\Swig\Environ\makefile.environswig

# build environ extension from SWIG generated code PYLIB = /usr/bin PYINC = /usr/include/python2.4 _environ.dll: environ_wrap.c gcc environ_wrap.c -g -I$(PYINC) -L$(PYLIB) -lpython2.4 -shared -o $@ environ_wrap.c: environ.i swig -python environ.i clean: rm -f *.o *.dll *.pyc core environ_wrap.c environ.py

When run on environ.i, SWIG generates two files and two modulesenviron.py (the Python interface module we import) and environ_wrap.c (the lower-level glue code module file we compile). Because the functions being wrapped here live in standard linked-in C libraries, there is nothing to combine with the generated code; this makefile simply runs SWIG and compiles the wrapper file into a C extension module, ready to be imported:

.../PP3E/Integrate/Extend/Swig/Environ$ make -f makefile.environ-swig swig -python environ.i gcc environ_wrap.c -g -I/usr/include/python2.4 -L/usr/bin -lpython2.4 -shared -o _environ.dll

And now you're really done. The resulting C extension module is linked when imported, and it's used as before (except that SWIG handled all the gory bits):

.../PP3E/Integrate/Extend/Swig/Environ$ ls _environ.dll environ.py makefile.environ-swig environ.i environ_wrap.c .../PP3E/Integrate/Extend/Swig/Environ$ python >>> import environ >>> environ.getenv('USER') 'Mark Lutz' >>> temp = 'USER=gilligan' # use C lib call pattern now >>> environ.putenv(temp) # temp required in Cygwin 0 >>> environ.getenv('USER') 'gilligan' >>> environ._ _name_ _, environ._ _file_ _, environ ('environ', 'environ.py', <module 'environ' from 'environ.py'>)

>>> dir(environ) [ ... '_environ', 'getenv', 'putenv' ... ]

22.8. A C Extension Module String Stack Let's kick it up another notchthe following C extension module implements a stack of strings for use in Python scripts. Example 22-15 demonstrates additional API calls, but it also serves as a basis of comparison. It is roughly equivalent to the Python stack module we coded earlier in Chapter 20, but it stacks only strings (not arbitrary objects), has limited string storage and stack lengths, and is written in C. Alas, the last point makes for a complicated program listingC code is never quite as nice to look at as equivalent Python code. C must declare variables, manage memory, implement data structures, and include lots of extra syntax. Unless you're a big fan of C, you should focus on the Python interface code in this file, not on the internals of its functions.

Example 22-15. PP3E\Integrate\Extend\Stacks\stackmod.c

/***************************************************** * stackmod.c: a shared stack of character-strings; * a C extension module for use in Python programs; * linked into Python libraries or loaded on import; *****************************************************/ #include "Python.h" /* Python header files */ #include <stdio.h> /* C header files */ #include <string.h> static PyObject *ErrorObject;

/* locally raised exception */

#define onError(message) \ { PyErr_SetString(ErrorObject, message); return NULL; } /****************************************************************************** * LOCAL LOGIC/DATA (THE STACK) ******************************************************************************/ #define MAXCHARS 2048 #define MAXSTACK MAXCHARS static static static static

int int char char

top = 0; len = 0; *stack[MAXSTACK]; strings[MAXCHARS];

/* /* /* /*

index into 'stack' */ size of 'strings' */ pointers into 'strings' */ string-storage area */

/****************************************************************************** * EXPORTED MODULE METHODS/FUNCTIONS ******************************************************************************/ static PyObject * stack_push(PyObject *self, PyObject *args)

/* args: (string) */

{ char *pstr; if (!PyArg_ParseTuple(args, "s", &pstr)) return NULL; if (top == MAXSTACK) onError("stack overflow") if (len + strlen(pstr) + 1 >= MAXCHARS) onError("string-space overflow") else { strcpy(strings + len, pstr); stack[top++] = &(strings[len]); len += (strlen(pstr) + 1); Py_INCREF(Py_None); return Py_None; }

/* /* /* /*

convert args: Python->C */ NULL triggers exception */ Python sets arg-error msg */ iff maxstack < maxchars */

/* /* /* /* /*

store in string-space */ push start address */ new string-space size */ a 'procedure' call */ None: no errors */

} static PyObject * stack_pop(PyObject *self, PyObject *args) { PyObject *pstr; if (!PyArg_ParseTuple(args, "")) return NULL; if (top == 0) onError("stack underflow") else { pstr = Py_BuildValue("s", stack[--top]); len -= (strlen(stack[top]) + 1); return pstr; } } static PyObject * stack_top(PyObject *self, PyObject *args) { PyObject *result = stack_pop(self, args); if (result != NULL) len += (strlen(stack[top++]) + 1); return result; } static PyObject * stack_empty(PyObject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; return Py_BuildValue("i", top == 0); } static PyObject * stack_member(PyObject *self, PyObject *args) { int i; char *pstr; if (!PyArg_ParseTuple(args, "s", &pstr)) return NULL;

/* no arguments for pop */ /* verify no args passed */

/* return NULL = raise */ /* convert result: C->Py */ /* return new Python string */ /* pstr ref-count++ already */

/* almost same as item(-1) */ /* but different errors */ /* get top string */ /* undo pop */ /* NULL or string object */

/* no args: '( )' */ /* or PyArg_NoArgs */ /* Boolean: a Python int */

for (i = 0; i < top; i++) if (strcmp(pstr, stack[i]) == 0) return PyInt_FromLong(1); return PyInt_FromLong(0);

/* find arg in stack */ /* send back a Python int */ /* same as Py_BuildValue("i" */

} static PyObject * stack_item(PyObject *self, PyObject *args) /* { /* int index; if (!PyArg_ParseTuple(args, "i", &index)) return NULL; if (index < 0) index = top + index; if (index < 0 || index >= top) onError("index out-of-bounds") else return Py_BuildValue("s", stack[index]); } static PyObject * stack_len(PyObject *self, PyObject *args) { if (!PyArg_ParseTuple(args, "")) return NULL; return PyInt_FromLong(top); } static PyObject * stack_dump(PyObject *self, PyObject *args) { int i; if (!PyArg_ParseTuple(args, "")) return NULL; printf("[Stack:\n"); for (i=top-1; i >= 0; i--) printf("%d: '%s'\n", i, stack[i]); printf("]\n"); Py_INCREF(Py_None); return Py_None; }

return Python string or NULL */ inputs = (index): Python int */ /* convert args to C */ /* bad type or arg count? */ /* negative: offset from end */ /* return NULL = 'raise' */ /* convert result to Python */ /* no need to INCREF new obj */

/* return a Python int or NULL */ /* no inputs */

/* wrap in Python object */

/* not "print": reserved word */

/* formatted output */

/****************************************************************************** * METHOD REGISTRATION TABLE: NAME-STRING -> FUNCTION-POINTER ******************************************************************************/ static struct PyMethodDef stack_methods[] = { {"push", stack_push, 1}, {"pop", stack_pop, 1}, {"top", stack_top, 1}, {"empty", stack_empty, 1}, {"member", stack_member, 1}, {"item", stack_item, 1}, {"len", stack_len, 1}, {"dump", stack_dump, 1}, {NULL, NULL}

/* name, address */ /* '1'=always tuple args */

/* end, for initmodule */

}; /****************************************************************************** * INITIALIZATION FUNCTION (IMPORT-TIME) ******************************************************************************/ void initstackmod( ) { PyObject *m, *d; /* create the module and add the functions */ m = Py_InitModule("stackmod", stack_methods);

/* registration hook */

/* add symbolic constants to the module */ d = PyModule_GetDict(m); ErrorObject = Py_BuildValue("s", "stackmod.error"); PyDict_SetItemString(d, "error", ErrorObject);

/* export exception */ /* add more if need */

/* check for errors */ if (PyErr_Occurred( )) Py_FatalError("can't initialize module stackmod"); }

This C extension file is compiled and statically or dynamically linked with the interpreter, just like in previous examples. The file makefile.stack in this book's examples distribution handles the build with a rule like this:

stackmod.dll: stackmod.c gcc stackmod.c -g -I$(PYINC) -shared -L$(PYLIB) -lpython2.4 -o $@

The whole point of implementing such a stack in a C extension module (apart from demonstrating API calls in a Python book) is optimization: in theory, this code should present a similar interface to the Python stack module we wrote earlier, but it should run considerably faster due to its C coding. The interface is roughly the same, though we've sacrificed some Python flexibility by moving to Cthere are limits on size and stackable object types:

.../PP3E/Integrate/Extend/Stacks$ python >>> import stackmod >>> stackmod.push('new') >>> stackmod.dump( ) [Stack: 0: 'new' ] >>> for c in "SPAM": stackmod.push(c) ... >>> stackmod.dump( ) [Stack: 4: 'M' 3: 'A'

# load C module # call C functions # dump format differs

2: 'P' 1: 'S' 0: 'new' ] >>> stackmod.len(), stackmod.top( ) (5, 'M') >>> x = stackmod.pop( ) >>> x 'M' >>> stackmod.dump( ) [Stack: 3: 'A' 2: 'P' 1: 'S' 0: 'new' ] >>> stackmod.push(99) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: argument 1 must be string, not int

Some of the C stack's type and size limitations could be removed by alternate C coding (which might eventually create something that looks and performs almost exactly like a Python built-in list). Before we check on this stack's speed, though, we'll see what can be done about also optimizing our stack classes with a C type.

22.8.1. But Don't Do That EitherSWIG You can manually code extension modules like this, but you don't necessarily have to. As we saw earlier, if you instead code the stack module's functions without any notion of Python integration, they can be integrated into Python automatically by running their type signatures through SWIG. I haven't coded these functions that way here, because I also need to teach the underlying Python C extension API. But if I were asked to write a C string stack for Python in any other context, I'd probably do it with SWIG instead.

22.9. A C Extension Type String Stack So far in this chapter, we've been dealing with C extension modulesflat function libraries. To implement multiple-instance objects in C, you need to code a C extension type, not a module. Like Python classes, C types generate multiple-instance objects and can overload (i.e., intercept and implement) Python expression operators and type operations. In recent Python releases, C types can also support subclassing just like Python classes. One of the biggest drawbacks of types, though, is their sizeto implement a realistically equipped C type, you need to code lots of not-very-pretty C code and fill out type descriptor tables with pointers to link up operation handlers. In fact, C extension types are so complex that I'm going to cut some details here. To give you a feel for the overall structure, Example 22-16 presents a C string stack type implementation, but with the bodies of all its functions stripped out. For the complete implementation, see this file in the book's examples distribution. This C type roughly implements the same interface as the stack classes we met in Chapter 20, but it imposes a few limits on the stack itself. The stripped parts use the same algorithms as the C module in Example 22-15, but they operate on the passed-in self object, which now refers to the particular type instance object being processed, just as the first argument does in class methods. In types, self is a pointer to an allocated C struct that represents a type instance object. Please note that the C API is prone to frequent changes, especially for C extension types. Although the code of this book's stack type example has been updated and retested for each edition, it may in fact not completely reflect current practice by the time you read these words. Even as is, although it works as shown, this example does not support new, advanced C type concepts such as support for subclassing. Because this is such a volatile topic, the example was almost cut from this edition completely, but was retained in abbreviated form just to give you a sampling of the general flavor of C types. To code types of your own, you will want to explore additional resources. For more up-to-date details on C types, consult Python's now thorough Extending and Embedding manual. And for more complete examples, see the Objects directory in the Python source distribution treeall of Python's own datatypes are merely precoded C extension types that utilize the same interfaces and demonstrate best practice usage better than the static nature of books allows. Of special interest, see Python 2.4's Objects/xxmodule.c for example C type code. Type descriptor layouts, described shortly, are perhaps the most prone to change over time; consult the file Include/object.h in the Python distribution for an up-to-date list of fields. Some new Python releases may also require that C types written to work with earlier releases be recompiled to pick up descriptor changes. Finally, if it seems like C types are complex, transitory, and error prone, it's because they are. Because many developers will find higher-level tools such as SWIG to be more reasonable alternatives to handcoded C types anyhow, this section is not designed to be complete.

Having said all that, the C extension type in Example 22-16 does work, and it demonstrates the basics of the model. Let's take a quick look.

Example 22-16. PP3E\Integrate\Extend\Stacks\stacktyp.c

/**************************************************** * stacktyp.c: a character-string stack datatype; * a C extension type, for use in Python programs; * stacktype module clients can make multiple stacks; * similar to stackmod, but 'self' is the instance, * and we can overload sequence operators here; ****************************************************/ #include "Python.h" static PyObject *ErrorObject; /* local exception */ #define onError(message) \ { PyErr_SetString(ErrorObject, message); return NULL; } /***************************************************************************** * STACK-TYPE INFORMATION *****************************************************************************/ #define MAXCHARS 2048 #define MAXSTACK MAXCHARS typedef struct { PyObject_HEAD int top, len; char *stack[MAXSTACK]; char strings[MAXCHARS]; } stackobject;

/* stack instance object format */ /* Python header: ref-count + &typeobject */ /* per-instance state info */ /* same as stackmod, but multiple copies */

/***************************************************************************** * INSTANCE METHODS *****************************************************************************/ static PyObject * stack_push(self, args) stackobject *self; PyObject *args; { ... } static PyObject * stack_pop(self, args) stackobject *self; PyObject *args; { ... } static PyObject * stack_top(self, args) stackobject *self;

/* on "instance.push(arg)" */ /* 'self' is the stack instance object */ /* 'args' are args passed to self.push method */

/* on "instance.pop( )" */

PyObject *args; { ... } static PyObject * stack_empty(self, args) stackobject *self; PyObject *args; { ... } static struct PyMethodDef stack_methods[] = { {"push", stack_push, 1}, {"pop", stack_pop, 1}, {"top", stack_top, 1}, {"empty", stack_empty, 1}, {NULL, NULL} };

/* instance methods */ /* name/address table */ /* like list append,sort */ /* extra ops besides optrs */ /* end, for getattr here */

/***************************************************************************** * BASIC TYPE-OPERATIONS *****************************************************************************/ static stackobject * newstackobject( ) { ... } static void stack_dealloc(self) stackobject *self; { ... } static int stack_print(self, fp, flags) stackobject *self; FILE *fp; int flags; { ... } static PyObject * stack_getattr(self, name) stackobject *self; char *name; { ... } static int stack_compare(v, w) stackobject *v, *w; { ... }

/* on "x = stacktype.Stack( )" */ /* instance constructor function */ /* these don't get an 'args' input */ /* instance destructor function */ /* when reference-count reaches zero */ /* do cleanup activity */

/* print self to file */

/* on "instance.attr" reference */ /* make a bound-method or member */

/* on all comparisons */

/***************************************************************************** * SEQUENCE TYPE-OPERATIONS *****************************************************************************/ static int stack_length(self) stackobject *self;

/* called on "len(instance)" */

{ ... } static PyObject * stack_concat(self, other) stackobject *self; PyObject *other; { ... } static PyObject * stack_repeat(self, n) stackobject *self; int n; { ... } static PyObject * stack_item(self, index) stackobject *self; int index; { ... } static PyObject * stack_slice(self, ilow, ihigh) stackobject *self; int ilow, ihigh; { ... }

/* on "instance + other" */ /* 'self' is the instance */

/* on "instance * N" */ /* new stack = repeat self n times */

/* on "instance[offset]", "in/for" */ /* return the i-th item of self */ /* negative index pre-adjusted */

/* on "instance[ilow:ihigh]" */ /* negative-adjusted, not scaled */

/***************************************************************************** * TYPE DESCRIPTORS *****************************************************************************/ static PySequenceMethods stack_as_sequence = { /* sequence supplement (inquiry) stack_length, /* sq_length "len(x)" (binaryfunc) stack_concat, /* sq_concat "x + y" (intargfunc) stack_repeat, /* sq_repeat "x * n" (intargfunc) stack_item, /* sq_item "x[i], in" (intintargfunc) stack_slice, /* sq_slice "x[i:j]" (intobjargproc) 0, /* sq_ass_item "x[i] = v" (intintobjargproc) 0, /* sq_ass_slice "x[i:j]=v" };

*/ */ */ */ */ */ */ */

/* The ob_type field must be initialized in the module init function to be portable to Windows without using C++. */ static PyTypeObject Stacktype = { /* type header */ PyObject_HEAD_INIT(NULL) 0, "stack", sizeof(stackobject), 0, /* standard methods */ (destructor) stack_dealloc, (printfunc) stack_print,

/* /* /* /* /* /* /*

main Python type-descriptor */ shared by all instances */ was PyObject_HEAD_INIT(&PyType_Type)*/ ob_size */ tp_name */ tp_basicsize */ tp_itemsize */

/* tp_dealloc /* tp_print

ref-count==0 "print x"

*/ */

(getattrfunc) (setattrfunc) (cmpfunc) (reprfunc)

stack_getattr, 0, stack_compare, 0,

/* /* /* /*

tp_getattr tp_setattr tp_compare tp_repr

"x.attr" */ "x.attr=v" */ "x > y" */ 'x',repr,print */

/* type categories */ 0, &stack_as_sequence, 0,

/* tp_as_number +,-,*,/,%,&,>>,...*/ /* tp_as_sequence +,[i],[i:j],len, ...*/ /* tp_as_mapping [key], len, ...*/

/* more methods */ (hashfunc) 0, (ternaryfunc) 0, (reprfunc) 0,

/* tp_hash /* tp_call /* tp_str

};

"dict[x]" */ "x( )" */ "str(x)" */

/* plus others: see Python's Include/object.h, Modules/xxmodule.c */

/***************************************************************************** * MODULE LOGIC *****************************************************************************/ static PyObject * stacktype_new(self, args) /* on "x = stacktype.Stack( )" */ PyObject *self; /* self not used */ PyObject *args; /* constructor args */ { if (!PyArg_ParseTuple(args, "")) /* Module-method function */ return NULL; return (PyObject *)newstackobject( ); /* make a new type-instance object */ } /* the hook from module to type... */ static struct PyMethodDef stacktype_methods[] = { {"Stack", stacktype_new, 1}, /* one function: make a stack */ {NULL, NULL} /* end marker, for initmodule */ }; void initstacktype( ) { PyObject *m, *d;

/* on first "import stacktype" */

/* finalize type object, setting type of new type object here for portability to Windows without requiring C++ */ if (PyType_Ready(&Stacktype) < 0) return; m = Py_InitModule("stacktype", stacktype_methods); /* make the module, */ d = PyModule_GetDict(m); /* with 'Stack' func */ ErrorObject = Py_BuildValue("s", "stacktype.error"); PyDict_SetItemString(d, "error", ErrorObject); /* export exception */ if (PyErr_Occurred( )) Py_FatalError("can't initialize module stacktype"); }

22.9.1. Anatomy of a C Extension Type Although most of the file stacktyp.c is missing, there is enough here to illustrate the global structure common to C type implementations:

Instance struct The file starts off by defining a C struct called stackobject that will be used to hold per-instance state informationeach generated instance object gets a newly malloc'd copy of the struct. It serves the same function as class instance attribute dictionaries, and it contains data that was saved in global variables by the C stack module of the preceding section (Example 22-15).

Instance methods As in the module, a set of instance methods follows next; they implement method calls such as push and pop . But here, method functions process the implied instance object, passed in to the self argument. This is similar in spirit to class methods. Type instance methods are looked up in the registration table of the code listing (Example 22-16) when accessed.

Basic type operations Next, the file defines functions to handle basic operations common to all types: creation, printing, qualification, and so on. These functions have more specific type signatures than instance method handlers. The object creation handler allocates a new stack struct and initializes its header fields; the reference count is set to 1, and its type object pointer is set to the Stacktype type descriptor that appears later in the file.

Sequence operations Functions for handling sequence type operations come next. Stacks respond to most sequence operators: len , +, *, and [i] . Much like the _ _getitem_ _ class method, the stack_item indexing handler performs indexing, but also in membership tests and for iterator loops. These latter two work by indexing an object until an IndexError exception is caught by Python.

Type descriptors The type descriptor tables (really, structs) that appear near the end of the file are the crux of the matter for typesPython uses these tables to dispatch an operation performed on an instance object to the corresponding C handler function in this file. In fact, everything is routed through these tables; even method attribute lookups start by running a C stack_getattr function listed in the table (which in turn looks up the attribute name in a name/functionpointer table). The main Stacktype table includes a link to the supplemental stack_as_sequence table in which sequence operation handlers are registered; types can provide such tables to register handlers for mapping, number, and sequence operation sets. See Python's integer and dictionary objects ' source code for number and mapping examples; they are analogous to the sequence type here, but their operation tables vary. Descriptor layouts, like most C API tools, are prone to change over time, and you should always consult Include/object.h in the Python distribution for an up-to-date list of fields.

Constructor module Besides defining a C type, this file also creates a simple C module at the end that exports a stacktype.Stack constructor function, which Python scripts call to generate new stack instance objects. The initialization function for this module is the only C name in this file that is not static (local to the file); everything else is reached by following pointersfrom instance to type descriptor to C handler function. Again, see this book's examples distribution for the full C stack type implementation. But to give you the general flavor of C type methods, here is what the C type's pop function looks like; compare this with the C module's pop function to see how the self argument is used to access per-instance information in types:

static PyObject * stack_pop(self, args) stackobject *self; PyObject *args; /* on "instance.pop( )" */ { PyObject *pstr; if (!PyArg_ParseTuple(args, "")) /* verify no args passed */ return NULL; if (self->top == 0) onError("stack underflow") /* return NULL = raise */ else { pstr = Py_BuildValue("s", self->stack[--self->top]); self->len -= (strlen(self->stack[self->top]) + 1); return pstr; } }

22.9.2. Compiling and Running This C extension file is compiled and dynamically or statically linked like previous examples; the file makefile.stack in the book's examples distribution handles the build like this:

PYLIB = /usr/bin PYINC = /usr/include/python2.4 stacktype.dll: stacktyp.c gcc stacktyp.c -g -I$(PYINC) -shared -L$(PYLIB) -lpython2.4 -o $@

Once compiled, you can import the C module and make and use instances of the C type that it defines much as if it were a Python class. You would normally do this from a Python script, but the interactive prompt is a convenient place to test the basics:

.../PP3E/Integrate/Extend/Stacks$ python

>>> import stacktype >>> x = stacktype.Stack( ) >>> x.push('new') >>> x [Stack: 0: 'new' ]

# import C constructor module # make C type instance object # call C type methods # call C type print handler

>>> x[0] 'new' >>> y = stacktype.Stack( ) >>> for c in 'SPAM': y.push(c) ... >>> y [Stack: 3: 'M' 2: 'A' 1: 'P' 0: 'S' ]

# call C type index handler

>>> z = x + y >>> z [Stack: 4: 'M' 3: 'A' 2: 'P' 1: 'S' 0: 'new' ]

# call C type concat handler

>>> y.pop( ) 'M' >>> len(z), z[0], z[-1] (5, 'new', 'M')

# make another type instance # a distinct stack object

# for loops work too (indexing)

>>> dir( stacktype) ['Stack', '_ _doc_ _', '_ _file_ _', '_ _name_ _', 'error'] >>> stacktype._ _file_ _ 'stacktype.dll'

22.9.3. Timing the C Implementations So how did we do on the optimization front this time? Let's resurrect that timer module we wrote back in Example 20-6 to compare the C stack module and type of this chapter to the Python stack module and classes we coded in Chapter 20. Example 22-17 calculates the system time in seconds that it takes to run tests on all of this book's stack implementations.

Example 22-17. PP3E\Integrate\Extend\Stacks\exttime.py

#!/usr/local/bin/python # time the C stack module and type extensions # versus the object chapter's Python stack implementations from PP3E.Dstruct.Basic.timer from PP3E.Dstruct.Basic import from PP3E.Dstruct.Basic import from PP3E.Dstruct.Basic import from PP3E.Dstruct.Basic import import stackmod, stacktype

import test stack1 stack2 stack3 stack4

# # # # # #

second count function Python stack module Python stack class: +/slice Python stack class: tuples Python stack class: append/pop C extension type, module

from sys import argv rept, pushes, pops, items = 200, 200, 200, 200 # default: 200 * (600 ops) try: [rept, pushes, pops, items] = map(int, argv[1:]) except: pass print 'reps=%d * [push=%d+pop=%d+fetch=%d]' % (rept, pushes, pops, items) def moduleops(mod): for i in range(pushes): mod.push('hello') for i in range(items): t = mod.item(i) for i in range(pops): mod.pop( ) def objectops(Maker): x = Maker( ) for i in range(pushes): x.push('hello') for i in range(items): t = x[i] for i in range(pops): x.pop( )

# strings only for C

# type has no init args # type or class instance # strings only for C

# test modules: python/c print "Python module:", test(rept, moduleops, stack1) print "C ext module: ", test(rept, moduleops, stackmod), '\n' # test objects: class/type print "Python simple Stack:", print "Python tuple Stack:", print "Python append Stack:", print "C ext type Stack: ",

test(rept, test(rept, test(rept, test(rept,

objectops, objectops, objectops, objectops,

stack2.Stack) stack3.Stack) stack4.Stack) stacktype.Stack)

Running this script under Cygwin on Windows produces the following results (as usual, these are prone to change over time; these tests were run under Python 2.4 on a 1.2 GHz machine). As we saw before, the Python tuple stack is slightly better than the Python in-place append stack in typical use (when the stack is only pushed and popped), but it is slower when indexed. The first test here runs 200 repetitions of 200 stack pushes and pops, or 80,000 stack operations (200 x 400); times listed are test duration seconds:

.../PP3E/Integrate/Extend/Stacks$ python exttime.py 200 200 200 0 reps=200 * [push=200+pop=200+fetch=0] Python module: 0.35 C ext module: 0.07

Python simple Stack: Python tuple Stack: Python append Stack: C ext type Stack:

0.381 0.11 0.13 0.07

.../PP3E/Integrate/Extend/Stacks$ python exttime.py 100 300 300 0 reps=100 * [push=300+pop=300+fetch=0] Python module: 0.33 C ext module: 0.06 Python simple Stack: Python tuple Stack: Python append Stack: C ext type Stack:

0.321 0.08 0.09 0.06

At least when there are no indexing operations on the stack, as in these two tests (just pushes and pops), the C type is only slightly faster than the best Python stack (tuples). In fact, the difference seems trivial; it's not exactly the kind of performance issue that would generate a bug report. The C module comes in at roughly five times faster than the Python module, but these results are flawed. The stack1 Python module tested here uses the same slow stack implementation as the Python "simple" stack (stack2). If it was recoded to use the tuple stack representation used in Chapter 20, its speed would be similar to the "tuple" figures listed here and almost identical to the speed of the C module in the first two tests:

.../PP3E/Integrate/Extend/Stacks$ python exttime.py 200 200 200 50 reps=200 * [push=200+pop=200+fetch=50] Python module: 0.36 C ext module: 0.08 Python simple Stack: Python tuple Stack: Python append Stack: C ext type Stack:

0.401 0.24 0.15 0.08

.../PP3E/Integrate/Extend/Stacks$ python exttime.py reps=200 * [push=200+pop=200+fetch=200] Python module: 0.44 C ext module: 0.12 Python simple Stack: Python tuple Stack: Python append Stack: C ext type Stack:

0.431 1.983 0.19 0.1

But under the different usage patterns simulated in these two tests, the C type wins the race. It is about twice as fast as the best Python stack ( append) when indexing is added to the test mix, as illustrated by the two preceding test runs that ran with a nonzero fetch count. Similarly, the C module would be twice as fast as the best Python module coding in this case as well. In other words, the fastest Python stacks are essentially as good as the C stacks if you stick to

pushes and pops, but the C stacks are roughly twice as fast if any indexing is performed. Moreover, since you have to pick one representation, if indexing is possible at all you would likely pick the Python append stack; assuming they represent the best case, C stacks would always be twice as fast. Of course, the measured time differences are so small that in many applications you won't care. Even at one million iterations, the best Python stack is still less than half a second slower than the C stack type:

.../PP3E/Integrate/Extend/Stacks$$ python exttime.py 2000 250 250 0 reps=2000 * [push=250+pop=250+fetch=0] Python module: 4.686 C ext module: 0.952 Python simple Stack: Python tuple Stack: Python append Stack: C ext type Stack:

4.987 1.352 1.572 0.941

Further, in many ways, this is not quite an apples-to-apples comparison. The C stacks are much more difficult to program, and they achieve their speed by imposing substantial functional limits (as coded, the C module and type overflow at 342 pushes: 342 * 6 > 2048). But as a rule of thumb, C extensions can not only integrate existing components for use in Python scripts, they can also optimize time-critical components of pure Python programs. In other scenarios, migration to C might yield an even larger speedup. On the other hand, C extensions should generally be used only as a last resort. As we learned earlier, algorithms and data structures are often bigger influences on program performance than implementation language. The fact that Python-coded tuple stacks are very nearly as fast as the C stacks under common usage patterns speaks volumes about the importance of data structure representation. Installing the Psyco just-in-time compiler for Python code might erase the remaining difference completely, but we'll leave this as a suggested exercise.

22.9.4. Older Timing Results Interestingly, Python grew much faster between this book's first and second editions, relative to C. In the first edition, the C type was still almost three times faster than the best Python stack (tuples), even when no indexing was performed. Today, as in the second edition, it's almost a draw. One might infer from this that C migrations have become one-third as important as they once were. For comparison, here were the results of this script in the second edition of this book, run on a 650 MHz machine under Python 1.5.2 and Linux. The results were relatively similar, though typically six or more times slowerowing likely to both Python and machine speedups:

.../PP3E/Integrate/Extend/Stacks$ python exttime.py 200 200 200 0 reps=200 * [push=200+pop=200+fetch=0] Python module: 2.09 C ext module: 0.68 Python simple Stack: 2.15 Python tuple Stack: 0.68 Python append Stack: 1.16

C ext type Stack:

0.5

.../PP3E/Integrate/Extend/Stacks$ python exttime.py 100 300 300 0 reps=100 * [push=300+pop=300+fetch=0] Python module: 1.86 C ext module: 0.52 Python simple Stack: 1.91 Python tuple Stack: 0.51 Python append Stack: 0.87 C ext type Stack: 0.38 .../PP3E/Integrate/Extend/Stacks$ python exttime.py 200 200 200 50 reps=200 * [push=200+pop=200+fetch=50] Python module: 2.17 C ext module: 0.79 Python simple Stack: Python tuple Stack: Python append Stack: C ext type Stack:

2.24 1.94 1.25 0.52

.../PP3E/Integrate/Extend/Stacks$ python exttime.py reps=200 * [push=200+pop=200+fetch=200] Python module: 2.42 C ext module: 1.1 Python simple Stack: Python tuple Stack: Python append Stack: C ext type Stack:

2.54 19.09 1.54 0.63

22.9.5. But Don't Do That EitherSWIG You can code C types manually like thisand in some applications, this approach may make sense. But you don't necessarily have tobecause SWIG knows how to generate glue code for C++ classes, you can instead automatically generate all the C extension and wrapper class code required to integrate such a stack object, simply by running SWIG over an appropriate class declaration. The wrapped C++ class provides a multiple-instance datatype much like the C extension type presented in this section, but SWIG handles language integration details. The next section shows how.

22.10. Wrapping C++ Classes with SWIG One of the more clever tricks SWIG can perform is class wrapper generation. Given a C++ class declaration and special command-line settings, SWIG generates the following: A C++-coded Python extension module with accessor functions that interface with the C++ class's methods and members A Python-coded module with a wrapper class (called a "shadow" or "proxy" class in SWIGspeak) that interfaces with the C++ class accessor functions module As before, to use SWIG in this domain, write and debug your class as though it would be used only from C++. Then, simply run SWIG in your makefile to scan the C++ class declaration and compile and link its output. The end result is that by importing the shadow class in your Python scripts, you can utilize C++ classes as though they were really coded in Python. Not only can Python programs make and use instances of the C++ class, they can also customize it by subclassing the generated shadow class.

22.10.1. A Simple C++ Extension Class To see how this works, we need a C++ class. To illustrate, let's code a simple one to be used in Python scripts.[*] The following C++ files define a Number class with three methods (add , sub , display), a data member (data), and a constructor and destructor. Example 22-18 shows the header file. [*]

For a more direct comparison, you could translate the stack type in Example 22-15 to a C++ class too, but that yields much more C++ code than I care to show in this Python book.

Example 22-18. PP3E\Integrate\Extend\Swig\Shadow\number.h

class Number { public: Number(int start); ~Number( ); void add(int value); void sub(int value); int square( ); void display( ); int data; };

// constructor // destructor // update data member // return a value // print data member

Example 22-19 is the C++ class's implementation file; each method prints a message when called to trace class operations.

Example 22-19. PP3E\Integrate\Extend\Swig\Shadow\number.cxx

/////////////////////////////////////////////////////////////// // implement a C++ class, to be used from Python code or not; // caveat: cout and print usually both work, but I ran into // an issue on Cygwin that prompted printf due to lack of time /////////////////////////////////////////////////////////////// #include "number.h" #include "stdio.h" Number::Number(int start) { data = start; printf("Number: %d\n", data); }

// versus #include "iostream.h"

// python print goes to stdout // cout ~Number: 99

22.10.3.2. Subclassing the C++ class in Python Using the extension module directly works, but there is no obvious advantage to moving from the shadow class to functions here. By using the shadow class, you get both an object-based interface to C++ and a customizable Python object. For instance, the Python module shown in Example 22-25 extends the C++ class, adding an extra print statement to the C++ add method and defining a brand-new mul method. Because the shadow class is pure Python, this works naturally.

Example 22-25. PP3E\Integrate\Extend\Swig\Shadow\main_subclass.py

from number import Number

# sublass C++ class in Python (shadow class)

class MyNumber(Number): def add(self, other): print 'in Python add...' Number.add(self, other) def mul(self, other): print 'in Python mul...' self.data = self.data * other num = MyNumber(1) num.add(4) num.display() num.sub(2) num.display( ) print num.square( )

# same test as main.cxx # using Python subclass of shadow class # add( ) is specialized in Python

num.data = 99 print num.data num.display( ) num.mul(2) num.display( ) print num del num

# mul( ) is implemented in Python # repr from shadow superclass

Now we get extra messages out of add calls, and mul changes the C++ class's data member automatically when it assigns self.dataPython code extends C++ code:

.../PP3E/Integrate/Extend/Swig/Shadow$ python main_subclass.py Number: 1 in Python add... add 4 Number = 5 sub 2 Number = 3 Square = 9 99 Number = 99 in Python mul... Number = 198 ~Number: 198

In other words, SWIG makes it easy to use C++ class libraries as base classes in your Python scripts. Among other things, this allows us to leverage existing C++ class libraries in Python scripts and optimize by coding parts of class hierarchies in C++ when needed.

22.10.3.3. Exploring the wrappers interactively As usual, you can import the C++ class interactively to experiment with it some more:

.../PP3E/Integrate/Extend/Swig/Shadow$ python >>> import _number >>> _number._ _file_ _ # the C++ class plus generated glue module '_number.dll' >>> import number # the generated Python shadow class module >>> number._ _file_ _ 'number.pyc' >>> x = number.Number(2) # make a C++ class instance in Python Number: 2 >>> y = number.Number(4) # make another C++ object Number: 4 >>> x, y (, ) >>> x.display( ) Number = 2 >>> x.add(y.data) add 4 >>> x.display( ) Number = 6

# call C++ method (like C++ x->display( )) # fetch C++ data member, call C++ method

>>> y.data = x.data + y.data + 32 >>> y.display( ) Number = 42

# set C++ data member # y records the C++ this pointer

>>> y.square( ) Square = 1764 >>> t = y.square( ) Square = >>> >>> t, type(t) (1764, )

# method with return value

Naturally, this example uses a small C++ class to underscore the basics, but even at this level, the seamlessness of the Python-to-C++ integration we get from SWIG is astonishing. Python code uses C++ members and methods as though they are Python code. Moreover, this integration transparency still applies once we step up to more realistic C++ class libraries. So what's the catch? Nothing much, really, but if you start using SWIG in earnest, the biggest downside may be that SWIG cannot handle every feature of C++ today. If your classes use some esoteric C++ tools (and there are many), you may need to handcode simplified class type declarations for SWIG instead of running SWIG over the original class header files. SWIG development is ongoing, so you should consult the SWIG manuals and web site for more details on these and other topics. In return for any such trade-offs, though, SWIG can completely obviate the need to code glue layers to access C and C++ libraries from Python scripts. If you have ever coded such layers by hand in the past, you already know that this is a very big win.

If you do go the handcoded route, though, consult Python's standard extension manuals for more details on both API calls used in this and the next chapter, as well as additional extension tools we don't have space to cover in this text. C extensions can run the gamut from short SWIG input files to code that is staunchly wedded to the internals of the Python interpreter; as a rule of thumb, the former survives the ravages of time much better than the latter.

22.11. Other Extending Tools In closing the extending topic, I should mention that there are alternatives to SWIG, some of which have a loyal user base of their own. This section briefly introduces some of the more popular tools in this domain; as usual, search the Web for more details on these and more. All of the following are currently third-party tools that must be installed separately like SWIG, though Python 2.5 is scheduled to incorporate the ctypes extension as a standard library module by the time you read this.

SIP Just as a sip is a smaller swig in the drinking world, so too is the SIP system a lighter alternative to SWIG in the Python world (in fact, it was named on purpose for the joke). According to its web page, SIP makes it easy to create Python bindings for C and C++ libraries. Originally developed to create the PyQt Python bindings for the Qt toolkit, it can be used to create bindings for any C or C++ library. SIP includes a code generator and a Python support module. Much like SWIG, the code generator processes a set of specification files and generates C or C++ code, which is compiled to create the bindings extension module. The SIP Python module provides support functions to the automatically generated code. Unlike SWIG, SIP is specifically designed for bringing together Python and C/C++; SWIG also generates wrappers for many other scripting languages.

ctypes The ctypes system is a foreign function interface (FFI) module for Python. It allows Python scripts to access and call compiled functions in a binary library file directly and dynamically, by writing dispatch code in Python instead of generating or writing the integration C wrapper code we've studied in this chapter. According to its web site, ctypes allows Python to call functions exposed from DLLs and shared libraries and has facilities to create, access, and manipulate complex C datatypes in Python. The net effect is to wrap libraries in pure Python. It is also possible to implement C callback functions in pure Python; ctypes now includes an experimental code generator feature that allows automatic creation of library wrappers from C header files. ctypes works on Windows, Mac OS X, Linux, Solaris, FreeBSD, and OpenBSD. It may run on additional systems, provided that the libffi package it employs is supported. For Windows, ctypes contains a ctypes.com package, which allows Python code to call and implement custom COM interfaces.

Boost.Python The Boost.Python system is a C++ library that enables seamless interoperability between C++ and the Python programming language through an IDL-like model. Using it, developers generally write a small amount of C++ wrapper code to create a shared library for use in Python scripts. Boost.Python handles references, callbacks, type mappings, and cleanup tasks. Because it is designed to wrap C++ interfaces nonintrusively, C++ code need not be changed to be wrapped. Like other tools, this makes the system useful for wrapping existing libraries, as

well as developing new extensions from scratch. Writing interface code for large libraries can be more tedious than the generation approaches of SWIG and SIP, but it's easier than manually wrapping libraries and may afford greater control than a fully automated wrapping tool. In addition, the Pyste system provides a Boost.Python code generator, in which users specify classes and functions to be exported using a simple interface file, which is Python code. Pyste uses GCCXML to parse all the headers and extract the necessary information to generate C++ code.

Pyrex Pyrex is a language specifically for writing Python extension modules. It lets you write code that mixes Python and C datatypes anyway you want, and it compiles it into a C extension for Python. In principle, developers need not deal with the Python/C API at all, because Pyrex takes care of things such as error-checking and reference counts automatically. Technically, Pyrex is a distinct language that is Python-like, with extensions for mixing in C datatype declarations. However, almost any Python code is also valid Pyrex code. The Pyrex compiler converts Python code into C code, which makes calls to the Python/C API. In this aspect, Pyrex is similar to the older Python2C conversion project. By combining Python and C code, Pyrex offers a very different approach than the integration code generation or coding schemes of other systems.

CXX The CXX system is roughly a C++ version of Python's usual C API, which handles reference counters, exception translation, and much of the type checking and cleanup inherent in handcoded C++ extensions. As such, CXX lets you focus on the application-specific parts of your code. CXX also exposes parts of the C++ Standard Template Library containers to be compatible with Python lists and tuples.

Modulator Finally, the Modulator system is a simple Python-coded GUI that generates skeleton boilerplate code for C extension modules and types. Users select components to be supported in the GUI, and Modulator generates the initial C code; simply edit to insert type-specific parts of extension functions. Modulator is available in the Tools directory of the Python source distribution. At the end of the next chapter, we will return to extending in the context of integration at large, and we'll compare Python C integration techniques to very different approaches such as COM, CORBA, and Jython. First, though, we need to shift our perspective 180 degrees to explore the other mode of Python/C integration discussed in the next chapter: embedding.

Mixing Python and C++ Python's standard implementation is currently coded in C, so all the normal rules about mixing C programs with C++ programs apply to the Python interpreter. In fact, there is nothing special about Python in this context, but here are a few pointers. When embedding Python in a C++ program, there are no special rules to follow. Simply link in the Python library and call its functions from C++. Python's header files automatically wrap themselves in extern "C" {...} declarations to suppress C++ name mangling. Hence, the Python library looks like any other C component to C++; there is no need to recompile Python itself with a C++ compiler. When extending Python with C++ components, Python header files are still C++ friendly, so Python API calls in C++ extensions work like any other C++-to-C call. But be sure to wrap the parts of your extension code made visible to Python with extern "C" declarations so that they may be called by Python's C code. For example, to wrap a C++ class, SWIG generates a C++ extension module that declares its initialization function this way, though the rest of the module is pure C++. The only other potential complication involves C++ static or global object constructor methods when extending. If Python (a C program) is at the top level of a system, such C++ constructors may not be run when the system starts up. This behavior may vary per compiler, but if your C++ objects are not initialized on startup, make sure that your main program is linked by your C++ compiler, not by C. If you are interested in Python/C++ integration in general, be sure to consult the C++ Special Interest Group (SIG) pages at http://www.python.org for information about work in this domain. The CXX system, for instance, makes it easier to extend Python with C++.

Chapter 23. Embedding Python Section 23.1. "Add Python. Mix Well. Repeat." Section 23.2. C Embedding API Overview Section 23.3. Basic Embedding Techniques Section 23.4. Registering Callback Handler Objects Section 23.5. Using Python Classes in C Section 23.6. A High-Level Embedding API: ppembed Section 23.7. Other Integration Topics

23.1. "Add Python. Mix Well. Repeat." In the prior chapter, we explored half of the Python/C integration picture: calling C services from Python. This mode lets programmers speed up operations by moving them to C, and to utilize external libraries by wrapping them in C extension modules and types. But the inverse can be just as useful: calling Python from C. By delegating selected components of an application to embedded Python code, we can open them up to onsite changes without having to ship a system's code. This chapter tells this other half of the Python/C integration tale. It introduces the Python C interfaces that make it possible for programs written in C-compatible languages to run Python program code. In this mode, Python acts as an embedded control language (what some call a "macro" language). Although embedding is mostly presented in isolation here, keep in mind that Python's integration support is best viewed as a whole. A system's structure usually determines an appropriate integration approach: C extensions, embedded code calls, or both. To wrap up, this chapter concludes by discussing a handful of larger integration platforms, such as Component Object Model (COM) and Jython, which present broader component integration possibilities.

23.2. C Embedding API Overview The first thing you should know about Python's embedded-call API is that it is less structured than the extension interfaces. Embedding Python in C may require a bit more creativity on your part than extending: you must pick tools from a general collection of calls to implement the Python integration instead of coding to a boilerplate structure. The upside of this loose structure is that programs can combine embedding calls and strategies to build up arbitrary integration architectures. The lack of a more rigid model for embedding is largely the result of a less clear-cut goal. When extending Python, there is a distinct separation for Python and C responsibilities and a clear structure for the integration. C modules and types are required to fit the Python module/type model by conforming to standard extension structures. This makes the integration seamless for Python clients: C extensions look like Python objects and handle most of the work. But when Python is embedded, the structure isn't as obvious; because C is the enclosing level, there is no clear way to know what model the embedded Python code should fit. C may want to run objects fetched from modules, strings fetched from files or parsed out of documents, and so on. Instead of deciding what C can and cannot do, Python provides a collection of general embedding interface tools, which you use and structure according to your embedding goals. Most of these tools correspond to tools available to Python programs. Table 23-1 lists some of the more common API calls used for embedding, as well as their Python equivalents. In general, if you can figure out how to accomplish your embedding goals in pure Python code, you can probably find C API tools that achieve the same results.

Table 23-1. Common API functions C API call

Python equivalent

PyImport_ImportModule

import module , _ _import_ _

PyImport_ReloadModule

reload(module)

PyImport_GetModuleDict

sys.modules

PyModule_GetDict

module._ _dict_ _

PyDict_GetItemString

dict[key]

PyDict_SetItemString

dict[key]=val

PyDict_New

dict = {}

PyObject_GetAttrString

getattr(obj, attr)

PyObject_SetAttrString

setattr(obj, attr, val)

PyEval_CallObject

funcobj(*argstuple), apply

PyRun_String

eval(exprstr) , exec stmtstr

PyRun_File

execfile(filename)

Because embedding relies on API call selection, becoming familiar with the Python C API is fundamental to the embedding task. This chapter presents a handful of representative embedding examples and discusses common API calls, but it does not provide a comprehensive list of all tools in the API. Once you've mastered the examples here, you'll probably need to consult Python's integration manuals for more details on available calls in this domain. As mentioned in the preceding chapter, Python offers two standard manuals for C/C++ integration programmers: Extending and Embedding, an integration tutorial; and Python/C API, the Python runtime library reference. You can find the most recent releases of these manuals at http://www.python.org. Beyond this chapter, these manuals are likely to be your best resource for up-to-date and complete Python API tool information.

23.2.1. What Is Embedded Code? Before we jump into details, let's get a handle on some of the core ideas in the embedding domain. When this book speaks of "embedded" Python code, it simply means any Python program structure that can be executed from C with a direct in-process function call interface. Generally speaking, embedded Python code can take a variety of forms:

Code strings C programs can represent Python programs as character strings and run them as either expressions or statements (such as eval and exec).

Callable objects C programs can load or reference Python callable objects such as functions, methods, and classes, and call them with argument list objects (such as apply and the newer func(*pargs, *kargs)).

Code files C programs can execute entire Python program files by importing modules and running script files through the API or general system calls (e.g., popen). The Python binary library is usually what is physically embedded in the C program; the actual Python code run from C can come from a wide variety of sources: Code strings might be loaded from files, obtained from an interactive user, fetched from persistent databases and shelves, parsed out of HTML or XML files, read over sockets, built or hardcoded in a C program, passed to C extension functions from Python registration code, and so on. Callable objects might be fetched from Python modules, returned from other Python API calls, passed to C extension functions from Python registration code, and so on. Code files simply exist as files, modules, and executable scripts. Registration is a technique commonly used in callback scenarios that we will explore in more detail later in this chapter. But especially for strings of code, there are as many possible sources as there

are for C character strings. For example, C programs can construct arbitrary Python code dynamically by building and running strings. Finally, once you have some Python code to run, you need a way to communicate with it: the Python code may need to use inputs passed in from the C layer and may want to generate outputs to communicate results back to C. In fact, embedding generally becomes interesting only when the embedded code has access to the enclosing C layer. Usually, the form of the embedded code suggests its communication media: Code strings that are Python expressions return an expression result as their output. Both inputs and outputs can take the form of global variables in the namespace in which a code string is run; C may set variables to serve as input, run Python code, and fetch variables as the code's result. Inputs and outputs can also be passed with exported C extension function callsPython code may use C module or type interfaces that we met in the preceding chapter to get or set variables in the enclosing C layer. Communications schemes are often combined; for instance, C may preassign global names to objects that export state and interface calls to the embedded Python code.[*] [*]

If you want a concrete example, flip back to the discussion of Active Scripting in Chapter 18. This system fetches Python code embedded in an HTML web page file, assigns global variables in a namespace to objects that give access to the web browser's environment, and runs the Python code in the namespace where the objects were assigned. I worked on a project where we did something similar, but Python code was embedded in XML documents, and objects that were preassigned to globals in the code's namespace represented widgets in a GUI.

Callable objects may accept inputs as function arguments and produce results as function return values. Passed-in mutable arguments (e.g., lists, dictionaries, class instances) can be used as both input and output for the embedded codechanges made in Python are retained in objects held by C. Objects can also make use of the global variable and C extension functions interface techniques described for strings to communicate with C. Code files can communicate with most of the same techniques as code strings; when run as separate programs, files can also employ Inter-Process Communication (IPC) techniques. Naturally, all embedded code forms can also communicate with C using general system-level tools: files, sockets, pipes, and so on. These techniques are generally less direct and slower, though. Here, we are still interested in in-process function call integration.

23.3. Basic Embedding Techniques As you can probably tell from the preceding overview, there is much flexibility in the embedding domain. To illustrate common embedding techniques in action, this section presents a handful of short C programs that run Python code in one form or another. Most of these examples make use of the simple Python module file shown in Example 23-1.

Example 23-1. PP3E\Integrate\Embed\Basics\usermod.py

######################################################### # C runs Python code in this module in embedded mode. # Such a file can be changed without changing the C layer. # There is just standard Python code (C does conversions). # You can also run code in standard modules like string. ######################################################### message = 'The meaning of life...' def transform(input): input = input.replace('life', 'Python') return input.upper( )

If you know any Python at all, you probably know that this file defines a string and a function; the function returns whatever it is passed with string substitution and uppercase conversions applied. It's easy to use from Python:

.../PP3E/Integrate/Embed/Basics$ python >>> import usermod >>> usermod.message 'The meaning of life...' >>> usermod.transform(usermod.message) 'THE MEANING OF PYTHON...'

# import a module # fetch a string # call a function

With a little Python API wizardry, it's not much more difficult to use this module the same way in C.

23.3.1. Running Simple Code Strings Perhaps the simplest way to run Python code from C is by calling the PyRun_SimpleString API function. With it, C programs can execute Python programs represented as C character string arrays. This call is also very limited: all code runs in the same namespace (the module _ _main_ _), the code

strings must be Python statements (not expressions), and there is no direct way to communicate inputs or outputs with the Python code run. Still, it's a simple place to start. Moreover, when augmented with an imported C extension module that the embedded Python code can use to communicate with the enclosing C layer, this technique can satisfy many embedding goals. To demonstrate the basics, the C program in Example 23-2 runs Python code to accomplish the same results as the interactive session listed in the prior section.

Example 23-2. PP3E\Integrate\Embed\Basics\embed-simple.c

/******************************************************* * simple code strings: C acts like the interactive * prompt, code runs in _ _main_ _, no output sent to C; *******************************************************/ #include /* standard API def */ main( ) { printf("embed-simple\n"); Py_Initialize( ); PyRun_SimpleString("import usermod"); PyRun_SimpleString("print usermod.message"); PyRun_SimpleString("x = usermod.message"); PyRun_SimpleString("print usermod.transform(x)"); }

/* load .py file */ /* on Python path */ /* compile and run */

The first thing you should notice here is that when Python is embedded, C programs always call Py_Initialize to initialize linked-in Python libraries before using any other API functions. The rest of this code is straightforwardC submits hardcoded strings to Python that are roughly what we typed interactively. Internally, PyRun_SimpleString invokes the Python compiler and interpreter to run the strings sent from C; as usual, the Python compiler is always available in systems that contain Python.

23.3.1.1. Compiling and running To build a standalone executable from this C source file, you need to link its compiled form with the Python library file. In this chapter, "library" usually means the binary library file that is generated when Python is compiled, not the Python source code library. Today, everything in Python that you need in C is compiled into a single Python library file when the interpreter is built (e.g., libpython2.4.dll on Cygwin). The program's main function comes from your C code, and depending on your platform and the extensions installed in your Python, you may also need to link any external libraries referenced by the Python library. Assuming no extra extension libraries are needed, Example 23-3 is a minimal makefile for building the C program in Example 23-2 under Cygwin on Windows. Again, makefile details vary per platform, but see Python manuals for hints. This makefile uses the Python include-files path to find Python.h in the compile step and adds the Python library file to the final link step to make API calls available to the C program.

Example 23-3. PP3E\Integrate\Embed\Basics\makefile.1

# # # #

a Cygwin makefile that builds a C executable that embeds Python, assuming no external module libs must be linked in; uses Python header files, links in the Python lib file; both may be in other dirs (e.g., /usr) in your install;

PYLIB = /usr/bin PYINC = /usr/include/python2.4 embed-simple: embed-simple.o gcc embed-simple.o -L$(PYLIB) -lpython2.4 -g -o embed-simple embed-simple.o: embed-simple.c gcc embed-simple.c -c -g -I$(PYINC)

To build a program with this file, launch make on it as usual:

.../PP3E/Integrate/Embed/Basics$ make -f makefile.1 gcc embed-simple.c -c -g -I/usr/include/python2.4 gcc embed-simple.o -L/usr/bin -lpython2.4 -g -o embed-simple

Things may not be quite this simple in practice, though, at least not without some coaxing. The makefile in Example 23-4 is the one I actually used to build all of this section's C programs on Cygwin.

Example 23-4. PP3E\Integrate\Embed\Basics\makefile.basics

# cygwin makefile to build all 5 # basic embedding examples at once PYLIB = /usr/bin PYINC = /usr/include/python2.4 BASICS = embed-simple.exe embed-string.exe embed-object.exe embed-dict.exe embed-bytecode.exe all:

\ \ \ \

$(BASICS)

embed%.exe: embed%.o gcc embed$*.o -L$(PYLIB) -lpython2.4 -g -o $@ embed%.o: embed%.c

gcc embed$*.c -c -g -I$(PYINC) clean: rm -f *.o *.pyc $(BASICS) core

On some platforms, you may need to also link in other libraries because the Python library file used may have been built with external dependencies enabled and required. In fact, you may have to link in arbitrarily many more externals for your Python library, and frankly, chasing down all the linker dependencies can be tedious. Required libraries may vary per platform and Python install, so there isn't a lot of advice I can offer to make this process simple (this is C, after all). The standard C development techniques will apply. One thing to note is that on some platforms, if you're going to do much embedding work and you run into external dependency issues, you might want to build Python on your machine from its source with all unnecessary extensions disabled in the Modules/Setup file (or the top-level setup.py Distutils script in more recent releases). This produces a Python library with minimal external requirements, which links much more easily. For example, if your embedded code won't be building GUIs, Tkinter can simply be removed from the library; see the README file at the top of Python's source distribution for details. You can also find a list of external libraries referenced from your Python in the generated makefiles located in the Python source tree. In any event, the good news is that you need to resolve linker dependencies only once. Once you've gotten the makefile to work, run it to build the C program with Python libraries linked in:

.../PP3E/Integrate/Embed/Basics$ make -f makefile.basics clean rm -f *.o *.pyc embed-simple.exe embed-string.exe embed-object.exe embed-dict.ex e embed-bytecode.exe core .../PP3E/Integrate/Embed/Basics$ make -f makefile.basics gcc embed-simple.c -c -g -I/usr/include/python2.4 gcc embed-simple.o -L/usr/bin -lpython2.4 -g -o embed-simple.exe ...lines deleted... gcc embed-bytecode.c -c -g -I/usr/include/python2.4 gcc embed-bytecode.o -L/usr/bin -lpython2.4 -g -o embed-bytecode.exe rm embed-dict.o embed-object.o embed-simple.o embed-bytecode.o embed-string.o

After building, run the resulting C program as usual, regardless of how this works in your platform:[*] [*]

Under Python 2.4 and Cygwin on Windows, I had to first set my PYTHONPATH to include the current directory in order to run the embedding examples under Python 2.4 and Cygwin, with the shell command export PYTHONPATH=.. I also had to use the shell command ./embed-simple to execute the program due to my system path setting. Your mileage may vary; if you have trouble, try running the embedded Python commands import sys and print sys.path from C to see what Python's path looks like, and take a look at the Python/C API manual for more on path configuration for embedded applications.

.../PP3E/Integrate/Embed/Basics$ embed-simple embed-simple The meaning of life... THE MEANING OF PYTHON...

Most of this output is produced by Python print statements sent from C to the linked-in Python library. It's as if C has become an interactive Python programmer. Naturally, strings of Python code run by C probably would not be hardcoded in a C program file like this. They might instead be loaded from a text file or GUI, extracted from HTML or XML files, fetched from a persistent database or socket, and so on. With such external sources, the Python code strings that are run from C could be changed arbitrarily without having to recompile the C program that runs them. They may even be changed onsite, and by end users of a system. To make the most of code strings, though, we need to move on to more flexible API tools.

23.3.2. Running Code Strings with Results and Namespaces Example 23-5 uses the following API calls to run code strings that return expression results back to C:

Py_Initialize Initializes linked-in Python libraries as before

PyImport_ImportModule Imports a Python module and returns a pointer to it

PyModule_GetDict Fetches a module's attribute dictionary object

PyRun_String Runs a string of code in explicit namespaces

PyObject_SetAttrString Assigns an object attribute by namestring

PyArg_Parse Converts a Python return value object to C form The import calls are used to fetch the namespace of the usermod module listed in Example 23-1 earlier so that code strings can be run there directly (and will have access to names defined in that module without qualifications). Py_Import_ImportModule is like a Python import statement, but the imported module object is returned to C; it is not assigned to a Python variable name. As a result, it's probably more similar to the Python _ _import_ _ built-in function. The PyRun_String call is the one that actually runs code here, though. It takes a code string, a parser mode flag, and dictionary object pointers to serve as the global and local namespaces for running the code string. The mode flag can be Py_eval_input to run an expression, or Py_file_input to run a

statement; when running an expression, the result of evaluating the expression is returned from this call (it comes back as a PyObject* object pointer). The two namespace dictionary pointer arguments allow you to distinguish global and local scopes, but they are typically passed the same dictionary such that code runs in a single namespace. [*] [*]

A related function lets you run files of code but is not demonstrated in this chapter: PyObject* PyRun_File(FILE *fp, char *filename, mode, globals, locals). Because you can always load a file's text and run it as a single code string with PyRun_String, the PyRun_File call is not always necessary. In such multiline code strings, the \n character terminates lines and indentation groups blocks as usual.

Example 23-5. PP3E\Integrate\Embed\Basics\embed-string.c

/* code-strings with results and namespaces */ #include main( ) { char *cstr; PyObject *pstr, *pmod, *pdict; printf("embed-string\n"); Py_Initialize( ); /* get usermod.message */ pmod = PyImport_ImportModule("usermod"); pdict = PyModule_GetDict(pmod); pstr = PyRun_String("message", Py_eval_input, pdict, pdict); /* convert to C */ PyArg_Parse(pstr, "s", &cstr); printf("%s\n", cstr); /* assign usermod.X */ PyObject_SetAttrString(pmod, "X", pstr); /* print usermod.transform(X) */ (void) PyRun_String("print transform(X)", Py_file_input, pdict, pdict); Py_DECREF(pmod); Py_DECREF(pstr); }

When compiled and run, this file produces the same result as its predecessor:

.../PP3E/Integrate/Embed/Basics$ embed-string embed-string The meaning of life... THE MEANING OF PYTHON...

But very different work goes into producing this output. This time, C fetches, converts, and prints the value of the Python module's message attribute directly by running a string expression and assigning a global variable (X) within the module's namespace to serve as input for a Python print statement string. Because the string execution call in this version lets you specify namespaces, you can better partition the embedded code your system runseach grouping can have a distinct namespace to avoid overwriting other groups' variables. And because this call returns a result, you can better communicate with the embedded code; expression results are outputs, and assignments to globals in the namespace in which code runs can serve as inputs. Before we move on, I need to explain two coding issues here. First, this program also decrements the reference count on objects passed to it from Python, using the Py_DECREF call introduced in Chapter 22. These calls are not strictly needed here (the objects' space is reclaimed when the programs exits anyhow), but they demonstrate how embedding interfaces must manage reference counts when Python passes their ownership to C. If this was a function called from a larger system, for instance, you would generally want to decrement the count to allow Python to reclaim the objects. Second, in a realistic program, you should generally test the return values of all the API calls in this program immediately to detect errors (e.g., import failure). Error tests are omitted in this section's example to keep the code simple, but they will appear in later code listings and should be included in your programs to make them more robust.

23.3.3. Calling Python Objects The last two sections dealt with running strings of code, but it's easy for C programs to deal in terms of Python objects too. Example 23-6 accomplishes the same task as Examples 23-2 and 23-5, but it uses other API tools to interact with objects in the Python module directly:

PyImport_ImportModule Imports the module from C as before

PyObject_GetAttrString Fetches an object's attribute value by name

PyEval_CallObject Calls a Python function (or class, or method)

PyArg_Parse Converts Python objects to C values

Py_BuildValue Converts C values to Python objects

We met both of the data conversion functions in Chapter 22. The PyEval_CallObject call in this version of the example is the key call here: it runs the imported function with a tuple of arguments, much like the Python apply built-in function and newer func(*args) call syntax. The Python function's return value comes back to C as a PyObject*, a generic Python object pointer.

Example 23-6. PP3E\Integrate\Embed\Basics\embed-object.c

/* fetch and call objects in modules */ #include main( ) { char *cstr; PyObject *pstr, *pmod, *pfunc, *pargs; printf("embed-object\n"); Py_Initialize( ); /* get usermod.message */ pmod = PyImport_ImportModule("usermod"); pstr = PyObject_GetAttrString(pmod, "message"); /* convert string to C */ PyArg_Parse(pstr, "s", &cstr); printf("%s\n", cstr); Py_DECREF(pstr); /* call usermod.transform(usermod.message) */ pfunc = PyObject_GetAttrString(pmod, "transform"); pargs = Py_BuildValue("(s)", cstr); pstr = PyEval_CallObject(pfunc, pargs); PyArg_Parse(pstr, "s", &cstr); printf("%s\n", cstr); /* free owned objects */ Py_DECREF(pmod); Py_DECREF(pstr); Py_DECREF(pfunc); /* not really needed in main( ) */ Py_DECREF(pargs); /* since all memory goes away */ }

When compiled and run, the result is the same again:

.../PP3E/Integrate/Embed/Basics$ embed-object embed-object The meaning of life... THE MEANING OF PYTHON...

But this output is generated by C this timefirst, by fetching the Python module's message attribute

value, and then by fetching and calling the module's transform function object directly and printing its return value that is sent back to C. Input to the TRansform function is a function argument here, not a preset global variable. Notice that message is fetched as a module attribute this time, instead of by running its name as a code string; there is often more than one way to accomplish the same goals with different API calls. Running functions in modules like this is a simple way to structure embedding; code in the module file can be changed arbitrarily without having to recompile the C program that runs it. It also provides a direct communication model: inputs and outputs to Python code can take the form of function arguments and return values.

23.3.4. Running Strings in Dictionaries When we used PyRun_String earlier to run expressions with results, code was executed in the namespace of an existing Python module. However, sometimes it's more convenient to create a brand-new namespace for running code strings that is independent of any existing module files. The C file in Example 23-7 shows how; the new namespace is created as a new Python dictionary object, and a handful of new API calls are employed in the process:

PyDict_New Makes a new empty dictionary object

PyDict_SetItemString Assigns to a dictionary's key

PyDict_GetItemString Fetches (indexes) a dictionary value by key

PyRun_String Runs a code string in namespaces, as before

PyEval_GetBuiltins Gets the built-in scope's module The main trick here is the new dictionary. Inputs and outputs for the embedded code strings are mapped to this dictionary by passing it as the code's namespace dictionaries in the PyRun_String call. The net effect is that the C program in Example 23-7 works exactly like this Python code:

>>> >>> >>> >>> >>>

d = {} d['Y'] = 2 exec 'X = 99' in d, d exec 'X = X + Y' in d, d print d['X']

101

But here, each Python operation is replaced by a C API call.

Example 23-7. PP3E\Integrate\Embed\Basics\embed-dict.c

/*************************************************** * make a new dictionary for code string namespace; ***************************************************/ #include main( ) { int cval; PyObject *pdict, *pval; printf("embed-dict\n"); Py_Initialize( ); /* make a new namespace */ pdict = PyDict_New( ); PyDict_SetItemString(pdict, "_ _builtins_ _", PyEval_GetBuiltins( )); PyDict_SetItemString(pdict, "Y", PyInt_FromLong(2)); PyRun_String("X = 99", Py_file_input, pdict, pdict); PyRun_String("X = X+Y", Py_file_input, pdict, pdict); pval = PyDict_GetItemString(pdict, "X");

/* /* /* /*

dict['Y'] = 2 run statements same X and Y fetch dict['X']

PyArg_Parse(pval, "i", &cval); printf("%d\n", cval); Py_DECREF(pdict);

/* convert to C */ /* result=101 */

*/ */ */ */

}

When compiled and run, this C program creates this sort of output:

.../PP3E/Integrate/Embed/Basics$ embed-dict embed-dict 101

The output is different this time: it reflects the value of the Python variable X assigned by the embedded Python code strings and fetched by C. In general, C can fetch module attributes either by calling PyObject_GetAttrString with the module or by using PyDict_GetItemString to index the module's attribute dictionary (expression strings work too, but they are less direct). Here, there is no module at all, so dictionary indexing is used to access the code's namespace in C. Besides allowing you to partition code string namespaces independent of any Python module files on the underlying system, this scheme provides a natural communication mechanism. Values that are

stored in the new dictionary before code is run serve as inputs, and names assigned by the embedded code can later be fetched out of the dictionary to serve as code outputs. For instance, the variable Y in the second string run refers to a name set to 2 by C; X is assigned by the Python code and fetched later by C code as the printed result. There is one subtlety: dictionaries that serve as namespaces for running code are generally required to have a _ _builtins_ _ link to the built-in scope searched last for name lookups, set with code of this form:

PyDict_SetItemString(pdict, "_ _builtins_ _", PyEval_GetBuiltins( ));

This is esoteric, and it is normally handled by Python internally for modules. For raw dictionaries, though, we are responsible for setting the link manually.

23.3.5. Precompiling Strings to Bytecode When you call Python function objects from C, you are actually running the already compiled bytecode associated with the object (e.g., a function body). When running strings, Python must compile the string before running it. Because compilation is a slow process, this can be a substantial overhead if you run a code string more than once. Instead, precompile the string to a bytecode object to be run later, using the API calls illustrated in Example 23-8:[*] [*]

In case you've forgotten: bytecode is simply an intermediate representation for already compiled program code in the current standard Python implementation. It's a low-level binary format that can be quickly interpreted by the Python runtime system. Bytecode is usually generated automatically when you import a module, but there may be no notion of an import when running raw strings from C.

Py_CompileString Compiles a string of code and returns a bytecode object

PyEval_EvalCode Runs a compiled bytecode object The first of these takes the mode flag that is normally passed to PyRun_String, as well as a second string argument that is used only in error messages. The second takes two namespace dictionaries. These two API calls are used in Example 23-8 to compile and execute three strings of Python code in turn.

Example 23-8. PP3E\Integrate\Embed\Basics\embed-bytecode.c

/* precompile code strings to bytecode objects */ #include #include #include <eval.h> main( ) { int i; char *cval; PyObject *pcode1, *pcode2, *pcode3, *presult, *pdict; char *codestr1, *codestr2, *codestr3; printf("embed-bytecode\n"); Py_Initialize( ); codestr1 = "import usermod\nprint usermod.message"; codestr2 = "usermod.transform(usermod.message)"; codestr3 = "print '%d:%d' % (X, X ** 2),";

/* statements */ /* expression */ /* use input X */

/* make new namespace dictionary */ pdict = PyDict_New( ); if (pdict == NULL) return -1; PyDict_SetItemString(pdict, "_ _builtins_ _", PyEval_GetBuiltins( )); /* precompile strings of code to bytecode objects */ pcode1 = Py_CompileString(codestr1, "<embed>", Py_file_input); pcode2 = Py_CompileString(codestr2, "<embed>", Py_eval_input); pcode3 = Py_CompileString(codestr3, "<embed>", Py_file_input); /* run compiled bytecode in namespace dict */ if (pcode1 && pcode2 && pcode3) { (void) PyEval_EvalCode((PyCodeObject *)pcode1, pdict, pdict); presult = PyEval_EvalCode((PyCodeObject *)pcode2, pdict, pdict); PyArg_Parse(presult, "s", &cval); printf("%s\n", cval); Py_DECREF(presult); /* rerun code object repeatedly */ for (i = 0; i #define error(msg) do { printf("%s\n", msg); exit(1); } while (1) main( ) { /* run objects with low-level calls and full error checking */ char *arg1="sir", *arg2="robin", *cstr; PyObject *pmod, *pclass, *pargs, *pinst, *pmeth, *pres; /* instance = module.klass( ) */ Py_Initialize( ); pmod = PyImport_ImportModule("module"); if (pmod == NULL) error("Can't load module"); pclass = PyObject_GetAttrString(pmod, "klass"); Py_DECREF(pmod); if (pclass == NULL) error("Can't get module.klass"); pargs = Py_BuildValue("( )"); if (pargs == NULL) { Py_DECREF(pclass); error("Can't build arguments list"); } pinst = PyEval_CallObject(pclass, pargs); Py_DECREF(pclass); Py_DECREF(pargs); if (pinst == NULL) error("Error calling module.klass( )");

/* fetch module */

/* fetch module.class */

/* call class( ) */

/* result = instance.method(x,y) */ pmeth = PyObject_GetAttrString(pinst, "method"); /* fetch bound method */ Py_DECREF(pinst); if (pmeth == NULL) error("Can't fetch klass.method"); pargs = Py_BuildValue("(ss)", arg1, arg2); if (pargs == NULL) { Py_DECREF(pmeth); error("Can't build arguments list"); } pres = PyEval_CallObject(pmeth, pargs); Py_DECREF(pmeth); Py_DECREF(pargs); if (pres == NULL)

/* convert to Python */

/* call method(x,y) */

error("Error calling klass.method"); if (!PyArg_Parse(pres, "s", &cstr)) error("Can't convert klass.method result"); printf("%s\n", cstr); Py_DECREF(pres); }

/* convert to C */

23.6. A High-Level Embedding API: ppembed As you can probably tell from Example 23-14, embedded-mode integration code can very quickly become as complicated as extending code for nontrivial use. Today, no automation solution solves the embedding problem as well as SWIG addresses extending. Because embedding does not impose the kind of structure that extension modules and types provide, it's much more of an open-ended problem; what automates one embedding strategy might be completely useless in another. With a little upfront work, though, you can still automate common embedding tasks by wrapping up calls in higher-level APIs that make assumptions about common use cases. These APIs could handle things such as error detection, reference counts, data conversions, and so on. One such API, ppembed, is available in this book's examples distribution. It merely combines existing tools in Python's standard C API to provide a set of easier-to-use calls for running Python programs from C.

23.6.1. Running Objects with ppembed For instance, Example 23-15 demonstrates how to recode objects-err-low.c in Example 23-14, by linking ppembed's library files with your program.

Example 23-15. PP3E\Integrate\Embed\ApiClients\object-api.c

#include <stdio.h> #include "ppembed.h" main ( ) { int failflag; PyObject *pinst; char *arg1="sir", *arg2="robin", *cstr;

/* with ppembed high-level api */

failflag = PP_Run_Function("module", "klass", "O", &pinst, "( )") || PP_Run_Method(pinst, "method", "s", &cstr, "(ss)", arg1, arg2); printf("%s\n", (!failflag) ? cstr : "Can't call objects"); Py_XDECREF(pinst); free(cstr); }

This file uses two ppembed calls (the names that start with PP) to make the class instance and call its method. Because ppembed handles error checks, reference counts, data conversions, and so on, there isn't much else to do here. When this program is run and linked with ppembed library code, it works like the original, but it is much easier to read, write, and debug:

.../PP3E/Integrate/Embed/ApiClients$ objects-api

brave sir robin

See the book's examples distribution for the makefile used to build this program; because it's similar to what we've seen and may vary widely on your system, we'll omit further build details in this chapter.

23.6.2. Running Code Strings with ppembed The ppembed API provides higher-level calls for most of the embedding techniques we've seen in this chapter. For example, the C program in Example 23-16 runs code strings to make the (now rarely used) string module capitalize a simple text.

Example 23-16. PP3E\Integrate\Embed\ApiClients\codestring-low.c

#include /* standard API defs */ void error(char *msg) { printf("%s\n", msg); exit(1); } main( ) { /* run strings with low-level calls */ char *cstr; PyObject *pstr, *pmod, *pdict; Py_Initialize( ); /* result = string.upper('spam') + '!' */ pmod = PyImport_ImportModule("string"); if (pmod == NULL) error("Can't import module"); pdict = PyModule_GetDict(pmod); Py_DECREF(pmod); if (pdict == NULL) error("Can't get module dict");

/* with error tests */

/* fetch module */ /* for namespace */ /* string._ _dict_ _ */

pstr = PyRun_String("upper('spam') + '!'", Py_eval_input, pdict, pdict); if (pstr == NULL) error("Error while running string"); /* convert result to C */ if (!PyArg_Parse(pstr, "s", &cstr)) error("Bad result type"); printf("%s\n", cstr); Py_DECREF(pstr); /* free exported objects, not pdict */ }

This C program file includes politically correct error tests after each API call. When run, it prints the result returned by running an uppercase conversion call in the namespace of the Python string module:

.../PP3E/Integrate/Embed/ApiClients$ codestring-low SPAM!

You can implement such integrations by calling Python API functions directly, but you don't necessarily have to. With a higher-level embedding API such as ppembed, the task can be noticeably simpler, as shown in Example 23-17.

Example 23-17. PP3E\Integrate\Embed\ApiClients\codestring-api.c

#include "ppembed.h" #include <stdio.h> /* with ppembed high-level API */ main( ) { char *cstr; int err = PP_Run_Codestr( PP_EXPRESSION, /* expr or stmt? */ "upper('spam') + '!'", "string", /* code, module */ "s", &cstr); /* expr result */ printf("%s\n", (!err) ? cstr : "Can't run string"); /* and free(cstr) */ }

When linked with the ppembed library code, this version produces the same result as the former. Like most higher-level APIs, ppembed makes some usage mode assumptions that are not universally applicable; when they match the embedding task at hand, though, such wrapper calls can cut much clutter from programs that need to run embedded Python code.

23.6.3. Running Customizable Validations Our examples so far have been intentionally simple, but embedded Python code can do useful work as well. For instance, the C program in Example 23-18 calls ppembed functions to run a string of Python code fetched from a file that performs validation tests on inventory data. To save space, I'm not going to list all the components used by this example (you can find all of its source files and makefiles in the book's examples distribution package). Still, this file shows the embedding portions relevant to this chapter: it sets variables in the Python code's namespace to serve as input, runs the Python code, and then fetches names out of the code's namespace as results.[*] [*]

This is more or less the kind of structure used when Python is embedded in HTML files in contexts such as the Active Scripting extension described in Chapter 18, except that the globals set here (e.g., PRODUCT) become names preset to web browser objects, and the code is extracted from a web page. It is not fetched from a text file with a known name.

Example 23-18. PP3E\Integrate\Embed\Inventory\order-string.c

/* run embedded code-string validations */ #include #include #include #include

<stdio.h> <string.h> "ordersfile.h"

run_user_validation( ) { int i, status, nbytes; char script[4096]; char *errors, *warnings; FILE *file;

/* Python is initialized automatically */ /* caveat: should check status everywhere */ /* caveat: should malloc a big-enough block */

file = fopen("validate1.py", "r"); nbytes = fread(script, 1, 4096, file); script[nbytes] = '\0';

/* customizable validations */ /* load Python file text */

status = PP_Make_Dummy_Module("orders"); /* application's own namespace */ for (i=0; i < numorders; i++) { /* like making a new dictionary */ printf("\n%d (%d, %d, '%s')\n", i, orders[i].product, orders[i].quantity, orders[i].buyer); PP_Set_Global("orders", "PRODUCT", "i", orders[i].product); PP_Set_Global("orders", "QUANTITY", "i", orders[i].quantity); PP_Set_Global("orders", "BUYER", "s", orders[i].buyer);

/* int */ /* int */ /* str */

status = PP_Run_Codestr(PP_STATEMENT, script, "orders", "", NULL); if (status == -1) { printf("Python error during validation.\n"); PyErr_Print( ); /* show traceback */ continue; } PP_Get_Global("orders", "ERRORS", "s", &errors); PP_Get_Global("orders", "WARNINGS", "s", &warnings);

/* can split */ /* on blanks */

printf("errors: %s\n", strlen(errors)? errors : "none"); printf("warnings: %s\n", strlen(warnings)? warnings : "none"); free(errors); free(warnings); PP_Run_Function("inventory", "print_files", "", NULL, "( )"); } } main(int argc, char **argv) { run_user_validation( ); }

/* C is on top, Python is embedded */ /* but Python can use C extensions too */ /* don't need sys.argv in embedded code */

There are a couple of things worth noticing here. First, in practice, this program might fetch the Python code file's name or path from configurable shell variables; here, it is loaded from the current directory. Second, you could also code this program by using straight API calls rather than ppembed,

but each of the PP calls here would then grow into a chunk of more complex code. As coded, you can compile and link this file with Python and ppembed library files to build a program. The Python code run by the resulting C program lives in Example 23-19; it uses preset globals and is assumed to set globals to send result strings back to C.

Example 23-19. PP3E\Integrate\Embed\Inventory\validate1.py

# embedded validation code, run from C # input vars: PRODUCT, QUANTITY, BUYER # output vars: ERRORS, WARNINGS import string import inventory msgs, errs = [], []

# all Python tools are available to embedded code # plus C extensions, Python modules, classes,.. # warning, error message lists

def validate_order( ): if PRODUCT not in inventory.skus( ): # this function could be imported errs.append('bad-product') # from a user-defined module too elif QUANTITY > inventory.stock(PRODUCT): errs.append('check-quantity') else: inventory.reduce(PRODUCT, QUANTITY) if inventory.stock(PRODUCT) / QUANTITY < 2: msgs.append('reorder-soon:' + repr(PRODUCT)) first, last = BUYER[0], BUYER[1:] if first not in string.uppercase: errs.append('buyer-name:' + first) if BUYER not in inventory.buyers( ): msgs.append('new-buyer-added') inventory.add_buyer(BUYER) validate_order( ) ERRORS = ' '.join(errs) WARNINGS = ' '.join(msgs)

# code is changeable onsite: # this file is run as one long # code string, with input and # output vars used by the C app

# add a space between messages # pass out as strings: "" == none

Don't sweat the details in this code; some components it uses are not listed here either (see the book's examples distribution for the full implementation). The thing you should notice, though, is that this code file can contain any kind of Python codeit can define functions and classes, use sockets and threads, and so on. When you embed Python, you get a full-featured extension language for free. Perhaps even more importantly, because this file is Python code, it can be changed arbitrarily without having to recompile the C program. Such flexibility is especially useful after a system has been shipped and installed.

23.6.3.1. Running function-based validations As discussed earlier, there are a variety of ways to structure embedded Python code. For instance, you can implement similar flexibility by delegating actions to Python functions fetched from module files, as illustrated in Example 23-20.

Example 23-20. PP3E\Integrate\Embed\Inventory\order-func.c

/* run embedded module-function validations */ #include #include #include #include

<stdio.h> <string.h> "ordersfile.h"

run_user_validation( ) { int i, status; char *errors, *warnings; PyObject *results;

/* should check status everywhere */ /* no file/string or namespace here */

for (i=0; i < numorders; i++) { printf("\n%d (%d, %d, '%s')\n", i, orders[i].product, orders[i].quantity, orders[i].buyer); status = PP_Run_Function( /* validate2.validate(p,q,b) */ "validate2", "validate", "O", &results, "(iis)", orders[i].product, orders[i].quantity, orders[i].buyer); if (status == -1) { printf("Python error during validation.\n"); PyErr_Print( ); /* show traceback */ continue; } PyArg_Parse(results, "(ss)", &warnings, &errors); printf("errors: %s\n", strlen(errors)? errors : "none"); printf("warnings: %s\n", strlen(warnings)? warnings : "none"); Py_DECREF(results); /* ok to free strings */ PP_Run_Function("inventory", "print_files", "", NULL, "( )"); } } main(int argc, char **argv) { run_user_validation( ); }

The difference here is that the Python code file (shown in Example 23-21) is imported and so must live on the Python module search path. It also is assumed to contain functions, not a simple list of statements. Strings can live anywherefiles, databases, web pages, and so onand may be simpler for end users to code. But assuming that the extra requirements of module functions are not prohibitive, functions provide a natural communication model in the form of arguments and return values.

Example 23-21. PP3E\Integrate\Embed\Inventory\validate2.py

# embedded validation code, run from C # input = args, output = return value tuple import string import inventory def validate(product, quantity, buyer): # function called by name msgs, errs = [], [] # via mod/func name strings first, last = buyer[0], buyer[1:] if first not in string.uppercase: # or not first.isupper( ) errs.append('buyer-name:' + first) if buyer not in inventory.buyers( ): msgs.append('new-buyer-added') inventory.add_buyer(buyer) validate_order(product, quantity, errs, msgs) # mutable list args return ' '.join(msgs), ' '.join(errs) # use "(ss)" format def validate_order(product, quantity, errs, msgs): if product not in inventory.skus( ): errs.append('bad-product') elif quantity > inventory.stock(product): errs.append('check-quantity') else: inventory.reduce(product, quantity) if inventory.stock(product) / quantity < 2: msgs.append('reorder-soon:' + repr(product))

23.6.3.2. Other validation components For another API use case, the file order-bytecode.c in the book's source distribution shows how to utilize ppembed's convenience functions to precompile strings to bytecode for speed. It's similar to Example 23-18, but it calls PP_Compile_Codestr to compile and PP_Run_Bytecode to run. For reference, the database used by the validations code was initially prototyped for testing with the Python module inventory.py (see Example 23-22).

Example 23-22. PP3E\Integrate\Embed\Inventory\inventory.py

# simulate inventory/buyer databases while prototyping Inventory = { 111: 555: 444: 222:

10, 1, 100, 5 }

Skus = Inventory.keys( )

# # # #

"sku (product#) : quantity" would usually be a file or shelve: the operations below could work on an open shelve (or DBM file) too... # cache keys if they won't change

def skus( ): return Skus def stock(sku): return Inventory[sku] def reduce(sku, qty): Inventory[sku] = Inventory[sku] - qty Buyers = ['GRossum', 'JOusterhout', 'LWall']

# or keys( ) of a shelve|DBM file

def buyers( ): return Buyers def add_buyer(buyer): Buyers.append(buyer) def print_files( ): print Inventory, Buyers

# check effect of updates

And the list of orders to process was simulated with the C header file ordersfile.h (see Example 2323).

Example 23-23. PP3E\Integrate\Embed\Inventory\ordersfile.h

/* simulated file/dbase of orders to be filled */ struct { int product; /* or use a string if key is structured: */ int quantity; /* Python code can split it up as needed */ char *buyer; /* by convention, first-initial+last */ } orders[] = { {111, 2, "GRossum" }, /* this would usually be an orders file */ {222, 5, "LWall" }, /* which the Python code could read too */ {333, 3, "JOusterhout" }, {222, 1, "4Spam" }, {222, 0, "LTorvalds" }, /* the script might live in a database too */ {444, 9, "ERaymond" } }; int numorders = 6;

Both of these serve for prototyping, but are intended to be replaced with real database and file interfaces in later mutations of the system. See the WithDbase subdirectory in the book's source distribution for more on this thread. See also the Python-coded equivalents of the C files listed in this section; they were initially prototyped in Python too.

And finally, here is the output produced the C string-based program in Example 23-18 when using the prototyping components listed in this section. The output is printed by C, but it reflects the results of the Python-coded validations it runs:[*] [*]

Note that to get this example to work under Cygwin on Windows, I had to run the Python file through dos2unix to convert line-end characters; as always, your platform may vary.

.../PP3E/Integrate/Embed/Inventory$ ./order-string 0 (111, 2, 'GRossum') errors: none warnings: none {555: 1, 444: 100, 222: 5, 111: 8} ['GRossum', 'JOusterhout', 'LWall'] 1 (222, 5, 'LWall') errors: none warnings: reorder-soon:222 {555: 1, 444: 100, 222: 0, 111: 8} ['GRossum', 'JOusterhout', 'LWall'] 2 (333, 3, 'JOusterhout') errors: bad-product warnings: none {555: 1, 444: 100, 222: 0, 111: 8} ['GRossum', 'JOusterhout', 'LWall'] 3 (222, 1, '4Spam') errors: buyer-name:4 check-quantity warnings: new-buyer-added {555: 1, 444: 100, 222: 0, 111: 8} ['GRossum', 'JOusterhout', 'LWall', '4Spam'] 4 (222, 0, 'LTorvalds') Python error during validation. Traceback (most recent call last): File "<string>", line 25, in ? File "<string>", line 16, in validate_order ZeroDivisionError: integer division or modulo by zero 5 (444, 9, 'ERaymond') errors: none warnings: new-buyer-added {555: 1, 444: 91, 222: 0, 111: 8} ['GRossum', 'JOusterhout', 'LWall', '4Spam', 'LTorvalds', 'ERaymond']

The function-based output is similar, but more details are printed for the exception (function calls are active; they are not a single string). Trace through the Python and C code files to see how orders are validated and applied to inventory. This output is a bit cryptic because the system is still a work in progress at this stage. One of the nice features of Python, though, is that it enables such incremental development. In fact, with its integration interfaces, we can simulate future components in either Python or C.

23.6.4. ppembed Implementation

The ppembed API originally appeared as an example in the first edition of this book. Since then, it has been utilized in real systems and has become too large to present here in its entirety. For instance, ppembed also supports debugging embedded code (by routing it to the pdb debugger module), dynamically reloading modules containing embedded code, and other features too complex to illustrate usefully here. But if you are interested in studying another example of Python embedding calls in action, ppembed's full source code and makefile live in this book's examples distribution: PP3E\Integration\Embed\HighLevelApi This API serves as a supplemental example of advanced embedding techniques. As a sample of the kinds of tools you can build to simplify embedding, the ppembed API's header file is shown in Example 23-24. You are invited to study, use, copy, and improve its code as you like. Or you can simply write an API of your own; the main point to take from this section is that embedding programs need to be complicated only if you stick with the Python runtime API as shipped. By adding convenience functions such as those in ppembed, embedding can be as simple as you make it. It also makes your C programs immune to changes in the Python C core; ideally, only the API must change if Python ever does. In fact, the third edition of this book proved this point: one of the utilities in the API had to be patched for a change in the Python/C API, but only one update was required. Be sure to also see the file abstract.h in the Python include directory if you are in the market for higher-level interfaces. That file provides generic type operation calls that make it easy to do things like creating, filling, indexing, slicing, and concatenating Python objects referenced by pointer from C. Also see the corresponding implementation file, abstract.c, as well as the Python built-in module and type implementations in the Python source distribution for more examples of lower-level object access. Once you have a Python object pointer in C, you can do all sorts of type-specific things to Python inputs and outputs.

Example 23-24. PP3E\Integrate\Embed\HighLevelApi\ppembed.h

/************************************************************************* * PPEMBED, VERSION 2.1 * AN ENHANCED PYTHON EMBEDDED-CALL INTERFACE * * Wraps Python's runtime embedding API functions for easy use. * Most utilities assume the call is qualified by an enclosing module * (namespace). The module can be a filename reference or a dummy module * created to provide a namespace for fileless strings. These routines * automate debugging, module (re)loading, input/output conversions, etc. * * Python is automatically initialized when the first API call occurs. * Input/output conversions use the standard Python conversion format * codes (described in the C API manual). Errors are flagged as either * a -1 int, or a NULL pointer result. Exported names use a PP_ prefix * to minimize clashes; names in the built-in Python API use Py prefixes * instead (alas, there is no "import" equivalent in C, just "from*"). * Also note that the varargs code here may not be portable to certain * C compilers; to do it portably, see the text or file 'vararg.txt' * here, or search for string STDARG in Python's source code files. * * New in version 2.1 (3rd Edition): minor fix for a change in the

* Python C API: ppembed-callables.c call to _PyTuple_Resize -- added * code to manually move args to the right because the original * isSticky argument is now gone; * * New in version 2.0 (2nd Edition): names now have a PP_ prefix, * files renamed, compiles to a single file, fixed pdb retval bug * for strings, char* results returned by the "s" convert code now * point to new char arrays which the caller should free( ) when no * longer needed (this was a potential bug in prior version). Also * added new API interfaces for fetching exception info after errors, * precompiling code strings to byte code, and calling simple objects. * * Also fully supports Python 1.5 module package imports: module names * in this API can take the form "package.package.[...].module", where * Python maps the package names to a nested directories path in your * filesystem hierarchy; package dirs all contain _ _init_ _.py files, * and the leftmost one is in a directory found on PYTHONPATH. This * API's dynamic reload feature also works for modules in packages; * Python stores the full pathname in the sys.modules dictionary. * * Caveats: there is no support for advanced things like threading or * restricted execution mode here, but such things may be added with * extra Python API calls external to this API (see the Python/C API * manual for C-level threading calls; see modules rexec and bastion * in the library manual for restricted mode details). For threading, * you may also be able to get by with C threads and distinct Python * namespaces per Python code segments, or Python language threads * started by Python code run from C (see the Python thread module). * * Note that Python can only reload Python modules, not C extensions, * but it's okay to leave the dynamic reload flag on even if you might * access dynamically loaded C extension modules--in 1.5.2, Python * simply resets C extension modules to their initial attribute state * when reloaded, but doesn't actually reload the C extension file. *************************************************************************/ #ifndef PPEMBED_H #define PPEMBED_H #ifdef _ _cplusplus extern "C" { #endif

/* a C library, but callable from C++ */

#include <stdio.h> #include extern int PP_RELOAD; extern int PP_DEBUG;

/* 1=reload py modules when attributes referenced */ /* 1=start debugger when string/function/member run */

typedef enum { PP_EXPRESSION, PP_STATEMENT } PPStringModes;

/* which kind of code-string */ /* expressions and statements differ */

/***************************************************/

/* ppembed-modules.c: load,access module objects */ /***************************************************/ extern extern extern extern extern

char *PP_Init(char *modname); int PP_Make_Dummy_Module(char *modname); PyObject *PP_Load_Module(char *modname); PyObject *PP_Load_Attribute(char *modname, char *attrname); int PP_Run_Command_Line(char *prompt);

/**********************************************************/ /* ppembed-globals.c: read,write module-level variables */ /**********************************************************/ extern int PP_Convert_Result(PyObject *presult, char *resFormat, void *resTarget); extern int PP_Get_Global(char *modname, char *varname, char *resfmt, void *cresult); extern int PP_Set_Global(char *modname, char *varname, char *valfmt, ... /*val*/);

/***************************************************/ /* ppembed-strings.c: run strings of Python code */ /***************************************************/ extern int PP_Run_Codestr(PPStringModes mode, char *code, char *modname, char *resfmt, void *cresult);

/* /* /* /*

run C string of code */ code=expr or stmt? */ codestr, modnamespace */ result type, target */

extern PyObject* PP_Debug_Codestr(PPStringModes mode, /* run string in pdb */ char *codestring, PyObject *moddict); extern PyObject * PP_Compile_Codestr(PPStringModes mode, char *codestr);

/* precompile to bytecode */

extern int PP_Run_Bytecode(PyObject *codeobj, /* run a bytecode object */ char *modname, char *resfmt, void *restarget); extern PyObject * /* run bytecode under pdb */ PP_Debug_Bytecode(PyObject *codeobject, PyObject *moddict);

/*******************************************************/ /* ppembed-callables.c: call functions, classes, etc. */ /*******************************************************/ extern int PP_Run_Function(char *modname, char *funcname,

/* mod.func(args) */ /* func|classname */

char *resfmt, char *argfmt,

void *cresult, /* result target */ ... /* arg, arg... */ ); /* input arguments*/

extern PyObject* PP_Debug_Function(PyObject *func, PyObject *args);

/* call func in pdb */

extern int PP_Run_Known_Callable(PyObject *object, /* func|class|method */ char *resfmt, void *restarget, /* skip module fetch */ char *argfmt, ... /* arg,.. */ );

/**************************************************************/ /* ppembed-attributes.c: run object methods, access members */ /**************************************************************/ extern int PP_Run_Method(PyObject *pobject, char *method, /* uses Debug_Function */ char *resfmt, void *cresult, /* output */ char *argfmt, ... /* arg, arg... */ ); /* inputs */ extern int PP_Get_Member(PyObject *pobject, char *attrname, char *resfmt, void *cresult);

/* output */

extern int PP_Set_Member(PyObject *pobject, char *attrname, char *valfmt, ... /* val, val... */ );

/* input */

/**********************************************************/ /* ppembed-errors.c: get exception data after API error */ /**********************************************************/ extern extern extern extern extern

void PP_Fetch_Error_Text( ); char PP_last_error_type[]; char PP_last_error_info[]; char PP_last_error_trace[]; PyObject *PP_last_traceback;

/* fetch (and clear) exception */ /* exception name text */ /* exception data text */ /* exception traceback text */ /* saved exception traceback object */

#ifdef _ _cplusplus } #endif #endif /* !PPEMBED_H */

23.6.5. Other Integration Examples (External) While writing this chapter, I ran out of space before I ran out of examples. Besides the ppembed API example described in the last section, you can find a handful of additional Python/C integration selfstudy examples in this book's examples distribution:

PP3E\Integration\Embed\Inventory The full implementation of the validation examples listed earlier. This case study uses the ppembed API to run embedded Python order validations, both as embedded code strings and as functions fetched from modules. The inventory is implemented with and without shelves and pickle files for data persistence.

PP3E\Integration\Mixed\Exports A tool for exporting C variables for use in embedded Python programs.

PP3E\Integration\Embed\TestApi A ppembed test program, shown with and without package import paths to identify modules. Some of these are large C examples that are probably better studied than listed.

23.7. Other Integration Topics In this book, the term integration has largely meant mixing Python with components written in C or C++ (or other C-compatible languages) in extending and embedding modes. But from a broader perspective, integration also includes any other technology that lets us mix Python components into larger systems. This last section briefly looks at a handful of integration technologies beyond the C API tools we've seen in this part of the book.

23.7.1. Jython: Java Integration We met Jython in Chapter 18 but it is worth another mention in the context of integration at large. As we saw earlier, Jython supports two kinds of integration: Jython uses Java's reflection API to allow Python programs to call out to Java class libraries automatically (extending). The Java reflection API provides Java type information at runtime and serves the same purpose as the glue code we've generated to plug C libraries into Python in this part of the book. In Jython, however, this runtime type information allows largely automated resolution of Java calls in Python scriptsno glue code has to be written or generated. Jython also provides a Java PythonInterpreter class API that allows Java programs to run Python code in a namespace (embedding), much like the C API tools we've used to run Python code strings from C programs. In addition, because Jython implements all Python objects as instances of a Java PyObject class, it is straightforward for the Java layer that encloses embedded Python code to process Python objects. In other words, Jython allows Python to be both extended and embedded in Java, much like the C integration strategies we've seen in this part of the book. By adding a simpler scripting language to Java applications, Jython serves many of the same roles as the C integration tools we've studied. With the addition of the Jython system, Python may be integrated with any C-compatible program by using C API tools, as well as any Java-compatible program by using Jython. Although Jython provides a remarkably seamless integration model, Python code runs slower in the Jython implementation, and its reliance on Java class libraries and execution environments introduces Java dependencies that may be a concern in some Python-oriented development scenarios. See Chapter 18 for more Jython details; for the full story, read the documentation available online at http://www.jython.org.

23.7.2. IronPython: C#/.NET Integration Much like Jython, the emerging IronPython implementation of the Python language promises to provide seamless integration between Python code and software components written for the .NET framework. Although .NET is a Microsoft Windows initiative, the Mono open source implementation of .NET for Linux provides .NET functionality in a cross-platform fashion. Like Jython, IronPython compiles Python source code to the .NET systems bytecode format and runs programs on the system's runtime engine. As a result, integration with external components is similarly seamless.

Also like Jython, the net effect is to provide Python as an easy-to-use scripting language for C#/.NET-base applications, and a rapid development tool that complements C#. For more details on IronPython, as well as its alternatives such as Python.NET, do a web search on Google.com or visit Python's home page at http://www.python.org.

23.7.3. COM Integration on Windows We briefly discussed Python's support for the COM object model on Windows when we explored Active Scripting in Chapter 18, but it's really a general integration tool that is useful apart from the Internet too. Recall that COM defines a standard and language-neutral object model with which components written in a variety of programming languages may integrate and communicate. Python's PyWin32 Windows extension package tools allow Python programs to implement both server and client in the COM interface model. As such, it provides a powerful way to integrate Python programs with programs written in other COM-aware languages such as Visual Basic, Delphi, Visual C++, PowerBuilder, and even other Python programs. Python scripts can also use COM calls to script popular Microsoft applications such as Word and Excel, since these systems register COM object interfaces of their own. On the downside, COM implies a level of dispatch indirection and is a Windows-only solution at this writing. As a result, it is not as fast or as portable as some of the lower-level integration schemes we've studied in this part of the book (linked-in, in-process, and direct calls between Python and Ccompatible language components). For nontrivial use, COM is also considered to be a large system, and further details about it are well beyond the scope of this book. For more information on COM support and other Windows extensions, refer to Chapter 18 in this book, and to O'Reilly's Python Programming on Win32, by Mark Hammond and Andy Robinson. That book also describes how to use Windows compilers to do Python/C integration in much more detail than is possible here; for instance, it shows how to use Visual C++ tools to compile and link Python C/C++ integration layer code. The basic C code behind low-level extending and embedding on Windows is the same as shown in this book, but compiling and linking details vary.

23.7.4. CORBA Integration There is also much open source support for using Python in the context of a CORBA-based application. CORBA stands for the Common Object Request Broker; it's a language-neutral way to distribute systems among communicating components, which speak through an object model architecture. As such, it represents another way to integrate Python components into a larger system. Python's CORBA support includes the public domain systems OmniORB, ILU, and fnorb (see http://www.python.org or do a web search for pointers). The OMG (Object Management Group, responsible for directing CORBA growth) has also played host to an effort to elect Python as the standard scripting language for CORBA-based systems. Python is an ideal language for programming distributed objects, and it is being used in such a role by many companies around the world. Like COM, CORBA is a large systemtoo large for us to even scratch the surface in this text. For more details, search Python's web site for CORBA-related materials.

23.7.5. Other Languages In the public domain, you'll also find direct support for mixing Python with other compiled languages. For example, the f2py and PyFort systems provide integration with FORTRAN code, and other tools provide access to languages such as Delphi and Objective-C. The PyObjC project, for instance, aims to provide a bridge between Python and Objective-C, the most important usage of which is writing Cocoa GUI applications on Mac OS X in Python. Search the Web for details on other languageintegration tools.

23.7.6. Network-Based Integration Protocols Finally, there is also support in the Python world for Internet-based data transport protocols, including SOAP, XML-RPC, and even basic HTTP. Some of these support the notion of Python as an implementation language for web services. These are distributed models, generally designed for integration across a network, rather than in-process calls. XML-RPC is supported by a standard library module in Python, but search the Web for more details on these protocols.

23.7.7. Integration Versus Optimization Given so many integration options, choosing among them can be puzzling. For instance, when should you choose something like COM over writing C extension modules? As usual, it depends on why you're interested in mixing external components into your Python programs in the first place. Basically, frameworks such as Jython, IronPython, COM, and CORBA allow Python scripts to leverage existing libraries of software components, and they do a great job of addressing goals such as code reuse and integration. However, they say almost nothing about optimization: integrated components are not necessarily faster than the Python equivalents. On the other hand, Python extension modules and types coded in a compiled language such as C serve two roles: they too can be used to integrate existing components, but they also tend to be a better approach when it comes to boosting system performance. In closing, here are a few words of context.

23.7.7.1. Framework roles Frameworks such as COM and CORBA can perhaps be understood as alternatives to the Python/C integration techniques we met in this part of the book. For example, packaging Python logic as a COM server makes it available for something akin to embeddingmany languages (including C) can access it using the COM client-side interfaces we met in Chapter 18. And as we saw earlier, Jython allows Java to embed and run Python code and objects through a Java class interface. Furthermore, frameworks allow Python scripts to use existing component libraries: standard Java class libraries in Jython, COM server libraries on Windows, and so on. In such a role, the external libraries exposed by such frameworks are more or less analogous to Python extension modules. For instance, Python scripts that use COM client interfaces to access an external object are acting much like importers of C extension modules (albeit through the COM indirection layer).

23.7.7.2. Extension module roles

Python's C API is designed to serve in many of the same roles. As we've seen, C extension modules can serve as code reuse and integration tools tooit's straightforward to plug existing C and C++ libraries into Python with SWIG. In most cases, we simply generate and import the glue code created with SWIG to make almost any existing compiled library available for use in Python scripts. In fact, as we saw in the preceding chapter, it's so easy to plug in libraries with SWIG that extensions are usually best coded first as simple C/C++ libraries and later wrapped for use in Python with SWIG. Adding a COM layer to an existing C library may or may not be as straightforward, but it will clearly be less portableCOM is currently a Windows-only technology. Moreover, Python's embedding API allows other languages to run Python code, much like client-side interfaces in COM. One of the primary reasons for writing C extension modules in the first place, though, is optimization: key parts of Python applications may be implemented or recoded as C or C++ extension modules to speed up the system at large (as in the last chapter's stack examples). Moving such components to compiled extension modules not only improves system performance, but also is completely seamlessmodule interfaces in Python look the same no matter what programming language implements the module.

23.7.7.3. Picking an integration technology By contrast, Jython, COM, and CORBA do not deal directly with optimization goals at all; they serve only to integrate. For instance, Jython allows Python scripts to automatically access Java libraries, but it generally mandates that non-Python extensions be coded in the Java language that is itself usually interpreted and is no speed demon. COM and CORBA focus on the interfaces between components and leave the component implementation language ambiguous by design. Exporting a Python class as a COM server, for instance, can make its tools widely reusable on Windows but has little to do with performance improvement. Because of their different focus, frameworks are not quite replacements for the more direct Python/C extension modules and types we've studied in these last two chapters, and they are less direct (and hence likely slower) than Python's C embedding API. It's possible to mix-and-match approaches, but the combinations are rarely any better than their parts. For example, although C libraries can be added to Java with its native call interface, it's neither a secure nor a straightforward undertaking. And while C libraries can also be wrapped as COM servers to make them visible to Python scripts on Windows, the end result will probably be slower and no less complex than a more directly linked-in Python extension module. As you can see, there are a lot of options in the integration domain. Perhaps the best parting advice I can give you is simply that different tools are meant for different tasks. C extension modules and types are ideal at optimizing systems and integrating libraries, but frameworks offer other ways to integrate componentsJython for mixing in Java tools, COM for reusing and publishing objects on Windows, and so on. As always, your mileage may vary.

Part VIII: The End This last part of the book wraps up with a single short chapter:

Chapter 24, Conclusion: Python and the Development Cycle This chapter discusses Python's roles and scope. It returns to some of the broader ideas that were introduced in Chapter 1, with the added perspective afforded by the rest of the book. Much of this chapter is philosophical in nature, but it underscores some of the main reasons for using a tool like Python. Note that there are no reference appendixes here. For additional reference resources, consult the Python standard manuals available online, or commercially published reference books such as O'Reilly's Python Pocket Reference, by Mark Lutz, and Python in a Nutshell, by Alex Martelli, as well as Python Essential Reference, by David M. Beazley (Sams). For additional Python core language material, see O'Reilly's Learning Python, by Mark Lutz. And for help on other Python-related topics, see the resources available at Python's official web site, http://www.python.org, or search the Web using your favorite web search engine.

Chapter 24. Conclusion: Python and the Development Cycle Section 24.1. "That's the End of the Book, Now Here's the Meaning of Life" Section 24.2. "Something's Wrong with the Way We Program Computers" Section 24.3. The "Gilligan Factor" Section 24.4. Doing the Right Thing Section 24.5. Enter Python Section 24.6. But What About That Bottleneck? Section 24.7. On Sinking the Titanic Section 24.8. So What's "Python: The Sequel"? Section 24.9. In the Final Analysis . . . Section 24.10. Postscript to the Second Edition (2000) Section 24.11. Postscript to the Third Edition (2006)

24.1. "That's the End of the Book, Now Here's the Meaning of Life" Well, the meaning of Python, anyway. In the introduction to this book, I promised that we'd return to the issue of Python's roles after seeing how it is used in practice. So in closing, here are some subjective comments on the broader implications of the language. Most of this conclusion remains unchanged since the first edition of this book was penned in 1995, but so are the factors that pushed Python into the development spotlight. As I mentioned in the first chapter, Python's focus is on concepts such as quality, productivity, portability, and integration. I hope that this book has demonstrated some of the benefits of that focus in action. Along the way, we've seen Python applied to systems programming, GUI development, Internet scripting, database and text processing, and more. And we've witnessed firsthand the application of the language to realistically scaled tasks. I hope you've also had some fun; that, too, is part of the Python story. In this conclusion, I wish now to return to the forest after our long walk among the treesto revisit Python's roles in more concrete terms. In particular, Python's role as a prototyping tool can profoundly affect the development cycle.

24.2. "Something's Wrong with the Way We Program Computers" This has to be one of the most overused lines in the business. Still, given time to ponder the big picture, most of us would probably agree that we're not quite "there" yet. Over the last few decades, the computer software industry has made significant progress on streamlining the development task (anyone remember dropping punch cards?). But at the same time, the cost of developing potentially useful computer applications is often still high enough to make them impractical. Moreover, systems built using modern tools and paradigms are often delivered far behind schedule. Software engineering remains largely defiant of the sort of quantitative measurements employed in other engineering fields. In the software world, it's not uncommon to take one's best time estimate for a new project and multiply by a factor of two or three to account for unforeseen overheads in the development task. This situation is clearly unsatisfactory for software managers, developers, and end users.

24.3. The "Gilligan Factor" It has been suggested, tongue in cheek, that if there were a patron saint of software engineers, the honor would fall on none other than Gilligan, the character in the pervasively popular American television show of the 1960s, Gilligan's Island. Gilligan is the enigmatic, sneaker-clad first mate, widely held to be responsible for the shipwreck that stranded the now-residents of the island. To be sure, Gilligan's situation seems oddly familiar. Stranded on a desert island with only the most meager of modern technological comforts, Gilligan and his cohorts must resort to scratching out a living using the resources naturally available. In episode after episode, we observe the Professor developing exquisitely intricate tools for doing the business of life on their remote island, only to be foiled in the implementation phase by the ever-bungling Gilligan. But clearly it was never poor Gilligan's fault. How could one possibly be expected to implement designs for such sophisticated applications as home appliances and telecommunications devices, given the rudimentary technologies available in such an environment? He simply lacked the proper tools. For all we know, Gilligan may have had the capacity for engineering on the grandest level. But you can't get there with bananas and coconuts. And pathologically, time after time, Gilligan wound up inadvertently sabotaging the best of the Professor's plans: misusing, abusing, and eventually destroying his inventions. If he could just pedal his makeshift stationary bicycle faster and faster (he was led to believe), all would be well. But in the end, inevitably, the coconuts were sent hurling into the air, the palm branches came crashing down around his head, and poor Gilligan was blamed once again for the failure of the technology. Dramatic though this image may be, some observers would consider it a striking metaphor for the software industry. Like Gilligan, we software engineers are often asked to perform tasks with arguably inappropriate tools. Like Gilligan, our intentions are sound, but technology can hold us back. And like poor Gilligan, we inevitably must bear the brunt of management's wrath when our systems are delivered behind schedule. You can't get there with bananas and coconuts . . . .

24.4. Doing the Right Thing Of course, the Gilligan factor is an exaggeration, added for comic effect. But few would argue that the bottleneck between ideas and working systems has disappeared completely. Even today, the cost of developing software far exceeds the cost of computer hardware. And when software is finally delivered, it often comes with failure rates that would be laughable in other engineering domains. Why must programming be so complex? Let's consider the situation carefully. By and large, the root of the complexity in developing software isn't related to the role it's supposed to performusually this is a well-defined, real-world process. Rather, it stems from the mapping of real-world tasks onto computer-executable models. And this mapping is performed in the context of programming languages and tools. The path toward easing the software bottleneck must therefore lie, at least partially, in optimizing the act of programming itself by deploying the right tools. Given this realistic scope, there's much that can be done nowthere are a number of purely artificial overheads inherent in our current tools.

24.4.1. The Static Language Build Cycle Using traditional static languages, there is an unavoidable overhead in moving from coded programs to working systems: compile and link steps add a built-in delay to the development process. In some environments, it's common to spend many hours each week just waiting for a static language application's build cycle to finish. Given that modern development practice involves an iterative process of building, testing, and rebuilding, such delays can be expensive and demoralizing (if not physically painful). Of course, this varies from shop to shop, and in some domains the demand for performance justifies build-cycle delays. But I've worked in C++ environments where programmers joked about having to go to lunch whenever they recompiled their systems. Except they weren't really joking.

24.4.2. Artificial Complexities With many traditional programming tools, you can easily lose focus: the very act of programming becomes so complex that the real-world goal of the program is obscured. Traditional languages divert valuable attention to syntactic issues and development of bookkeeping code. Obviously, complexity isn't an end in itself; it must be clearly warranted. Yet some of our current tools are so complex that the language itself makes the task harder and lengthens the development process.

24.4.3. One Language Does Not Fit All Many traditional languages implicitly encourage homogeneous, single-language systems. By making integration complex, they impede the use of multiple-language tools. As a result, instead of being able to select the right tool for the task at hand, developers are often compelled to use the same language for every component of an application. Since no language is good at everything, this constraint inevitably sacrifices both product functionality and programmer productivity.

Until our machines are as clever at taking directions as we are (arguably, not the most rational of goals), the task of programming won't go away. But for the time being, we can make substantial progress by making the mechanics of that task easier. This topic is what I want to talk about now.

24.5. Enter Python If this book has achieved its goals, you should by now have a good understanding of why Python has been called a "next-generation scripting language." Compared with similar tools, it has some critical distinctions that we're finally in a position to summarize:

Tcl Like Tcl, Python can be used as an embedded extension language. Unlike Tcl, Python is also a full-featured programming language. For many, Python's data structure tools and support for programming-in-the-large make it useful in more domains. Tcl demonstrated the utility of integrating interpreted languages with C modules. Python provides similar functionality plus a powerful, object-oriented language; it's not just a command string processor.

Perl Like Perl, Python can be used for writing shell tools, making it easy to use system services. Unlike Perl, Python has a simple, readable syntax and a remarkably coherent design. For some, this makes Python easier to use and a better choice for programs that must be reused or maintained by others. Without question, Perl is a powerful system administration tool. But once we move beyond processing text and files, Python's features become attractive.

Scheme/Lisp Like Scheme (and Lisp), Python supports dynamic typing, incremental development, and metaprogramming; it exposes the interpreter's state and supports runtime program construction. Unlike Lisp, Python has a procedural syntax that is familiar to users of mainstream languages such as C and Pascal. If extensions are to be coded by end users, this can be a major advantage.

Smalltalk Like Smalltalk, Python supports object-oriented programming (OOP) in the context of a highly dynamic language. Unlike Smalltalk, Python doesn't extend the object system to include fundamental program control flow constructs. Users need not come to grips with the concept of if statements as message-receiving objects to use PythonPython is more conventional.

Icon Like Icon, Python supports a variety of high-level datatypes and operations such as lists, dictionaries, and slicing. Unlike Icon, Python is fundamentally simple. Programmers (and end users) don't need to master esoteric concepts such as backtracking just to get started.

BASIC

Like modern structured BASIC dialects, Python has an interpretive/interactive nature. Unlike most BASICs, Python includes standard support for advanced programming features such as classes, modules, exceptions, high-level datatypes, and general C integration. And unlike Visual Basic, Python provides a cross-platform solution, which is not controlled by a commercially vested company.

Java Like Java, Python is a general-purpose language, supports OOP, exceptions, and modular design, and compiles to a portable bytecode format. Unlike Java, Python's simple syntax and built-in datatypes make development much more rapid. Python programs are typically onethird to one-fifth the size of the equivalent Java program.

C/C++ Like C and C++, Python is a general-purpose language and can be used for long-term strategic system development tasks. Unlike compiled languages in general, Python also works well in tactical mode, as a rapid development language. Python programs are smaller, simpler, and more flexible than those written in compiled languages. For instance, because Python code does not constrain datatypes or sizes, it both is more concise and can be applied in a broader range of contexts. (For more on this comparison, see the sidebar "Why Not Just Use C or C++? " in Chapter 1.) All of these languages (and others) have merit and unique strengths of their ownin fact, Python borrowed most of its features from languages such as these. It's not Python's goal to replace every other language; different tasks require different tools, and mixed-language development is one of Python's main ideas. But Python's blend of advanced programming constructs and integration tools make it a natural choice for the problem domains we've talked about in this book, and many more.

24.6. But What About That Bottleneck? Back to our original question: how can the act of writing software be made easier? At some level, Python is really "just another computer language." It's certainly true that Python the language doesn't represent much that's radically new from a theoretical point of view. So why should we be excited about Python when so many languages have been tried already? What makes Python of interest, and what may be its larger contribution to the development world, is not its syntax or semantics, but its world view: Python's combination of tools makes rapid development a realistic goal. In a nutshell, Python fosters rapid development by providing features like these: Fast build-cycle turnaround A very high-level, object-oriented language Integration facilities to enable mixed-language development Specifically, Python attacks the software development bottleneck on four fronts, described in the following sections.

24.6.1. Python Provides Immediate Turnaround Python's development cycle is dramatically shorter than that of traditional tools. In Python, there are no compile or link stepsPython programs simply import modules at runtime and use the objects they contain. Because of this, Python programs run immediately after changes are made. And in cases where dynamic module reloading can be used, it's even possible to change and reload parts of a running program without stopping it at all. Figure 24-1 shows Python's impact on the development cycle.

Figure 24-1. Development cycles

Because Python is interpreted, there's a rapid turnaround after program changes. And because Python's parser is embedded in Python-based systems, it's easy to modify programs at runtime. For example, we saw how GUI programs developed with Python allow developers to change the code that handles a button press while the GUI remains active; the effect of the code change may be observed immediately when the button is pressed again. There's no need to stop and rebuild. More generally, the entire development process in Python is an exercise in rapid prototyping. Python lends itself to experimental and interactive program development, and it encourages developing systems incrementally by testing components in isolation and putting them together later. In fact, we've seen that we can switch from testing components (unit tests) to testing whole systems (integration tests) arbitrarily, as illustrated in Figure 24-2.

Figure 24-2. Incremental development

24.6.2. Python Is "Executable Pseudocode" Python's very high-level nature means there's less for us to program and manage. The lack of compile and link steps isn't really enough to address the development-cycle bottleneck by itself. For instance, a C or C++ interpreter might provide fast turnaround but would still be almost useless for rapid development: the language is too complex and low level. But because Python is also a simple language, coding is dramatically faster too. For example, its dynamic typing, built-in objects, and garbage collection eliminate much of the manual bookkeeping code required in lower-level languages such as C and C++. Since things such as type declarations, memory management, and common data structure implementations are conspicuously absent, Python programs are typically a fraction of the size of their C and C++ equivalents. There's less to write and read, and there are fewer interactions among language components, and thus there is less opportunity for coding errors. Because most bookkeeping code is missing, Python programs are easier to understand and more closely reflect the actual problem they're intended to address. And Python's high-level nature not only allows algorithms to be realized more quickly, but also makes it easier to learn the language.

24.6.3. Python Is OOP Done Right For OOP to be useful, it must be easy to apply. Python makes OOP a flexible tool by delivering it in a dynamic language. More importantly, its class mechanism is a simplified subset of C++'s; this simplification is what makes OOP useful in the context of a rapid-development tool. For instance, when we looked at data structure classes in this book, we saw that Python's dynamic typing let us apply a single class to a variety of object types; we didn't need to write variants for each supported type. In exchange for not constraining types, Python code becomes flexible and agile. In fact, Python's OOP is so easy to use that there's really no reason not to apply it in most parts of an application. Python's class model has features powerful enough for complex programs, yet because they're provided in simple ways, they do not interfere with the problem we're trying to solve.

24.6.4. Python Fosters Hybrid Applications As we've seen earlier in this book, Python's extending and embedding support makes it useful in mixed-language systems. Without good integration facilities, even the best rapid-development language is a "closed box" and is not generally useful in modern development environments. But Python's integration tools make it usable in hybrid, multicomponent applications. As one consequence, systems can simultaneously utilize the strengths of Python for rapid development and of traditional languages such as C for rapid execution. While it's possible and common to use Python as a standalone tool, it doesn't impose this mode. Instead, Python encourages an integrated approach to application development. By supporting arbitrary mixtures of Python and traditional languages, Python fosters a spectrum of development paradigms, ranging from pure prototyping to pure efficiency. Figure 24-3 shows the abstract case.

Figure 24-3. The development mode "slider"

As we move to the left extreme of the spectrum, we optimize speed of development. Moving to the right side optimizes speed of execution. And somewhere in between is an optimum mix for any given project. With Python, not only can we pick the proper mix for our project, but we can also later move the RAD slider in the picture arbitrarily as our needs change:

Going to the right Projects can be started on the left end of the scale in Python and gradually moved toward the right, module by module, as needed to optimize performance for delivery.

Going to the left Similarly, we can move strategic parts of existing C or C++ applications on the right end of the scale to Python, to support end-user programming and customization on the left end of the scale. This flexibility of development modes is crucial in realistic environments. Python is optimized for speed of development, but that alone isn't always enough. By themselves, neither C nor Python is adequate to address the development bottleneck; together, they can do much more. As shown in Figure 24-4, for instance, apart from standalone use, one of Python's most common roles splits systems into frontend components that can benefit from Python's ease of use and backend modules that require the efficiency of static languages such as C, C++, or FORTRAN.

Figure 24-4. Hybrid designs

Whether we add Python frontend interfaces to existing systems or design them in early on, such a division of labor can open up a system to its users without exposing its internals. When developing new systems, we also have the option of writing entirely in Python at first and then optimizing as needed for delivery by moving performance-critical components to compiled languages. And because Python and C modules look the same to clients, migration to compiled extensions is transparent. Prototyping doesn't make sense in every scenario. Sometimes splitting a system into a Python frontend and a C/C++ backend up front works best. And prototyping doesn't help much when enhancing existing systems. But where it can be applied, early prototyping can be a major asset. By prototyping in Python first, we can show results more quickly. Perhaps more critically, end users can be closely involved in the early stages of the process, as sketched in Figure 24-5. The result is systems that more closely reflect their original requirements.

Figure 24-5. Prototyping with Python

24.7. On Sinking the Titanic In short, Python is really more than a language; it implies a development philosophy. The concepts of prototyping, rapid development, and hybrid applications certainly aren't new. But while the benefits of such development modes are widely recognized, there has been a lack of tools that make them practical without sacrificing programming power. This is one of the main gaps that Python's design fills: Python provides a simple but powerful rapid development language, along with the integration tools needed to apply it in realistic development environments. This combination arguably makes Python unique among similar tools. For instance, Tcl is a good integration tool but not a full-blown language; Perl is a powerful system administration language but a weak integration tool. But Python's marriage of a powerful dynamic language and integration opens the door to fundamentally faster development modes. With Python, it's no longer necessary to choose between fast development and fast execution. By now, it should be clear that a single programming language can't satisfy all our development goals. In fact, our needs are sometimes contradictory: the goals of efficiency and flexibility will probably always clash. Given the high cost of making software, the choice between development and execution speed is crucial. Although machine cycles are cheaper than programmers, we can't yet ignore efficiency completely. But with a tool like Python, we don't need to decide between the two goals at all. Just as a carpenter wouldn't drive a nail with a chainsaw, software engineers are now empowered to use the right tool for the task at hand: Python when speed of development matters, compiled languages when efficiency dominates, and combinations of the two when our goals are not absolute. Moreover, we don't have to sacrifice code reuse or rewrite exhaustively for delivery when applying rapid development with Python. We can have our rapid development cake and eat it too:

Reusability Because Python is a high-level, object-oriented language, it encourages writing reusable software and well-designed systems.

Deliverability Because Python is designed for use in mixed-language systems, we don't have to move to more efficient languages all at once. In many scenarios, a system's frontend and infrastructure may be written in Python for ease of development and modification, but the kernel is still written in C or C++ for efficiency. Python has been called the tip of the iceberg in such systemsthe part visible to end users of a package, as captured in Figure 24-6.

Figure 24-6. "Sinking the Titanic" with mixed-language systems

Such an architecture uses the best of both worlds: it can be extended by adding more Python code or by writing C extension modules, depending on performance requirements. But this is just one of many mixed-language development scenarios:

System interfaces Packaging libraries as Python extension modules makes them more accessible.

End-user customization Delegating logic to embedded Python code provides for onsite changes.

Pure prototyping Python prototypes can be moved to C all at once or piecemeal.

Legacy code migration Moving existing code from C to Python makes it simpler and more flexible.

Standalone use Of course, using Python all by itself leverages its existing library of tools. Python's design lets us apply it in whatever way makes sense for each project.

24.8. So What's "Python: The Sequel"? As we've seen in this book, Python is a multifaceted tool, useful in a wide variety of domains. What can we say about Python to sum up here? In terms of some of its best attributes, the Python language is: General purpose Object-oriented Interpreted Very high level Openly designed Widely portable Freely available Refreshingly coherent Python is useful for both standalone development and extension work, and it is optimized to boost developer productivity on many fronts. But the real meaning of Python is really up to you, the reader. Since Python is a general-purpose tool, what it "is" depends on how you choose to use it.

24.9. In the Final Analysis . . . I hope this book has taught you something about Python, both the language and its roles. Beyond this text, there is really no substitute for doing some original Python programming. Be sure to grab a reference source or two to help you along the way. The task of programming computers will probably always be challenging. Perhaps happily, there will continue to be a need for intelligent software engineers, skilled at translating real-world tasks into computer-executable form, at least for the foreseeable future. After all, if it were too easy, none of us would get paid. But current development practice and tools make our tasks unnecessarily difficult: many of the obstacles faced by software developers are purely artificial. We have come far in our quest to improve the speed of computers; the time has come to focus our attention on improving the speed of development. In an era of constantly shrinking schedules, productivity must be paramount. Python, as a mixed-paradigm tool, has the potential to foster development modes that simultaneously leverage the benefits of rapid development and of traditional languages. While Python won't solve all the problems of the software industry, it offers hope for making programming simpler, faster, and at least a little more enjoyable. It may not get us off that island altogether, but it sure beats bananas and coconuts.

24.10. Postscript to the Second Edition (2000) One of the luxuries of updating a book like this is that you get an opportunity to debate yourself, or at least your opinions, from years past. With the benefit of five years' retrospect, I'd like to add a few comments to the original conclusion.

24.10.1. Integration Isn't Everything The conclusion for this book's first edition stressed the importance of Python's role as an integration tool. Although the themes underscored there are still valid, I should point out that not all Python applications rely explicitly on the ability to be mixed with components written in other languages. Many developers now use Python in standalone mode, either not having or not noticing integration layers. For instance, developers who code Common Gateway Interface (CGI) Internet scripts with Python often code in pure Python. Somewhere down the call chain, C libraries are called (to access sockets, databases, and so on), but Python coders often don't need to care. In fact, this has proven to be true in my own recent experience as well. While working on the new GUI, system, and Internet examples for this edition, I worked purely in Python for long periods of time. A few months later I also worked on a Python/C++ integration framework, but this integration project was entirely separate from the pure Python book examples programming effort. Many projects are implemented in Python alone. That is not to say that Python's integration potential is not one of its most profound attributesindeed, most Python systems are composed of combinations of Python and C. However, in many cases, the integration layer is implemented once by a handful of advanced developers, while others perform the bulk of the programming in Python alone. If you're fortunate enough to count yourself among the latter group, Python's overall ease of use may seem more crucial than its integration role.

24.10.2. The End of the Java Wars In 1995, the Python community perceived a conflict between Java and Python in terms of competition for developer mindsharehence the sidebar "Python Versus Java: Round 1?" in the first edition. Since then, this has become virtually a nonissue; I've even deleted this sidebar completely. This cooling of hostilities has come about partly because Java's role is now better understood: Java is recognized as a systems development language, not as a scripting language. That is essentially what the sidebar proposed. Java's complexity is on the order of C++'s (from which it is derived), making it impractical for scripting work, where short development cycles are at a premium. This is by designJava is meant for tasks where the extra complexity may make sense. Given the great disparity in their roles, the Python/Java conflict has fizzled. The truce has also been called on account of the new Jython implementation of Python. Jython was described in Chapter 18; in short, it integrates Python and Java programs such that applications can be developed as hybrids: parts can be coded in Python when scripting is warranted and in Java for performance-intensive parts. This is exactly the argument made for C/C++ integration in the conclusion of the first edition; thanks to Jython, the same reasoning behind hybrid systems now applies to Java-based applications.

The claims made by the old Java sidebar are still truePython is simpler, more open, and easier to learn and apply. But that is as it should be: as a scripting language, Python naturally complements systems languages such as Java and C++ instead of competing with them. There are still some who would argue that Python is better suited for many applications now coded in Java. But just as for Python and C and C++, Python and Java seem to work best as a team. It's also worth noting that as I write these words, Microsoft has just announced a new, proprietary language called C# that seems to be intended as a substitute for Java in Microsoft's systems language offerings. Moreover, a new Python port to the C#/.NET environment has been announced as well. See Chapter 18 for detailsthis port is roughly to C# what Jython is to Java. Time will tell whether C# and Java will do battle for mindshare. But given that Python integrates with both, the outcome of these clashes between megacompanies is largely irrelevant; Pythonistas can watch calmly from the sidelines this time around.

24.10.3. We're Not Off That Island Yet As I mentioned in the Preface to this edition, Python has come far in the last five years. Companies around the world have adopted it, and Python now boasts a user base estimated at half a million strong. Yet for all the progress, there is still work to be done, both in improving and popularizing Python and in simplifying software development in general. As I travel around the world teaching Python classes at companies and organizations, I still meet many people who are utterly frustrated with the development tools they are required to use in their jobs. Some even change jobs (or careers) because of such frustrations. Even well after the onset of the Internet revolution, development is still harder than it needs to be. On the other hand, I also meet people who find Python so much fun to use, they can't imagine going back to their old ways. They use Python both on and off the job for the pure pleasure of programming. Five years from now, I hope to report that I meet many more people in the latter category than in the former. After all, Guido may have appeared on the covers of Linux Journal and Dr. Dobb's since the first edition of this book, but we still have a bit more work to do before he makes the cover of Rolling Stone.

24.11. Postscript to the Third Edition (2006) And now here I am in the future again, so I get to add a few more words.

24.11.1. Proof of Concept Some 5 years after writing the second edition of this book, and 10 years after the first, perhaps the most obvious thing worth adding to this original conclusion today is proof of concept: Python's success over the years seems validation of the simplicity and mixed-language themes that Python, and this conclusion, originally advocated. By all accounts, Python has been a greater success than most of its pioneers ever imagined. See Chapter 1 for some statistics on this frontPython is now a mainstream language, widely used in very successful projects and organizations, and often in the context of hybrid architectures. In fact, the question today is not who is using Python, but who is not; it shows up in some fashion in virtually every substantial development organization. Moreover, all signs point to continued growth in years to come; as I write these words, Python's popularity is roughly doubling each year. Today I meet many more people than ever before who are able to use Python. Programming may indeed always be a challenge, but Python has shown that the language used, and the mixture of languages used, can greatly reduce the difficulty of that challenge. People enjoy using Pythonso much so that many of them would find it difficult to go back to using something as tedious and complex as C++.

24.11.2. Integration Today Another trend that has become clear in recent years is that many people are indeed using Python in a hybrid role. GIS and graphical modeling systems, for example, often generate Python code to render models created in a user interface. And many people still plug C and C++ libraries into their Pythons; for instance, hardware testing is often structured as low-level device libraries made accessible to Python scriptsthe classic integration model. Newer systems, such as the IronPython port to the .NET/Mono framework, open up new integration possibilities for the future. I should add again, though, that integration with other components is not required to leverage the flexibility of this language. Many successful Python-based systems are all, or mostly, Python code. Ultimately, every realistic Python program does run linked-in code (even opening a file or network socket invokes a C library function in standard Python), but many systems never integrate usercoded, compiled language libraries. Python code is fast and capable enough to be used for most applications standalone.

24.11.3. Quality Counts So how, then, does one define Python's contribution to the software field? Perhaps its emphasis on simplicityon limiting interactions in your codeis at least as important as its integration focus. This theme makes Python much less difficult and error prone to use than other tools, but it also goes to the core of the software engineering task.

Although it is a richly creative endeavor, developing software is fundamentally an act of engineering, not art. Computer programs and paintings, for example, have very different roles, and they should be created in very different ways. In art, freedom of expression is paramount. One creates a painting for purely aesthetic and often very personal purposes, without expecting the next artist to modify or build upon their work later. But as anyone who has worked in this field for even a few years knows only too well, in engineering, unconstrained freedom of expression can be a liability. In engineering, we need predictability, simplicity, and a limited set of possible interactions. In engineering, maintaining and modifying a creation is usually just as important as building it in the first place. And in engineering, making your system complex for the sake of complexity is not a reasonable goal. It doesn't matter how clever a system's code is, if it cannot be understood and changed by others. The Python language has always taken the complexity issue head-on, in terms of both syntax and overall design. Its syntax model encourages and enforces readable code, and most of the language follows from a small and simple set of core principles. Python does offer alternatives, but there is usually one obvious way to accomplish a task, and a relatively small set of ways that language features interact. Simplicity, explicitness, and lack of exceptional cases permeate the language's design. That is, magic is frowned on in both the Python language and its community, because magic is simply not good engineering. This is in sharp contrast to many other languages, especially in the scripting realm. And this philosophy is much of what makes Python code easier to use and understand and it is a large part of what makes Python programmers so productive. If you can understand someone else's code well enough to reuse it, part of your job is already done. But at the end of the day, something even more profound seems to be at work. After watching Python inspire people for some 13 years, I have come to believe that Python's real legacy, if it is to have one, is just what I stated in the sidebar at the end of this edition's first chapter: it has almost forced software developers to think about quality issues that they may have not otherwise considered. In hindsight, it seems almost deliberateby attempting to be a "better" tool, Python has stirred developers to consider the very term. And by addressing quality among an ever-larger audience, Python has, in its way, helped to improve the state of the software field at large. Not bad, for a little language from Amsterdam.

A Morality Tale of Perl Versus Python (The following was posted recently to the rec.humor.funny Usenet newsgroup by Larry Hastings, and it is reprinted here with the original author's permission. I don't necessarily condone language wars; OK?) This has been percolating in the back of my mind for a while. It's a scene from The Empire Strikes Back, reinterpreted to serve a valuable moral lesson for aspiring programmers. EXTERIOR: DAGOBAHDAY With Yoda strapped to his back, Luke climbs up one of the many thick vines that grow in the swamp until he reaches the Dagobah statistics lab. Panting heavily, he continues his exercisesgrepping, installing new packages, logging in as root, and writing replacements for two-year-old shell scripts in Python.

YODA: Code! Yes. A programmer's strength flows from code maintainability. But beware of Perl. Terse syntax . . . more than one way to do it . . . default variables. The dark side of code maintainability are they. Easily they flow, quick to join you when code you write. If once you start down the dark path, forever will it dominate your destiny, consume you it will. LUKE: Is Perl better than Python? YODA: No . . . no . . . no. Quicker, easier, more seductive. LUKE: But how will I know why Python is better than Perl? YODA: You will know. When your code you try to read six months from now.

About the Author Mark Lutz is the world leader in Python training, the author of Python's earliest and best-selling texts, and a pioneering figure in the Python community. Mark is also the author of the O'Reilly book Python Pocket Reference, and coauthor of Learning Python, all currently in second or third editions. Involved with Python since 1992, he started writing Python books in 1995 and began teaching Python classes in 1997. As of mid-2006, he has instructed more than 170 Python training sessions. In addition, he holds B.S. and M.S. degrees in computer science from the University of Wisconsin, and over the last two decades has worked on compilers, programming tools, scripting applications, and assorted client/server systems. Whenever Mark gets a break from spreading the Python word, he leads an ordinary, average life in Colorado. Mark can be reached by email at [email protected], or on the Web at http://www.rmi.net/~lutz.

Colophon The animal on the cover of Programming Python is an African rock python, one of approximately 18 species of python. Pythons are nonvenomous constrictor snakes that live in tropical regions of Africa, Asia, Australia, and some Pacific Islands. Pythons live mainly on the ground, but they are also excellent swimmers and climbers. Both male and female pythons retain vestiges of their ancestral hind legs. The male python uses these vestiges, or spurs, when courting a female. The python kills its prey by suffocation. While the snake's sharp teeth grip and hold the prey in place, the python's long body coils around its victim's chest, constricting tighter each time it breathes out. They feed primarily on mammals and birds. Python attacks on humans are extremely rare. The cover image is a 19th-century engraving from the Dover Pictorial Archive. The cover font is Adobe ITC Garamond. The text font is Linotype Birka; the heading font is Adobe Myriad Condensed; and the code font is LucasFont's TheSans Mono Condensed.

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z]

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Active Scripting Active Server Pages ActiveState ActiveX administrative tools os module AF_INET variable, socket module after method after_idle tools animation techniques anonymous pipes ANTIALIAS filter, PIL anydbm module 2nd shelve module and AnyGui Apache APIs (application programming interfaces) embedded-call Python object model ppembed code strings, running with customizable validations, running objects, running Python C vs. Jython Python integration Python Interpreter running Python from Java SQL App class 2nd append() applets Grail writing in Jython application-level programming applications hierarchies superclasses argument lists arguments command line ASP (Active Server Pages) asynchat module asyncore module attributes COM servers and

doc automated program launchers

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] background activities base64 module BaseHTTPServer module behavior, OOP bidirectional IPC, pipes and BigGui example binary files distinguishing from text files downloading module search trees compared to dictionaries binary data parsing struct module binary data files binary files splitting and BinaryTree class binascii module bind method binding events binding events Tkinter binhex module BitmapImage object books Boost Python system borders labels bound methods callback handlers browsers Active Scripting support email client complexity of forwarding mail performance portability replying to mail retrieving mail root page security protocols selecting mail

sending mail utility modules viewing mail examples in book, running on HTML, languages embedded in interoperability issues Jython and 2nd Python-based restricted file access server files, displaying on surfing the Internet with browsing built-in file objects buttons command option GUIs images byte streams serialized bytecode files cleanup bytecodes, precompiling strings to

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] C compared to Python integration examples C API threads and C modules C types 2nd compiling string stacking timing implementations wrapping in classes with SWIG C# compiler C++ compared to Python C/C++ API vs. Jython classes embedding Python and using in Python data conversion codes embedding Python in code strings, running exceptions and extensions components, adding extensions modules, C extension SWIG Python modules Jython and translating to Python scripts integration 2nd calculator, GUI 2nd _call_ operator callable objects callback handlers as object references binding events and button state CGI and Grail GUI, reloading Jython and registering, embedding Python code and reload() and scheduled callbacks

Tkinter bound methods callable class objects lambda user-defined callbacks GUIs queues callbacks, select() and Canvas events canvas scrolling canvas widgets scrolling CGI (Common Gateway Interface) HTML and module PYTHONPATH, configuring scripts as callback handlers coding for maintainability converting strings in debugging email browser HTML and 2nd HTMLgen and installing missing/invalid inputs, checking for Python and state information, saving web pages web sites Zope user interaction, adding to web interface cgi module 2nd web pages, parsing user input CGI, scripts cgi.escape() cgi.FieldStorage(), form inputs, mocking up cgi.FieldStoreage() cgi.print_form(), debugging CGI scripts cgi.text(), debugging CGI scripts CGIHTTPServer module 2nd chapter checkbuttons callback handlers command option configuring dialogs Entry widgets Message widgets variable option variables and windows, top-level

checkbuttons, adding to HTML forms child processes exiting from forking servers and classes App 2nd application-specific tool set C/C++ embedding Python code and ppembed API using in Python CollectVisitor components attaching extending container classes standalone DBM and EditVisitor FileVisitor form layout, for FTP client GUI graphs to GUIs reusable components and hierarchy mixin multiple clients, handling with of stored objects, changing OOP alternative pickled objects and ReplaceVisitor SearchVisitor 2nd set shelves and stack StreamApp subclasses protocols superclasses application hierarchies widgets customizing wrappers windows ZODB client function client-side scripting PyMailGUI client/server architecture on the Web clients 2nd COM using servers from 2nd 3rd connecting to closing

establishing email command line interacting with viewing mail multiple, handling with classes with forking servers with multiplexing servers with threading servers path formats scripts CGI, scripts email files, transferring over Internet newsgroups web sites, accessing sending files to socket calls socket programs, running locally spawning in parallel clipboard text and clipping widgets clock example close event intercepting code HTML, escaping URLs embedded in legacy, migration maintainable sharing objects between web pages readable running URLs, escaping code reuse C extension modules and data structures and form layout class GUI calculator PyMailGUI 2nd web forms code strings, embedding Python code and 2nd calling Python objects compiling to bytecodes running in dictionaries with results and namespaces code structure PyMailGUI CollectVisitor class color dialogs user selected

labels columns, summing COM (Component Object Model) clients using servers from 2nd 3rd extensions distributed integration with Python servers GUIDs combo function command line arguments CWD and email client sending from examples GUIs Jython web browsers, launching command-line mode splitting files commands shell commands Common Object Request Broker (CORBA) commonhtml module email, viewing state information in URL parameters, passing comparedirs function comparing directory trees reporting differences compilers compiling C extension files modules code strings in embedded Python code component integration Component Object Model conferences configuring checkbuttons email client PYTHONPATH, CGI scripts and radiobuttons connections client closing establishing reserved ports and database server closing 2nd establishing opening

POP connectivity creating Internet, Python and console shelve interface constants raw strings container classes standalone convertEndlines function converting objects to strings, pickled objects and Python objects to/from C datatypes return values strings in CGI scripts cookies 2nd CORBA (Common Object Request Broker) counting source lines cpfile function cPickle module create_filehandler tool creating connectivity servers Apache Mailman tools for with Python code csh (C shell) ctypes cursor labels custom dialogs customization OOP OOP constructors customizing by users, Python and CWD (current working directory) command line and files and import path and CXX system Cygwin forking processes

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Dabo data conversion codes data storage, persistent database cursor object Database Management database object databases access connecting to DBM lists Python and server-side SQL tables loading datatypes binary trees C, converting to/from Python objects objects and sets classes functions moving to dictionaries relational algebra, adding to stacks as lists optimizing date/time formatting dbhash module DBM (Database Management) files compatibility of file operations shelves and dbm module deadlocks FTP and deadlocks, pipes debugging CGI scripts deleting email 2nd files, when downloading web sites demos

setup development DHTML (Dynamic Hypertext Markup Language) Dialog module dialogs color user selected custom demo modal custom nonmodal printing results Quit button standard tkFileDialog module _dict_ attribute dictionaries code strings, running in iteration lists nested structures of dictionaries sets as uses diff command Digital Creations dirdiff 2nd directories CGI scripts and glob module lists joining splitting os.listdir os.popen trees searches walking walking directory trees comparing reporting differences copying deleting file permissions fixing files editing matched global replacements Dispatch() distributed programming doc attribute documentation modules

strings Tkinter domain names domains DOS filenames rewriting line ends converting in one file lines converting in one directory converting in tree start command downloads reusing dump(), pickled objects and Dynamic Hypertext Markup Language (DHTML)

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] EditVisitor class education email client command line configuration module interacting with viewing mail composing messages deleting 2nd interactive prompt loading 2nd 3rd mailboxes accessing encapsulating fetches modules unlocking Mailman Message object modules headers parsing parsing messages passwords POP page POP message numbers reading saving 2nd sending embedding Python code 2nd 3rd code strings compiling running in dictionaries running with objects running with results and namespaces precompiling strings to bytecode registering callback handler objects using Python classes in C ppembed API encapsulation inheritance and OOP encrypting passwords end of file Windows

end-of-line characters CGI scripts and Entry widget (Tkinter) input forms modal dialogs programming variables environment settings launching without eval() input expressions security and event-driven programming events binding Tkinter Canvas examples command lines demos distribution packages filenames launchers listings titles running tree uses web-based examples in book security server-side scripts running viewing exceptions C and CGI scripts and sockets and sys module exec statement input expressions Jython security and execfile(), Jython executable pseudocode exits os module processes status codes threads Extensible Markup Language extensions C types 2nd compiling string stacking timing implementations

C/C++ 2nd components, adding C/C++ modules, C extension SWIG Python interface Tkinter

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] fastset fc command fcntl module features listing fetching shell variables fields labels lists names file permissions directory trees fixing file scanners file tools os module File Transfer Protocol filenames CGI scripts and DOS rewriting examples joining files and files as libraries as scripts binary distinguishing from text files downloading module binary data files client, uploading to web server CWD and DBM compatibility of shelves and deleting, when downloading web sites descriptors directory trees editing in matched downloading 2nd frontend for, adding end of file Windows formatted

persistence and from web server, displaying on client GDBM header, Python HTML, permissions input joining portably usage media portable player tool objects built-in open modes operations output files packing reading file iterators and reading from remote deleting retrieving restricted, accessing on browsers shelve splitting portably usage storing streams redirecting to text distinguishing from binary uploading transferring over Internet downloading frontend for to clients and servers uploading using various means with urllib unpacking FileVisitor class printed list find module find.find tool findFiles function 2nd flat files flushes, pipes fonts labels forked processes child programs forking processes Cygwin (Windows) forking servers

child processes, exiting from zombies killing preventing forks Form class (HTMLgen) format display OOP scripts persistence formatted files persistence and forms web adding input devices to changing hidden fields in inputs, checking for missing/invalid inputs, mocking up laying out with tables reusable selection list on tags Zope and forward-link web pages generating Frame widgets attaching widgets frame-based menus frames GUIs inheritance FTP (File Transfer Protocol) deadlock and files, transferring over Internet downloading mirroring web sites uploading with urllib ftp object quit() retrbinary() 2nd storlines() ftplib module 2nd function calls web browsers, launching functions as published objects C SWIG and comparedirs comparison convertEndlines cpfile findFiles 2nd

initialization method re module redirectedGuiFunc redirectedGuiShellCmd sets multiple operands, supporting timing unique

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] gadgets setup garbage collection reference count management gdbm module generate() geometry managers pack alternatives to columns, spanning rows, spanning widgets, resizing 2nd 3rd 4th 5th get method scales getfile(), FTP _getitem_ method getpass.getpass(), FTP GIF (Graphics Interchange Format) images, displaying on web pages with HTMLgen GIF files glob module GIL (global interpreter lock) threads threads and Gilligan factor glob module 2nd glob modules GIF files globally unique identifier (GUID) globs gopherlib module Grail browser 2nd graphics on web pages adding displaying with HTMLgen PyDraw Graphics Interchange Format graphs moving to classes searching grep popen *.py grep utility grep.grep grepping greps grid geometry manager

FTP client frontend grids gui1 script GuiInput class GuiMaker BigGui self-test GuiMakerFrameMenu GuiMakerWindowMenu GuiMixin GuiOutput class 2nd GUIs (graphical user interfaces) animation techniques buttons calculator 2nd callback handlers, reloading callbacks canvas widgets checkbuttons adding to HTML forms configuring dialogs Entry widgets Message widgets variables and windows, top-level classes reusable components and command lines frames and FTP client frontend Grail and grids Hello World images inheritance Jython, interface automation listboxes menus 2nd non-GUI code object viewers persistent OOP and pipes programs running queues and radiobuttons adding to HTML forms configuring dialogs Entry widgets Message widgets variables and windows, top-level running

scrollbars ShellGui shelve interface sliders sockets streams to widgets, redirecting text editing threads and 2nd 3rd Tkinter toolbars windows, independent windows, pop GuiStreams example

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] handleClient() threading header files Hello World GUIs and hierarchies applications superclasses high-level languages history of Python holmes system HTML (Hypertext Markup Language) browsing email, complexity of CGI scripts and 2nd 3rd embedding in document templates, Zope and email passwords escaping escaping conventions email passwords embedded URLs text file permissions forms and Grail and hidden input fields, passing state information in HTMLgen and hyperlinks JavaScript embedded in library of tools module passwords tags 2nd tables forms Grail and HTMLgen and tables web pages and HTMLgen GIF images, displaying hyperlinks and PYTHONPATH setting HTMLgen module

htmllib module 2nd HTTP (Hypertext Transfer Protocol) CGI scripts and cookies 2nd module requests, Zope and servers CGI scripts and Python implementations httplib module 2nd hyperlinks CGI and scripts state information encrypted passwords in escaping URLs and HTML and HTMLgen and smart links URLs embedded in 2nd syntax Zope and Hypertext Markup Language Hypertext Transfer Protocol

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] IANA (Internet Assigned Numbers Authority) icons top-level windows identifiers for machines on Internet IDLE Tkinter and IETF (Internet Engineering Task Force) image viewer, PyPhoto images buttons displaying PIL file types supported slideshow program thumbnails PIL Tkinter types PIL IMAP (Internet Message Access Protocol) module imaplib module import path CWD and IndexError exception inheritance encapsulation and frames GUIs OOP initialization function input users input files input/output, redirection installing CGI scripts integration 2nd 3rd COM and Python component CORBA and Python examples extensions, C/C++ embedding Python code Jython and limitations of vs. optimization interactive mode

splitting files interactive prompt, email interfaces load server PyMailGUI os module shelve console and GUI web-based web CGI query strings reply text format urllib web server windows top-level Internet addresses client/server clients examples files, transferring over downloading frontend for to clients and servers uploading various means of with urllib identifiers for machines connected to machine names message formats modules port numbers protocols 2nd message formats modules structures scripting 2nd scripting clients, scripts servers, scripts servers sockets TCP/IP Internet Assigned Numbers Authority (IANA) Internet Engineering Task Force (IETF) Internet Explorer HTML JavaScript embedded in registering Python with Internet Message Access Protocol (IMAP) module Internet Protocol (IP) Internet Service Provider IP (Internet Protocol) IP addresses IPC

bidirectional, pipes and IPC (Inter-Process Communication) 2nd ISPs (Internet Service Providers) Python-friendly iteration dictionaries iterators reading files

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Java applets classes accessing with Jython testing with Jython end-user customization libraries, Jython and Python and Java virtual machine JavaJython job growth joining files portably usage JVM (Java virtual machine) Jython and Python scripts executed by Jython 2nd 3rd 4th API applets, writing browsers and callback handlers command lines compatibility with Python 2nd components integration and interface automations 2nd Java classes accessing testing Java libraries and object model performance issues Python-to-Java compiler scripts, compared to Java trade-offs in using vs. Python C API Jythonc

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] keyboard input stream redirection and keyboard shortcuts, menus killing top-level windows kwParsing system

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] labels borders color cursor fields font layout padding scales size state lambda callback handlers lambda statement language-based modules Launch_*.py launchBookExamples function LaunchBrowser.py 2nd 3rd Launcher.py 2nd 3rd launching automated program launchers programs without environment settings web browsers command lines and function calls portably launching programs portable framework Windows launchmodes script libraries Java Jython manuals standard library linking scales linking, C extension modules static vs. dynamic binding Linux C extension modules compiling wrapping environment calls end-of-lines, CGI scripts and servers killing processes on web, finding Python on

showinfo() Lisp, Python compared to listboxes adding to HTML forms programming listings examples lists databases dictionaries fields joining records splitting stacks as uses load server interface PyMailGUI load(), pickled objects and loaded modules loadmail module email, viewing POP mail interface loadmail.loadmail()

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] machine names mailbox module Mailman program mailtools initialization file MailFetcher class MailParser class MailSender class MailTool class self-test script _main_ module mainloop method mainloop() recursive calls makefile(), sockets and mapping Tk/Tkinter marshal module matched files, directory trees media files portable player tool Medusa memory threads and menubutton-based menus menus 2nd automation frame-based GuiMaker class keyboard shortcuts menubutton-based separator lines tear-offs top-level windows 2nd message function message widget (Tkinter) messages error, CGI scripts and POP message numbers status suppressing method functions methods bind bound callback handlers get

mainloop OOP augmenting pack readlines registering set Set class stack module/Stack class strings writelines MFC (Microsoft Foundation Classes) mhlib module mimetypes module mirroring web sites mixin classes mod_python package modal dialogs custom Modulator system modules binary data, encoding C extension code reuse and compiling linking string stacking structure of wrapping environment calls CGI scripts creating servers with Medusa with Zope data, encoding Dialog documentation email client configuration fcntl find FTP glob 2nd Gopher HTML HTMLgen IMAP Internet loaded MIME mimetypes network communications os 2nd administrative tools exits interfaces portability

packages pickle persistence and 2nd Python extensions in C/C++, Jython and Queue queue regular expressions servers SGML signal socket stacks as string, as object methods StringIO stderr sys platforms search path versions Telnet thread 2nd threading time translating to C URLs utility for email browser text-processing web pages webbrowser 2nd XML Monty Python theme song multifile module multimedia viewing in browsers multiplexing servers, with select()

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] named pipes 2nd names fields namespaces, running code strings with nesting dictionaries and Network News Transfer Protocol (NNTP) newsgroups newsgroups, accessing NNTP (Network News Transfer Protocol) nntplib module 2nd Nodiffsfound message nonmodal dialogs notes slideshow program

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Object Linking and Embedding (OLE) object model API Object Request Broker object viewers persistent object-oriented databases (OODBs) object-oriented programming (OOP), Python and objects callable class objects callback handlers callable, embedded Python code and callback handler, registering code strings, running with converting to strings, pickled objects and to/from C datatypes database connection database cursor datatypes and DBM file file objects built-in files, grep utility ftp quit() retrbinary() 2nd storlines() graphs HTMLgen Jython mapping URLs into calls on persistent, shelves and pickled shelves publishing COM and redirecting streams to sequences permutations of reversing 2nd sorting 2nd sharing between web pages socket accept() bind() close() 2nd

connect() listen() recv() 2nd send() 2nd setblocking() socket() 2nd stacks stored, changing classes of Toplevel Zope as database for storing ORB and OLE (Object Linking and Embedding) OODBs (object-oriented databases) OOP (object-oriented programming) behavior classes alternative constructors, customization customization encapsulation format, display GUIs inheritance methods, augmenting persistence Python and structure open modes, files open source nature open source software, compared to commercial operators, stack versus module optimization C extension files performance sets 2nd stacks vs. integration Option menu widget ORB (Object Request Broker) Zope 2nd URLs, mapping into calls on Python objects os module 2nd 3rd administrative tools exits file tools interfaces os path, tools os.environ os.execlp os.fork os.mkdir os.mkfifo os.open os.path.walk os.pipe

os.remove os.spawnv os.stat os.walk portability os path module, tools os._exit(), forking servers and os.chmod(), uploading client files to web servers os.exec, formats os.fork servers, forking os.getpid os.listdir 2nd os.path.isdir os.path.join os.path.samefile, restricted file access os.path.split os.path.walk directories, renaming directory trees 2nd 3rd files, renaming recursion rewriting filenames os.popen email, sending Linux os.listdir() and shell commands streams redirection os.remove os.rmdir os.spawn calls os.startfile call os.system, FTP client frontend os.waitpid, zombies, killing os.walk generating recursion output files

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] pack geometry manager alternatives to columns, spanning rows, spanning widgets, resizing 2nd anchor option clipping expansion pack widget method PackDialog class packer geometry manager (Tkinter) packer, FTP client frontend and packing files scrollbars padding, labels painting, PyDraw parallel processing parent processes, forking servers and parse tree interpreter adding to parsers exploring structure parsing binary data, struct module parsing rule strings passwords encrypting escaping in HTML in hyperlinks, encrypted on POP page 2nd PATH, CGI scripts and paths, Unix executable, changing PBF (Python Business Forum) PEP (Python Enhancement Proposal) performance C extension files email browser forking and HTMLgen and CGI and Jython Python profiler scripts integrated with C/C++ reading from files sets

stacks threads and Perl, Python compared to permissions CGI scripts and directory trees fixing HTML files permute() persistence databases formatted files object viewer OOP pickle module per-record files shelves pickled objects Python and scripts format test utilities Persistent class Pexpect, stream buffers PhotoImage object pickle module persistence and per-record files shelves shelve module and pickle.dump() pickle.dumps() pickle.load() pickle.loads() pickle.Pickler() PickleDictionary class Pickler class constraints PIL (Python Imaging Library) image types images displaying thumbnails installation overview Tkinter and ping command pipes anonymous bidirectional IPC deadlocks flushes GUIs named 2nd redirecting streams and

unbuffered streams pipes-nongui.py Playfile.py Pmw, Tkinter and POP (Post Office Protocol) mail interface, utility modules message numbers module passwords 2nd retrieving email 2nd from browser servers, connecting to poplib module 2nd email client poplib.POP3() port numbers clients reserved 2nd client connections and talking to servers portability email browser clients and forking and Jython os module select() and signal handlers and threading and 2nd portable media file player tool position, widgets Post Office Protocol ppembed API 2nd running code strings with running customizable validations running objects with press coverage printing dialog results processes child exiting from forking servers and communicating between exits parent, forking servers and signals zombies killing preventing processing handcoded parsers expression grammar tree interpreter, adding parser generators

rule strings text summing columns productivity, speed of development program launchers, automated programming event-driven sockets client calls server calls programs launching without environment settings socket running locally running remotely starting protocols Internet 2nd message formats modules structures of standards prototyping rapid PSA (Python Software Activity) pseudocode, executable PSP (Python Server Pages) pty, stream buffers Py_BuildValue() 2nd Py_CompileString() Py_DECREF() Py_INCREF() Py_Initialize() Py_XDECREF()/Py_XINCREF() PyArg_Parse() 2nd 3rd PyArg_ParseTuple() PyCalc example components adding buttons to using PyCalc as source code PyClock changes in source code PyDemos 2nd PyDemos.pyw PyDemos2.pyw PyDict_GetItemString() PyDict_New() PyDict_SetItemString() PyDraw source code PyEdit as standalone changes in

dialogs embedded mode menus pop-up model running source code toolbars PyErr_SetString() PyEval_CallObject() PyEval_EvalCode() PyEval_GetBuiltins() PyForm data as code GUI code limitations SQL databases table wrappers utility scripts ZODB databases PyFtpGui PyGadgets 2nd PyGadgets.py 2nd PyGadgets_bar.pyw 2nd PyGTK PyImport_ImportModule() 2nd PyMail console client updates PyMailCGI changes in configuration error pages fetched mail, processing messages composition deleting forwarding reading replying to selecting sending view page passwords, escaping root page security send mail script outside browser utility modules web site PyMailGUI attachments sending viewing changes in code reuse code structure

general purpose GUI pop-ups globals implementation interacting with load server interface mailconfig Main module messages cache manager deleting displaying forwarding loading replying sending viewing offline processing passwords POP message numbering presentation reasons to use running source code starting status messages threading user help text user settings windows list windows message windows multiple wraplines PyModule_GetDict() PyObject type PyObject_GetAttrString() PyObject_SetAttrString() PyPhoto changes in running source code PyPi PyQt Pyrex PyRun_SimpleString() PyRun_String() 2nd Python as executable pseudocode C# compiler C/C++ and integration CGI scripts and changes in compared to other languages 2nd compatibility with Jython 2nd databases and

development cycle embedded-call API embedding features of 2nd growth history of integrating with CORBA Internet connections and Internet uses for Java and OOP and overview persistent data and profiler, performance and registering with Internet Explorer uses web servers, finding on XML support Python 2.4, book updates Python Cheese Shop Python Server Pages (PSP) Python.h header file PythonCard pythoncom.CreateGuid() PythonInterpreter API PythonInterpreter class (Jython) PYTHONPATH CGI scripts and HTMLgen and Pickler class and PythonWin IDE PyToe PyTree example parse trees source code 2nd PyView running source code

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] query strings, web interface Queue module 2nd queues callbacks on GUIs and threads Quit button, dialogs quopri module

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] radiobuttons 2nd callback handlers command option 2nd configuring dialogs Entry widgets HTML forms, adding to Message widgets value attribute variable option variables variables and windows, top-level random module GIF files raw_input() re module 2nd functions re.compile() re.match() re.search() re.split() re.sub() re.subn() read(), Unpickler class readable code reading files, iterators and readlines method readlines() reapChildren() record dictionaries records, lists recursion os.path.walk os.walk redirectedGuiFunc function redirectedGuiFunc() redirectedGuiShellCmd function redirection coding alternatives pipes and print statements reading keyboard input StreamApp class streams to widgets to files

to objects user interaction and using for packing scripts refactoring uploads/downloads reference count management regex module migrating code using Register_Handler() registering methods regression test script regular expressions compiled pattern objects match objects patterns re module vs. string module relational algebra, adding to sets reload() callback handlers and ReplaceVisitor class replies, web interface, text format return values reusable tools reusing downloads/uploads rfc822 module email client rotor module rule strings running code

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] scales command option labels linking variables scheduled callbacks Scheme, Python compared to scripting Internet 2nd overview scripting languages scripts CGI client-side email files, transferring over Internet newsgroups PyMailGUI web sites, accessing executable Unix execution, context format, persistence Jython, compared to Java page generator, forward link files Python in Java applications regression test script server-side URLs shell commands shelve-based starting programs test, persistence utility, persistence scrollbars packing programming ScrolledText class scrolling thumbnails search_all searches binary search trees directory trees graphs grep utility sys module

SearchVisitor class 2nd secret module security CGI scripts and HTTP servers running examples in book passwords encrypting escaping in HTML in encrypted hyperlinks on POP page 2nd PyMailCGI web server files, displaying on browsers select module select(), servers, multiplexing sendmail program sequences objects permutations of reversing 2nd sorting 2nd comparison functions serial ports serialized byte streams SeriesDocument class (HTMLgen) server function server-side databases server.getfile(), web sites, accessing server.sendmail() servers asynchronous COM GUIDs using from clients 2nd 3rd connecting to creating Apache Mailman tools for with Python code email file frontend, adding 2nd forking multiple clients, handling with zombies, killing zombies, preventing FTP closing connection opening connection to HTTP CGI scripts and multiple clients, handling with classes multiplexing POP, connecting to

scripts sending files to socket calls socket programs, running locally threading web email client and finding Python on uploading client files to Zope services Set class methods set method setops() sets classes functions moving to dictionaries relational algebra, adding to SGML (Standard Graphic Markup Language) module sgmllib module 2nd shell listing commands tools variables changing fetching shell commands shell variables, faking inputs on forms with ShellGui ShellGui class shellgui module shelve interface console and GUI web-based shelve module concurrent updates and pickle module and shelves constraints file operations OODBs and persistence and storage classes object types objects, changing classes of signal handlers, zombies, preventing with signal module signal.signal(), zombies and signals Simple Mail Transfer Protocol SimpleDocument class (HTMLgen) SimpleHTTPServer module 2nd

Simplified Wrapper and Interface Generator (SWIG) SIP slideshow program smart links SMTP (Simple Mail Transfer Protocol) date formatting standard module sending mail from browser smtplib module 2nd email, sending from browser smtplib.SMTP() SOCK_STREAM variable, socket module socket module 2nd 3rd variables socket object accept() bind() close() 2nd connect() listen() recv() 2nd send() 2nd setblocking() socket() 2nd sockets 2nd 3rd 4th blocking/unblocking calls client server CGI scripts and identifiers for machines IP addresses machine names message formats multiplexing servers and port numbers programming programs running locally running remotely select() and SocketServer module SocketServer.TCPServer class sort() source code line counting PyClock PyEdit PyMailGUI PyPhoto PyView spam 2nd speed speed of development split()

splitpath() splitting files binary files and command-line mode interactive mode manually closing portably usage SQL (Structured Query Language) utility scripts Stack class optimizing performance stack module methods stacks as lists optimizing standalone container classes standard dialogs Standard Graphic Markup Language (SGML) module standard library starting programs state labels top-level windows static binding static language build cycle status messages, suppressing storage object types persistent 2nd DBM pickled objects _str_ StreamApp class redirection and streams buffers Pexpect pty and CGI and pickled redirecting coding alternatives pipes and print statements reading keyboard input to files to objects to widgets user interaction and unbuffered string module vs. regular expressions string.atoi()

string.find() string.join() text processing string.replace() email client string.split() text processing string.strip() string.upper() StringIO module stderrr strings converting in CGI scripts objects to, pickled objects and documentation methods raw regular expressions compiled pattern objects match objects patterns re module rule struct module binary data, parsing Structured Query Language (SQL) subclasses, protocols subset() summing columns superclasses, application hierarchies SWIG (Simplified Wrapper and Interface Generator) C extension module string stack C structs C variables and constants C++ class integration wrapping C environment calls wrapping C++ classes synchronization, threads sys module 2nd exceptions platforms streams versions sys modules search path sys path sys.exit(), vs. os._exit() sys.modules attribute sys.stderr, error messages, trapping sys.stdout, error message, trapping system tools system utilities systems application domain

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Table class (HTMLgen) tables on web pages adding laying out forms tags 2nd Tcl language Tcl/Tk 2nd TCP objects TCP/IP socket module tear-offs, menus Telnet telnetlib module templates, web pages, forward link files testing GuiMaker self-test mixin methods persistence regression test script sequence permutations set classes set functions text advanced operations clipboard and composition, inheritance and editing files distinguishing from binary uploading labels processing parser generators regular expressions rule strings summing columns utilities ScrolledText class widgets text editors, PyEdit example text widget, programming thread module 2nd threading module threads 2nd C API

exits 2nd function calls GIL and 2nd GUIs and 2nd 3rd memory and performance and portability and queues synchronization threading servers thumbnails PIL scrollable scrolling tic-tac-toe game time module time-slicing time.sleep() client requests servers, multiplexing time.strftime() time/date formatting timer module 2nd titles, top-level windows Tix, Tkinter and Tk library tkFileDialog module Tkinter 2nd 3rd animation techniques Button class callback handlers, user-defined callbacks, protocols canvas widgets checkbuttons adding to HTML forms configuring dialogs Entry widget Message widget variables and windows, top-level classes, container coding coding alternatives documentation events, binding extensions FTP client frontend Grail and grids IDLE and images listboxes menus 2nd object viewers persistent

pack widget method packer geometry manager PIL and Pmw and radiobuttons 2nd adding to HTML forms configuring dialogs Entry widget Message widget variables and windows, top-level scales scrollbars sliders structure Tcl/Tk and text editing Tix and toolbars variable classes, del destructor variables widgets appearance clipping configuration creating customizing with classes Entry expansion Message multiple Optionmenu packing positioning resizing toolbars automation top-level windows geometry icons killing menus 2nd state titles Toplevel object translating conversion codes Tcl/Tk to Python/Tkinter trees, walking generically Trigger_Event() try/finally statements, mailboxes, unlocking type declarations, lack of

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] UDP (Uniform Datagram Protocol), socket module unbuffered streams pipes Uniform Datagram Protocol (UDP), socket module unique function Unix end-of-lines, CGI scripts and executable path lines, changing scripts, executable web servers, finding Python on unpacking files unparsing Unpickler class updates uploads, reusing urllib module 2nd files, FTPing state information in URL parameters URLs, escaping web sites, accessing urllib web interface urllib.quote(), URLs, escaping urllib.quote_plus(), URLs, escaping urllib.urlencode() urllib.urlretrieve() Urlparse module URLs (Uniform Resource Locators) 2nd components of minimal embedded in hyperlinks escaping conventions form tags, embedded in hardcoded, passing parameters in hyperlinks, embedded in module parameters passing passing state information parsing passwords encrypted in text in, escaping web sites, accessing Zope invoking functions through mapping into calls on objects by

user interaction, adding to CGI scripts users groups growth of input stream redirection and uses of Python utilities email browser external components POP interface scripts persistence text processing uu module

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] van Rossum, Guido variables checkbuttons radio buttons 2nd scales shell changing faking inputs on forms with fetching Vaults of Parnassus visitors copied directory trees fixers and

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] walking directories trees Web applications, trade-offs CGI scripts and client/server architecture servers web browsers launching command lines and function calls portably multimedia, viewing in web interface CGI query strings reply text format urllib web server web pages cgi module, parsing user input CGI scripts and email forwarding passwords 2nd replying to selecting sending viewing forms on changing hidden fields in laying out with tables mocking up inputs reusable selection list tags forward-link, generating graphics on, adding HTMLgen modules opening, remote servers sharing objects between tables on adding tags

templates forward-link files Zope and web servers email client and finding Python on uploading client files to web interface web sites accessing httplib module urllib module downloading deleting files when from CGI scripts email root page mirroring uploading with subdirectories user interaction, adding Zope webbrowser module 2nd whichdb module widgets canvas scrolling creating, Tkinter Frame attaching widgets gridded GUIs input adding to HTML forms missing/invalid, checking for packing order redirecting streams to resizing 2nd 3rd anchor option text editing text Tkinter appearance clipping configuration customizing with classes Entry expansion message multiple Optionmenu packing positioning resizing win32all package 2nd win32COM extensions

Windows client requests and COM and DBM and forking servers and Internet Explorer launching programs serial ports on server processes, killing web scripting extensions Active Scripting ASP COM windows GUI, popup independent interfaces menus, toolbars and toolbars, menus and wrapper classes WPY wrapper classes, windows write(), Pickler class writelines method wxPython wxWidgets

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] xdrlib module XML (Extensible Markup Language) module processing tools xmllib module 2nd

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] YAPPS (Yet Another Python Parser System)

Index [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] ZODB classes database creation installation zombies killing preventing Zope 2nd 3rd components forms and HTML document templates hyperlinks and object database ORB 2nd Python objects, publishing Zope web application toolkit

Programming Python

Programming Python

Programming Python