COMPUTER AIDED MOLECULAR DESIGN: THEORY AND PRACTICE
COMPUTER-AIDED CHEMICAL ENGINEERING Advisory Editor: R. Gani Vol...
53 downloads
1029 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
COMPUTER AIDED MOLECULAR DESIGN: THEORY AND PRACTICE
COMPUTER-AIDED CHEMICAL ENGINEERING Advisory Editor: R. Gani Volume 1: Volume 2: Volume 3: Volume 4:
Distillation Design in Practice (L.M. Rose) The Art of Chemical Process Design (G.L. Wells and L.M. Rose) Computer Programming Examples for Chemical Engineers (G. Ross) Analysis and Synthesis of Chemical Process Systems (K. Hartmann and K. Kaplick) Studies in Computer-Aided Modelling. Design and Operation Volume 5: Part A: Unite Operations (1. Pallai and Z. Fony6, Editors) Part B: Systems (1. Pallai and G.E. Veress, Editors) Neural Networks for Chemical Engineers (A.B. Bulsari, Editor) Volume 6: Material and Energy Balancing in the Process Industries - From Microscopic Volume 7: Balances to Large Plants (V.V.Veverka and F. Madron) European Symposium on Computer Aided Process Engineering-10 Volume 8: (S. Pierucci, Editor) European Symposium on Computer Aided Process Engineering- 11 Volume 9: (R. Gani and S.B. Jorgensen, Editors) Volume 10: European Symposium on Computer Aided Process Engineering- 12 (J. Grievink and J. van Schijndel, Editors) Volume 11: Software Architectures and Tools for Computer Aided Process Engineering (B. Braunschweig and R. Gani, Editors) Volume 12: Computer Aided Molecular Design: Theory and Practice (L.E.K. Achenie, R. Gani and V. Venkatasubramanian, Editors)
COMPUTER-AIDED CHEMICAL ENGINEERING, 12
COMPUTER AIDED MOLECULAR DESIGN: THEORY AND PRACTICE Editedby
Luke E.K. Achenie
Computer Aided Process and Product Design Lab Department of Chemical Engineering University of Connecticut 191 Auditorium Road Storrs, CT06269, USA
Rafiqul Gani
CAPEC, Technical University of Denmark Department of Chemical Engineering Building 229, DK-2800 Lyngby, Denmark
Venkat Venkatasubramanian
Laboratory of Intelligent Process Systems School of Chemical Engineering Purdue University West Lafayette, IN 4 790 7-1283, USA
2003 ELSEVIER Amsterdam
- Boston
- London
- New
San Diego - San Francisco - Singapore
York - Oxford - Sydney
- Paris
-Tokyo
E L S E V I E R S C I E N C E B.V. Sara B urgerhartstraat 25 P.O. B o x 211, 1000 A E A m s t e r d a m , The N e t h e r l a n d s 9 2003 E l s e v i e r S c i e n c e B.V. All rights reserved. This w o r k is p r o t e c t e d u n d e r copyright by E l s e v i e r Science, and the f o l l o w i n g terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier Science via their homepage (http://www.elsevier.com) by selecting 'Customer support' and then 'Permissions'. Alternatively you can send an e-mail to: permissions @elsevier.corn, or fax to: (+44) 1865 853333. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W1P 0LP, UK; phone: (+44) 207 631 5555; fax: (+44) 207 631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Global Rights Department, at the fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made.
First edition 2003 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for.
British Library Cataloguing in Publication Data A catalogue record from the British Library has been applied for.
ISBN: 0-444-51283-7 ISSN: 1570-7946 (Series) ( ~ The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
Preface CAMD or Computer Aided Molecular Design refers to the design of molecules with desirable properties. That is, through CAMD, one determines molecules that match a specified set of (target) properties. CAMD as a technique has a very large potential as in principle, all kinds of chemical, bio-chemical, and material products can be designed through this technique. It has become a mature technique and attracting more and more researchers and finding increasing industrial applications. The limitation, at this moment, is the ability to estimate the target properties of the desired molecule. The book mainly deals with macroscopic properties and therefore, does not cover molecular design of large, complex chemicals such as drugs. The methodology presented, however, would be applicable for such problems provided the higher level molecular structural representation is integrated with appropriate molecular structure-property relationships. While books have been written on computer aided molecular design related to drugs and large complex chemicals, a book on systematic formulation of CAMD problems and solutions with emphasis on theory and practice which would help one to learn, understand and apply the technique is currently unavailable. With this book, we have tried to put together the theoretical aspects related to CAMD, the different techniques that have been developed and the different applications that have been reported. We have highlighted the applications through case studies. We have grouped the chapters of this book into 3 parts - Part I: Theory, Methods & Tools; Part II: Applications & Practice of CAMD; and Part III: New Frontiers. Problem formulation and solution techniques are covered in Part I by chapters 1-7. Applications and practice of CAMD in different types of problems are highlighted in chapters 8-15 of Part II together with descriptions of case study problems and their solution. Each case study highlights the application of specific CAMD techniques. Part III contains one single chapter (16) where we highlight the new frontiers (in our view) and the future of CAMD. We have targeted a mixed audience in this book. Specifically, we have designed the book for scientists and engineers from industry who would like to apply CAMD to solve their specific problems of interest. It is also designed for educators from academia who would like to use it for teaching as part of process/product design courses (including such courses as separation processes). The book would be of interest to scientists and engineers who would like to learn more about CAMD in addition to
vi CAMD problem solutions. Finally, this book is intended for those who would like to use it as the starting point to further develop and extend the state of the art in CAMD. We would like to thank all the contributing authors for their manuscripts and for agreeing to make the necessary changes to accommodate the content, format and style of this book. The contributing authors to the various chapters of this book come from academia as well as industry. They are among the leading researchers, developers and users of CAMD. We hope the book will serve to promote further development of CAMD and further interest from the industry to apply CAMD. We thank the reviewers for their valuable comments and suggestions. We thank Elsevier for their interest in this subject and for publishing this book. We acknowledge the support, help and contribution of Prasanjeet Ghosh, Santhoji Katare, Mette Dinsen and all our previous students and coworkers who have contributed to the development of CAMD in general and preparation of this book in particular. We also thank all the companies who have shown interest in CAMD and supported our research in this area. We hope the readers of this book will find it an invaluable resource in their research, development and educational activities. We also hope that the book will generate enough interest and valuable feedback for future editions.
Luke E. K. Achenie, Rafiqul Gani & Venkat Venkatasubramanian
List of contributors Author L. E. K. Ache nie
C. S. Adjiman
A. Apostolakou
E. A. Brignole
A. Buxton
J. M. Caruthers M. Cismondi J. L. Cordiner
R. Gani P. M. Harper M. Hostrup A. Hugo
Address University of Connecticut, Department of Chemical Engineering, 191 Auditorium Road, Storrs, CT 06269, USA Department of Chemical Engineering and Chemical Technology, Centre for Process Systems Engineering, Imperial College of Science, Technology and Medicine, Prince Consort Road, London SW7 2BY, UK. Department of Chemical Engineering and Chemical Technology, Centre for Process Systems Engineering, Imperial College of Science, Technology and Medicine, Prince Consort Road, London SW7 2BY, UK. Planta Piloto de Ingenieria Quimica-PLAPIQUI (UNS'CONICET), Camino La Carrindanga Km 7, 8000, Bahia Blanca Argentina. Department of Chemical Engineering and Chemical Technology, Centre for Process Systems Engineering, Imperial College of Science, Technology and Medicine, Prince Consort Road, London SW7 2BY, UK. Laboratory for Intelligent Process Systems School of Chemical Engineering, Purdue University, West Lafayette, IN-47907, USA. Planta Piloto de Ingenieria Quimica-PLAPIQUI (UNS-CONICET), Camino La Carrindanga Km 7, 8000, Bahia Blanca Argentina. Syngenta, Global Specialist Technology, Grangemouth Manufacturing Centre, Earls Road, Grangemouth, Stirlingshire, FK3 8XG, United Kingdom CAPEC, Technical University of Denmark, Department of Chemical Engineering, Building 229, DK'2800 Lyngby, Denmark. Integrated Process Solutions ApS, Solvgade 14B, 1307 Copenhagen K, Denmark Integrated Process Solutions ApS, Solvgade 14B, 1307 Copenhagen K, Denmark Department of Chemical Engineering and Chemical Technology, Centre for Process Systems Engineering, Imperial College of
viii
A. G. Livingston
G. M. Ostrovski
P. Patkar E. N. Pistikopoulos
M Sinha A. Sundaram Vo
Venkatasubramanian J. M. Vinson
Science, Technology and Medicine, Prince Consort Road, London SW7 2BY, UK. Department of Chemical Engineering and Chemical Technology, Centre for Process Systems Engineering, Imperial College of Science, Technology and Medicine, Prince Consort Road, London SW7 2BY, UK. University of Connecticut, Department of Chemical Engineering, 191 Auditorium Road, Storrs, CT 06269, USA. Laboratory for Intelligent Process Systems School of Chemical Engineering, Purdue University, West Lafayette, IN-47907, USA. Department of Chemical Engineering and Chemical Technology, Centre for Process Systems Engineering, Imperial College of Science, Technology and Medicine, Prince Consort Road, London SW7 2BY, UK. Global Alternative Propulsion Center, General Motors, Honeoye Falls, NY 14472, USA. ExxonMobil Process Research, Pauslboro, NJ 08066, U.S.A. Laboratory for Intelligent Process Systems School of Chemical Engineering, Purdue University, West Lafayette, IN-47907, USA. Pharmacia Corporation, 5200 Old Orchard Rd., Skokie, IL 60077, USA.
Contents
Page
Preface List of contributors
PART I: Theory, Methods & Tools 1. Introduction to CAMD R. Gani, L. E. K. Achenie and V. Venkatasubramanian 2. Molecular D e s i g n - Generation & Test Methods E. A. Brignole and M. Cismondi 3. Optimization Methods in CAMD - I M. Sinha, L. E. K. Achenie and G. M. Ostrovski 4. Optimization Methods in CAMD - II A. Apostolakou and C. S. Adjiman 5. Genetic Algorithms Based CAMD P. R. Patkar and V. Venkatasubramanian 6. A Hybrid CAMD Method P. M. Harper, M. Hostrup and R. Gani 7. Identification of Multistep Reaction Stoiehiometries" CAMD Problem Formulation A. Buxton, A. Hugo, A. G. Livingston and E. N. Pistikopoulos Part II: Applications of CAMD 8. CAMD for Solvent Selection in I n d u s t r y - I J. M. Vinson 9. CAMD for Solvent Selection in I n d u s t r y - II J. L. Cordiner 10. Case Study in Optimal Solvent Design M. Sinha, L. E. K. Aehenie and G. M. O~trovskl 11. CAMD in Solvent Mixture Design M. Sinha and L. E. K. Aehenie 12. Refrigerant Design Case Study A. Apostolakou and C. S. Adjiman 13. Polymer Design Case Study P. R. Pa tkar and V. Venka tasubramanian 14. Case Study in Identification of Multistep Reaction Stoiehiometries A. Buxton, A. Hugo, A. G. Livinggton and E. N. Pi~tikopoulos 15. Molecular Design of Fuel Additives A. Sundaram, V. Venkatasubramanian and J. M. Caruthors
vii
23 43 63 95 129 167
211 213 229 247 261 289 303 319
329
PART III: Computer Aided Product Design 16. Challenges and Opportunities for CAMD R. Gani, L. E. K. Achenie and K Venkatasubramanian
355 357
Glossary of Terms
379
Subject Index
387
Author Index
393
P a r t I: T h e o r y , M e t h o d s & Tools This part of the book covers problem formulation and solution techniques. The first chapter introduces the computer aided molecular design (CAMD) problem and discusses its important issues. Then chapters 2 to 7 deal with some of the common techniques used to tackle various types of CAMD problems. Specifically, the second chapter discusses methods based on a generate-and-test approach, followed by two chapters on optimization methods involving mathematical programming. Evolutionary techniques based on genetic algorithms are presented next in chapter 5 while chapter 6 describes a hybrid CAMD method. Finally, the first part of the book concludes with chapter 7 where CAMD in identification of multistep reaction stoichiometries is presented.
This Page Intentionally Left Blank
Computer Aided MolecularDesign: Theoryand Practice L.E.K. Achenie, R. Gani and V. Venkatasubramanian(Editors) 9 2003 Elsevier ScienceB.V. All fights reserved.
C h a p t e r 1" I n t r o d u c t i o n to CAMD R. Gani, L.E.K. Achenie & V. V e n k a t a s u b r a m a n i a n
In (chemical) product design, we try to find a (chemical) product t h a t exhibits certain desirable or specified behaviour. In another type of (chemical) product design, we try to find an additive t h a t when added to another chemical or non-chemical product, enhances its (desirable) functional properties. This type of a product is commonly known as a formulation. That is, in (chemical) product design, we do not know the identity of the final product but we have some idea of how we want it to behave and the problem is to find the most appropriate chemical(s) t h a t will exhibit and/or cause the desired behaviour. Once we have identified the product, and have tested it, we need to determine if it can also be manufactured. That is, we need to design a (chemical) process through which we can manufacture the desired product with profit, increased operational efficiency and positive environmental, health and safety impact. Before we can do this, however, we also need to determine the likely raw materials (which could also be other chemical products) t h a t can be processed in order to manufacture the desired product. That is, we extend the problem boundary of process design at the start by determining the product t h a t we would like to manufacture and at the end in order to analyse the effect of the product and its manufacture on the environment.
1.1 W H A T IS CAMD? The design process for a chemical product involves a n u m b e r of steps t h r o u g h which scientific principles may be applied for the solution of the specified design problem. Cussler and Moggridge (2001) suggest four principal steps in their design process: 1. 2. 3. 4.
Define needs; Generate ideas to meet needs; Select among ideas; Manufacture product.
As illustrated in Fig. 1, the 2 nd and 3 rd steps considered together, represent two types of design problems namely, Molecular Design and Mixture/Blend Design. The I st step may be considered as a pre-design or problem formulation step while the last step may be considered as part of a process design problem. The molecular and mixture/blend design
problems can be solved independent of the process design problem or as an integrated product-process design problem.
Process-Product Design
Pro duct Design
CAMD "generate, & select alternatives" Pre-Design "define needs & goals"
I
' I~ ]
Process Design
~
"malmfacture & test product"
CAMbD "generate & ~lect alternatives"
Figure 1: Steps of the design process related to product design. For the solution of the molecular and mixture/blend design problems, various approaches, ranging from empirical trial and error approaches to mathematical programming to hybrid methods can be applied as the solution technique. The applicability of a particular solution technique depends, to a large extent, on the approach used to determine the target behaviour (properties) of the desired products. If appropriate property models do not exist, although not the most efficient, an empirical trial and error approach based on experimentation is usually the only option. If property models are available, computer aided methodologies become viable alternatives. That is, the molecular design problem is transformed into a computer aided molecular design (CAMD) problem while mixture/blend design problem becomes transformed into a computer aided mixture design (CAMDD) problem through the use of property models as part of a computer aided methodology. CAMD and CAMDD together may be called computer aided product design (CAPD). Unless specifically mentioned, in this book, the term CAMD will be used for molecular design as well as mixture/blend design. Likewise, the term product will be used to include single molecules as well as mixtures. 1.1.1 P r o b l e m D e f i n i t i o n Computer aided molecular design problems are defined as
Given a set of building blocks and a specified set of target properties, determine the molecule or molecular structure that matches these properties.
In this respect, it is the reverse problem of property prediction where given the identity of the molecule and]or the molecular structure, a set of target properties are calculated. CAMD maybe performed at various levels of size and complexity of molecular structure representation. For example, design of solvents, refrigerants, etc., are usually based on properties estimated from macroscopic structural information. In the design of structured products such as polymers, drugs, pesticides, food additives, etc., the structural differences are observed by employing meso- and/or microscopic representation of the molecular structure. Therefore, the property models and the molecular structural representation differ according to the type of molecules being designed. Computer aided mixture/blend design problems can be defined as, Given a set of chemicals and a specified set of property constraints, determine the optimal mixture and~or blend.
Here, we do not know which chemicals to use in the product and in what amount they should be present but we know the molecular structures of the candidate chemicals. The design of formulated products and blends are typical examples of mixture design. Here, a formulation (representing a mixture or blend) is added to a product in order to enhance one or more specified properties of the original product. For example, a specified property (for example, viscosity of a product) needs to increase by an order of magnitude when the formulation (also known as ingredient or additive) is added. In other cases, a mixture or blend having a specified set of target properties is the desired p r o d u c t - as in polymer blends, petroleum blends, solvent blends, edible oil blends and many more. The fundamental objective of CAMD, therefore, is to identify a compound or a collection of compounds having specific (desired) properties. The structures of the compounds (molecules) are represented using appropriate descriptors together with an algorithm that identify these descriptors. This means that the property evaluation methods should be based on these descriptors as well. The most common approach in CAMD is to generate chemically feasible molecular structures from a set of descriptors (represented by fragments or building blocks) and to test them by estimating their desired (specified) properties. The properties are estimated by using some kinds of fragmentbased methodology, where the contributions for a specific property of each fragment present in the compound molecular are added to determine the compound property value. The set of feasible compounds are identified as those that match the property specifications, given as a series of property
constraints. The optimal compound is identified from the set of feasible compounds through a problem specific selection criteria or objective function. The principal differences between the various CAMD methodologies are how the various steps are performed, the type of descriptors used and how the necessary property values are obtained.
1.1.2 F o r m u l a t i o n of P r o p e r t y C o n s t r a i n t s The formulation of the property constraints is a prerequisite for solving any CAMD problem. A set of properties is selected as constraints with some combination of specified goal values, lower and upper bounds. These represent explicit property constraints because their values can be determined directly through a model or measured experimentally. There are, however, desired properties involving products such as food, fragrances, health & safety, etc., that may need to be formulated implicitly. That is, they cannot be measured or predicted by a model directly but may be inferred through databases, past knowledge, other measured or predicted properties and so on. For example, taste of a food product, the aroma of fragrances, the health hazards of chemicals, etc., fall under implicit property constraints. Environmental considerations can be formulated implicitly or explicitly. Explicit considerations relate physical properties to environmental considerations (e.g. ozone depletion potential) while implicit considerations are realized in the selection of the types of compounds considered in the search/design phase (e.g. the exclusion of aromatic compounds). The following questions help to define the c o n s t r a i n t s - note t h a t these are not the only questions that will help to define the problem completely.
What function is the desired product supposed to perform? These functions could be related only to the use of the product on a standalone basis or, they could be included as part of some greater functionality t h a t the product may be asked to provide in conjunction with other materials. Examples of the former are a solvent, a refrigerant, and a polymer while examples of the latter are a solvent blend added to a paint, an ingredient added to a food product to make it fat-free, and an ingredient added to a drug to inhibit a specific biological function.
Is the product a replacement for another product? If yes the designed product should do some combination of the following (a) match a set of properties, (b) match or surpass a set of properties of the original product and (c) avoid a third set of properties. This can be the replacement of one synthesized chemical product with another as well as replacement of a natural product with a synthesized one (for example, synthetic rubber).
A r e there any operational limits (temperature, p r e s s u r e a n d p h a s e ) for the desired p r o d u c t ? I f yes, w h a t are these?
The operational limits help define the upper and lower limit of the constraints on the phase and the phase transition related properties. W h a t criteria s h o u l d be used to evaluate the p e r f o r m a n c e of the desired product?
The performance criteria are related to the function of the desired product in the process operation for which it is designed, which helps to define the objective function for optimization based CAMD. For example, as a solvent in solvent based separations, these criteria often degenerate into bound constraints; usually lower bounds on selectivity, lower bounds on distribution coefficient, upper bounds on solvent loss and many more. In the case of formulations, the ingredient needs to be tested for the enhanced performance of the original product, such as controlled release, improved inhibition, etc., of drugs. Models for evaluation of performance, however, may not be easy and is most likely to be very complex. A r e there any d o w n s t r e a m processing considerations?
The role of the designed product in downstream processing, such as solvent recovery, wastewater treatment and disposal, needs to be considered. They may be included as direct property constraints, if feasible. However, since they depend on the process, alternatively, the product and process design problems may be integrated to handle these constraints together with other process design issues. The following provides a generic representation of most CAMD problems.
mathematical
programming
(1)
FOBJ = m a x {C T y + f (x)}
s.t. hl (x) = 0 h2 (x) = 0 h3 (x) = 0 ll ~_gl (x) ~_ul 12 ~_g2 (x) ~_u2 13 ~_B y + C x ~_u3
....process design specs ....process model equations .... C A M D specifications ....process design constraints .... C A M D constraints .... logical constraints
(2) (3) (4) (5) (6) (7)
In the above equations, x represents the vector of continuous variables (such as flowrates, mixture compositions, condition of operation, design variables, etc.), y represents the vector of binary integer variables (such as unit operation identity, descriptor identity, compound identity, etc.), hi (x) represents the set of equality constraints related to process design specifications (such as, reflux ratio, operation pressure, heat addition,
etc.), h2 (x) represents the set of equality constraints related to the process model equations (i.e., mass and energy balance equations), h3(x) represents the set of equality constraints related to CAMD (such as, chemical feasibility rules, mixing rules for properties, etc.), gl(x) represents the set of inequality constraints (process design specifications) and g2(x) represents the set of inequality constraints with respect to environmental constraints and property constraints related to CAMD design. The binary variables typically appear linearly as they are included in the objective function term and in the constraints (Eq. 7) to enforce logical conditions. The term f(x) represents a vector of objective functions t h a t may be linear or non-linear depending on the definition of the optimization problem. For process optimisation, f(x) is usually a non-linear function while for integrated approaches, f(x) usually consists of more t h a n one non-linear function. Many variations of the above mathematical formulation may be derived to represent different CAMD problems and methodologies. Some examples are given below.
ii)
iii) iv)
v)
Satisfy only constraint 6. This represents a CAMD problem for which a database search is adequate as a solution methodology. Ignore the objective function and the constraints represented by Eqs. 2, 3, 5 and 7 and only satisfy constraints 4 and 6. This is a CAMD problem that generates a feasible set of candidates. Solve a mathematical programming problem that includes Eqs. 1, 4 and 6. This is optimal design of the molecule and/or mixture. Only satisfy the constraints 2-7. This generates a feasible set of candidates (products and their corresponding process). Solve all the equations. This represents an integrated processproduct design problem.
Note that for all problem formulations, properties either need to supplied (measured or database retrieval) and]or predicted through models. Problems that include Eq. 3, also have property models included as a set of constitutive models that relates the properties to the intensive variables (pressure, temperature and composition). All problem formulations may use property models and therefore, the application range of a CAMD methodology depends on the application range of the property models used. Note t h a t in problem formulations i-ii, an optimal design may be obtained by ordering all the feasible candidates according to the objective function (Eq. 1) value. Global optimality, however, can only be guaranteed if and only if all possible compounds were considered in the generation of the feasible set of candidates. On the other hand, problem formulations iii-v, may become too complex to solve if the property model is highly non-linear and discontinuous. Also, the solution approach may not be able to accommodate multiple property models for the same property. In this way,
while these problem formulations can determine the optimal design, their application range is usually quite small. Having formulated the property constraints and a version of the generic problem formulation, the next step is to select the property models and/or means to provide the necessary property values.
1.1.3 P r e d i c t i o n of P r o p e r t i e s Successes of CAMD methodologies depend to a large extent, on the ability to predict and/or obtain the necessary pure component and mixture properties, or more generally, performance characteristics, included in the property constraints and in the process model. Even if the CAMD problem involves the design of a single molecule, mixture properties may need to be calculated. For example, in solvent design, the property constraints may include pure component properties such as boiling point, heat of vaporization and mixture properties such as solubility of solute and solvent loss. In CAMbD problems, the property constraints are all mixture property based, however, the models for these mixture properties may require pure component properties. Consequently, the pure component properties may be used to screen out some of the candidate molecules to be considered in the mixture design problem. A wide range of property models can be found (Poling et al. 2000). The main question is which model has the largest reliable application range for the descriptors used to represent the molecular structures? For instance, if the descriptors employed for molecular structural representation are able to identify differences in isomer structures, then the property model must also be able to predict the property differences (if any) of these isomers. Otherwise, all isomers would be selected as feasible. Gani and Constantinou (1995) proposed a classification of properties as primary (pure component properties that can be determined only from the molecular structural variables - examples are critical properties, normal boiling point, normal melting point, heat of vaporization at 298 K, heat of fusion at 298 K, etc.), secondary (pure component properties that are dependent on other p r o p e r t i e s - examples are surface tension, viscosity, solubility parameter, vapor pressure at a given temperature, density at a given temperature, etc.) and functional (pure component properties dependent on temperature and/or p r e s s u r e - examples are density, vapor pressure, enthalpy, heat of vaporization, etc., as a function of temperature; and mixture properties that are dependent on composition and/or temperature & p r e s s u r e - examples are liquid phase activities, vapor phase fugacities, phase density, mixture viscosity, mixture saturation temperature, etc.). For several material design applications of interest, the desired properties are even more complex, high-level performance characteristics that are to be satisfied by the material during its active service life. These performance measures are usually very difficult to predict using standard property-prediction models. Sophisticated models,
10 usually hybrids of different approaches, need to be constructed. Examples of such systems or properties include reaction systems (i.e. where the final desired performance may come into play only at the end of chemical or biological reactions), long-term mechanical properties, biological functionalities, etc. Further several of these performance measures are dynamic i.e., time-evolving. In such cases, not only is the value of a particular high-level property at the start of active service life of the material important, but also, and usually more critical, its evolution profile throughout the period of service. Gani and Constantinou (1995) also propose a classification of property models that may be employed for each class of properties. Figure 2 highlights this classification.
Classification of Estimation M e t h o d s
/
Reference
Mechanical models
Semi-empilical models
EmphJcal models
Quantum Mechanics
Corresponding States Theory
Chemometrics
Molecular Mechanics
Topology / Geomet~'y
Pattern matching
Molecular Simulation
Group / Atom / Bond additivity
Facto," analysis QSAR
Figure 2: Classification of property estimation methods Estimation of primary pure component properties
While there are numerous property estimation methods for primary pure component properties, not all of them are applicable in CAMD. Most property estimation methods used in CAMD methodologies are based on the Group Contribution Approach, GCA, (Franklin, 1949) where the properties of a compound are expressed in terms of functions of the number of occurrences of predefined fragments (groups) in the molecule. The GCA-based methods belong to a class known as additive methods. F (p) = w~Z Ni C~ + w, s M~ D~ + w~X Oh Eh +.
(S)
11 In the above equation, Ci is the contribution of atom, bond or first-order group i; Ni is the number of occurrences of atom, bond or first-order group i; Dj is the contribution of atom, bond or second-order group j ; / ~ is the n u m b e r of occurrences of atom, bond or second-order group j; Ek is the contribution of atom, bond or third-order group k; Oh is the n u m b e r of occurrences of atom, bond or third-order group k. wi, w2, w3 are weights t h a t may be imposed on each of the additive terms. With this method, if the fragments (atoms, bonds, groups, etc.) representing each molecule are identified and their contributions to a needed property are available, then the corresponding property of the molecule can be estimated by simply summing all the contributions. Since the same fragments can be used to represent different molecules, these property estimation methods, although semi-empirical in nature, are also truly predictive. Note t h a t the atoms and bonds only consider the number of occurrences and not their placement in this type of methods. The limitations of these methods are accuracy and ability to handle complex molecular structures. However, in principle, these methods can be made to be highly accurate with large application range by simply adding more additive terms of higher order. From a practical point of view, this is not feasible and the highest order of this type of methods is three (Marerro and Gani, 2001). Second- and thirdorder additive methods are able to distinguish some isomeric molecular structures. Methods based on topological or geometric information provide a higher level of molecular representation. The methods based on topological information related to the molecular structure commonly employ the wellknown connectivity index (Kier and Hall, 1986; Bicerano, 1993) while methods based on geometric information employ conjugates (Constantinou et al. 1994). Connectivity indices specify the spatial a r r a n g e m e n t of the atoms in the molecule, while, conjugation (with respect to molecular structures) refers to an idealized arrangement of atoms connected by bonds (Constantinou et al. 1994). Any property p is estimated through Eq. 9 (connectivity index) or Eq. 10 (conjugation).
F (p) = a X ' + b X 1 + c X 2 + d X 3 + .....
F (p) = E N~ B~ + E Mj Ej
(9)
(10)
In Eq. 9, X n is the connectivity index of order n; and a, b, c & d are the adjustable parameters. In Eq. 10, Bi is the contribution of bond i; Ni is the number of occurrences of bond i; Ej is the contribution of bond j ; / ~ is the number of occurrences of bond j. The main computational effort is spent on generating the connectivity indices or conjugates representing a molecular structure. Once these are known, the properties estimation phase is simple and computationally inexpensive. As in the additive methods, these methods are also predictive. Another advantage of these methods is t h a t the indices and/or conjugates may be used to generate the fragments for
12 the additive methods. In this way, they use additional structural information t h a n the additive methods and therefore, are able to distinguish more isomeric structures. The main difficulty is to know how m a n y indices should be used and how to estimate their property contributions. The topological information based methods are also classified under QSPR (Quantitative Structure Property Relationship) or QSAR (Quantitative Structure Activity Relationship) methods. Many QSPR and QSAR methods base the prediction of properties on the structure of the molecule using complex descriptors obtained from molecular modeling. CAMD methodologies dealing with meso- and microscopic representation of the molecular structures employ such descriptors to identify the differences in the molecular structures as well as to estimate the needed properties. While these property models are able to employ complex descriptors and to distinguish between isomeric structures, their application range outside the training set of molecules may be questionable. Therefore, they are more suitable for use in CAMD problem formulations of types i & ii but are able to handle large, complex molecules. More details on QSPR and QSAR methods can be found in Kier and Hall (1986) and Livingstone
(2001). E s t i m a t i o n of secondary pure c o m p o n e n t properties The best source of methods for this type of properties is the book by Poling et al. (2000), which gives a comprehensive overview of the properties and the corresponding property models that may be used. Therefore, in this book, we are not covering these methods. It should be noted, however, t h a t many of the secondary properties that are calculated from primary properties might also be converted to primary properties. For example, the Hansen's solubility parameters are estimated from known values of molar volumes and heats of vaporizations at 298 K. The solubility p a r a m e t e r data can therefore be also correlated through a set of groups or topological indices to generate a primary property model. In a similar way, properties such as Octanol-Water partition coefficients and water solubilities may also be converted to primary pure component properties. Since the p r i m a r y pure component properties are only functions of the molecular structural variables, they are very useful in CAMD problem solution.
E s t i m a t i o n of mixture properties The simplest and easiest, but usually the least accurate way, is to assume mixture ideality and employ a simple linear mixing rule. F (O) - V~x~ p~
(11)
13 In the above equation, F (0) is a property function for mixture property 0; x i is the composition of component i and pi is the corresponding pure component property of 0 for component i. If the assumption of mixture ideality is valid, this method is fast, easy and very convenient for use in CAMD problem formulations of types iii-v. Most practical problems, however, do not behave ideally and therefore, more rigorous models are needed. Since CAMD methodologies generate molecular structures and therefore, work with molecular structural parameters, models that do not employ such parameters are therefore not suitable. Examples of these models are NRTL (Renon and Prausnitz, 1968) and Wilson (Wilson, 1964), which need compound specific, and predetermined molecular interaction parameters for estimation of liquid phase activity coefficients. The most widely used mixture property in many CAMD applications are the liquid phase activity coefficients because they may be used for estimating solubility (solid, liquid or gas), phase equilibrium (considering the other phase in equilibrium with the liquid to be ideal), for liquid surface tension, liquid viscosity, bulk properties such as saturation temperatures and pressures and many more. GCA-based methods are the only practical choices in this case since the topological information based methods have not been developed for general purpose use and molecular modeling based methods are too complex for use in CAMD problem formulations of types ii-v. The GCA-based method for prediction of liquid phase activity coefficients that is most widely used in CAMD methodologies is the UNIFAC method (Fredenslund et al., 1977) in its original form or in its various modifications. A major limitation of the UNIFAC method with its original set of first-order groups is that it cannot handle complex mixture nonideality (such as proximity effects) and it cannot distinguish between isomers. Some of these limitations have been addressed recently through the introduction of second-order groups (Kang et al. 2002). Another important limitation of UNIFAC and all other GCA-based mixture property models is that the necessary group interaction parameters may not be available for the generated feasible candidate molecules. Molecular modeling in this respect can help to predict the necessary group interactions (Jonsdottir et al. 1994). For CAMD involving large, complex molecules and mixture properties, problem formulations of type i-ii are feasible options as they allow the use of sequential generation of feasible candidate molecules and testing of candidates. In this case, any number of property models may be used. While this is not a computationally efficient procedure, it is able to provide a means to identify promising candidates, at least, as a first step of the search.
14
Estimation of environmental, implicit and high-level properties Environmental and other implicit properties need special attention since they do not usually belong to the standard databases for properties of chemical compounds. For the estimation of environmental properties, such as toxicity, biodegradability, ozone depletion potential, biological oxygen demand, global warming potential, soil adsorption potential, very few general methods covering a wide range of compounds have been developed, although, new methods are continuously being developed (Martin and Young 2001). However, a number of methods valid for specific molecular types such as alcohols, acids, benzene derivatives are available (Lyman et al., 1990). These methods are capable of predicting many of the environmental properties listed above. Often, methods for environmental properties rely on the Octanol/Water partition coefficient (log P) as a known property value. Databases such as CHRIS (Silver Platter Information Inc., 1998a), HSDB (Silver Platter Information Inc., 1998b) and RTECS (Silver Platter Information Inc., 1998c) store environmental data and properties for a large number of substances. The more difficult properties are high-level performance characteristics desired of the material. Examples of these include properties related to taste of food products, aroma of fragrances, long-term mechanical properties of polymers and polymer blends and many more. What often makes the modeling process even more challenging is that several of these properties of interest are dynamic and the design objectives are specified in terms of the time-evolution profile of the property in question throughout the service time of the material. Some of these maybe estimated through a combination of higher-level modeling and theory, such as molecular modeling combined with kinetic phenomena (in the case of polymer blends with desired properties) while others may be implied through QSAR types of investigations. Typically, highly sophisticated hybrid approaches that make use of a variety of modeling techniques need to be employed to model the high-level properties to desired levels of prediction accuracy (Ghosh et al., 2000). Having the necessary property models available brings us to the next topic - t h e actual CAMD algorithm.
1.1.4 CAMD algorithms The CAMD algorithm basically solves the CAMD problem formulations of type i-v and other variations of the generic problem defined by Eqs. 1-7. The main solution step involves finding the molecules of the desired type having the desired properties. Here, a difference is made between those problems that involve only selection (type i and some variation of type ii) and those that involve selection plus design (types ii-v). If the problem is of the selection type (i.e. finding candidates from a database of known compounds) the solution step involves one or more database lookup
operations in order to identify the subset (if any) satisfying the property and molecule type constraints. For pure component properties based selection, the search engine is commonly known as pattern matching (Nielsen et al., 1991), that is, find the specified pattern in a database. If mixture properties are also considered, the search is more difficult. Cabezas (2000) have developed tools for efficiently solving these problems. If the CAMD problem formulation is of type ii-v, an algorithm is needed to identify (design) the molecules of the specified types and having the desired properties as specified through the property constraints. Even though different algorithms have been proposed for design of molecules, nearly all algorithms rely on, to some degree, the creation of chemically feasible molecules from fragments. The most widely used feasibility criteria is the valency rule proposed by Macchietto et al. (1990) where the goal is to guarantee the fulfillment of the octet rule. Different approaches have been proposed for solving CAMD problems and these approaches can be grouped into three categories: 1. Mathematical programming (a mathematical representation of the problem is solved with a numerical optimization m e t h o d ) problem type iii-v. Chapters 3, 4 and 11 describe these types solution approaches. 2. Stochastic optimization (a mathematical representation of the problem is solved by numerical stochastic methods) - problem type ii-iii. Chapter 5 describes a genetic algorithm based solution approach of this type. 3. Enumeration techniques (a combined mathematical and qualitative representation of the problem is solved by hybrid solution approaches) - problem type ii-v, but using a decomposed problem formulation (also called hybrid methods). Chapters 2, 6 and 7 describe solution approaches of this type. Common to all the solution approaches is that the objective is to find a compound or compounds fulfilling the requirements set forth in the constraints and goals. 1.1.5 M o l e c u l a r S t r u c t u r e R e p r e s e n t a t i o n
All CAMD methodologies need to employ some form of representation of the molecular structure information for use in property estimation. In general, the estimation methods used for predicting properties of the designed molecule(s) decide the level of detail needed for the molecular structural information and the representation method to use. Other considerations are compatibility with external programs and databases. The simplest form of a compound is an atomic representation based on chemical formula. Here, a compound is simply represented by the types of
15 atoms it contains and the number of occurrences of each atom type (Fig. 3a). A single representation can describe a large number of compounds of very different types. No direct information regarding the bonds in the compound can be extracted from the representation. Although, if assumptions of the valency of the different atom types are made, it is possible to calculate bond configurations. A related representation form is the representation of a compound as a collection (or vector) of groups. A group is a molecular fragment or substructure defined by the number and types of atoms in the fragment, how the atoms are connected, how many free connections the group has and where (on which atom) they are located. Figure 3b shows an example of a fragment and Fig. 3c an example of a group vector. A group vector contains some information about the connectivity of the structure of the molecule but does not define it completely. As a result, a group vector can represent more t h a n one possible molecule (isomers) - Figure 3d illustrates the different compounds t h a t are possible to construct using the group vector in Fig. 3c. The compounds depicted in Fig. 3d have the connectivity defined. One of the most versatile and manageable methods is the adjacency matrix. An adjacency matrix is a square symmetrical matrix with rows and columns representing the atom (or fragments) in the molecule and containing zeroes and non-zeroes indicating bonds or absence of bonds. An adjacency matrix can be on fragment level or on atomic level. Conversion from a fragment-based matrix to an atomic based matrix is achieved by substituting the entry for each fragment with that of the atomic adjacency matrix representing the fragment. Figures 3e and 3f are the fragment based and atom based adjacency matrices, respectively, for the first compound in Fig. 3d. While the adjacency matrix defines the 2-dimensional relations between atoms in a compound, it does not contain the steric information needed in order to distinguish R/S, L/D and Cis/Trans isomers. In order to distinguish between such isomers it is necessary to have 3-dimensional information about the placement of the atoms. For 3-dimensional representation two methods are widely used. The first is the combination of an adjacency matrix with a list of x, y, z Cartesian coordinates for the atoms. The second is the so-called internal coordinate system where an atom's position is defined by a length, a bond angle and a torsion angle (Maranas and Floudas, 1994). Choice of the type of representation depends on the computations that are to be performed with the 3dimensional representation. Chapter 2 describes methods for generating molecular structures using group information only. Chapters 3 and 4 give examples of how the generation of molecular structures can be incorporated into mathematical programming formulations through the feasibility rules. Chapter 4 also gives a detailed description of generation of molecular structures from higher-level groups (Marerro and Gani, 2001). Chapter 5 describes how
17
e m p l o y i n g g r o u p s a n d topological indices c a n g e n e r a t e m o l e c u l a r s t r u c t u r e s t h r o u g h g e n e t i c a l g o r i t h m s . Finally, c h a p t e r 7 d e s c r i b e s g r o u p s b a s e d c o m b i n a t i o n r u l e s to g e n e r a t e m o l e c u l e s t h a t also s a t i s f y r e a c t i o n stoichiometry.
,o H2C~C ~ C5H1002
/
~
/ (a)
\
2 C~O
H
H
(c)
\
2
O
H2C~C
\ H2C~CH
CH3 CH3 CH3 CH2 CH2COO
H
1 CH2COO
H3C~CH
o/~
H 0
I CH2
o
(b)
H3C~CH
H H H H H H H H H H C C C C C 0
2 CH3
3
O~CH
(d)
CH3
CH2
0 0 0
0 0 1
0 1 0
1
0 (e)
1
H
H
H
// \
H
H
H
0 0
C 1 1 1
0
C
CH2COO 1 0 1
0
C
0 0 0
1 1
0
1
0 1
1
C
1 1
0
1
C
1 1 1
0
1
3
1
1 0
1
1
1 1
1
1 0
1
1 0
,0 (0 Figure 3: Different levels of molecular structure representation (Harper, 2000)
O
O
18
1.2 KEY I S S U E S & T H E I R R E L A T I O N S H I P S
Some of the key issues and their relationships associated with the generation of molecular structures and the predictions of the properties of the generated compounds are highlighted here (from H a r p e r 2000). 9 Computational L o a d - This is related to the a m o u n t of calculations required to solve any CAMD problem. 9 Generation L e v e l - This is related to the steps employed to generate molecular structures (compounds). With increasing levels of molecular structural information, the degree of detail and information also increases. 9 Property Range - The Property Range is the total n u m b e r of properties to be calculated for a generated molecule in order to evaluate if it matches the specified requirements. Each of the properties in the Property Range may have an associated constraint value indicating a lower and/or upper bound t h a t m u s t be fulfilled if the generated molecule is to be retained for further screening. 9 Property L e v e l - This is related to the level of "complexity" involved in the estimation of a needed property. This is a theoretical m e a s u r e of the a m o u n t of information needed in order to calculate the property based on: o
The type of molecular information needed in order to use the selected property estimation method. o W h e t h e r or not the property requires other properties in order to be calculated (that is, if they are secondary properties). o The complexity of the calculation, t h a t is, is the calculation iterative, does it involves solution of a system of equations or is it otherwise calculation intensive? o If a property p depends on other properties, the level (with respect to calculation order) of property p m u s t be higher t h a n the levels of the other properties. Therefore, if the level of property p is determined on the basis of the levels of other properties, it is not a fixed value for all calculations involving using property p - but is a variable. o W h e t h e r the property p is a dynamic i.e. time-evolving property. Certain high-level, complex performance m e a s u r e s m a y involve not only the value p(O) of the property at the s t a r t of the material's active service life, but also the profile p(t) of its evolution with time over the service period.
Property T r u s t - The level of "confidence" one can assign to a property. This depends on: o
Estimation accuracy.
19 o o
The dependence of other calculated properties, for example, error propagation. Applicability of the method(s) to the compound(s) in question.
For any CAMD problems it is necessary to identify the Generation Levels needed for a given CAMD problem. It is necessary to cover the entire property range (of the t a r g e t properties) within the generation levels. The n u m b e r of levels needed is determined by the available property e s t i m a t i o n methods. As a consequence of this, the property range and the available property estimation methods control the m i n i m u m generation level.
1.3
T A R G E T S F O R A CAMD F R A M E W O R K
From the above discussion, it is clear t h a t any CAMD methodology requires a n u m b e r of methods and tools t h a t need to work in an i n t e g r a t e d m a n n e r . An architecture t h a t glues the various methods and tools together into a CAMD framework could therefore be very useful for further development of CAMD methodologies in a systematic m a n n e r as well as increasing the solution range of any CAMD methodology. The targets for the development of a CAMD framework could be (Harper 2000): 9 The correct formulation of the Property Range is critical to the success of a CAMD method. Failure to identify the i m p o r t a n t properties will lead to the generation of the wrong products. It is therefore necessary to include a methodology for the formulation of the t a r g e t property constraints within a CAMD framework. 9 The ability to predict a wide range of properties using different methods would broaden the application range of CAMD. Therefore, a CAMD framework m u s t be able to use other prediction methods in addition to the traditionally used GCA methods. This requires the generation and integration of detailed molecular models. 9 While the design of highly detailed molecular structures improves the ability to predict properties accurately there can be a significant associated computational cost. If highly detailed molecules (in t e r m s of s t r u c t u r a l information) are to be generated, it is necessary t h a t the computational efficiency of the CAMD algorithm be t a k e n into account in the development of the CAMD framework. 9 The minimization of u n c e r t a i n t y is i m p o r t a n t when performing complex calculations. Consequentially the use of correlations should be minimized and the use of experimental data and accurate prediction methods (using all available information) should be maximized. With the background presented in this chapter, we now move on to some of the tools and methods used to tackle the CAMD problem.
20 Acknowledgement
The PhD-thesis of Peter M. Harper (2000) has provided material in the form of text and figures for parts of this chapter.
1.4
1.
2. 3. 4. 5. 6.
7. 8. 9.
10. 11.
12.
13.
14.
15.
REFERENCES
J. Bicerano, "Prediction of Polymer Properties", Marcel Dekker Inc. (1993). Cabezas, H., "Designing green solvents", Chemical Engineering, 107 (3), March (2000) I07-109. Chem-Bank, Chemical Hazards Response Information System (CHRIS) Database, Silver Platter Information Inc, MA, USA, November (1998a). Chem-Bank, The Hazardous Substances Data Bank (HSDB), Silver Platter Information Inc, MA, USA, November, (1998b). Chem-Bank, The Registry of Toxic Effects of Chemical Substances (RTECS), Silver Platter Information Inc, MA, USA, November (1998c). L. Constantinou, S.E. Prickett and M.L. Mavrovouniotis, "Estimation of thermodynamic and physical properties of acyclic hydrocarbons using the ABC approach and conjugation operators", Ind. Eng. Chem. Res., 32 (1993), 1734. L. Constantinou and R. Gani, "New group contribution method for estimating properties of pure compounds", AIChE J., 40 (1994) 1697. Cussler, E. L., Moggridge, G. D., "Chemical Product Design", Cambridge University Press, USA (2001). Aa. Fredenslund, J. Gmehling, P. Rasmussen, "Vapor liquid equilibria using UNIFAC", Elsevier Scientific, Amsterdam, The Netherlands (1977). Franklin, J. L., "Prediction of Heat and Free Energies of Organic Compounds", Industrial Engineering & Chemistry, 41(1949) 1070 R. Gani, B. Nielsen and A. Fredenslund, "A group contribution approach to computer-aided molecular design", AIChE J., 37 (1991) 1318. R. Gani, & L. Constantinou, "Molecular Structure Based Estimation of Properties for Process Design", Fluid Phase Equilibria, 116 (1996) 75-86. Ghosh, P., A. Sundaram, V. Venkatasubramanian and J. Caruthers, "Integrated Product Engineering: A Hybrid Evolutionary Framework", Computers and Chemical Engineering, 24 (2000) 685691. P. M. Harper, "A Multi-Phase, Multi-Level Framework for Computer Aided Molecular Design", PhD-thesis, Technical University of Denmark, Lyngby, Denmark (2000). S. O. Jonsdottir, Kj. Rasmussen, Aa. Fredenslund, Fluid Phase Equilibria, 100 (1994) 121-138.
21 16. J. W. Kang, J. Abildskov, R. Gani, J. Cobas, "Estimation of Mixture Properties from First- and Second-Order Group Contributions with the UNIFAC Model", I&EC Research, 41 (2002) 3260-3273. 17. L. Kier, L. H. Hall, "Molecular Connectivity in Structural-Activity Analysis", Wiley, New York, USA (1986). 18. D. Livingstone, "Data analysis for chemists,: Application to QSAR and chemical product design", Oxford University Press, Oxford, UK (1995). 19. L. J. Lyman, W. F. Reehl, D. H. Rosenblatt, "Handbook of Chemical Property Estimation Methods, Environmental Behavior of Organic Compounds", American Chemical Society, Washington DC., USA (1990). 20. C. D. Maranas, C. A. Floudas, "A Deterministic Global Optimization Approach for Molecular Structure Determination", J. Chem. Phys., 100 (1994) 1247-1261. 21. J. Marrero and R. Gani, "Group-contribution based estimation of pure component properties", Fluid Phase Equilibria, 183-184 (2001) 183. 22. S. Macchietto, O. Odele and O. Omatsone, "Design of optimal solvents for liquid-liquid extraction and gas absorption processes", Chem. Eng. Res. Des., 68 (1990) 429. 23. J. M. Nielsen, R. Gani, J. P. O'Connell, "TMS: A Knowledge Based Expert System for Thermodynamic Model Selection and Application", in "Computer-Oriented Process Engineering" ed. L. Puigjaner and A Espuna, Elsevier, 10 (1991) 29-34. 24. B.E. Poling, J.M. Prausnitz, J.P. O'Connell, The properties of gases and liquids, 5th edition, McGraw-Hill, New York, USA (2000). 25. H. Renon, J. M. Prausnitz, AIChE J., 14 (1968) 135. 26. G. M. Wilson, J. Am. Chem. Soc., 86 (1964) 127. 27. T. D. Martin, D. M. Young, "Prediction of the Acute Toxicity (96-h LC50) of Organic Compounds to the Fathead Minnow Using a Group Contribution Method", Chem Res Toxicol, 14 (2001) 1378-1385.
This Page Intentionally Left Blank
ComputerAided MolecularDesign: Theoryand Practice L.E.K. Achenie, R Gani and V. Venkatasubramanian(Editors) 9 2003 Elsevier ScienceB.V. All fightsreserved.
23
Chapter 2: Molecular D e s i g n - G e n e r a t i o n & Test Methods E.A.Brignole & M.Cismondi
2.1 I N T R O D U C T I O N Traditionally the search for solvents or products for specific applications has been carried out by examining several compounds and families of compounds and selecting those with the desired properties. A more systematic approach to the solution of these problems is based on CAMD of solvents or products. In both cases an experimental validation of the component properties is recommended. The CAMD approach was introduced in the early eighties for the selection of solvents for separation process [1,2]. At that time the problem was formulated as follows: "Given a mixture and certain separation goals, synthesize, from the set of UNIFAC groups, molecular structures with the desired solvent properties. The groups are the building blocks for the synthesis process and the UNIFAC thermodynamic model is used for the evaluation of the primary solvent properties". UNIFAC is a group contribution based model [3] used for predicting the liquid phase activity coefficients of the compounds present in the mixture and the UNIFAC groups are the functional groups needed to represent the molecular structures of the compounds. These two stages: synthesis and evaluation are still the main components of the various types of CAMD techniques that have been developed. The extensive development of group contribution methods for the prediction of pure component and mixture properties has been a fertile ground for the generalized use of product molecular design techniques. The original CAMD approach can be defined as the backward product design problem: "giving a set of property constraints and certain performance indexes, generate chemical structures with the desired physico-chemical and/or environmental properties". Applications have been reported for the design of polymers [4], refrigerants [5,6], product substitution [7], solvents [8,9,10] and many more. The first solvent design studies were based on solution properties derived from the UNIFAC group contribution method for computing activity coefficients [3]. Several revisions and extensions to electrolytes, polymers and equations of state, of the original UNIFAC predictive package have been presented [11]; a group contribution equation of state (GC-EOS) based on similar but more detailed group definitions, has been extended to
24 new groups and gases [12-14]. For the prediction of pure component properties, such as heat capacities, solubility parameters, formation energies, critical properties, etc, different group definitions have been proposed [15]. However, correlation of pure component properties has also been proposed in terms of the original UNIFAC groups [16,17], which are also called first-order groups [17, 18]. In this chapter the original UNIFAC group definitions will be used throughout. This chapter presents the class of CAMD methods that is characterized as generate & test methods. At the macroscopic properties level, these type of methods were first developed for solvent selection and design. For the design of large complex molecules involving a higher level of molecular structural representation than functional groups, most of the procedures also employ generate and test type of CAMD methods. In this chapter, however, only the method based on groups as building blocks is discussed in detail.
2.2 T H E E V O L U T I O N OF CAMD
The elements of a CAMD technique can be divided into algorithmic stages dealing with generation of molecules and testing of generated molecules, that is, i) the "generate" or molecular synthesis stage and ii) the "test" or molecular evaluation stage. The main features of the molecular synthesis stage are: group selection, group characterization and molecular feasibility rules. The result of the molecular synthesis stage is a number of feasible molecular structures. The main features of the molecular evaluation stage are: group contribution methods for property estimation, calculated properties, property constraints and evaluation (performance indexes). The final result is a ranked set of product candidates. 2.2.1 M o l e c u l a r S y n t h e s i s
Molecules are synthesized by joining groups with free-attachments until no free-attachments remain in the generated structure. This means that the search (or design) for suitable molecules is not limited to a given set of molecules. Although this is an attractive feature of CAMD, it also has its drawback - the number of structures that may be generated can be very large. Another important feature with respect to properties prediction (forward problem) and CAMD (reverse problem) is that while in the forward problem the groups representing a molecule are given, in the reverse problem, the group's free-attachment properties are also important [1,2] and need to be analysed. The free-attachments of a group are the number of chemical bonds available to neighbouring groups for attachment (or combination). The characterisation of the group's combination properties is needed mainly to satisfy two criteria:
25
i) ii)
To obtain chemically feasible structures. To avoid proximity effects t h a t could lead to unreliable UNIFAC predictions.
Therefore, the generation of feasible molecular structures from the groups is subject to several restrictions and is based on the f r e e - a t t a c h m e n t s of the groups. Some of the restrictions are the result of the way the groups in the UNIFAC table are defined, while other restrictions are made to prevent the formation of unstable compounds or the generation of new functional groups such as acetals (for which the property predictions will be uncertain). In an earlier publication on molecular design using UNIFAC groups [1], a set of combination rules were formulated: a) Groups with two a t t a c h m e n t s cannot be combined to obtain a double bond. b) Aromatic groups with two a t t a c h m e n t s (such as "ACCH2" see Table 3) m u s t always have one a t t a c h m e n t to the aromatic ring. c) All non-hydrocarbon groups can only combine with a carbon attachment. d) Only one bond of the carbon atom can be used for a t t a c h m e n t s with bonds other t h a n those of carbon or hydrogen atoms. In later works [2,8] a more detailed group characterisation was introduced allowing a more general formulation of feasibility rules for aliphatic and aromatic compounds. The main chemical property used for the generation of combination rules was the electro negativity of the group bonds [2,8,9]. O t h e r authors have proposed feasibility rules t h a t satisfy the molecule n e u t r a l i t y conditions. However, the chemical stability of the components is, in m a n y cases, not g u a r a n t e e d [5,6] with such feasibility rules. This is p a r t l y due to the way groups are defined in different group contribution methods and/or the lack of proper combination rules for the groups.
Classification of Groups The UNIFAC groups with free-attachments (or bonds) have one or more a t t a c h m e n t s for combination among themselves. Groups with only one free a t t a c h m e n t are defined as "terminal " groups. All other groups with more t h a n one free a t t a c h m e n t are defined as "intermediate" groups. There are three types of i n t e r m e d i a t e groups (i.e., groups with multiple attachments): radial, linear and mixed. In the groups of the UNIFAC p a r a m e t e r tables, there are no more t h a n two atoms with "free" a t t a c h m e n t s . The "free" a t t a c h m e n t s of a group m a y be characterised by two properties: i) a t t a c h m e n t status, which takes into account the combination properties and ii) valence, the n u m b e r of a t t a c h m e n t s . Four
25 types of a t t a c h m e n t s , for paraffinic groups have been defined on the basis of t h e i r electro negativity: 9 K: severely restricted a t t a c h m e n t , e.g., 9 L: partially restricted a t t a c h m e n t , e.g., 9 M : unrestricted carbon a t t a c h m e n t in valence groups 9 J : unrestricted carbon a t t a c h m e n t s in "-CH2-","-CH ( CHz C O) (CHe)-(CH2) (C O CHs) ---> (CHs C O) (CH2)-(CH)-(CHJ (C O CHs)
"(OH) The last is a branched structure with a t e r t i a r y carbon linked to an (OH) group. This example shows how the addition each K group requires the introduction of J-J bonds in the final structures. This synthesis concept can be formulated as follows: K S NJJ K- I S NJJ
(cyclic) (noncyclic)
(2) (3)
where N J J is the number of J - J bonds These conditions are valid for both i n t e r m e d i a t e and final structures. Therefore the new feasibility criteria consist on determining the N J J by counting the n u m b e r of type J a t t a c h m e n t s available. A "J a t t a c h m e n t s balance" could be obtained as follows: Xi i Ji - 2 N J J + N J F
when K < N J F m
(4)
or
i Ji - 2 N J J + N J F + 2 (K-NJF)
w hen K > N J F
(5)
where the n u m b e r of J free a t t a c h m e n t s is given by: N J F - J8 + 2 J4 + 2 (non cyclic and J >_1)
(6)
or
N J F - Jz + 2 J4
(7)
(cyclic)
In the final structure (non cyclic) of the previous example: (CH3C O) (CH2)-(CH)-(CH2)(C O CH3)
(OH)
31 J2=2; J3 =1; NJF=3, N J J = 2 ; K=3; Zi i J i - 7; J=3 Therefore the structure verifies the feasibility criteria given by equation (3). However if this criterion is applied to FMSa: (HCOO)(CH)(CH3)(OH) discussed in the previous section:
J3 =1; NJF=3, NJJ=O ; K=2; Xi i J i - 3; J=l The s t r u c t u r e is unfeasible because it does not satisfy equation 3. W h e n K > NJF, a (K-NJF) n u m b e r of K should be inserted in the i n t e r m e d i a t e s t r u c t u r e requiring twice as m a n y additional J bonds (equation 5) to obtain a feasible structure. For example the following final structure is unfeasible: (CH3 C O) (CH2C O) (CH2)-(CH)-(CH2) (C O CHs)
"(OH) J3 =1; NJF=3, N J J = 2 ; K=4; Zi i J i - 9; J=4 On the basis of the previous definitions (equation 1) and equations 2 to 7, the general feasibility criteria derived for linear or branched structures are shown in Table 3, where J is the number of subgroups J given by e q u a t i o n / . F r o m Table 3 it can be seen t h a t for the case where K > N J F an additional (CH2) is required in the previous example in order to obtain a feasible molecule. When N J F = 0 then J=0, in this case for K=I the final molecule is obtained only by combining the K group with a M group (CH3). This is the case, for example, of methanol (CH3)(OH) where M=I; J=0; K=I. In the application of the feasibility criteria of Table 3, K and J are the total n u m b e r of groups or subgroups of each kind t h a t participate in the molecule irrespective of their valence. The criteria for the aromatic parts of the structures are those indicated in Table 1 and should be combined with the ones of Table 3 in the synthesis of mixed (aromatic - paraffinic) structures. Considering t h a t the new group characterisation gives more detailed properties of the functional group, the feasibility criteria of Table 3 can be extended to different group definitions.
Table 3: Feas!bility criteria for linear and cyclic branchedstructures K NJF Non cyclic structures Cyclic structures J-0
............... :::::::::::::::::::::::::::::::::: ..................................................
::::::: ........ :............................
KSJ KSJ -. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 K ) ................Lc__0..E ( c_~
.......... (K,2)
.... (CH--CH)_ (CC12) (SiH2)
(CH2=C) (CHNO~_) (SiH20)
. . . . . . . . . (,CH3N) (C=C) (C4H2S)
..........................................................
(CsH3N) ......... (COO)
[ ........................(K,4) ................. (C-C) .... (Si) ..................,.........ii{-sio)ii---ii--iiiiiii-ili/iii --ii_i_i._i/__.i./__] [(I,1) . . . . . . (ACH) ............. (ACF i ................................................................................ ] ._(E._!..)_
L
(ACOH) tACCH3) .(A_C__N02)...................................
(K, 1) (H, 1)
(.A_C_~2)
k4CCl)
(AC)
R e d u c i n g the C o m b i n a t o r i a l Size of the P r o b l e m The group characterisation given in Table 4 indicates that there are only 19 d i f f e r e n t c o m b i n a t i o n p r o p e r t i e s of t h e U N I F A C g r o u p s . T h e r e f o r e , u s i n g t h e f e a s i b i l i t y c r i t e r i a of T a b l e 3 a n e f f i c i e n t c o m b i n a t o r i a l s y n t h e s i s of b r a n c h e d m o l e c u l e s is i m p l e m e n t e d o n t h e b a s i s of m e t h a g r o u p s , i.e. groups with the same combination properties as indicated in the first c o l u m n of T a b l e 4. I n t h e s y n t h e s i s of l i n e a r m o l e c u l e s t h e i n t e r m e d i a t e structures have two free attachments. However, the number of f r e e attachments in branched intermediate s t r u c t u r e s is a l w a y s l a r g e r t h a n two:
N F A - 2 + N V 8 + 2NVr
(non cyclic)
(10)
(cyclic)
(11)
or
N F A - N V 3 + 2NV4
33 where NV3 and NV4 a r e the number of groups of valence three and four. Computer programs based on the above combination rules and group classification can easily be developed [18] and consist on the following steps: 1. Definition by the user of the desired product or solvent property constraints and performance index. 2. Selection of the intermediate and terminal groups in an interactive way. 3. Generation of metha- Intermediate Molecular Structures with NFAs from 2 to 8, using the available metha-groups (intermediate) and satisfying the feasibility criteria. A maximum number of 12 groups in the Final Molecular Structures (FMS) is allowed. Then, each metha-IMS is replaced by all different possible combinations of the selected groups to form "real" IMSs. 4. In a similar way, pre-FMSs are obtained by adding (NFA-2) terminal groups to each IMS. 5. Screening of the pre-FMSs according to the physical property constraints. 6. Termination of Solvent Molecular Structures (SMSs) by adding to each accepted IMS different combinations of two terminal groups that conserve the molecule feasibility. 7. Screening of the synthesized SMSs according to the physical property constraints. 8. Ranking the selected products in accordance with molecular complexity and specific performance index, indicating their predicted physico-chemical or environmental properties. The size of the combinatorial synthesis problem increases when considering branched structures because of the large number of free attachments of the intermediate structures (equations 10 and 11) and the larger number of groups available (see Table 4). Usually, the UNIFAC or other group contribution methods for computation of activity coefficients or component fugacities are used in the case of solvent design. The application of these methods requires the availability of binary parameters between the groups participating in the molecule synthesis stage. Therefore, between the steps 3-4, 4-5 and 6-7 the molecular synthesis method eliminates all intermediate and final structures that contain pairs of groups (one or more) with unknown binary interaction parameters, limiting in this way the size of the combinatorial problem and reducing the computing time. The results of the synthesis procedure are illustrated with an example of solvent design for the separation of benzene from hexane by liquid extraction. For this example the following groups were chosen:
34 (C), (CH-O), (CHNH), (CH), (CH 3N), (CHNO 2), (C O O), (CH 2 CO), (C H2 NH), (DM F-2), (CH2), (O H), (CH 3 COO), (H COO), (CH2 NH2), (CH 3). The only physical property constraint for an intermediate structure is to have a maximum solvent loss of 10%. For the final solvents the main physical constrains are: selectivity greater than five and molecular weight less than 240. In this example, 16 groups (intermediate:10 and terminal groups:6) are selected for the synthesis of solvents with a minimum of two groups and a maximum of 12 in the final structure. In this case 10 meta groups can be identified within the selected set of groups. An example of the number of structures that are generated in the different steps of the molecular synthesis process is given in Table 5. The direct combination of these groups to form structures from 2 to 12 groups results in the generation of 646635 structures. The results of Table 5 show that the use of feasibility rules, physical constraints and the lack of binary interaction coefficients between groups, leads to a significant reduction in the size of the synthesis problem. However, when pure component properties are dominant in the product design, the size of the combinatorial problem is not limited by the availability of binary parameters. A sound strategy to handle this problem is to make a preliminary search of product candidates using only single and dual valence groups. Thereafter, it is convenient to select the main group families that lead to the most promising branched structures. Note that in this case a database search method may also be employed, provided that a large database is available. 2.2.2 T e s t or M o l e c u l e E v a l u a t i o n S t a g e
The test stages of generate & test methods, is closely related to the type of product design problem being solved. In this chapter, only solvent-based separation problems are considered. A separation operation requires specific values or ranges of solvent properties for each particular application. These properties determine the space of physical properties constraints that limit the search space of solvent structures. The solvent property constraints may have lower or upper bounds or both. Even though it is difficult to define the conditions for an optimum solvent, the solvents synthesised by molecular design can be ranked according to a performance index and molecular complexity. The development of molecular design applications to different separation problems therefore requires the identification of these physical constraints and the formulation of predictions based on group contributions methods.
35 Table 5: Solvent design for separation of benzene from hexane by liquid extraction N u m b e r of Groups selected: N u m b e r of m e t h a - intermediate structures generated N u m b e r of m e t h a - pre final solventes N u m b e r of pre-final solvents - P r e - f i n a l solvents rejected by MW restriction - P r e - f i n a l solvents rejected by lack of binary parameters - P r e - f i n a l solvents rejected by solvent loss constraints N u m b e r of final solvents generated Number of final solvents that satisfy all physical constraints
Potential Solvents
Selectivity
(CH3)(CH2)2(CH2COO)2(HCO0) (CH3)3(CH2)2(C)(HCO0) 3
8.8 7.5
16 2344 10552 101934 81303 14475 4120 8823 277
Distribution Coefficient 0.85 0.76
Liquid Extraction When selecting of a solvent for liquid extraction, it is important to consider all the separating operations involved in the liquid extraction process:
i) ii) iii) iv)
solvent extractor, raffinate removal from extract solute purification solvent recovery column.
The scheme shown in Fig.1 is typical for the extraction of a dilute component. If the solute is recovered by extraction from a dilute solution, the solute/solvent relative volatility should be much greater t h a n one, and the solvent solubilities in the raffinate should be very low. Otherwise, economic considerations screen out liquid extraction as infeasible for the separation under consideration. Cockrem et a/.[20] indicated t h a t the solute distribution coefficient and the solvent solubility in the raffinate (solvent loss) are usually the dominant properties for solvent selection in liquid extraction. Low solvent loss in the raffinate also determines raffinate-extract immiscibility. High solvent selectivity is also required to reduce the cost of solute recovery and purification from the extract. Solute - solvent azeotrope formation and high relative volatility for the solute solvent pair can be assured if a minimum boiling point difference is required. In general, the evaluation of potential solvents for liquid
36 extraction is based on primary solvent properties and pure component properties (boiling points, heats of vaporisation, densities and molecular weights). The primary solvent properties: selectivity, distribution coefficient, solvent loss and solvent power can be obtained from UNIFAC group contribution predictions of infinite dilution activity coefficients. The pure component properties of the solvent structures generated with UNIFAC groups can be estimated by group contribution methods (Pretel et al.[11], Gani and Constatinou [17]. The primary solvent properties can be estimated through the expressions given in Table 6: Pretel et al. [8] evaluated the performance of the UNIFAC method with respect to its liquid-liquid and liquid-vapour group interaction parameter tables. Their conclusion is that the vapour-liquid parameter table renders more reliable predictions at infinite dilution conditions than the liquid liquid parameter table. In addition, there are a greater number of groups and parameters available in the liquid-vapour parameter table and its revisions (Gmehling et al., 1982; Macedo et al., 1983; Tiegs et al., 1987; and Hansen et al., 1991), than in the liquid-liquid parameter table (Magnussen et al. 1981).
Table 6. UNIFAC Evaluation of Primary Solvent Properties for Liquid Extraction
Property
Estimate
(mass basis)
/3 =
Solvent Selectivity
MWA
r;,sMW
Solvent Power
MW A Sp -
Solute Distribution Coefficient
tr/--
r;,s 1 MWs
Solvent Loss
37
v
Extract A+B+S
Feed A+B
B
Raffinate removal column Solute A
Extractor Solvent and solute separation column Solvent
-
1'
J
Figure 1. Typical cycle for the extraction of a dilute solute E x t r a c t i v e Distillation. The s t a n d a r d extractive distillation process works in two steps: the extractive distillation column and the solvent recovery column. The primary solvent properties are the degree in which the solvent increases the relative volatility between the mixture components, the normal boiling point difference between the solute and the solvent, and the amount of solvent required to break the azeotrope in the case of an azeotropic feed mixture. Another important constraint is t h a t the solvent should be miscible in the mixture at the desired concentration range. This constraint is assessed with the phase stability criterion proposed by Michelsen [21]. Furthermore, the feed concentration should be considered in selecting the component that should removed from the top of the extractive distillation column. This choice determines the n a t u r e of the solvents to be generated with the purpose of increasing or decreasing the r of the feed mixture. The CAMD procedure estimates the solvent properties on the basis of activity coefficients and pure component properties, on the basis of group contribution methods based on UNIFAC groups. The computation of the desired properties on the basis of these estimates is given in Table 7.
38 Table 7. M O L D E S Property Estimates for Solvent Evaluation for Extractive Distillation Property Estimate
P: Relative Volatility 1 MWA
Solvent Power (mass basis) Sp
~ m
Y~,s MWs
Minimum amount of solvent to break the azeotrope (molar fraction) Phase Stability Criterion Performance Index
Xms,[O~,B,A]xms -1. ~S,azeotropeXms ~-~1.0 O~B,A 1 MWS x m
2.3 A P P L I C A T I O N EXAMPLES Application of a CAMD method based on the generate & test approach is highlighted through two examples involving solvent-based separations. 2.3.1 S o l v e n t for e t h a n o l r e c o v e r y
The ethanol recovery from aqueous solutions is a problem of great industrial interest. Ethanol recovery and dehydration by distillation and azeotropic distillation is very energy intensive. The potential of liquid extraction for this application can be readily explored by CAMD. The search of a potential solvent for this application illustrates the effect of physical property constraints, on solvent selection. The solvent properties desired for this application are: fl > 7.0 ( w t . / w t . )
T b s - TbA > 50 K
m> 1.0 ( w t J w t . ) S1 < 0.1 wt. %
Molecular design results for several homologue families of organic solvents are shown in Table 8. The low selectivities of alkyl amines and diols exclude all the components of these families as potential solvents. Even though all families satisfy the boiling point difference, the requirement of distribution coefficients greater than one rejects all solvents with MW
39 greater than 100. However the solvent loss restriction precisely requires higher molecular weights (>140, more CH2 groups) for the alcohols and carboxylic acid families; therefore no solvents that meet all the specifications can be found. We can say that molecular design excludes liquid extraction as a feasible operation for this particular problem.
Table 8. Effect of solvent property constraints on ethanol extraction from aqueous solution solvent molecular design
Solvent Family
fl> 7.0 (wtJwt.)
T b S - TbA >
m > 1.0
50 K
(wt./wt.)
Phenyl acids
(+)
(+)
(-)
Alcohols
(+)
(+)
Carboxylic acids Diols
(+)
(+)
(+) if MW100 (+) if MW100
(-)
Alkyl amines
(-)
S1 < 0.1 w t . %
(-) if MW 3.0 Sp >_30.0, wt%
(7)
Tbs - TbA > 50K
The best solvents found by MOLDES are shown in Table 9, together with experimental relative volatility values obtained by Cepeda and Resa (1984). Table 9. CAMD solvent selection for the extractive distillation of n-Propyl Acetate rrom n-Prop~,l Alcohol at atmospheric pressure
Solvent
a B ,A
O~B,A, exp
Ethylbenzene Nonene n-Decane Chlorobenzene Decalin Chloroctoane Xylene Dichlorobenzene Mesitylene
5.4 4.64 5.26 3.71 4.64 4.95 3.95 3.1 2.32
4.23 4.63 4.7 4.63 4.37 4.79 4.24
Sp
Xms
PI%
80.6 46.45 35.7 86.0 42.6 47.11 67.1 60.93 35.7
35.9 35.7 37.6 33.7 34.1 34.8 40.9 30.6 47.8
14.2 10.31 9.84 9.78 9.71 9.57 9.1 6.89 4.05
For this separation problem ethylbenzene, nonene, n-decane and xylenes are the most attractive solvents. From their experimental study Cepeda and Resa recommended the use of xylenes and saturated hydrocarbons with more t h a n 9 atoms. If the reverse problem is studied, that is, if propyl alcohol is the solute of the extractive distillation column and it is removed from the bottom, together with the solvent, the selection changes drastically and now the best solvents are Ethylene Glycol or Propylene Glycol (Pretel et al.[8])).
2.4 R E F E R E N C E S
1. R.Gani and E.A.Brignole, Fluid Phase Equilibria 13 (1983) 331 2. E.A.Brignole, S.Bottini, R.Gani, Fluid Phase Equilibria 29 (1986) 125 3. Aa.Fredenslund, J.Gmehling and P.Rasmussen, "Vapor liquid equilibria using UNIFAC", Elsevier Scientific, Amsterdan, 1977. 4. V.Venkatasubramanian, K.Chan, J.M.Carutheres, Computer Chem.Eng 18 (1994) 833.
41 5. K.G. Joback and G.Stephanopoulos "Designing molecules possessing desired physical property values" Proceedings FOCAPD'89, Snowmass, CO, 1989. 6. N.Churi, L.E.K.Achenie, Ind. Eng. Chem.Res. 35 (1996) 3788 7. P.M.Harper, R.Gani, P.Kolar, T.Ishikawa, Fluid Phase Equilibria, 158160 (1999) 337 8. E.J.Pretel, P.Araya LSpez, S.B.Bottini, E.A.Brignole, AIChE Journal 40 (1994) 1349 9. R.Gani, B. Nielsen, Aa. Fredenslund, AIChE J. 37 (1991) 1318 10. O.Odele, S.Machietto, Fluid Phase Equilibria 82 (1993) 47 11.Aa.Fredenslund, J.Sorensen, Ch.4, "Group Contribution Methods" in "Models for Thermodynamic and Phase Equilibria Calculations", editor S.I.Sandler, Marcel Dekker, Inc., New York, 1994. 12.S. Skjold-Jorgensen, Ind.Eng.Chem.Res. 27 (1988) 110 13.H.P.Gros, S.Bottini, E.A.Brignole, Fluid Phase Equilibria 116 (1996) 537. 14. S.Espinosa, G.Foco, A.Bermfidez, T.Fornari, Fluid Phase Equilibria 172 (2000) 129 15.R.C.Reid, J.M.Prausnitz, B.E.PSling ,"The properties of gases and liquids", 4th Ed. Graw Hill Inc., New York, 1987. 16. E.Pretel, P.Lopez, A.Mengarelli, E.Brignole, Latin American Applied Res. 22 (1992) 187 17. L.Constantinou, R. Gani, AIChE J 40 (1994)1697 18.M.Cismondi, E.A.Brignole, Proceedings of the 11th European Symposium on Cumper Aided Process Engineering, Denmark, May 2001, Edited by R.Gani and S.Bay Jorgensen, Elsevier, ISBN:0-44450709-4, pp.375-380. 19. Cockrem, M., J.Flatt and E. Lightfoot, Sep.Sci. and Technol., 24 (1989)769 20. E.Cepeda, J.M.Resa, An.Quire. 80 (1984)755
This Page Intentionally Left Blank
Computer Aided MolecularDesign: Theoryand Practice L.E.K. Achenie, R. Gani and V. Venkatasubramanian(Editors) 9 2003 Elsevier ScienceB.V. All fightsreserved.
43
C h a p t e r 3: O p t i m i z a t i o n M e t h o d s in C A M D - I
M. Sinha, L. E. K. Achenie & G. M. Ostrovsky
3.1 I N T R O D U C T I O N Chemical product design addresses the design of single component chemical compounds and/or mixtures (blends) of compounds with prespecified thermo-physical properties. In recent years, the traditional wet chemistry based chemical product design is being supplemented with computer-aided approaches. The latter is formally designated as computer-aided product design. To be consistent with this book, we will employ the more conventional name, namely computer-aided molecular design (CAMD) in this chapter. The CAMD problem can often be posed as a mathematical program in which a number of binary and continuous variables define the search space (Duvedi and Achenie, 1996; Churi and Achenie, 1996; Maranas, 1997; Odele and Machietto, 1993; Pistikopoulos and Stefanis, 1998). A binary variable is an integer variable that can have one of two possible values, for example 0 and 1. This chapter discusses a branch and bound approach to solving the resulting mathematical program.
3.2 PROBLEM DEFINITION A typical molecular design problem may be minimization or maximization subject to constraints. Thus a CAMD problem for design in which thermo-physical property modeled as
modeled as a single objective structural and performance single component molecular matching is sought may be
min f (x,v,O)
(1)
x,v,O
(pj(X,V,O)~O, h i (x, v,O) = 0,
j=l
.... ,m 1
i = 1,..., m2
(2) (3)
w h e r e , is a vector of binary variables that define the molecular structure, x is a vector of continuous variable such as process variables (pressure, temperature, etc.) and 0 is a vector of group contribution parameters. Note t h a t additional binary variables may be included in , to indicate additional constraints on the kind of molecular structures that can be
44 generated, f(x,v,O) is the performance objective function (for example some undesirable property such as a compound's ozone depletion potential). The group contribution model is a structure-property correlation that has found wide use in the chemical process industry. The constraints involve (a) structural feasibility, (b) physical property targets, and (c) process constraints. The constraints associated with structural feasibility are usually linear. Physical property targets often have the form p~ _nmin j
cl
EEE j
cl
(12)
ct l'l'max
(13)
ct
where nmin and nmax are the m i n i m u m and m a x i m u m allowable numbers of groups. These constraints indirectly restrict chain length in homologous series. More direct constraints can be written by bounding the sums of the numbers of group types in any series. Since the formation and cleavage of carbon-carbon bonds often requires extreme operating conditions which are likely to disrupt the chemistry of interest, it m a y be desirable to avoid co-materials which m u s t undergo changes in carbon skeletal structure in order to arrive at the product. In general this is difficult to achieve, since co-material design focuses on types and numbers of groups, r a t h e r t h a n on the connections between them. However, m a n y undesirable materials can be avoided by imposing restrictions on the allowable types and numbers of groups. The numbers of branches, substituents, substituted sites and functional groups may also be limited in this way to avoid co-materials which are significantly more or less structurally complicated t h a n the product. For example, if only monosubstituted benzenes are required, the following equations are introduced: EETtACH, cl ct EEnAC,cl,ct cl ct
and m tures. cation which
cl,ct = 5
-1
is set to zero in the octet rule (equation 11) to allow only monocyclic strucAdditional restrictions can be incorporated in the stoichiometry identifiexercise to avoid, or at least further reduce, the generation of chemistries alter carbon skeletal structures, if required.
178
Objective Function The objective is set as the minimisation of the total number of groups in a molecule:
MinimiseEEEnj,d,ct j
cl
(14)
ct
In this way, co-materials are enumerated subject to the above rules, starting with the simplest first. Solution Procedure The above formulation consists entirely of binary and integer variables in linear equations and is therefore an mixed integer linear programming (MILP) problem. In order to generate a set of co-materials, the problem is solved repeatedly with an integer cut written after each iteration to exclude the current optimal group combination from future iterations. However, it is the precise combination of numbers of groups which must be eliminated, not just the combination of group types (excluding group type combinations would eliminate homologous series). In order to do this the binary variable CUTj,t is introduced, which is related to nj,cl,ct as follows:
(15) t
cl
ct
CUTj,t
-
1
(16)
t
According to these equations, CUTj,t is non-zero only for t = t' where t' is the n u m b e r of times group j occurs in a molecule. CUTj,t is zero for all other values of t # t'. The integer cuts are written in terms of CUTj,t. Note t h a t linear group contribution property prediction equations and bounds may be included in the above formulation without affecting the solution procedure. For example, to exclude co-materials with high toxicity, the following equation could be introduced based upon the lethal concentration (molfl) causing 50% mortality in fathead minnow (LC50):
where dl/j is the toxicity contribution of group j from Gao et al. (1992), and LC5Omin is the lowest permitted LC50. Since LV5Omin is fixed, this equation is linear. ADDITIONAL MOLECULES To complete any stoichiometry, it may be necessary to include some simple additional molecules, which cannot be systematically designed using the above
179 procedure. A set of simple complete molecules appears as class zero in Constantinou et al. (1996). However, further molecules may be required on a case by case basis according to any existing industrial stoichiometries and the type of chemistries to be considered. Examples of such molecules include oxygen, hydrogen, hydrogen chloride or other hydrogen halides, chlorine or other halogen molecules, carbon monoxide and carbon dioxide. A subset of these, or a larger set, may be selected as required as the final step of co-material design.
7.4 S T O I C H I O M E T R Y I D E N T I F I C A T I O N F O R M U L A T I O N
The multistep reaction stoichiometry identification problem can be defined as follows. Given, (i) a desired product and desired production rate, (ii) a set of stoichiometric co-materials, (iii) cost information for each material and group contribution parameters for the corresponding group set (iv) a set of role specification and chemistry constraints and (v) a range of reactor operating conditions, then the objective is to determine a set of candidate multi-step reaction stoichiometries which are promising in terms of both economics and environmental impact. The model for the identification and economic and environmental evaluation of a single step reaction stoichiometry is presented below, followed by a description of the solution algorithm in which this model is used to develop multistep stoichiometries. The model consists of seven sets of equations; an atom balance, whole number stoichiometries constraints, role specification constraints, chemistry constraints, carbon structure constraints, pure component property prediction equations and a reactor process model. The sets employed in the model are shown in Table 2. Table 2: Stoichiometry Identification Model Sets E S C S ( c S) J
elements species carbon containing species chemical groups
The formulation is based on the assumption that chemical species undergo reactions either singly (e.g. thermal decomposition or isomerisation, ignoring any reagent, catalyst or solvent effects) or at most in pairs, so that the number of reactants is limited to at most two. An upper limit is applied on the total number of materials in each stoichiometry (since the number of reactants is limited
180 this effectively limits the number of co-products) and no competing reactions are considered (stoichiometry determination can only develop stoichiometric coproducts not side products). The following additional assumptions are made in the analysis: isobaric reactor operation at known pressure Ptot, gas phase reaction and perfect gas behaviour. Only the products and the reactants are costed, no process equipment or operating costs are considered and the inherent inaccuracies in the property prediction techniques and thermodynamic models employed are accepted. Clearly, incorporating side reactions will add to the impacts so t h a t the present results are lower bounds in this respect. The limits and cuts employed here are practical constraints which can be tightened or relaxed as desired. In principle, the thermodynamic model permits consideration of operation at any pressure. More detailed costing depends on more sophisticated process models. 7.4.1 A t o m B a l a n c e The starting point for this work is an atom balance equation which describes the chemistry of a particular set of S species composed of E elements (Rotstein et al., 1982). The atom balance is written as follows" c~E = 0
(18)
where c~ is the E - S atomic matrix and V~ is the S. 1 column vector of stoichiometric coefficients v~. It is a s s u m e d t h a t the r a n k of the matrix c~ is E. In general, S = E + m, so t h a t m represents the degrees of freedom (DOF's) in the system. These DOF's represent stoichiometric coefficients which m u s t be specified in order for the atom balance to be solved. The remaining S - m coefficients are then determined as functions of these. Clearly when m = 0, a unique solution exists, and when m >_ 1, there is an infinity of solutions, corresponding to an infinity of possible stoichiometries. 7.4.2 W h o l e N u m b e r S t o i c h i o m e t r i e s C o n s t r a i n t s At the atomic level, chemical species react in whole number ratios so t h a t in general, meaningful chemical reactions are written in terms of stoichiometric coefficients which are rational numbers (i.e. whole numbers or numbers which can be expressed as ratios of whole numbers) so t h a t through multiplication by appropriate factors, stoichiometries involving only whole n u m b e r coefficients can be obtained. In such stoichiometries the product coefficient is a whole number which may be greater t h a n or equal to unity. In their atom balances, Rotstein et al. (1982), and later Crabtree and E1-Halwagi
181
(1994), assigned the value unity to the product stoichiometric coefficient with no restrictions on the co-material coefficients. While this does not lead to any loss of generality, it potentially allows the development of an infinity of meaningless solutions in which the co-material coefficients are not rational numbers. In order to ensure t h a t only solutions involving whole n u m b e r stoichiometric coefficients are obtained, the following linear equations are introduced where vp is the stoichiometric coefficient of the desired product. vp _> 1
(19)
Vs c S
(20)
N
Xs -- ~
2(n-1)bns,
n=l
Assigning 89 >_ 1 allows the necessary flexibility in the value of the product stoichiometric coefficient so t h a t there is no loss of generality, x~ is a d u m m y coefficient which is defined as a positive, continuous variable. For each species s, this variable is expressed as a linear combination of binary (i.e. 0 - 1 ) variables bn~. In this way, the continuous coefficients x~ are constrained to take positive whole n u m b e r values in the range from zero to an upper limit d e t e r m i n e d by the value of N. The real stoichiometric coefficients v~ are related to the d u m m y coefficients x~ as follows: vs = xs - 2x~ii~,
Vs C S
The b i n a r y variable ii~ is necessary since the coefficients v~ m a y take positive or negative values. The variables ii~ take the value zero if species s is a product (v~ positive) and u n i t y if species s is a r e a c t a n t (v~ negative) so t h a t ii~ is the r e a c t a n t flag. This equation m a y be linearised using the Glover (1975) transformation, yielding: vs=xs-2.y~,
VscS
(21)
y~ - ?)max 9iis < 0,
Vs C S
(22)
xs § Vmax(ii~ -- 1) -- ys _ 0,
Vs E S VscS
(23)
y~-x~_0,
(24)
where y~ is a d u m m y variable for the product x~ii~ and Vmax is the m a x i m u m p e r m i t t e d m a g n i t u d e for any stoichiometric coefficient. The variables y~ are defined as positive continuous variables. To ensure t h a t t h e y t a k e non-zero values only w h e n species s is a reactant, the following additional constraint is applied: ys >_ iis,
Vs c S
(25)
182 Note t h a t for any particular stoichiometry, xs and vs are non-zero only for the species involved and zero for all other species, while ys is non-zero only for the reactants involved and zero for all other species (including products and coproducts).
7.4.3 Role Specification Constraints Role specification constraints (Fornari et al., 1994a, 1989) are used to restrict the participation of molecules in the stoichiometries; for example, to avoid certain stoichiometric co-products or to define a species as a raw material only. In order to apply such constraints the raw materials and products in any stoichiometry must be identified. Raw material identification is taken care of by the binary reactant flag iis, from the whole number stoichiometry constraints. Products are identified using the following equations: xs -- ys -- Vmax " Is 1 x 10-4(1-iii~),
VsES
(55)
Since n~y is zero for some s, iiis must also be introduced in to the G~y~expression: Gsys
i000 : E
nsfAGfs
8
+RTop~r[~nfsln((ny~+iii~)Poper)-En~yln(En~yPe)]s
~
(56)
Crabtree and E1-Halwagi (1994) use a similar approach to deal with species not involved in a particular stoichiometry, although they reported using the reaction equilibrium condition to determine the reaction equilibrium position. Note t h a t the chemical equilibrium condition provides a unique relationship between n~f and Toper, whereas in the G~y~ expression they are independent variables. Thus an additional temperature bound is required to ensure t h a t t e m p e r a t u r e is consistent with the extent bound from equation 48. This bound is calculated by solving the following equation for T':
AGR(T') = -RT'lnK~o
(57)
191 where K~zo is the equilibrium constant evaluated at cr = ergo "
lnK~z~= E v~ln ((n~ + v~c~~+
- E vsln ( ~ (nSi + v ~ P e ) )
(58)
The reactor operating temperature must then satisfy:
Toper> r'
(59)
THERMODYNAMIC AND ECONOMIC CONSTRAINTS The Gibbs Free Energy of reaction per mole of product is employed to eliminate thermodynamically infeasible solutions, using a 10 kcal/mol (or 41.868 kJ/mol) upper limit as follows: AGR < 41.868 vp
(60)
The profit associated with each reaction is calculated as follows, assuming t h a t any stoichiometric co-products are sold at their market value: Profit = ~ s
vsCs
(61)
vp
where C8 is the market value of species s using Chemical Prices (1998). Note t h a t individual reaction steps cannot be rejected on the basis of profit since the profit of any one step is not representative of the profit of the entire chemistry. ENVIRONMENTAL CONSIDERATIONS The environmental impact directly associated with carrying out each stoichiometry is assumed to arise only from the energy consumption necessary to maintain reactor temperature. By-products are not considered and it is assumed that there are no material emissions of the co-products of any stoichiometry. In addition, the impacts associated with separating the products from the recycle, and with all other downstream processing are ignored for the present. In these respects, the impact figures calculated here are very much lower bounds for the eventual process impacts. This simplistic treatment of environmental impact assessment reflects the level of information available at stoichiometry selection. In principle, the full range of life cycle assessment (LCA) based metrics available within the MEIM could be used to develop a full impact vector for each stoichiometry. However, since air emissions are the dominant form of energy associated waste, the critical air mass (CTAM) metric is chosen. According to Stefanis (1996), the critical air mass associated with energy production is 1.629 • l0 s kg air/MWh. Assuming that the environmental impact per unit energy of maintaining reactor temperature is the same as that of burning fossil
192 fuels to produce electricity, the environmental impact arising from the reactor energy demand per mole of desired product is:
CTAME(kgair/hr)=l'629•
V/(Qr~act~ )3600
(62)
Note t h a t it is assumed here for simplicity that the impact of cooling the reactor (which is necessary for negative is the same as that of heating it. This is a simplistic assumption, however, in this way reactions which require withdrawal of energy are equally penalised in environmental terms as those which require energy supply.
Qr~actor)
This assumption is made on the basis that reactions which require energy withdrawal are likely to be exothermic reactions occurring at moderate temperatures, so t h a t the heat of reaction term dominates the energy balance (equation 51). This is only likely to occur towards the reaction temperature lower bound (i.e. 300K) at which the reactor temperature is too low to use cooling water at ambient temperature. Thus, some kind of refrigeration would be required which carries with it a high energy demand and therefore a high impact, associated with compression requirements. In order to complete the impact assessment, the input wastes associated with the materials consumed in any stoichiometry must be included. However, the quantification of the input waste of any material can be a lengthy exercise, since all processing steps necessary to produce the material from naturally occurring substances must be considered in accordance with the principles of LCA (Heijungs 1992; ISO 14040, 1997; SETAC, 1993). Thus, r a t h e r t h a n performing this exercise for all co-materials, it is more efficient to assess the input wastes only of those materials which are identified as raw materials by the multi-step stoichiometry identification formulation (i.e. those materials with no precursors). Since these materials are not known at the outset, input waste assessment can only be performed after stoichiometry identification.
et al.,
Provided input wastes are included in this way, consistent impact figures can be obtained for multi-step stoichiometries involving branches of different lengths, and different stoichiometries can be compared on a consistent basis.
7.5 SOLVING THE M U L T I S T E P S T O I C H I O M E T R Y I D E N T I F I C A T I O N PROBLEM 7.5.1 O v e r v i e w
It is desirable to use the above model to enumerate and evaluate multistep stoichiometries simultaneously and within a framework of constrained
implicitly
193 optimisation. However, the optimisation objective (minimise Gsus) is not suitable for a such an approach since it must be applied to each individual stoichiometric step and furthermore, even for a single step stoichiometry, the model is a large mixed integer nonlinear programming (MINLP) problem. It involves a large number of optimisation variables, including for each species s: the binary variables is, iis, iiis, bns (Vrt) and also the binaries qt,es (Vt), Zcs and p if the carbon structure constraints are employed, and the continuous variables vs, xs, ys, n4, nsi, Fs, Ps and AGfs. Furthermore, the relationships between these variables are not trivial, including many instances of products between binary and continuous variables. Thus, in order to solve the multistep stoichiometry identification problem, a decomposition based approach is adopted in which the single step problem is solved by explicit enumeration and subsequent evaluation of stoichiometries in two sequential steps. This procedure is then applied successively in an algor i t h m designed to build up multistep reaction stoichiometries. The enumeration and evaluation of single step stoichiometries are discussed below, followed by a description of the multistep stoichiometry identification solution algorithm. SINGLE STEP STOICHIOMETRY ENUMERATION The basic single step stoichiometry enumeration formulation consists of equations 18 - 33. Carbon structure constraints (equations 3 4 - 39) are optional and are included on a case by case basis. With the exception of the more complicated carbon structure constraints (equations 37 and 38) all equations are linear. However, as discussed above, provided equations 36, 37 and 39 are solved in advance, equation 38 becomes linear. Thus, with or without carbon structure constraints, the single step stoichiometry enumeration problem can be formulated as an mixed integer linear programming (MILP) problem so t h a t optimal solutions can be guaranteed. Recognising t h a t simple stoichiometries with few reactants and co-products are more attractive t h a n complex ones (which in general require more complex reaction and separation technologies) a new objective function is introduced in order to extract stoichiometries systematically from the matrix _%starting with the simplest first. The number of materials involved in any stoichiometry Nspe is obtained by summing the reactant and co-product flag values, so t h a t this objective is written as follows: minimise E{(i~
+ ii~)
= N~p~}
(63)
8
In order to identify a set of candidate stoichiometries this problem must be solved repeatedly, with integer cuts introduced at each iteration to exclude previous solutions. Accordingly, the simplest stoichiometries are enumerated first
194 and as cuts are added the solutions become progressively more complicated. SINGLE STEP STOICHIOMETRY EVALUATION The single step stoichiometry evaluation problem consists of the property prediction and reactor process model, equations 40 - 62. For each stoichiometry, this model is solved immediately after the stoichiometry enumeration model. Thus, vs, xs, ys, is, iis and iiis are known and are treated as parameters in the stoichiometry evaluation problem which is then reduced to an nonlinear prog r a m m i n g (NLP) problem. The optimisation objective is to minimise Gsy~ and the main optimisation variables are Top~, ~ and n/s.
7.5.2 Multistep Stoichiometry Identification Algorithm OVERVIEW OF ALGORITHM In order to generate multistep stoichiometries the single step stoichiometry enumeration and evaluation problems are solved successively using a depth first enumeration strategy, in which the desired maximum number of reaction steps is specified in advance. The operation of the algorithm is schematically depicted in Figure 1 for the case where at most three reaction steps are allowed.
: ~ / s t e = 0 - - 1 ~"
]
-- . . . . . .
l o-,
.......... t~? ......
- Eva
....
don
"1
~A,SaBUE
,
[ Eva
'
~A~a~
tion
r
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System 2B
~'~sm,,
System 214
,
iion l"~
.....
'System 2C ' ~ [ E. . . . . . tion I
/ Evaluation I .....................
'System 2D
~B"
,
..... ,
:
........
|
', , .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
, .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
~ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Figure 1: Multistep Stoichiometry Identification Algorithm At the first level, system zero, the final desired product is the target molecule, and a single stoichiometry involving up to two first generation precursor reactants is extracted from the matrix es. One of these first generation precursors is then arbitrarily selected as the target molecule for system 1A and a single sto-
195 ichiometry involving up to two second generation precursor reactants is identified which leads to this compound. One of these second generation precursors is then selected as the target molecule for system 2A. Since system 2A completes this branch of the enumeration exercise, it is solved iteratively until all stoichiometries leading to this second generation precursor target molecule have been enumerated. Once this has been achieved, system 2B is solved iteratively for all stoichiometries leading to the other second generation precursor. With this pair of second generation precursors completely fathomed, System 1A is run again to generate two more, which are then treated as the target molecules for system 2A and system 2B. This process is repeated until all stoichiometries leading to the first generation precursor target molecule have been enumerated. Once this has been achieved, systems 1B, 2C and 2D are employed to enumerate all stoichiometries leading to the other first generation precursor. System zero is then solved again to generate two more first generation precursors, and the whole procedure is repeated until all stoichiometries leading to the final desired product have been enumerated. In this way, multistep reaction stoichiometries are developed in which a family tree of precursors are linked to each other by individual reaction stoichiometries which lead eventually to the desired product. In principle, this approach may be applied to generate any number of successive reaction steps. Each system comprises of both stoichiometry enumeration and evaluation, so t h a t for each stoichiometry, the evaluation problem is solved immediately after the stoichiometry is generated. Infeasibility in the linear stoichiometry enumeration problem implies that no stoichiometries exist which lead to the particular target molecule, while infeasibility in the non-linear stoichiometry evaluation model implies violation of the thermodynamic constraints (ignoring numerical problems) for a particular stoichiometry. Thus, if the stoichiometry enumeration model is initially infeasible or becomes infeasible after all possible stoichiometries in a particular system have been enumerated, the algorithm immediately moves onto the next system. If however, the stoichiometry evaluation model is infeasible for a particular stoichiometry, the algorithm continues to enumerate stoichiometries within the same system. The results of the system are stored only if both problems are feasible. The same occurrence matrix _%and variables are used throughout all systems, so t h a t each time a system is solved all variables are over written. Thus, parameters are employed to store results and to communicate variable values between stoichiometry enumeration and evaluation problems within each system. Different role specification constraints, chemistry or carbon structure constraints may be included in different systems, if required, simply by including different equations in the model definitions. Otherwise, the same equations and models are also employed throughout all systems.
196 INTEGER CUTS Within the algorithm, integer cuts are automatically written each time a syst e m is solved. The cuts are written in such a way t h a t they prevent the same stoichiometry from occurring again both within the current system and within all s u b s e q u e n t systems. Furthermore, they are written to prevent the reappearance of the stoichiometry in both forward and reverse directions. In this way, each m a t e r i a l which appears as a reactant in the entire multistep stoichiometry n e t w o r k is fathomed only once and circuits, in which a reaction appears in both forward and reverse directions within the same multi-step stoichiometry, are avoided. The integer cuts are written in t e r m s of the r e a c t a n t and co-product flags in m u c h the same way as the chemistry constraints, and since the same r e a c t a n t and co-product flag variables are used for each system, it is a simple m a t t e r to write these constraints in such a way t h a t once written, they are included in all s u b s e q u e n t systems. TARGET MOLECULE IDENTIFICATION At the outset, only the desired product is known so t h a t a m e a n s of communicating t a r g e t molecule identities between the subsequent systems is required. This is achieved by using the iis values from each system as lower bounds for the stoichiometric coefficients in the next. If species k and 1 are the precursor r e a c t a n t s generated by a certain system, iik and ii~ will take the value u n i t y for this system. These species m u s t then be identified as the products of the pair of subsequent systems. However, species k and 1 m u s t be considered independently, i.e. one in one system and one in the other. In order to do this, the vector IIs m u s t be split so t h a t in one system ik = 1 while iz (and all other product flags) are unconstrained, whereas in the other system il = 1 while ik (and all other product flags) are unconstrained. This is achieved by incorporating the following equations in all stoichiometry e n u m e r a t i o n formulations: iis = as + bs,
Vs C S
(64)
as -- 1
(65)
bs _< 1
(66)
s
s
E q u a t i o n 64 splits the vector IIs from each system into two vectors As and Bs, of which the elements as and bs are binary variables. According to equations 65 and 66, As and Bs m a y have only one non-zero entry each, so t h a t the nonzero entries in IIs are divided, one in to As and one in to Bs. The values of As and Bs are t h e n communicated to the subsequent pair of systems through the p a r a m e t e r vectors APs and B G respectively, by replacing equation 19 with one of the following equations in all systems subsequent to system zero: is >_ aps,
Vs c S
(67)
197 i~ > bp~, Vs c S
(68)
Note t h a t equation 66 is written as an inequality to permit stoichiometries in which there is only one reactant (e.g. isomerisation or thermal decomposition). In such cases, all elements of the vector B~ take the value zero and the algorithm omits the entire branch of corresponding subsequent systems. Note also t h a t %, the stoichiometric coefficient of the target molecule for each system subsequent to system zero, which is needed in the stoichiometry evaluation equations, is identified using one of the following equations: vp - E
ap~v~
(69)
bp~v~
(70)
8
vp = E 8
where ap~ and bp~ are the parameter values generated by the previous system, and v~ are the stoichiometric coefficients of the current system. For all systems, vp is included as a parameter in stoichiometry evaluation. CALCULATION OF FINAL RESULTS The profit and impact from each system are stored as parameters immediately after the system is solved. These figures are calculated per mole of the target molecule produced in the current system. The total profit and impact associated with the multistep stoichiometries are calculated per mole of the final product, by starting at the final systems and working forward towards the final product, adding the profits and impacts sequentially. However, the target molecule in a certain system may exhibit a stoichiometric coefficient with any value (subject to Vma~) in the previous system where it appears as a reactant, and the product of this previous system may also exhibit any such stoichiometric coefficient value. Thus, the profit and impact of each system m u s t be multiplied by the magnitude of the stoichiometric coefficient of its target molecule as it appears in the previous system, and divided by the magnitude of the stoichiometric coefficient of the product of this previous system, before being added to the profit or impact of the previous system. Since each system may have up to two immediate subsequent systems, the profits and impact of both subsequent systems must be treated in this way. In addition, each combination of subsequent system stoichiometries must be considered and a separate profit and impact figure calculated for each. Depending on whether ap~ or bp~ is used in a particular system, xp, kk-1 the magnitude of the stoichiometric coefficient which the target molecule of system k exhibits as a reactant in system k - 1 is given by one of: k-1 XP, k --- E
ap~k-1 x~k-1 8
(71)
198
k-1 ~ - E bPsk-l-k-1 :I; s
X p, k
(72)
8
where ap~ -1 and bp~-1 are parameter values from system k - 1 and x sk-1 are the stoichiometric coefficients of system k - 1. Thus, the profit and impact of system k are multiplied by xp, k and divided by xpk-1 before being added to the profit and k-1 impact of system k - 1. Using these parameters, profits and impacts are cascaded through the multistep stoichiometry network and a total profit and impact figure is arrived at for each set of stoichiometries which eventually leads to the final product. Note that each reactor is assumed to be fed at ambient conditions. It is assumed that any cooling which may be required between successive reaction steps to achieve this will be accommodated by energy integration at a later stage of the process design, with no environmental penalty.
7.6 A P P L I C A T I O N
7.6.1 Case Study: Production of 1-Naphthalenyl Methyl Carbamate 1-naphthalenyl methyl carbamate, also known as carbaryl was employed as a pesticide (Kalelkar, 1988; Shrivastava, 1987; Worthy, 1985). It was manufactured under the trade name SEVIN by Union Carbide India, Limited (UCIL) in Bhopal until December, 1984 when production was terminated following the Bhopal disaster. UCIL's process involved the raw materials 1-naphthol and methyl isocyanate, a toxic substance with a permissible exposure limit (PEL) of 0.02ppm (AGCIH, 1977; Dagani, 1985). Under disputed circumstances, 45 tons of methyl isocyanate underwent a chemical reaction and were released, killing approximately 2,500 people in the vicinity of the plant and resulting in some 300,000 additional casualties. Crabtree and E1-Halwagi (1994) considered this example with the objective of identifying stoichiometries with more innocuous raw materials, to reduce the potential impact of fugitive emissions. The approach employed here is somewhat different in that the objective is to identify stoichiometries which exhibit low environmental impact under normal operating conditions. While materials with high toxicity can be excluded at the co-material design stage by including equation 17 in the co-material design formulation, this potentially excludes stoichiometries which could be environmentally promising provided proper cont a i n m e n t were employed. Thus, no such limit was included in this example. In cases where fugitive emissions are of concern, the methodology for environmental risk assessment of non-routine industrial releases presented by Stefanis and Pistikopoulos (1997) could, in principle, be incorporated as part of stoichiometry
199 evaluation. GROUP PRE-SELECTION According to Worthy (1985), there are two accepted industrial routes to carbaryl, which can be produced with or without methyl isocyanate. The alternative chemistries are shown in Figure 2. Methyl IsocyanateRoute CH3NH 2
§
COCI 2
Methyl Amine
>
CH3--N--- ~
Phosgene
§
CH3mN-- ~
§
O
2 HCI
Methyl Isocyanate
O O - - C ~ N ~ CH 3
OmH
II
1-Naphthol
O
Carbaryl (1-NaphthalenylMethyl Carbamate)
Non-Methyl Isocyanate Route
§
COCI 2
HCI
> .iCl
O--H
II
O
1-Naphthalenyl Chloroformate HC! CH3NH 2 iCl
II
O
> O---C I N ~ CH 3
II
O
Carbaryl
Figure 2: Carbaryl Production Routes
For simplicity, to limit the size of the co-material design problem in this illustrative example, the group set is restricted to the simplest set of groups which are required to form the product and industrial co-materials shown in Figure 2. The selected set of UNIFAC groups (eleven in all) then consists of the aromatic groups AC, ACH, ACC1 and ACOH, and the groups -CH3, CH3NH-, CH3NH2, -CO0-, -CHO, -OH and -C1. Note that methyl amine (CH3NH2) appears as a class zero group in Constantinou (1996), that is as a complete molecule, so that the NH2 group is not required. Note that the -C1 group is a category two group.
200 CO-MATERIAL DESIGN Using this group set, the co-materials were then constructed by solving the co-material e n u m e r a t i o n formulation once for acyclic molecules and once for aromatic molecules. Additional structural restrictions were included, according to the structures of the industrial co-materials: (i) for non-aromatic molecules an upper limit of two groups was imposed, (ii) for aromatic molecules an upper limit of twelve groups was imposed since it is unlikely t h a t carbaryl (which contains twelve groups) would be synthesised from a more complex molecule, (iii) only u n s u b s t i t u t e d or monosubstituted aromatics which contain the double ring (naphthyl group) aromatic structure were allowed (since the product contains the n a p h t h y l group is monosubstituted) by specifying a m i n i m u m of seven ACH groups, and a total of ten ACH and AC groups altogether, and (iv) only one s u b s t i t u e n t group with a carbon free a t t a c h m e n t was allowed in the aromatics. In addition, all non-aromatic molecules containing carbon bonds were screened out after enumeration. Thus, chemistries in which the n a p h t h y l group structure is constructed or decomposed, or in which any other carbon-carbon bonds are formed or broken, are avoided. For the acyclic molecules, constraints were included to prevent chlorine bonding with itself or with any groups of higher category. However, for the aromatic molecules, these constraints were removed, to allow the formation of 1n a p h t h a l e n y l chloroformate. The results of co-material e n u m e r a t i o n are shown in Figure 3.
H I N ~ CH 3 1) Naphthalene
2) 1-Chloronaphthalene
3) 1-Naphthol
OH
4) N-Methyl-l-Naphthylamine
.i CI
II
I N ~ CH 3
II
O
II
O
5) 1-Naphthalenyl Hydroxyformate
O
7) Carbaryl
6) 1-Naphthalenyl Chloroformate
CI 2
CH3C 1
CH30 H
CI----C~ H
I~
8) Chlorine
9) Chloromethane
10) Methanol
11) Chloromethanal
H'-'C I N~. CH 3
II CH3NH 2 12) Methyl Amine
C! ~C'-~ O CI 13) Phosgene
CH3--N--- C-~ O
O 15) Methyl Formamide
14) Methyl lsocyanate
Figure 3" Co-Material Design R e s u l t s - Carbaryl Example
201
Note that species 8, 11, 13, 14 and 15 are included as additional molecules since none of these can be constructed according to the structural restrictions employed. Four further additional molecules were also included, as shown in Figure 4. H2
16) Hydrogen
02
17) Oxygen 18) Water 19) Hydrogen Chloride
a20
HCI
Figure 4" Additional Molecules MULTI-STEP STOICHIOMETRY IDENTIFICATION RESULTS The solutions of the stoichiometry identification program are presented in the form of a table of stoichiometric coefficients in Table 3, where blank spaces indicate zero coefficients and the species are numbered as above. According to the industrial routes, stoichiometries of up to two steps in length were allowed, with a m a x i m u m of four species permitted in any step. The role specification and chemistry constraints employed in this example are given in Appendix A. No carbon structure constraints were employed in this example. A production rate (c~vp) lower bound of 2.5 kmol/hr and an allowable reactor temperature range of 300-800K were imposed. Table 3" Multistep Stoichiometries- Carbaryl Example ]]
Index]Nsp~
Species
]]
K [kmol/hr erVp I Profit Toper $/mol I
11 2[ 3]4151 6171819110111112113114115116117118119
CTAM tnair/mol
System 0 - Producing Species 7 A B C D
3 4 4 4
1 -1 1 1 -1 1
-1 1
-1 -1
1
-1
300 300 300 300
9.99 5.51 10.00 3.17
0.4508 -2.9885 0.5026 -2.9485
19.57 22.22 16.63 16.08
300 300 300 300
20.00 9.34 10.00 10.00
0.5509 0.5013 0.5015 0.5249
6.90 18.23 17.85 13.44
System 1 - ProducingSpecies 3 E F G H
3 4 4 4
I
I 4
K
4
L M N
4 4 ! 4
-2
2 -1 1 -1 1 -1 1
I I III
1 -1 -1 -1
1
System 1 - ProducingSpecies 14
IIII
I l-1[-11 11 1 1 1 1 2
System 1 - ProducingSpecies 15
I I I rSystem I I I 1-1111:II I I il ,11 11 Producing Species 1-
s00 I lo.00 101ss71 300 I 2.7510.04001 736 2.50 0.0535
5~s 4.20 2050.78
6
300 300 300
10.00 10.00 10.00
3.95561 3.49111 3.5880
17.86 14.11 10.27
System zero produced four candidate stoichiometries that satisfy all constraints,
202 in which materials 3, 6, 12, 14 and 15 appear as first generation precursor reactants. Systems 1A and 1B produced a total of ten further stoichiometries leading to all of these materials except species 12, which is allowed only as a reactant in systems 1A and 1B since it could only be produced by decomposing more complex naphthyl molecules. All stoichiometries except I and K achieve acceptable conversion at 300K. For stoichiometry I, Gsys is minimised at 800K and high conversion since AGR for this reaction has a large negative temperature gradient, so that equation 56 is dominated by its first RHS term. For stoichiometry K however, reactor temperature has to be elevated to meet the production rate bound, so that this stoichiometry is the only one for which T' > 300K (from equations 57-59). The two industrial chemistries shown in Figure 2 were reproduced; stoichiometries A and I representing the methyl isocyante route, and stoichiometries D and N representing the non-methyl isocyanate route. Stoichiometries C and D represent the first and third of the three alternative single step routes put forward by Crabtree and E1-Halwagi (1994), their second alternative does not appear here since it involves three apparently simultaneous reactants. Table 4 shows the total profits and impacts for the individual solutions combined in to multi-step stoichiometries. For example, the index AEI denotes the combination of steps A, E and I. Note that the profits reflect only the values of the products minus the values of the reactants, assuming that stoichiometric co-products are sold at their market value, and that in this example, raw materials are assumed input waste free. Note also that stoichiometries with poor conversion are not penalised since the costs and impacts of separation are not included here, and it is assumed that unconsumed reactants are recycled with no loss of heat and no compression or pumping requirements. Only stoichiometries involving step K can justifiably be eliminated from further consideration on impact grounds, this step being penalised in impact terms by high reactor temperature, and only stoichiometries involving steps M or N can justifiably be eliminated on economic grounds. Despite the fact that these steps both exhibit high profits, species 6 is such a high value material that only step L generates sufficient profit to cover the cost of consuming species 6 in system zero. It is for this reason that stoichiometry DL remains competitive despite the poor economic performance of step D, which was rejected by Crabtree and E1-Halwagi (1994) on economic grounds. This clearly illustrates the advantages of considering multi-step production routes. Of the remaining ten stoichiometries, the original industrial chemistry (steps A and I) with the addition of step E, F, G or H to produce species 3, exhibits the
203 Table 4: Total Profits and Impacts Index A A A A
E F G H
I I I I
B B B
J J J
L M N
B B B
K K K E E F F G G H H L M N
L M N J K J K J K J K
D D D
Total Profit $/mol 1.1403 1.1408 1.1409 1.1644 1.0070 0.5426 0.6394 1.0205 0.5561 0.6529 1.0453 1.0570 1.0439 1.0574 1.0441 1.0576 1.0675 1.0810 1.0070 0.5426 0.6394
Total CTAM
tnair/mol 32.15 43.48 43.10 38.69 44.28 40.53 36.69 2090.86 2087.12 2083.27 27.72 2074.31 39.05 2085.64 38.68 2085.26 34.27 2O8O.85 33.93 30.19 26.34
most promising economics of all, which is probably why it was selected. Furthermore, the environmental impacts of the routes based on this chemistry are also among the most promising. Of these routes, stoichiometry AEI represents the best compromise solution. Only stoichiometry CEJ exhibits a significantly lower impact t h a n AEI, with only a marginally reduced profit, and so appears to r e p r e s e n t the best compromise solution of all. However, conversion is poor for step J so t h a t higher separation and recycle costs are anticipated. Issues such as this m u s t be explored in order to eliminate further alternatives.
7.7 C O N C L U S I O N S
In the work presented here, a procedure for the rapid identification of alternative multi-step stoichiometries has been described in which each stoichiometric step involves whole n u m b e r stoichiometric coefficients and a limited n u m b e r of species. The key to the procedure is the introduction of m a t e r i a l design principles to formalise the development of a set of co-materials from which stoichiome-
204 tries are then extracted using an optimisation procedure. The co-material enumeration procedure is based on a set of structural and chemical feasibility rules from Constantinou et al. (1996). However, r a t h e r t h a n employing their molecular generation algorithm, the rules are instead used to develop a set of linear integer constraints governing the numbers and combinations of particular structural groups in a molecule. Combining these rules with the octet rule, and additional structural restrictions to limit the total number of groups, and the numbers of branches, substituents, substituted sites and functional groups, results in the co-material enumeration MILP formulation. This problem is solved repeatedly, introducing integer cuts after each iteration to exclude previous solutions, to produce a set of co-materials. Stoichiometries are then extracted from this set of materials using an optimisation procedure, in which stoichiometries are explicitly enumerated and subsequently evaluated in two sequential steps. Stoichiometry enumeration includes whole number stoichiometric coefficients constraints, constraints to restrict changes to the carbon skeletons of the reacting species, and case specific constraints based on chemical knowledge. Thermodynamic, economic and environmental impact criteria are employed in the evaluation of the stoichiometries, with aspects of the MEIM (Pistikopoulos et al., 1994) providing the framework for the environmental evaluation of alternatives. The illustrative example has shown that the co-material design technique provides an interesting set of co-material molecules and that, with the inclusion of a few simple rules based on chemical knowledge, it is possible to limit the quantity of co-materials to a manageable number. Furthermore, by incorporating simple chemical rules along with thermodynamic, economic and environmental criteria in stoichiometry identification it is possible to identify a small number of alternative stoichiometries which are promising both in terms of economics and environmental impact. Moreover, it has been shown that developing multistep stoichiometries directly can lead to the acceptance of alternatives which would be rejected as single step syntheses. In the illustrative example, existing industrial chemistries were identified as the most promising compromise solutions, with several new and competitive alternatives. This suggests that the approach could lead to promising results in the search for production routes for new molecules. F u r t h e r reinforcement of this conclusion appears in a second application presented in Chapter 14.
7.8 R E F E R E N C E S
AGCIH. Airborne Hazards at Work. American Conference of Governmental Industrial Hygienists. Great Britain Factory Inspectorate. London (1977)
205 Agnihotri, R.B. and R.L. Motard. Reaction Path Synthesis in Industrial Chemistry. Computer Applications to Chemical Engineering, ACS Symposium Series 124, 193-206 (1980) Androulakis, I.P. Kinetic mechanism reduction based on an integer programming approach. AIChE Journal 46(2), 361-371 (2000) Buxton, A. Solvent Blend and Reaction Route Design for Environmental Impact Minimisation. PhD Thesis. Imperial College, London (2002) Buxton, A., A.G. Livingston and E.N. Pistikopoulos. Reaction Path Synthesis for Environmental Impact Minimization Computers chem. Engng. 21, $959-$964 (1997) Chemical Prices. Chemical Marketing Reporter 253(8), 25-35 (1998) Compounds. Dictionary of Organic Compounds. Chapman & Hall, London 6th Edition (1996) Constantinou, L., C. Jaksland, K. Bagherpour and R. Gani. Application of the Group Contribution Approach to Tackle Environmentally Related Problems. AIChE Symposium Series, Volume on Pollution Prevention through Process and Product Modifications 303, 105-116 (1994) Constantinou, L., K. Bagherpour, R. Gani, J.A. Klein and D.T. Wu. Computer Aided Product Design: Problem Formulations, Methodology and Applications. Computers chem. Engng 20(6), 685-703 (1996) Corey, E.J. and W.T. Wipke. Computer Assisted Design of Complex Organic Syntheses. Science 166 (1969) Corey, E.J., W.T. Wipke, R.D. Cramer III and W.J. Howe. Techniques for Perception by a Computer of Synthetically Significant Structural Features in Complex Molecules. J. Am. Chem. Soc. 94, 431 (1972) Corey, E.J., H.W. Orf and D.A. Pensak. Computer Assisted Synthetic Analysis. The Identification and Protection of Interfering Functionality in MachineGenerated Synthetic Intermediates. J. Am. Chem. Soc. 98, 210 (1976) Crabtree, E.W. and M.M. E1-Halwagi. Synthesis of Environmentally Acceptable Reactions. AIChE Symposium Series, Volume on Pollution Prevention via Process and Product Modifications 90, 117-127 (1994) Dagani, R. Data on MICS Toxicity are scant, leave much to be learned. Chemical & Engineering News 63(6), 37-40 (1985) Derringer, G.C. and R.L. Markham. A Computer-Based Methodology for Matching Polymer Structures with Required Properties. Journal of Applied Polymer Science 30, 4609 (1985) Edwards K., T.F. Edgar and V.I. Manousiouthakis. Reaction mechanism simplification using mixed-integer nonlinear programming. Computers chem. Engng. 24, 67-79 (2000) Fornari, T. and G. Stephanopoulos. Synthesis of Chemical Reaction Paths: The Scope of Group Contribution Methods. Chemical Engineering Communications 129, 135-157 (1994a) Fornari, T. and G. Stephanopoulos. Synthesis of Chemical Reaction Paths: Eco-
206 nomic and Specification Constraints. Chemical Engineering Communications 129, 159-182 (1994b) Fornari, T., E. Rotstein and G. Stephanopoulos. Studies On the Synthesis of Chemical Reaction Paths - II. Reaction Schemes with Two Degrees of Freedom. Chemical Engineering Science 44(7), 1569-1579 (1989) Gani, R., B. Neilsen and A. Fredenslund. A Group Contribution Approach to Computer-Aided Molecular Design. AIChE J. 37, 1318-1332 (1991) Gao, C.,R. Govind and H. Tabak. Application of the Group Contribution Method for Predicting the Toxicity of Organic Chemicals. Environmental Toxicology and Chemistry 11, 631-636 (1992) Gelernter, H., N.S. Sridharan, A.J. Hart, F.W. Fowler and H.J. Shue. An Application of Artificial Intelligence to the Problem of Organic Synthesis Discovery. Topics Curr.Chem. 41, 113 (1973) Glover, F. Improved Linear Integer Programming Formulations of Nonlinear Integer Problems. Management Science 22(4), 455-460 (1975) Govind, R. and G.J. Powers. Studies in Reaction Path Synthesis. AIChE J. 27(3), 429-442 (1981) Heijungs, R., J.B. Guinee, G. Huppes, R.M. Lankreijer, H.A. Udo de Haes, A. Wegener Sleeswijk, A.M.M. Ansems, A.M.M. Eggels, R. van Duin, H.P. de Goede. Environmental Life Cycle Assessment of Products: Background and Guide. Multicopy. Leiden (1992) Hendrickson, J.B. A General Protocol for Systematic Synthesis Design. Topics in Curr.Chem. 62, 49 (1971) Holiastos, K. and V. Manousiouthakis. Automatic Synthesis of Thermodynamically Feasible Reaction Clusters. AIChE J. 44(1), 164-173 (1998) ISO 14040. Environmental Management- Life Cycle Assessment- Part 1: Principles and Framework. (1997) Joback, K.G. Unified Approach to Physical Property Estimation using Multivariate Statistical Techniques. Master's thesis. MIT, Cambridge, Mass (1984) Joback, K.G. Designing Molecules Possessing Desired Physical Property Values. PhD thesis. MIT, Cambridge, Massachussetts (1989) Joback, K.G. and G. Stephanopoulos. Designing Molecules Possessing Desired Physical Property Values. Proceedings FOCAPD, CACHE Corporation, Austin, Texas 11, 631-636 (1989) Joback, K.G. and G. Stephanopoulos. Designing Molecules Possessing Desired Physical Property Values. Advances in Chemical Engineering 21 - Intelligent Systems in Process Engineering, Academic Press (1995) Kalelkar, A. S. Investigation of Large Magnitude Accidents: Bhopal as a Case Study. Authur D. Little Inc., Cambridge, Massachussetts (1988) Kaufmann, G. Computer Design of Synthesis in Organo-Phosphorous Chemistry. Computer-Assisted Design of Organic Synthesis, Table Ronde Roussel UCLAF, Paris (1977) Knight, J.P. Computer-Aided Tools to Link Chemistry and Design in Process
207 Development. PhD Thesis, Massachusetts Institute of Technology (1995) Mavrovouniotis, M.L. and D. Bonvin. Design of Reaction Paths. FOCAPD, AIChE Symposium Series 91, 41-51 (1995) May, D. and D.F. Rudd. Development of Solvay Clusters of Chemical Reactions. Chem. Eng. Sci. 31, 59 (1976) Perry, R.H. and D. Green. Perry's Chemical Engineers' Handbook. 6th ed.. McGraw Hill (1984) Pistikopoulos, E.N., S.K. Stefanis and A.G. Livingston. A Methodology for Minimum Environmental Impact Analysis. AIChE Symposium Series, Volume on Pollution Prevention through Process and Product Modifications 90(303), 139150 (1994) Porter, K.E., S. Sitthiosoth and J.D. Jenkins. Designing a Solvent for Gas Absorption Trans IChemE 69(A), 229-236 (1991) Rotstein, E., D. Resasco and G. Stephanopoulos. Studies on the Synthesis of Chemical Reaction Paths - I. Chemical Engineering Science 37(9), 1337-1352 (1982) SETAC. A Conceptual Framework for Life-Cycle Impact Assessment. (1993) Shrivastava, P. Bhopal, Anatomy of a Crisis. Ballinger Publishing Company, Cambridge, Massachussetts (1987) Sirdeshpande, A.R., M.G. Ierapetritou and I.P. Androulakis. Design of flexible reduced kinetic mechanisms. AIChE Journal 4"/(11), 2461-2473 (2001) Stefanis, S.K. A Process Systems Methodology for Environmental Impact Minimization. PhD Thesis. Imperial College, London (1996) Stefanis, S.K. and E.N. Pistikopoulos. A Methodology for Environmental Risk Assessment for Industrial Non-Routine Releases. Ind. Eng. Chem. Res. 36, 3694-3707 (1997) Ugi, I. and P. Gillespie. Chemistry and Logical Structure. 3. Representation of Chemical Systems and Interconversions by BE matrices and their Transformation Properties. Agnew. Chem. Ind. Ed. Engl 10, 914 (1971) Van Krevelen, D.W. and H.A.G. Chermin. Estimation of the Free Enthalpy (Gibbs Free Energy) of Formation of Organic Compounds from Group Contributions. Chem. Eng. Sci. 1, 66-80 (1951) Weissermel, K. and H.-J. Arpe. Industrial Organic Chemistry. Second, Revised and Extended Edition. VCH, Weinheim FRG (1993) Wipke, W.T., H. Braun, G. Smith, F. Choplin and W. Seiber. SECS Simulation and Evaluation of Chemical Synthesis: Strategy and Planning. In: ComputerAssisted Organic Synthesis (W.T. Wipke and W.J. Howe, Eds.) ACS Symposium Series 61 (1976) Worthy, W. Methyl Isocyanate: The Chemistry of a Hazard. Chemical and Engineering News 63(6), 27-33 (1985)
208
A p p e n d i x A: Role S p e c i f i c a t i o n and C h e m i s t r y C o n s t r a i n t s for C a s e S t u d y - 1 M a n u f a c t u r e of Carbaryl A.1 R o l e S p e c i f i c a t i o n C o n s t r a i n t s Table 5 shows the knowledge based role specification constraints employed in the carbaryl example, where R denotes reactant only, P denotes the final product, C denotes product or co-product, N denotes the exclusion of a species from a system and a blank space denotes no restriction. Table 5: Role Specification C o n s t r a i n t s - Carbaryl Example Species System 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 RRRRRRP R R R C N C 0 1A & 1B CCCC R C C C R
These constraints were developed specifically for two step stoichiometries according to the following arguments, based on chemical knowledge and the existing industrial chemistries: 9 The product (carbaryl, species 7) should appear only as the product in system zero, or as a product or co-product in systems 1A and 1B, never as a reactant. 9 Other naphthyl group containing molecules should be reactants only in system zero (i.e. no naphthyl containing co-products should appear in system zero), and naphthyl containing compounds with complex substituents (i.e. species 4, 5, and 6) should be products or co-products only in systems 1A and lB. 9 Methyl isocyanate and methyl formate (species 14 and 15) may be produced only in systems 1A and 1B and consumed only in system zero (i.e they may not be decomposed to produce simpler molecules). 9 /-/2 (species 16) appears as a coproduct only in all systems since hydrogenation reactions are not required. 9 H C I (species 19) appears as a co-product only in system zero, and as a reactant (Cl provider) or co-product (as in the industrial chemistries) in
systems 1A and lB. 9 H 2 0 (species 18) appears as a possible O H group donor or recipient in all
systems.
209 9 02 (species 17) appears as an oxygen provider in systems 1A and 1B only.
A.2 Chemistry Constraints The following knowledge based chemistry constraints were employed: 9 n a p h t h y l containing species with complex substitutions not allowed as reactants
ii4 + ii5 + ii6 = 0
(73)
9 other n a p h t h y l group containing species may not react with each other
iil § ii2 § ii3 § ii7 _ 1
(74)
This Page Intentionally Left Blank
PART II: A p p l i c a t i o n s of CAMD The first part of the book dealt with some of the solution techniques commonly employed to tackle the CAMD problem. This part demonstrates the application and practice of some of those techniques to different types of problems in CAMD. Chapters 8 & 9 describe the industrial application of CAMD methods for solvent design and selection. In particular, the use of the hybrid CAMD method (chapter 6) is highlighted. Chapter 10 deals with optimal solvent design for blanket-wash using the optimization-based CAMD method of chapter 3. Chapter 11 extends the application of CAMD from single solvent design to solvent mixture design together with an application example. Chapter 12 provides an example of the application of the global optimization-based CAMD method of chapter 4 to optimal refrigerant design while chapter 13 highlights the application of genetic algorithm-based CAMD (chapter 5) to polymer design. Chapter 14 provides a detailed case study of the application of CAMD to identify multistep reaction stoichiometries (using the method described in chapter 7). Finally, chapter 15 presents the application of CAMD to design of fuel additives employing the genetic-algorithm based CAMD method of chapter 5.
Computer Aided MolecularDesign: Theory and Practice L.E.K. Achenie, R. Gani and V. Venkatasubramanian(Editors) 9 2003 Elsevier Science B.V. All fights reserved.
213
C h a p t e r 8: CAMD for S o l v e n t S e l e c t i o n in I n d u s t r y - I J. M. Vinson
8.1 INTRODUCTION While the process design for a new commercial drug is not the critical step of getting a new drug to market, it is i m p o r t a n t to do a good job of scaling from the laboratory to full production. A good p h a r m a c e u t i c a l m a n u f a c t u r i n g process is insensitive to small variations in operating conditions and r u n s in a reasonable time. Along with the commercial m a n u f a c t u r i n g needs, batches of active p h a r m a c e u t i c a l ingredient are required for clinical trials, so processes in development m u s t be capable of delivering ever-increasing a m o u n t s of active p h a r m a c e u t i c a l ingredients (API). One aspect of the development of production processes for API's is the selection of appropriate solvents for dissolution of raw materials, reactions and product crystallization. The most common mechanism for d e t e r m i n i n g solvents is to select from a common set of solvents, as p h a r m a c e u t i c a l companies are not i n t e r e s t e d in developing new solvents for their processes. While this method is sufficient for most cases, situations arise where none of the usual solvents are terribly effective. When the timelines are short, the developers have little choice but to go with the best of the ineffective solvents, or to use a more complex procedure. For example, the procedure might call for several extractions and back-extractions to achieve an acceptable yield with a m o d e r a t e solvent, whereas a better solvent might be available t h a t can effect the extraction in one or two steps. CAMD is one tool t h a t can help quickly point to a n u m b e r of candidate solvents. The goal is to help the development t e a m find candidate solvents t h a t they m a y not have considered in the normal course of development. As we have seen t h r o u g h o u t this book, computer aided molecular design is a methodology in which molecules are designed to meet specific needs. While the approaches vary widely, depending on the application area, they all require the ability to predict the behavior of the full compound. This is accomplished
214 through molecular dynamics, expert systems, genetic contribution methods, and combinations of these techniques.
algorithms,
group
The hitch for complex industrial compounds, such as those found in pharmaceutical applications, is that it is not always possible to accurately predict the properties of the compounds. This chapter will describe the application of computer-aided molecular design to situations where the standards CAMD techniques do not obviously apply.
8.2 CAMD METHODOLOGY U S E D 8.2.1 G e n e r a l i z e d CAMD F r a m e w o r k As described in chapters 6, the basic approach to compute-aided molecular design (Harper, 2000) is a three-step process: 1. Pre-Design: Define the problem in terms of desired properties of the compound to be designed. At this stage it is also critical that one select the best formulation of the problem, as the problem with the most clarity and the most available data will be easier to solve. Since design is an iterative process, it is not unlikely that one will come back to the pre-design stage to evolve the problem definition, based on results obtained in stages two and three. 2. CAMD: Run the actual CAMD algorithm to generate compounds and test them against stated criteria from the pre-design stage. 3. Post-Design: Test the results based on properties that are not easily screened during stage two, such as environmental, health & safety criteria. The compounds must also be tested either in simulation or in the laboratory.
8.2.2 S p e c i a l F e a t u r e s for C o m p l e x S o l u t e s The very nature of specialty chemical and pharmaceutical development is of working with new compounds. Many of these contain unusual active groups or combinations of active groups that make property predictions dubious by standard means. These factors add an extra degree of difficulty to solvent selection problems. However, one can make use of extensive experimental apparatus to enable computer-aided molecular design. In particular, where traditional methods of solvent selection by experiment do not result in an acceptable solvent, the results of those very experiments can help point researchers in the right direction. The following seven-step procedure (Vinson et al. 2000) details how to combine
215 experimental work with CAMD to help find appropriate solvents for complex solutes. Step 1. Select N solvents with solubility parameter values between those of hexane (minimum) and water (maximum). Step 2. Compute solubility of the solute in each of the N solvents using the regular solution theory (requires only solubility parameter value). Step 3. Plot the calculated solubility values against the known solubility parameter values of each solvent and identify the location of the maximum solubility value together with the corresponding solubility parameter value. Step 4. Based on the solubility parameter value from step 3, identify compounds with similar solubility parameter values from the database to produce a list of compounds with known properties such as melting points, boiling points, and so on. Step 5. Use the generated data to define the solvent design/selection problem and go to stage 2 of the generalized CAMD framework (see chapter 6). Step 6. Validate the selected compounds (solvents) from step 5 by plotting their ability to dissolve the solute on the solubility versus solubility parameter plot. They should all lie near the maximum. Step 7. Consider other properties as given in the post design phase for further screening and final selection. Note that the first two steps of this procedure are traditionally conducted in the lab in the search for appropriate solvents. The remaining steps walk researchers through a mechanism to develop a CAMD formulation to find alternate solvents.
8.3APPLICATION EXAMPLES With the case studies, we explore the use of CAMD for solvent selection in a number of examples inspired by the pharmaceutical industry. Only the first example is worked in full detail. The second example provides the basic setup of the problem and suggestions as to how one might approach finding appropriate solvents via CAMD. The final example is a challenge problem, which is presented to show the full complexity of solving computer aided molecular design in a real-world situation. The ProCamd (ICAS Documentation, 2002) software developed at CAPEC (www.capec.kt.dtu.dk) has been used for the solution of the problems
8.3.1 Example 1: Extraction Solvent R e p l a c e m e n t This case study is an example of CAMD used for solvent selection. Not only does this example show the difficulty of handling complex compounds, but it also demonstrates the need for well-thought out problem formulation. In this case,
216
there are several problems to be handled by a new solvent. The first task is to determine which problem has the highest likelihood of successful resolution. This example is of a reaction system, followed by extraction. The basic chemistry is described in Figure 1. The first reaction is a peptide coupling between compounds A and B with diisoproyl carbodiimide (DIC) as a coupling agent and N-hydroxybenzotriazole as a catalyst. This liquid-phase reaction runs in a solvent mixture of 1:1 dimethylformamide (DMF) and methylene chloride (MeC1). Reactant A has limited solubility in these solvents, thus the reaction runs over several hours. The second reaction is a saponification that hydrolyzes the ethyl group in compound C with 2.5 N sodium hydroxide. The current process calls for no isolation between the first and second reaction, which is common in the pharmaceutical industry due to purity concerns. The second reaction is followed by an extraction in methylene chloride, leaving the product in the aqueous layer. The final workup (not shown in Figure 1) involves an isoelectric precipitation that isolates the product as a zwitterion. The DIU byproduct of reaction 1 is somewhat soluble in water, which leaves DIU with the product throughout the precipitation step and necessitating additional back-extractions after the second reaction to purify the aqueous phase. R
~COO[-I
+
.~ ...OCH2CH3 HC1 9NH2""R '
A
N ~C
~Y_ / ~
H
A (solid)
B (solid)
O
diisopropyl carbodiimide
0 1) 1"1 DMF/CH2CI2 2) 0.2 equiv. N-hydroxybenzotriazole 3) ambient temp., 6 h Reaction 1
R'C/~q" R,'~OCH2CH 3 O C (dissolved)
+
HCI
H diisopropyl urea (dissolved)
0
O
R .~C f N.. R' OCH2CH3 + O C (dissolved)
NaOH
Reaction 2
Rxc/N"R' ONa + 0 D (dissolved)
CH3CH2OH
Figure 1: Reaction chemistry details for example 1 While scaling this process from the laboratory to the manufacturing scale, several inefficiencies in this process became evident, and the development team began looking into alternatives. As far as CAMD goes, there are a number of problems to solve, several of which interact with one another: One could attempt to find a replacement solvent for methylene chloride, as it is environmentally undesirable for large-scale operations. A solvent that preferentially removes the DIU impurity from the first reaction mixture would reduce the need for additional workup after the second reaction. Finally, the reaction time could be
217 improved by finding a new solvent (or solvent system) that does a better job of dissolving compounds A and B.
Pre-Design Phase After studying the available data and the compounds in question, it became clear t h a t the best problem to solve is that of removing the DIU impurity from the post-second-reaction mixture. While it would be more profitable to explore a better reaction system solvent, there is not enough solubility data on the compounds A, B, C and D to attempt solvent design for them. In addition, these compounds have structures that reduce the utility of the group contribution methods. As we shall see in the results, this particular approach can also help replace or reducing the methylene chloride in the process. The list of options and some comments on their viability are listed in Table 1.
Table 1: Summary of potential problems to solve in Example 1
Option
Comment
Replace methylene chloride
Wide range of possible resolutions, may take too | long. Not enough experimental data for this system. Reduce the number of extractions, but this may be a large process change. DIU solubility data are available. The smallest change, as this is the current process. A better solvent system must be found. DIU solubility data are available. Requires significantly more solubility information than is available for the reactants and products. Compounds are not compatible with current group contribution estimation methods.
Remove DIU after the first reaction Remove DIU after the second reaction Find a better solvent for the reactants, A and B
]
j
]
In this example, we apply the solvent selection approach for complex compounds, described in Section 8.2.2. Steps one and two had been completed in earlier experimental work. DIU solubility was determined in a number of solvents, which happen to span a wide range of total solubility parameters. In selecting solvents, one wants to ensure a good representation of the range of total solubility parameter in order to focus in on the most likely range of total solubility parameter for the designed solvent. Step three of the method produces a plot of solubility data for DIU as a function of total solubility parameter for the solvents. This is shown in Figure 2. Since the peak on Figure 2 is around total solubility parameter of 22, it is most likely t h a t the best solvents for DIU will have a total solubility parameter between 21
218 and 23. It is also clear that the solute is not very soluble either in paraffinic or cyclic hydrocarbons (solubility parameter around 15 MPa 0.5) and only slightly so in water (solubility parameter of 47.8 MPa ~ and other polar solvents. The likely solvents having similar solubility parameters are acrylic alcohols, ketones, aldehydes, acids, esters and ethers. Aromatic solvents, though they may fit into this total solubility range, will not be considered due to their poor health effects profiles. Solubility of Diisopropylurea (%w/w) 4.5-
3.5-
2.5-
0.5 A
0
14
A 9
16
9
9
18
20
22
24
26
28
Solubility Parameter (MPa)^.5
Figure 2: Plot of solubility versus total solubility parameters for DIU as the solute In Step four the other properties of the solvent are identified, either by comparison with database materials or by specification of the process. In this case, the process creates the specifications. Since the solvent must be a liquid at the operating conditions, its melting point must be less than 300 K. Since DIU is to be removed from the reaction mixture, the solvent should split the reaction mixture into two phases with the solvent rich phase containing the majority of the DIU, which is not totally miscible in water. As a first pass environmental assessment, the Octanol/Water partition coefficient should be kept as low as possible. Also, for recovery concerns, the solvent must be easy to separate from DIU and therefore, its boiling point should not be greater than 450 K for separation by evaporation or distillation. These target properties are listed in Table 2.
219
Table 2: Target properties for example 1
Property
Target / Range
Total solubility parameter Boiling point Melting point Octanol/Water partition coefficient (logP) Water solubility Water capacity of solvent Groups to search
21 - 23 MPa ~ Less than 450 K Less than 300 K Less than 2
Immiscible Less then 1.0, preferably zero. CH3, CH2, CH, C, OH, CH3CO, CH2CO, CHO, CH3COO, CH2COO, HCOO, CH30, CH20, CH-O, COOH, COO Low environmental impact Limited health & safety concerns
CAMD Phase Moving into step five of the process described in Section 8.2.2, new compounds were generated by ProCAMD based on the specifications listed in Table 2. This problem formulation generated 3498 compounds, based on combining groups to form only chemically feasible molecules. The octanol/water constraint removed 260 compounds. The total solubility parameter constraint removed 2634 compounds. The melting and boiling point constraints removed 534 and 39 compounds, respectively. The solvent capacity constraint removed another 17 compounds, leaving 14 final compounds. Of these, about half appear in the DIPPR database of compounds. Table 3 gives a list of those designed solvents that appears in the database together with their water solubility and EH&S properties. Note that 2-butanol and 2-methyl-l-propanol are isomers in terms of the groups that make up the compounds (2 CH3, CH2, CH, OH). The final four compounds (five-carbon alcohols) are also isomers with respect to their groups (2 CH3, 2 CH2, CH, OH). Note that the predicted and actual total solubility parameters do not necessarily match.
220
Table 3: Potential solvents for Example 1 Compound
Solubility Param 1
Predicted Solubility earam
1 -Butanol
22.47
23.3536
6.32E+004
21.47
21.6034
1E+006
21.83
22.9094
8.5E+004
Tumorigen (C); Mutagen (M)
21.83
22.5414
1.81E+005
Reproductive-Effector (T)
21.72
22.576
2.2E+004
.
Water Solubility 2 . (mg/L)
RTECS C o d e 3
(CAS #)
(000071-36-3) t-Butanol (000075-65-0) 2-Methyl- 1Propanol (000078-83-1) 2-Butanol (000078-92-2) 1 -Pentanol (000071-41-0) Ethylene-glycolmonopropylether (002807-30-9) Ethyl Lactate (000097-64-3) 2-Methyl-Ibutanol (000137-32-6) 3-Methyl- 1butanol; Isoamyl alcohol (000123-51-3) 2-Pentanol (006032-29-7) 3-Pentanol (000584-02-1)
Mutagen ( M ) ; ReproductiveEffector (T); Human-Data (P); , Primary-Irritant (S)
Mutagen (M); Primary-Irritant
(s) 21.65
20.0055
3.169E+005
22.62
22.3818
1E+006
21.16
22.1107
2.97E+004
Primary-Irritant (S)
21.16
22.1574
2.67E+004
Tumorigen (C); Human-Data (P); Primary-Irritant (S)
21.16
21.704
4.46E+004
21.16
21.1227
5.15E+004
To explore the sensitivity of the CAMD method, we can a d j u s t each of t h e s e c o n s t r a i n t s individually t h r o u g h ProCAMD. For example, t i g h t e n i n g the logP c o n s t r a i n t to less t h a n 1.0 will filter over 1000 compounds a n d change the r e s u l t s of the s u b s e q u e n t filters as well, leaving ten compounds in the end. This i n f o r m a t i o n is listed in Table 4, showing the change a n d the n u m b e r of c o m p o u n d s screened out for each condition along with the n u m b e r of compounds r e m a i n i n g after screening. In a quick a s s e s s m e n t , the most sensitive p a r a m e t e r for t h e overall problems is the boiling point range, as r a i s i n g the u p p e r b o u n d i n c r e a s e s the n u m b e r of c a n d i d a t e s to 96 from four.
1Solubilityparameter data is fromthe CAPECDatabase(www.capec.kt.dtu.dk). 2Water solubilitydata from:"SRCPhysPropDatabase"(onlineversion), SyracuseResearchCorporation, Syracuse,NY, USA. aRTECSCodeis from:WebSpirsversionof"RTECS(through2000/04)."
221
Table 4: Number of compounds screened out for a variety of conditions
Change
logP Sol. Par. Tm
Tb
Water Capacity
Final Compounds
Original constraints logP to less t h a n 1 W a t e r capacity max 0.5 Melting point max 250 K Melting point max 200 K Boiling point max 500 K Boiling point max 350 K
260 1031 260 260 260 260 260
392 23 392 39 0 282 463
17 17 28 17 2 45 0
14 10 3 14 4 96 0
2634 1991 2634 2634 2634 2634 2634
181 426 181 534 598 181 181
Post-Design Phase Steps 6 and 7 of the design procedure for complex solutes call for additional analysis of the candidate solvents found from the CAMD phase. Ideally, we would test the a p p a r e n t l y best compounds as solvents for DIU. In this case, of the solvents listed in Table 3, l-Butanol and 2-Methyl-l-Propanol were easily available from our stockroom. A quick test of DIU solubility showed excellent results for both the butanol (6.25 wt% at 25~ and methyl propanol (6.48% at 25~ These values were 50% higher t h a n the best solvent tested in our prior work, as shown in Figure 2. We t h e n took a new look at the reactions above and decided to work with butanol as an extraction solvent after the second reaction. We examined two possibilities. The first option was to keep the DMF/MeC12 mixture for the first reaction and conduct one methylene chloride extraction after reaction two, followed by a butanol extraction. As this option was not substantially different t h a n the s t a n d a r d chemistry, there were no significant improvements in the yield or purity of the mixture after the second reaction. The second option was to remove methylene chloride from the chemistry entirely, as it does not participate in the reaction, and use butanol as the only extraction solvent after reaction two. For comparison purposes the reaction mixture after the second reaction was divided into two portions. One portion was extracted with two methylene chloride t r e a t m e n t s per the standard. The other portion was extracted with an equivalent volume butanol in two extractions. The results for these experiments are listed in Table 5.
222
Table 5: Example 1 Results Content of aqueous layer (percent of original) DIU total D (extraction 1) D (extraction 2) D total DMF HOBT D-urea impurity
MeCI Extraction 0% 98% 94% 92% 20% 76% ~95%
Butanol Extraction 0% 74% 99% 73% 11% 8% ~20%
Overall both solvents remove all the DIU after two extractions, which is the primary goal of the extractions. The butanol system is at a slight disadvantage for product recovery due to the first extraction pulling about 26% of the product, D, into the butanol layer. Clearly, this extraction could be optimized to achieve greater t h a n 90% total recovery of product from the extraction step. The other advantage of the butanol extractions is that the content of both the DMF and HOBT in the aqueous layer is much lower. Finally, the butanol extractions were successful in significantly reducing the level of the D-urea reaction byproduct. While this was not stated as a primary goal, reducing the amount of extraneous organics in the aqueous phase is advantageous to the subsequent precipitation reaction and purification steps. The all-butanol extraction option has the additional advantage of removing methylene chloride entirely from the process. 8.3.2 E x a m p l e 2: M a s s s e p a r a t i n g a g e n t
In this example, the development team was exploring the design of a manufacturing process to recover pure product and potentially recover solvents for reuse. The method used to design the process was based on the process synthesis procedures of Jaksland (1995). The synthesis procedure explores the properties of the mixture to select appropriate separation techniques. The original design of the process simply used distillation and then n-heptane as an anti-solvent to crystallize the product, which was filtered and dried, as shown in Figure 4. However, there were some disadvantages to heptane, particularly regarding solubility of the impurities that end up as solids with the X-2P product. The challenge for CAMD was to find a suitable replacement mass-separating agent (MSA) for heptane that will cause the X-2P to precipitate out of solution while retaining the toluene and reactants in the liquid phase.
223
Figure 3: Example 2 process flow The primary chemistry in this example is X-2R1 + X-2R2 --->X-2P + H 2 0 This takes place at atmospheric conditions in the presence of toluene as the primary solvent with MTBE carried over from previous processing with X-2R2. Table 6 gives the approximate composition of the post-reaction mixture and for which the process synthesis has been conducted.
Table 6: Example 2 post-reaction mixture composition Component
Concentrati on (mol%)
X-2R1 MTBE X-2R2 X-2P Toluene Water
0.1
As in the from the exchange However,
1 10 73.9 10
S t a t e of p u r e c o m p o n e n t s 298~ 1 atm State Solid Liquid Solid Solid Liquid Liquid
Tb (K) 524.33 328.35 445.91 611.1 383.7 373.2
at
Tm (K) 443 164.55 300.93 353.1 178.18 273.2
original process the best mechanism for removing the MTBE and water post-reaction mixture is concentration of the mixture in at least one of toluene. At that point the mixture is essentially free of water. care must be taken to not remove too much toluene, as the product
224 tends to form a highly viscous tar with the toluene at higher concentrations. As a result, the mixture passed to the product isolation step must be at least 30 wt% toluene.
Pre-Design Phase The p r i m a r y goal of CAMD in this example is to replace heptane with another MSA to effect the precipitation of the X-2P product while retaining the other materials in the liquid phase. Table 7 lists the solubilities of the compounds in heptane.
Table 7: Solubility of compounds in heptane
Compound
Solubility in heptane (g/cm ^3)
X-2R1 X-2R2 X-2P Toluene
0.0125 0.0186 2.83E-04 0.397
In the CAMD Pre-Design toluene, X-2R2 and X-2R1 must be miscible in the new solvent, and X-2P must be immiscible. Ideally, the relative values of the solubility will also be greater for the first three and lower for X-2P. Unfortunately, very few of these mixture properties can be predicted due to the complex nature of the solutes. As a first pass, we can use CAMD to find solvents with similar properties to heptane. The target values for this initial CAMD problem are listed in Table 8.
Table 8: Target properties for example 2
Property
Heptane value
Target / Range
Boiling point Melting point Total solubility Groups to search
371.6 K 182.6 K 15.2 MPa o.5 N/A
Less than 400 K Greater t h a n 150 K 1 4 - 16 MPa ~ CH3, CH2, CH, C, OH, CH3CO, CH2CO, CHO, CH3COO, CH2COO, HCOO, CH30, CH20, CH-O, COOH, COO Low environmental impact Limited health & safety concerns
225 CAMD Phase Based on the specifications above, ProCAMD generates 3498 compounds, filtering 3397 based on solubility parameter, 14 based on melting point and 44 based on boiling point, leaving 43 candidates compounds. Of these compounds, the following were found in the DIPPR databank: MTBE, Ethylal, Ethyl propyl ether, tert-Butyl ethyl ether, Methyl tert-pentyl ether, Diisopropylether, Acetal, n-Butyl ethyl ether, Di-n-propyl ether, and Ethyl-tert pentyl ether.
Post-Design Phase Based on discussions with the chemist and available compounds from the stockroom, we decided to explore the use of diisopropylether via experimentation. The chemist was also curious about 2-pentanone, which did not appear on the list because its solubility parameter is closer to 18 MPa 0.5. The solubility information for each of these is listed in Table 9 and Table 10, respectively.
Table 9: Solubility of compounds in 2-pentanone Compound
Solubility in 2pentanone (g/cm ^3)
Relative to heptane solubility
X-2R1 X-2R2 X-2P Toluene
0.3043 0.1337 1.685E-04 0.126
24.3 7.19 0.60 0.32
Table 10: Solubility of compounds in diisopropylether Compound
X-2R1 X-2R2 X-2P Toluene
Solubility in diisopropylether (g/cm^3) 0.072 0.024 4.82E-05 0.326
Relative to heptane solubility 5.76 1.29 0.17 0.82
With two potential solvents selected at this point, decisions need to be made as to which properties are the most important. In this case, it is most important to reduce the solubility of the X-2P in the solvent. The fact that diisopropylether has less than 20% of the solubility of heptane for X-2P takes precedence over the better solubility for X-2R1 and X-2R2 in 2-pentanone.
226
8.3.3 Example 3: Challenge Problem From a computer-aided design perspective, this problem has proven difficult. The hope of presenting this problem is to give researchers in this area an idea of the complexities that arise in the real world. The mixture in this case contains water, acetonitrile, ammonia, and three difficult-to-model internal compounds, as listed in Table 11. The product, X-3P, is an Ammonia-Bromine salt, which impacts any computations on the mixture. The structures of this compound and the other unique compounds are shown in Figure 4. The mixture is highly non-ideal due to the electrolytes present in X-3P and ammonia.
Table 11: Example 3 mixture composition Compound Acetonitrile Water Ammonia X-3P (product) X-3R (reactant) X-3B (byproduct)
Wt. % 51.7 29.3 10.3 7.7 0.7 0.4
NH3Br
N
Br
o
o
CN
CN X-3P
X-3R
o
NC
CN
X-3B
Figure 4: Molecular structures of X-3P, X-3R and X-3B
227 The goal for the operation is to remove the water (to less than 2 wt%) and to drive the composition of the mixture to approximately 15% X-3P in Acetonitrile. The new solvent should also be a liquid at normal conditions. The suggested properties for such a solvent are listed in Table 12.
Table 12: Target properties for challenge problem
Property
Target / Range
Boiling Point Melting Point
Less than 400 K Greater t h a n 150 K
Miscible with Acetonitrile Immiscible with Water Good solvent for X-3P Low environmental impact Limited health concerns
First try to find solvents that satisfy all the constraints except those related to the solubility of X-3P. Then use experimental data, if available, to find out which of the candidates have good solubility for X-3P. This will reduce the number of candidates. In the final selection, perform simulation as well as more detailed analysis of the property constraints, especially since the property models used in the design-phase may be subject to errors.
8.4 C O N C L U S I O N S This chapter demonstrates the utility of computer-aided molecular design even for complex solutes, where the solvent interaction is difficult to determine analytically. The chapter presents a procedure that combines experimental work with CAMD for complex solutes and then goes on to show how this applies in real situations encountered in the pharmaceutical industry. The final example presents a challenge problem for future computer-aided molecular design researchers.
8.5
REFERENCES
P. M. Harper, "A multi-phase, multi-level framework for computer aided molecular design", ", PhD-thesis, Technical University of Denmark, Lyngby, Denmark, 2000.
228 ICAS Documentations, Internal Report PEC02-14, CAPEC, Department of Chemical Engineering, DTU, Lyngby, Denmark, 2002. C. Jaksland, "Separation process synthesis and design based on thermodynamic insights", PhD-thesis, Technical University of Denmark, Lyngby, Denmark, 1996 J. Vinson, P. M. Harper, R. Gani, "Solvent selection for chemical and pharmaceutical processes", AIChE Annual Meeting, Paper no. 240 c, Los Angeles, USA, November 2000.
Computer Aided Molecular Design: Theory arid Practice L.E.K. Achenie, R. Gani and V. Venkatasubramanian(Editors) 9 2003 Elsevier ScienceB.V. All fights reserved.
C h a p t e r 9: C A M D for S o l v e n t S e l e c t i o n in I n d u s t r y -
229
II
J. L. Cordiner
9.1 I N T R O D U C T I O N Fine Chemicals M a n u f a c t u r i n g is increasingly looking at reducing the time to market, this m e a n s t h a t decisions about the process are pushed f u r t h e r and f u r t h e r back the decision train. These decisions are then required when less and less of the a p p a r e n t l y required information is available. Conventional wisdom needs to be tested to consider w h a t information is really needed and w h a t level and quality of decision is required at each stage. In some cases, for example pharmaceuticals, the process route needs to be decided very early for registration reasons. The choice of the route can have large implications on the costs of production and capital requirement. It is then advantageous to have methods to challenge the normal route selection and development processes. This c h a p t e r describes two methods & tools t h a t m a y be used in early evaluation of processing routes related to solvent selection. These two methods & tools are S M S W I N (developed at Syngenta) and ICAS-tools (developed at C A P E C http://www.capec.kt.dtu.dk). The methodology applied is briefly described and i l l u s t r a t e d t h r o u g h two case studies.
9.2 GENERATING AND REVIEWING ALTERNATIVE PROCESS ROUTES Clearly the synthetic routes from research (see Fig. 1) are usually not practical for a m a n u f a c t u r i n g setting. The chemist and engineer need to work together to consider how all the routes for consideration will be operated at the m a n u f a c t u r i n g scale desired by the business. At this stage it is vital the early evaluation tools are able to aid this process in generating processes t h a t can be radically different from conventional wisdom. Each chemical route can be operated at a m a n u f a c t u r i n g scale in a n u m b e r of different ways and these needs to be considered in any route evaluation. In addition the early evaluation tools are required to enable comparison of routes and processes to enable the most practical options to be chosen. Clearly the level of information on each route will be sparse at this stage and therefore the tools m u s t allow quality decision to be
230 t a k e n on the limited data. It is therefore important to remember t h a t comparison requires the data to be consistent but not necessarily accurate at this stage. As it is i m p o r t a n t to consider the whole supply chain in route selection one should use the tools alongside experience from different professionals r a t h e r t h a n expecting the tools to do the whole job. Researchroute ~ - -
~"~
Generateroute 1 I options I ~--/]~-~~'~
~ ~ Selection cdteria
\
k SHEimpact ~
Market requirements
(quality;toxl Activity VPC/ margin Capital
/
~
~l "I ~
/
/
/
/
/
I oUtlinef/s & I
I
costs !
I ~ /f I \ ~ I \ I \
I
X
X
X
/
~.____1
I.J I ~
/
/
/
/
FF&P=formulation, f i l l apack nd
9
( I
u,ness targets
]
/ /
/
/
I.,
Ongoing F development t ~ - ~ . . . ~ tr~loSn t ~
manufacture ~
)
/ ~ / f f / r / ~ l Design,f/s, / f I costestimate / / I \ /
FF&Pdevelopment
Process development
Product specification market forecast
. . ,-any . . I .... I I Tormutauon. j
/
i ~
I
/ /
\
I k ................ ............... ~ ~
X
THE KEY DE,~ISION POINT
9
/
/
/
~
I --~ Decision tOinvest
t
Figure 1: Schematic of the development process for an agrochemical product (Carpenter [1,2]). 9.2.1 C h a l l e n g e s for the Early Evaluation Tools The early evaluation tools need to be user friendly, robust and easy to use. In particular the tools need to be as intuitive as possible for the infrequent user, minimising the n u m b e r of forms to be filled in or clicks required. This can be seen in setting the most commonly used information at very easy and fast reach as shown in Fig. 2. Wherever possible an expert system to select items or calculation methods needs to be employed in such a way that it is easy for the non specialist to use the tool whilst providing sufficient information (knowledge) about the problem and guidance to arrive at an acceptable solution. For example, the physical property method for early (solvent) evaluation and setting up of this method needs to be made very easy. This can be demonstrated by pre-setting the groups for UNIFAC (described in chapter 2 of Part I) for as many molecules and frequently used building blocks for molecules as possible as is done typically for UNIQUAC for molecules. The databases in SMSWIN, ICAS and in most commercial simulators already have this feature.
231
Figure 2: Property selection From S M S W I N
Many of the process developers will need help when considering different solvent options. Here in the form of a decision tree (see Fig. 3), would selection and points to when further advice from
in selecting a property method an expert system, as highlighted be beneficial. This allows rapid the expert system is required.
The tools should be as visual as possible. Many visualisation tools are provided in SMSWIN and ICAS to help process developers to rapidly access processing route options. For example, residue maps (for evaluation of feasible separation regions), eutectic/azeotropic diagrams (for evaluation of separation constraints), solubility/saturation plots (evaluation of phase boundaries) and many more. Having diagrammatic ways of presenting the same data can aid understanding of the solvent-based separation process. For example, a triangular phase diagram highlight the existence of one or two liquid phases in equilibrium with a vapour phase for a ternary mixture consisting of two solutes and a solvent. For the same system, a solvent-free two-dimensional phase diagram can be used to determine (visually) the amount of solvent (or entrainer) needed to break or sufficiently move an azeotrope.
232
SyngentaProperLy Metl~dSelection. propertiesor component
No---~
\ "known? ~ A ~ r e yo yes (~
Use EOSSeek Advice
yes
no
tryingto
~distinguishbetween~----- no----~
j
UNI~F
isome~
|
~
l
yes
(Aspenor SMSWIN)
~ s ~ systemat ~ low pressure? ~ "~i\e~ 0 & < 0.0013 (gm 3) at 298 K.
240
Table 2: EH&S property constraints PROPERTIES
RELEASE RANGE
Very Toxic Respiratory Sensitisers Potent Carcinogen Toxic Corrosive Animal Carcinogen Harmful Skin/Eye irritants Non Hazardous Non Irritant Non Genotoxic
< O.1 mg/m3 < O.lppm
CLASS ,,,
H1 H2
M
> O.lppm < 10ppm < lmg/m 3 < 500ppm < 10mg/m3 > 500ppm > 10mg/m3
Figures 7c and 7d show the specifications for the mixture properties and the azeotrope/miscibility calculations within ProCAMD. The UNIFAC model is selected and anthracene is selected as the solute that needs to be extracted with the solvent (Fig. 7c). From Fig. 7d, it can be noted that an azeotrope with water is specified and a liquid phase split is also specified. Figure 7e shows a typical screen shot when ProCAMD has finished the calculations in the CAMD-phase. ProCAMD did not find any cyclic compounds (because of the limitations of group parameters within the property models) but it did find acyclic compounds and aromatic compounds, listed in Table 3. One of the compounds, 1-Methyl-3-n-propylbenzene has already been found through SMSWIN (see Table 1). Therefore, the post-design phase was not continued further since the analysis had already been done through SMSWIN.
Table 3: List of feasible compounds from ProCAMD ACYCLIC
AROMATIC SOLVENTS
CYCLIC SOLVENTS
SOLVENTS n-Decylacetate
1,2,3,4-Tetramethylbenzene
No molecule met the
1-Undecanal
1-methyl-3-n propylbenzene
specifications
n-Nonylacetate 1-Decanal Methyl decanoate
241
Figure 7a: General problem specification in ProCAMD
Figure 7b: General problem specification in ProCAMD
Figure 7c: Mixture property specification
Figure 7d: Azeotrope /miscibility calculation specifications
242
Figure 7e: Screen shot of results from ProCAMD 9.3.2 Case Study 2: Solvent for Dehydration In this example the problem is to find a solvent to replace toluene as an e n t r a i n e r in batch dehydration, which is the bottleneck in this stage of a processing route. The existing process operation is carried out by the addition of toluene to a batch distillation column with a decanter to recover e n t r a i n e r from the distillate. The feed to the system contains a n u m b e r of products including the i n t e r m e d i a t e to an agrochemical. The two key components are, however, Dimethyl acetamide (DMAC) and water. The other components can be ignored due to their high molecular weight and small impact on the VLE of the water-DMAC system. The c u r r e n t system employs an e n t r a i n e r as DMAC hydrolyses with w a t e r p a r t i c u l a r l y at elevated t e m p e r a t u r e s hence toluene as an e n t r a i n e r was selected to allow the separation at lower t e m p e r a t u r e s . The new solvent would need to fit into the existing equipment with minimal changes required. In addition the p u r i t y of the agrochemical i n t e r m e d i a t e product s t r e a m passing to the next stage of the process should r e m a i n the same as with toluene as the entrainer. The following t a r g e t s need to be matched by any solvent to be selected. 9 Final w a t e r content of the i n t e r m e d i a t e product s t r e a m should be less t h a n 9 kmol. 9 DMAC losses to be controlled by t e m p e r a t u r e (< 117~
243 9 A m a x i m u m of 20 kmol of e n t r a i n e r can r e m a i n with the i n t e r m e d i a t e product stream. 9 Batch dehydration time should decrease in order to reduce cycle time and DMAC losses. 9 DMAC loss in distillate should be a m a x i m u m of 0.3 kmol%. Based on the above targets, the selected e n t r a i n e r needs to have the following properties. E n v i r o n m e n t a l and toxicity constraints are not considered at this stage but will be analysed in a post-design stage (not highlighted in this case study). 9 Form a heterogeneous azeotrope with w a t e r with a boiling point below 117oC. 9 The liquid-liquid split should be at least as good as toluene. 9 Separation of DMAC and the e n t r a i n e r should be good, i.e. no azeotrope should form between the e n t r a i n e r and DMAC and the solvent power should be high. Applying the ProCAMD program, the following candidates have been found. Figure 8a shows the screen shot from ProCAMD highlighting the solution details. Figure 8b confirms t h a t the substitute e n t r a i n e r satisfies the desired (target) properties. The next step would be to perform batch distillation simulations to verify the functional (operational) target properties and to analyse the e n v i r o n m e n t a l and toxicity constraints.
Figure 8a: Problem specification details and solution statistics from ProCAMD
244
Figure 8b: Problem specification details and feasible solvent from ProCAMD
9.4 C O N C L U S I O N S & F U T U R E C H A L L E N G E S Many of the typical processes contain very complex molecules of which there is little information. These complex molecules have many functional groups and be in the presence of similar molecules which are produced as by products or as pre or post stage products. Indeed many final molecules are required in a particular enatiomer. Some typical molecules are shown in Fig. 9 (from Carpenter [2]). The selection of the separation task therefore becomes complicated. It is important therefore to have good predictive tools for the important physical properties and the ability to improve these predictions with as much known information as possible. This sort of tool has been developed by the CAPEC Group at the department of chemical engineering of the Technical University of Denmark. There are however ways forward by using as much information as available from the molecule and similar molecules to give some guidance. This is where using the tools along side experience and experiment can work very well.
245 Br
F,
O\ +
_~Br
H .'"~~ -
o==~ O
_
N~O
~Me
P
Cl
O
O~ §
N %N
O
o A substituted diphenyl ether used as an herbicide
Ii
O
A green azo dyestuff for dying polyester
Nit
F
o
A synthetic pyrethroid insecticide
Figure 9: Typical examples of complex molecules (solutes). It is common in many processes to have by-products and intermediates t h a t are very similar in structure to the product, indeed it is also common to have enantiomers where one is the active compound and all other enantiomers inactive. This makes the separation selection and also the prediction of the properties more difficult. Measurement of the required physical properties can also be problematic due to the difficulty of producing a pure sample of any byproduct. There is therefore a substantial gap in the currently available property prediction methods to be filled. The currently available CAMD methods and tools (see Part I of this book) need to be further developed to take account of wider solvent issues and could also be widened to route selection including formulation of active products, for example, surfactant selection. In addition visualisation tools along with optimisation t h a t allow selection of separation schemes taking into account efficiency of separation (Bek-Pedersen et al. [8]) will prove very useful. Solvent selection tools will also be greatly improved when reaction effects are better predicted. Finally, early evaluation tools are proving very useful in improving solvent-based process route selection practise, bringing chemical engineers and chemist together and facilitating co-current development that is focussed much earlier reducing the necessary experimentation and development time-scales.
ACKNOWLEDGEMENTS Permission to publish from Syngenta is gratefully acknowledged. Thanks to a great m a n y friends and colleagues for advice and information, especially: Dr
246 Keith Carpenter and Dr. Alan Hall, Dr Will Wood of Syngenta Technology and Projects and James Morrison Consultant.
9.5 R E F E R E N C E S
1. K.J. Carpenter, "Chemical Engineering in Product Development- The Application of Engineering Science", Entropic, 223 (2000). 2. K.J. Carpenter, 16th International Symposium on Reaction Engineering (ISCRE 16), 2001. 3. B. G. Cox, "Modern liquid phase kinetics", Oxford Chemistry Primer Series 21, Oxford University Press, UK (1994). 4. B.G. Cox and A. J. Parker, J. Am. Chem. Soc., 95 (1973) 408. 5. Chastrette, JACS, 107 (1985)1-11. 6. ICAS Documentations, Internal Report PEC02-14, CAPEC, Department of Chemical Engineering, DTU, Lyngby, Denmark, 2002. 7. P. Bavishi, MEng Thesis-2000, Department of Chemical Engineering, Imperial College, London, UK (2000). 8. Bek-Pedersen, E., Gani, R., Levaux, O., Computers and Chemical Engineering, 24 (2000) 253-259.
Computer Aided MolecularDesign: Theoryand Practice L.E.K. Achenie, R. Gani and V. Venkatasubramanian(Editors) 9 2003 Elsevier ScienceB.V. All fights reserved.
247
Chapter 10: Case Study in Optimal Solvent Design M. Sinha, L. E. K. Achenie & G. M. Ostrovsky
10.1 I N T R O D U C T I O N Solvents are extensively used as a major component of ink in the printing industry. The function of solvents in ink is to act as a vehicle for polymeric resins, pigments and dyes. The ink solvent also assists in wetting and dispersion of dyes and pigments. In letterpress and offset lithographic printing processes, the ink is carried to the plate by means of a train of rubber rollers commonly called "blankets" as shown Fig 1. Thus a thin film of ink is distributed over a large surface area on the blankets. These ink solvents are volatile and evaporate to leave behind the pigments and resins on the blanket surface. Cleaning is required whenever the residue build-up affects the print quality and between print jobs. Paper fibres, ink residue, paper coating and dried ink, are types of material t h a t must be removed from the rubber blankets.
Figure 1: Schematic of Lithographic Printing One of the most used solvents in lithographic printing is the '%lanket wash" which is specially formulated to clean ink and other residue from rubber blankets. Blanket cleaning is accomplished automatically or manually. In an
248 automatic blanket wash process, as shown in Fig. 1, the blanket wash is jet sprayed onto the blanket. Therefore a large amount of the wash is lost by evaporation even before it makes contact with the blanket. Blanket wash solvents are mostly solvent mixtures as opposed to single component solvents. As such, next to solvent performance, one of the most pressing concerns of the printing industry with regard to the environment is the volatile organic component (VOC) level of solvents. At present the VOC levels of solvents used in the printing industry are unusually high, well over 80% and far beyond the industry target of 30%. For example, a commonly used blanket wash, "VM&P naphtha" has a 100% VOC content (United States Environmental Protection Agency, 1997a). Blanket washes and solvents for "rag and bucket" operations are chosen based on their performance and their impact on the environment, health and safety. There is a wide variation in the performance attributes of cleaning solvents by different vendors. To enhance the cleaning operation, companies sometimes mix solvents from different vendors. However, this trial and error approach is costly and may not necessarily yield the solvent mixture with the desired performance attributes. In addition, the solvent for a cleaning operation may not meet safety, health and environmental restrictions.
Another important issue is minimizing the effect of a solvent on the surface characteristics of the rubber blanket by inducing swelling. Swelling severely affects the print quality in lithographic processes. Thus, there is a need to account for this in blanket wash design. The goal of this case study is to design globally optimal solvents to be used for cleaning in lithographic. These solvents should (i) have a minimal drying time, (ii) dissolve residue ink, (iii) not swell the blanket, and (iv) be environmentally benign. Drying time is correlated with the heat of vaporization of the solvent. The ink residue is assumed to consist of phenolic resins.
10.2 P R O B L E M D E F I N I T I O N
The problem as posed can be modelled as a multicriteria optimization problem. However, in the printing industry, there are rather loose and minimal requirements on these attributes. Therefore these attributes are regarded as constraints with given targets (similar to goal programming, Tamiz, 1996). A straightforward approach to modelling the problem as a special kind of multicriteria problem is to consider a lumped objective in which the different criteria appear as terms with appropriate weights. However this approach forces the solvent formulation engineer to think of appropriate weights (usually of no physical meaning) to employ, a rather non-trivial task. A more meaningful and
249 rigorous approach is to consider the problem as a multi-level optimization problem. The latter is rather difficult to solve and has usually been restricted to bi-level optimization problems in which the decision variables are continuous. We reiterate that the goal of this case study is to design optimal solvents to be used as cleaning agents in the printing industry. These solvents should (i) have a minimal drying time, (ii) dissolve residue ink, (iii) not swell the blanket, and (iv) be environmentally benign. Drying time is correlated with the heat of vaporization of the solvent. The ink residue is assumed to consist of phenolic resins. Solvents that can effectively dissolve the ink residue obey the solute-solvent interaction
R ~ =4(5 D -SD) 2 + ( 6 p - S p ) 2 + ( 5 . - 6 . ) 2 _ 323
(4)
~_~ ~_, u ij ( Tm ) j) / 102 .425 ) < 223 i
j
(5) ~_,~_u~(Z~ i
+ ~,,~_uo(Z')j < 4.0
j
i
4(5 D -23.3) 2 + (Sp - 6.6) 2 +(5 H - 8 . 3 ) : < (19.8) 2
D - 6.31/tv > 0 ~/[i -~ llf i -~ ~/[ i'
(6)
j
i = 1,2,3,4
(7)
(8) (9)
To solve CAMD_I: we proceed as follows Step 1" (a) Decide on the set of groups to be used to form compounds. We choose as basis set twelve groups, namely CHa-, CH2-, Ar-, -Ar-, -OH, CHaCO,-CH2CO-,-COOH, CHaCOO-,-CHeCOO-,-CHaO, and-CH20-.
251 (b) Identify the design variables. These are given by the structural variables u/j, which determine whether a particular structural group is present in the molecule. Step 2" Identify the performance objective. The performance objective is given by the double summation in Eq. (1), which gives the heat of vaporization of the compound. Step 3: Identify the constraints. Constraints are employed in order to ensure that the last seven groups in the basis set are not allowed to occur more than twice in a compound as follows ~ u ~ < 2. j = 5 ..... 12
The constraint Sp _> 6.3, will ensure minimal blanket swelling. The environmental impact of solvents is accounted for by requiring that the maximum value of the partition coefficient (log Kow) be 4.0. To ensure that the solvent is a liquid at ambient temperature, the limits on boiling point (Tb) and melting point (Tin) are imposed. The constraints are Eqs. (4) through (9). Eqs. (4) to (7) are the property target constraints on blanket swelling, and Eq. (8) are constraints imposed by the branching functions. Eq. (9) are simple bounds on the branching functions. Step 4: Decide whether to use the Odele-Machietto or the Churi-Achenie Octet Rule Model. Here we employ the much simpler (although restrictive) OdeleMachietto model for acyclic compounds where vj is the valence of jth structural group. The model is given in Eq. (3). We also include the molecular structural constraints (Eqs. (2) and (3)). Step 5" Using information from previous steps, assemble the mathematical program, i.e. the performance objective, constraints, design variables and the Octet Rule Model. Eqs. (1) through (9) make up the mathematical program. Step 6: Construct linear estimators of the performance objective and the constraints. The simple example in Chapter 3 gives an illustration of how to do this; also see the appendix in this chapter. Step 7: Enter an iterative loop using the branch and bound (BB) procedure in Section 3.3.1 of Chapter 3. There are two nonconvex constraints. The splitting functions employed are ~D, ~P, I~rHand ~y. The MILP solver used is a public domain code lp_solve by Hartmut Schwab available at (ftp.es.ele.tue.nl/pub/lp_solve). This solver uses the simplex algorithm, lp_solve uses a rather simple depth first strategy. Identify the optimal molecule using information from the solution.
252
Three different runs were investigated for case study 1. The three runs correspond to n~ax of 3, 4, 5, 6, 7, and 10 ( C A M D _ l a , C A M D _ l b , C A M D _ l c , C A M D _ l d , C A M D _ l e , and C A M D _ l f , respectively). The corresponding problem dimensions are 36, 48, 60, 72, 84 and 120. For all cases the n u m b e r of constraints are 15. The t e r m i n a t i o n criterion used is an absolute tolerance of 10 .3. The results are shown in Table 2. Problem C A M D _ l a has a very limited search space. A feasible solution was found in the first iteration in the branch-and-bound algorithm. In C A M D _ l c , the algorithm took 31 iterations and 351.4 seconds on a 333-MHz DELL P e n t i u m II personal computer. The m a x i m u m n u m b e r of sub-regions constructed is 16. The globally optimal solution corresponded to methyl-ethyl ketone (MEK or CH3CH2-CO-CH3) with objective function 35.471 k J / m o l e . This compound was found at the 10 th iteration with a valid upper bound of 35.471 and a lower bound of 33.99. Since the difference between the upper and lower bound was more t h a n the tolerance, the algorithm continued executing. The algorithm finally converged to M E K as the global solution after 21 more iterations. The two other feasible compounds found were propanol (CH3-CH2CH2-OH) and diethyl-ketone (CH3-CH2-CO-CH2-CH3). The objective function values for propanol and diethylketone were 44.77 kJ/mole and 40.12kJ/mole, respectively.
Table 2: Application of Reduced Space BB algorithm to CAMD_I Case
nmax
CAMD_la CAMD_lb CAMD_lc CAMD ld CAMD_le CAMD_lf
3 4 5 6 7 10
Variables Constraints Iterations CPU time (min) 36 15 1 0.045 48 15 18 0.86 60 15 31 5.85 72 15 42 17.21 84 15 46 48.45 120 15 67 713.5
Max number of subregions 1 12 16 20 21 21
We note t h a t at any iteration, the solution of the relaxed MILP problem is a s t r u c t u r a l l y feasible compound since all the structural constraints are linear. During the execution of the algorithm, fifteen different compounds were found. Of these, two other compounds satisfied the specified or performance constraints. For case C A M D _ l e , the n u m b e r of iterations is 46 and 3 compounds are designed. The m a x i m u m n u m b e r of subregions created is 21. In C A M D _ l f , the n u m b e r of iterations is 67. The m a x i m u m n u m b e r of subregions created is 21. Even t h o u g h the n u m b e r of iterations does not grow very much, the CPU time increases. This is because the CPU time associated with each LP solution increases significantly when the n u m b e r of variables increases. Another desirable property of this algorithm is t h a t a very small n u m b e r of subregions are created.
253 For the three cases, the number of subregions created are 16, 21 and 21, respectively. Thus the algorithm is very efficient in terms of storage requirements. It should be noted that as the dimension of the problem increases from 60 to 120, the number of iterations only increases from 31 to 67. This is perhaps the consequence of the fact that the number of branching variables, namely 4, is the same in all the cases. Recall that in all the example problems above, although the number of variables uij increased from 60 to 120, the number of branching functions is unchanged at 4. In contrast, if we employ the standard full space BB algorithm, we will need to perform branching with respect to all the variables ui. Here, the number of branching variables ranges from 60 to 120.
10.2.2 Case study CAMD_2 In this case, the same formulation is solved with the Churi-Achenie model (see Step 4 above). The connectivity variables z and w are employed in the structural representation as described in section 3 of chapter 3. The second constraint in CAMD_I is replaced by the following set of structural constraints. This leads to a large increase in the number of linear structural constraints. m
sm~
m
y~ Z up - ~ 1 u i~ v k
p=lj=l
i - 1...nmax
(10)
i - I smax
~_zij p > -w i j "l
p=l
nm ax m
Z
i = 2 ....
nma x
(11)
n m ax
Zuik+
i= l k = l
Zwi=nmax
(12)
i=l
(13)
wl=O Wi
i=l...(nmax-1)
~-- W i + l
~-~
Z
Zijp q-
Muik < M
i = l...nm~x,k = l .... m
j=u+l p = l S~
(14)
(15) S m,,r
i = l...(nmo~ - l), p = (i + l)...nmo~
(16) n
i = 1...nm~, j = 1 . . . s , ~
p~l Zijp < 1 m ZUikk=I
m ZUi_l,k k =I
--3, otherwise.
P4i =
10 if ].tCl, i = 4, otherwise.
The value of p3i and p4i is set t h r o u g h the following c o n s t r a i n t s
(29)
lzCl, i - 2.5 < 2.5P3 i < ~Cl, i, Vi ~ V. J.tCl,i - 3.5 < 3.5P4 i < PCl, i,Vi ~ V.
(30)
Then, the contribution for rule 2 is given by cO pl,2 = - 2 - 6 Z ( P 3 i +P4i)
(31)
i
Rule 3 - I f a given CH2, CH2Cl or CH2F group is bonded to at least one O H group, a d d - 0 . 4 6 cal mol I K -1. We introduce a the b i n a r y variable ~OH,i,k such t h a t
~og,i,k =
0' if there is an OH group linked to group k at vertex i , otherwise
for all ie V a n d for all ke {CH,CHC1,CH2,C,CC1,CF, CFz. Then, ~OH,i,k CC< -S- -SO2- -0- -C-
0II
oII
-O-C-OO
0II
S i d e c hain G r o u p s
O
-O-CII
-H -CH~ -C2 Hs - nC~ H7 -iC~Hr
0II
-C-O-CO
X
-NH- -C-NH-
-@,~
-@
- ~ 4 Hs
-F
-C1
9 II
-OCH~ ~O~
-O-C-CH8
-OH
-Br
0 II
-O-C-OCHs
-CN
Fig. 1. Extended palette of base groups for the design case study
For the present case study, taken from Venkatasubramanian et al. [2], the design problem was made much larger and the search space more complex by increasing the base group choices to 17 mainchain and 15 sidechain groups. The extended palette of base groups is shown in Fig.1. In the smaller problem, when the base groups consisted of four mainchain and four sidechain groups, the total number of design candidates was about 1.4x105. Under the increased number of mainchain and sidechain groups, the search space was magnified to 1.1x1013 candidates considering design lengths of 2 to 7. Thus, the search space was about 100 million times larger than that in the earlier study. Also, the number of target polymers evaluated was increased from three in the previous study to nine as shown in Table1. The search space was further complicated by the increased number of nonlinear group interactions. For example, for polymer design target 4, the nonlinear van Krevelen group interactions required that every mainchain group, other than the -O- endgroup, and every sidechain group be in their proper
305
p o s i t i o n in o r d e r to give t h e o p t i m a l f i t n e s s of 1. T h a t is, t h e m a c r o s c o p i c p r o p e r t i e s d e p e n d e d n o t only on t h e g r o u p t y p e s b u t also on t h e i r exact ordering in t h e t a r g e t molecule.
Table 1. Target polymers a n d their properties Target Polymer
_g-c / H h, u
a, K.... (X 10, 4)
.................
9, ~,cm3
Tg, K
Cp, K, N/m J./kl~.K ,, (x 10 9) ....
1.34
350.8
2.96
1152.67
5.18
1.18
225.2
2.81
1377.82
2.51
1.21
420.8
2.90
1135.10
5.40
1.19
406.8
2.90
1073.96
5.39
1.28
472.0
2.89
995.95
5.31
1.25
421.1
2.90
1016.55
6.12
1.06
322.3
2.98
1455.90
3.85
1.27
322.1
2.81
1152.67
3.42
1.09
428.7
2.77
1163.10
4.12
H H
I I c-o-c-c4I.!1 J J /
/~/
u
N N--,n
TP1
_•1H
F HH
-
Li~ F H ~H3-ln
TP2 c.~ k
6
~/
~H3 ~
Jn
TP3
TP4
TP5
Jn
TP6 H H H H H
H
___~11 --C--C--C--C---C--N----lb--I I I I I -1 I I I I II H H H H O
/
.In
TP7 H H t,~
O
H H -in
TP8 r_...~ CH3
l TP9
r _ _ . ~ CH3
306 p = density, T g - glass transition temperature, a = thermal expansion coefficient, Cp - specific heat capacity, K - bulk modulus The number of property constraints was the same as before at five and included the following properties: density, glass transition temperature, thermal expansion coefficient, specific heat capacity and bulk modulus. Predicted values of these physical properties for a given molecular structure were calculated by the van Krevelen [3] group contribution methods. The second aspect of the case study involved the incorporation of higher-level chemical knowledge, which is discussed next.
13.1.1 Incorporation of high-level knowledge: Molecular Stability Higher-level chemical knowledge was incorporated to facilitate the search towards more chemically realistic and stable polymers. For example, it is commonly known that certain group combinations such as -O-O-O- and -OC=OC=O- lead to chemically unstable structures and are therefore undesirable in candidate solutions presented by the design system. In the absence of any inclusion of any such higher-level knowledge into the GA, such group combinations were often found in many high-fitness polymers in the smaller case study [1]. Another example of a practical constraint on a design system is environmental acceptability. Certain molecular groups or group combinations are known to be environmentally toxic or unacceptable. This is a common problem in the design of agrochemicals such as fertilizers and pesticides as well as refrigerants. Yet another important consideration would be the relative ease or difficulty involved in the synthesis or manufacture of the proposed design candidates. It is important to be able to incorporate all such constraints in the design process. In the current study, only stability and molecular complexity constraints were addressed. In the knowledge-augmented GA framework, chromosomes with unstable mainchain group combinations were assigned zero fitness. As a result of n a t u r a l selection, such solutions were automatically weeded out of the design process and thereby removed from any further consideration. The knowledge incorporated into the algorithm about the stability of nearest neighbor mainchain groups was drawn from Barton and Ollis [4].
13.1.2 Molecular Complexity Molecular complexity is encoded as a count of the total number of mainchain and sidechain groups and is given by the following equations [5, 6, 7]: F(x) = F(x)-13 x Sig x Complexity
(1)
307
2 Sig = (1 + exp[- 7{F - Fcrit }])
(2)
Complexity =
(3)
MC + SC MCma x
+ SCma x
where F is the fitness value, [3 is a penalty scaling factor, Sig is a sigmoidal fitness function, given by equation (2), that provides a fitness threshold, Fcrit, for the genetic algorithm to start penalizing complex designs, and ~ is a decay scaling parameter. The complexity measure, given by equation (3), ranges from 0 to 1 and is given by the ratio of the number of mainchain (MC) and sidechain (SC) units in the current design to the maximum allowable mainchain and sidechain units (32 in this case). Thus, the complexity of a polymer repeat structure is viewed in terms of its 'size' as given by the number of units in the repeat structure. The smaller the molecule, the lower is its complexity. In order to encourage the favoring of simple molecules over more complex ones of comparable fitness, a penalty was applied to the fitness. All molecules having fitness values greater than the threshold Fcrit w e r e penalized as given by equation (1) in direct proportion to their complexity.
13.2
GA B A S E D S E A R C H
The evolutionary search approach based on GAs has already been discussed in detail in chapter 5. The same framework was adopted for the larger polymer design problem. Slight modifications had to be made to handle the constraints arising out of molecular stability and complexity or maximum molecular length. These constraints were handled via suitable modification of the fitness function. A penalty was assigned to the overall fitness for design candidates that violated the defined constraints. The penalized fitness function used for this purpose can be expressed as [8]: P
F(x) = F(x) + e r / ~ q~i
(4)
i=l
where P is the total number of constraints, rl is a penalty coefficient, e is -1 for maximization and +1 for minimization problems, and (pi is a penalty related to the i th constraint. As mentioned before, the penalty was very severe for violation of stability constraints. Chromosomes infeasible with respect to stability were directly assigned zero fitness.
308 The parameter values used for the search are given in Table2. The design lengths varied from two base group units to a maximum of two units more than the polymer design target. The fitness function gain, a was equal to 0.001. The parameters for equations (1), (2) and (3) were as follows: Fcrit- 0.99, which resulted in applying the complexity measure only after near optimal solutions were attained, ~=100 which provided a gradual activation of the complexity measure as the fitness approached the critical value, and 13=0.10 so that a large penalty reduced the overall design fitness to a point where the genetic algorithm considered the design to be unworthy of further consideration. For statistical significance, results were compiled after 25 runs of 1000 generations each. The genetic design investigations carried out were subdivided into the following scenarios: (i) standard genetic design (ii) knowledge-augmented genetic design, which penalized unstable mainchain group combinations, and (iii) knowledge-augmented genetic design, which penalized unstable mainchain group combinations and molecular complexity.
......................................................Table 2: GAParameters ....................................................... Parameter Value 100 Steady state population 1000 Number of generations 0.001 Gaussian fitness decay rate (a) 0.1 Complexity sigmoid gain (13) Complexity penalty (~) 100 Maximum polymer length Target Length +2 Elitist retention with respect to population 10% size Genetic Operator Probabilities: Crossover Backbone mutation Sidechain mutation Hop Deletion Blending Insertion
13.3
0.2 0.2 0.2 0.2 0.1 0.1 0.0
R E S U L T S AND D I S C U S S I O N
The results for the different genetic design cases are presented in Table3. The results are arranged in the following manner. The rows labeled part (a) give the percent success rate (in bold text) in achieving the design objective and the
309 number of successful runs (in parenthesis) for each target. Part (b) presents the average generation when the target was first located (in normal text). The rows labeled part (c) show the average number (in italic text) of distinct high-fitness solutions found for each target. As was expected, the genetic design was not as successful as it was in the case of the smaller case study, when it located the target molecule in every run (i.e. a success rate of 100%). However, the most important observation here was t h a t the genetic design still succeeded in finding the target molecule for eight out of the nine target polymers, even though the search space had exploded by over a factor of 100 million. As seen from part (a) of the table, with the exception of target polymer 4, all target polymers were located at least once by one of the design scenarios (i.e., columns 3-7). From part (b) of Table 3, it is seen that some molecules took longer t h a n others to be discovered. For example, target polymer 7 was always found in less t h a n 100 generations. On the other hand, target polymer 6 was located with varying success (4%-68%) and took more than 400 generations for discovery. Typically, longer molecules t h a t required exact mainchain group ordering and sidechain positioning needed more generations to be discovered. This explained why target polymer 7, which was the only target molecule with no group ordering constraint was quickly located while target polymer 6, which required exact ordering, took much longer to discover. The exact ordering requirement and the long backbone structure were also the reasons why target polymer 4 was never discovered in any of the runs of 1000 generations each. Columns five to seven of Table 3 present results for the knowledge-augmented genetic search where higher-level chemical knowledge about the feasibility and stability of group combinations and molecular complexity were incorporated. One can observe several general trends from these results. It can be seen t h a t the success rates were higher, in general, with the knowledge-augmented genetic design in comparison with the standard genetic design (part (a) of column 3 vs. columns 5 and 7), when the initial population consisted of random mainchain and sidechain groups. Thus, the addition of higher-level chemical knowledge improved the design efficiency. For column 7, since the complexity measure was applied only after the fitness threshold was exceeded, more generations were required to achieve the target. This also attributed as to why the genetic design was unable to locate target polymers number 3, 4, and 9. In summary, it appears t h a t the incorporation higher-level chemical knowledge not only produced candidates t h a t were chemically feasible, stable, and less complex but also increased the efficiency of the search by eliminating spurious candidates in the genetic design.
310 .......................................
T a b l e 3 : Results for,,the genetic search
S t a n d a r d GA
=__=_..............
Pa rt
Target Polymer
random MC, SC
random MC, hydrog enSC
random MC, hydrog enSC
random MC, SC
60%
64%
28% (7)
(15)
(16)
random MC, SC
60% H
H
- - ~ - ~ ~~ ~k ~ /- o - 'It~ - ~ 't - -iq/- -
(a)
TP1
(b)
/ , I=
0
0
H
. F . ~.~J ~ TP2
-0-o--s
12% (3)
(15)
184
300
233
240
428
282
192
281
213
166
48%
40%
48%
48%
H-In
(c)
0
............
Feasible MC
,'~-x c"~/~x -1
~L_)2-1 --
TP3
(a)
36%(9)
(12)
(10)
(12)
(12)
Co) (c)
411
400
209
522
412
6
7
7
6
10
(a)
0% (0)
4% (1)
8% (2)
12% (3)
0% (0)
293
640
193
163
91
161
74
109
0% (0)
0% (0)
0% (0)
0% (0)
0% (0)
861
564
910
589
570
56%
48%
48%
92%
32% (8)
(14)
(12)
(12)
(23)
(c) s
~
~
~
i
~
~
(a) Co)
TP4
(c)
--~so~>-~o~
(a)
TP5
(b)
400
205
317
232
420
(c)
175
136
197
142
99
4% (1)
32% (8)
16% (4)
68% TP6
8% (2)
(17)
(b)
548
405
529
632
528
(c)
199
146
314
168
158
100%
100%
100%
100%
100%
(a)
(25)
(25)
(25)
(25)
(25)
(b)
61 217
61 188
58 214
64 198
85 163
H _El
H H H H H J i I I I " 1 C--C--C--C--C--C--N--{---I I I I I It / H H H H H 0 .,In
(a)
TP7
(c)
311 Table 3 (continued) H
H
H
H
~ - /- o - IIc ~I(c - )1o - IIc - c -II - -II ~/ I,.
0
~t,...~)
0
TP8
,______r
TP9
,__..__~ CH3
~
n
68%
68%
76%
88%
96%
(a)
(17)
(17)
(19)
(22)
(24)
(b) (c)
210 162
88 132
147 158
109 161
81 125
(a)
8% (2)
4% (1)
4% (1)
4% (1)
0% (0)
(b)
382
132
513
868
(c)
144
69
174
70
....
46 - - :
_.
.
.
.
.
.
.
.._=__.
(a) target polymer success rate "bold", times target found out of 25 GA Runs "(parentheses)"; (b) average generation number for locating target polymer "plain text"; (c) number of distinct polymers with fitness >_ 0.99 (0.985 for TP2) "italic text"; MC = mainchain, SC = sidechain.
The results also suggest that the initial polymer population complexity played a role in the success rate of the genetic design. For example, the standard genetic design, in general, gave better results when the initial population sidechains were seeded with hydrogen groups (column 3, part (a) vs. column 4, part (a)). Large improvements were seen for target polymer 1 (12% to 60%) and for target polymer 6 (8% to 68%). Similar results were obtained for the knowledgeaugmented genetic design that penalized unstable mainchain structures (column 5 part (a) vs. column 6, part (a)). The best improvements were those for target polymer 1 (28% to 60%) and for target polymer 6 (4% to 32%). Part (c) of Table 3 lists the number of near optimal or high-fitness solutions that were found for each target. This ability of the genetic design system to find many diverse alternative solutions with properties very close to the desired target properties, is one of the most appealing features of the system. The high-fitness threshold was 0.99 for all design targets except for polymer 2, in which case it was 0.985. The genetic design was unable to find alternate solutions with a fitness value greater than 0.99 for this polymer. It should be noted that while the genetic design did not find the exact target for polymer 4, it did locate more than 500 to 900 alternative near-optimal solutions. 13.4.1 N e a r - o p t i m a l s o l u t i o n s
Table 4 presents two of the numerous nearly optimal alternatives for target polymer 4 for each of the scenarios 1-3. As one can see, the alternative solutions
312 were very close to the target properties and had fitness values exceeding 0.99. The average absolute error ranged from 0.25% to slightly over 1.0% of the desired property values. The solutions varied according to the search type. For example, case 1 (basic genetic design) obtained two infeasible polymers. The first used a combination of-O- and >C=O groups instead of the single -O-C=O- group and the second contained a -O-O-O- group combination which was unstable. Using the correct -O-C=O- reduced the fitness to 0.976 and increased the average absolute error to 2.04%. Case 2 produced feasible mainchain structures but were generally more complex than those in case 3, which also considered molecular complexity. The number of near-optimal solutions was approximately the same for all genetic design types. Table 5 presents corresponding results for target polymer 3. For this target, as in the case of target polymer 4, all alternative solutions had very high fitness values. Furthermore, these alternative solutions were structurally fairly similar to the actual target. It can be easily appreciated that this ability of the genetic design system to deliver a number of nearly optimal solutions structurally similar to the target is of immense practical importance. In several cases, one of the near-optimal candidates could easily turn out to be an attractive and feasible option for further consideration.
13.5
PARAMETRIC SENSITIVITY AND ROBUSTNESS ANALYSES FOR GA'S
The performance of GA-based strategies is intimately tied to the different parameters employed in the algorithm. These parameters control the various aspects of the algorithm and hence directly govern the outcome of the search. The discovery of an optimal setting for the parameters or even the existence of one can be determined only by experimentation. The results of the GA design system on the case studies though encouraging, were widely varied in terms of success rate as well as the quality of the final solutions obtained. This indicated t h a t to obtain an improvement in performance, a detailed parametric sensitivity analysis needed to be performed. This would help to establish whether an optimal setting could be obtained, independent of the nature of the target structure or design problem. In their previous work, Sundaram and V e n k a t a s u b r a m a n i a n carried out such a parametric sensitivity study in an effort to systematically determine optimal parameter settings [9]. Their investigation also involved a characterization of the search space in order to identify strategies that would allow the GA to exploit the underlying structure of the space. The key results from their work are mentioned below.
313
............................................................T a . b
l_e_4"Near opt.i.mal...so!.u_tio.ns.for ta.._rget.po!ymer.____.._4 ........................................................... % error a
Polymer design
Fitness
Target Polymer: TP4 _ _ _ ~ ~
/'~'h
s_ o
~
{0; 0; 0; 0; 0}
CH3 r=--x
0%
1.0
Case 1" Standard GD b H
O
H
--HK ( ) k----f:'~"Y---( ( ) k--C--O--C---( I\ ~ - - ~ / I( ) l \ x"--"--' .j/ I ~ L'------~ ~ H
{-2.2;-0.5; 0.4; 0.4;-2} 0.74%
) )----C--F-
(
lm J
H
n
C2H5
0 II
Case 2" knowledge-augmented GD, stability
~
II
OH H
,---, 0 ,---, /:-~\ /f-'~\ A "1 C--C---( ( ) k---O--C--( ( J k--'-C:'x"r--O+-
I
l
~ClH3~
O
~
~
Jn
C2H5
~]-c--o--((
I )--c--s--((
) ~-((
I }--~r-~-~
(
) H-
Case 3: knowledge-augmented GD, stability & complexity H
O
~-o-~. @@ CH3
0
0
__[~~o_~.._k/~o_~__]_,
{1.6; 2.2;-0.8;-0.2; 0 . 9 } 1.18%
{0.04; 0.09;-0.4; 0.09; 0.7} 1.10% {0.4; 1.9; 0.85; 0.14;-2.2 1.10% {-0.1; 0.6; 0.1; 0.08; 0.04} 0.21%
{0.4;0.83%-1.0,0.02; 1.8;-0.9}
0.995
0.991
0.999
0.991
0.999
0.999
.....a% Error is f0r {p; Tgi :ai Cpi K} averageabsoluteerr0r %. b GD"= genetic design: The study clearly highlighted the absence of a single optimal setting for the parameters examined. In fact a parameter setting found to work very well for a particular target was found to be non-optimal for a different target. The results implied that an optimal tuning of parameters could be done only on a run-to-run basis. The target-specific nature of the optimal parameter settings exposed an important aspect of the algorithm: the nature of the search space critically influenced the mechanics of the GA. The search-space characterization study illustrated that the structure of the fitness landscape was drastically altered by the target property settings. While in some cases, the landscape was amenable to
314 search using convexity based algorithms, in other cases, it remained rather flat but reasonably correlated for small changes. The most important insight provided by the study was that the breadth as well as the depth of the sampling of chromosomes is crucial to performance of the GA. Stated differently, the diversity of chromosomes sampled during the search is important not only in terms of variety of the samples in terms of their distances in the search space but also in terms of the necessary number of samples at a given distance of separation. This becomes even more profound under non-binary genetic encoding.
Table 5: Near optima_! solutions for target polymer 3 Polymer design
% error a
Target Polymer: TP3 .-~ cH3 .---. _
c-o-((Q%c--,/~'~]-
{0; 0; 0; 0; 0} 0%
. . . . . . . . . . . . . . . . .
Fitness
1.0
Near-optimal solutions O
r II
~ /f-~\
~ /F~\
0
C2H 5
I
O
k
a
I
F
C3H~ ~
--t-c--o---(
{0.58; 0.22; 0.89;-1.3; 0.09} 0.62% {-0.95; 0.3; 0.68;-0.4; 1.5} 0.76% {-0.61; 0.56; 1.2;-0.09; 2.1} 0.92%
0
t-
"----'
--, n
C2H 5 {
~ ~
) Y---C--S---(
{
~
) )-----( {
) Y---Cf--x"v--K
"~--/'
UI~
{
) y-A-
~"~--/J n
{-1.9; 0.34;-0.5;-2;-0.5} 1.05%
0.997
0.996
0.993 0.992
% Error is for {p; Tg; (z; Cp; K} average absolute error %.
In addition to the issue of parametric sensitivity, another important concern relates to the robustness of the genetic search method, in fact any design system, to uncertainty in the forward prediction model, which is used for fitness evaluation. Every forward model has some level of error associated with it. Depending upon the type and complexity of the property or performance measure at hand, the predictions of a model may be as much as 10-15% off the true values. While such high degree of error may not be present in predictive models for simpler properties such as density, there would surely be some error. The presence of error may be viewed as uncertainty in the forward predictions. Then the practical utility of a design system would be related to its performance under
315 such uncertainty. In a recent work, P a t k a r and V e n k a t a s u b r a m a n i a n [10] studied the robustness of genetic algorithms to model uncertainty in molecular design. The study was carried out using the large polymer design case study. The results were highly encouraging and indicated an overall robust performance of the GA-based design system. For the target polymers considered, the system was able to enjoy success at errors even as high as 10% error in the forward model.
13.6
CONCLUSIONS
The performance of a GA-based approach for large-scale molecular design was investigated with the help of a large polymer design case study. The total number of solution candidates in the present problem was about 100 million times larger t h a n in the example discussed in chapter 5. It was found that, despite the tremendous increase in the search space size and the complex nonlinear group interactions, the genetic design was generally able to find the target molecules. Furthermore, it was also able to provide a diverse collection of design alternatives, which nearly satisfy the property constraints. However the algorithm enjoyed a much less success rate and was much slower in terms of convergence compared to the smaller problem. The versatility of the genetic search methodology was illustrated in terms of its easy extension to include higher-level chemical knowledge. The objective of incorporating such knowledge was to ensure that more realistic, stable, and less complex solutions were obtained from the search. The results indicated t h a t the inclusion of knowledge not only eliminated the creation of chemically infeasible structures as expected, but also improved the overall efficiency of the genetic design. In other words, not surprisingly, the search turned out to be more intelligent t h a n in the absence of additional knowledge. It was evident from the case studies that the genetic design system was extremely proficient at rapidly locating favorable regions in the design space. It was, however, less effective at performing very localized searches. This was seen in many design scenarios where the optimal design could be reached by three or four genetic operations but took the algorithm several hundred generations to realize the target. This strongly indicated that tuning the p a r a m e t e r s could significantly improve performance. However parametric sensitivity studies indicated the absence of a single optimal p a r a m e t e r setting. The best settings changed from one target to another and could be determined only by experimentation. The issue of the performance of GAs under forward model uncertainty was briefly addressed. Results from a recent study are encouraging and indicate significant robustness on the part of the genetic design system.
316 In conclusion, the problem independent, efficient nature of the versatile genetic approach and the ease with which chemical, biological, design or process knowledge and constraints can be incorporated make the genetic design framework very appealing for CAMD and worthy of further investigation for large-scale molecular design problems.
13.7
LIST OF SYMBOLS AND ABBREVIATIONS
F Fcrit (z
Y CAMD GA(s) PET PVP PC MC SC
13.8
fitness value fitness threshold decay rate for Gaussian fitness function penalty scaling factor for complexity complexity gain penalty coefficient for modified fitness function penalty related to the i th constraint Computer-Aided Molecular Design Genetic Algorithm(s) Polyethylene terephthalate Poly(vinylidene propylene) copolymer Polycarbonate of bisphenol-A mainchain sidechain
REFERENCES
1. V. Venkatasubramanian, K. Chan and J. M. Caruthers, Comput. Chem. Eng., 18 (1994) 833-844. 2. V. Venkatasubramanian, K. Chan and J. M. Caruthers, J. Chem. Info. Comput. Sci., 35 (1995) 188-195. 3. D. W. van Krevelen, Properties of Polymers; their Correlation with Chemical Structure; their Numerical Estimation and Prediction from Additive Group Contribution, 3rd Ed., Elsevier, Amsterdam, The Netherlands, 1990. 4. D. Barton and Ollis, W.D. (Eds.), Comprehensive Organic Chemistry: The Synthesis and Reaction of Organic Compounds, First Edition, Pergamon Press, New York, 1979. 5. E.A. Brignole, S. Bottlini, and R. Gani, Fluid Phase Equil. 29 (1986) 125132. 6. K. G. Joback and G. Stephanopoulos, FOCADP '89, Snowmass, CO, 1989. 7. S. Macchietto, O. Odele and O. Omatsone, Chem. Eng. Res. Des., 68, 5 (1990) 429-433. 8. R. Gani and E. A. Brignole, Fluid Phase Equil. 13 (1983) 331-340.
317 9. A. Sundaram and V. Venkatasubramanian, J. Chem. Inf. Comput. Sci., 38 (1998) 1177-1191. 1 0 . P . R . Patkar and V. Venkatasubramanian, AIChE J. (submitted for publication, 2002).
This Page Intentionally Left Blank
ComputerAided MolecularDesign: Theoryand Practice L.E.K. Achenie, R Gani and V. Venkatasubramanian(Editors) 9 2003 Elsevier ScienceB.V. All fightsreserved.
319
C h a p t e r 14: C a s e S t u d y in I d e n t i f i c a t i o n of M u l t i s t e p Reaction Stoichiometries A. Buxton, A. Hugo, A.G. Livingston & E.N. Pistikopoulos
14.1 INTRODUCTION In this chapter, the systematic procedure for the rapid identification of environmentally benign alternative multi-step stoichiometries, as described in Chapter 7, is applied to a case study- the production of acetic acid. Acetic acid is one of the most important aliphatic intermediate compounds with various of its esters being important for artificial silk manufacture and used as solvents for resins and paints. Its inorganic salts are used in the dye and clothing industries and in medicine. The scale of production of this molecule makes this an interesting example from the environmental point of view. The background and chemical routes for this example were adapted from Weissermel and Arpe, (1993).
14.2 PROBLEM FORMULATION The problem addressed here may be stated as follows:
Given a desired organic product
Identify a set of candidate multi-step organic reaction stoichiometries for the production of the desired product which are both economically and environmentally promising. This requires a three step procedure: (i) selection of co-material groups, (ii) determination of a set of candidate co-materials, and (iii) identification of a set of promising candidate multi-step stoichiometries. The use of such a structured, stepwise procedure reduces the multi-step stoichiometry identification problem to a manageable size. The key to the procedure is the introduction of co-material design (steps (i) and (ii)). With the product and stoichiometric co-materials known, the identification of feasible re-
320 action stoichiometries is no longer an open ended problem. The steps of the procedure are described in the following sections.
14.3 M E T H O D O L O G Y
As described in Chapter 7, the first step in the methodology is the application of a new group based co-material enumeration algorithm. By introducing material design principles, through structural and chemical feasibility constraints, a manageable set of raw materials and co-products can be generated. Next, stoichiometries are extracted from the co-material set using a two step optimisation procedure, including whole number stoichiometric coefficient constraints, carbon structure constraints and case specific constraints based on chemical knowledge. Thermodynamic, economic and environmental impact criteria are employed in the evaluation of feasible stoichiometries, with aspects of the Methodology for Environmental Impact Minimisation (MEIM) (Pistikopoulos et al., 1994) providing the framework for the environmental evaluation of alternatives. In terms of each of these steps, the particular specifications used in the case study follows. GROUP PRE-SELECTION There are five established routes to acetic acid, these are shown in Figure 1. As before, for simplicity group pre-selection was restricted to identifying the simplest set of UNIFAC groups necessary to represent the product and the comaterials involved in these stoichiometries. As a further simplification, the chemistry specific intermediates peracetic acid and 2-acetoxybutane were not considered as part of group pre-selection since it is unlikely that they would be produced and consumed in different stoichiometries which lead directly to the desired product. Accordingly, the following thirteen groups were selected: CH3-, -CH2-, -CHO,-CO2H, CH3COO-,-CH=CH-, CH3CO-, HCOO-, CH2=CH-, -OH, H20, CH~OH, HCOOH. The latter three groups are complete molecules selected from class zero in Constantinou et al. (1996), no category two groups are featured in this example. CO-MATERIAL DESIGN Since the established chemistries involve only unbranched acyclic molecules (disregarding 2-acetoxybutane), the co-material enumeration problem was solved for such molecules only, including the following additional structural restrictions based on the established co-materials; (i) an upper limit of four groups per molecule is imposed, and (ii) only one oxygen containing group is allowed per molecule, since more complex molecules than this are unlikely raw materials and the common industrial by-products are simpler than the product (mostly CO2 and H C Q H ) .
321
Oxidation of Acetaldehyde
ct3cno
+
--~ cn3co-o-on
02
Acetaldehyde
Peracetic Acid
CH3CO-O-OH + CH3CHO ---> 2 CH3CO2H Acetic Acid Operated by: UCC (USA), Daicel (Japan) and British Celanese (UK)
Oxidation of Alkanes (n-Butane)
CH3(CH2)2CH3 + 2.5 02 ---> 2 CH3CO2H + H20 n-Butane
Acetic Acid
Operated by: Hoechst Celanese, Hills and UCC(USA)
Oxidation of Alkenes (Butenes)
cn3cn2c~I-Cn2 + cn3co2n --) cn3cn2.cncI~ 3 /
CH3CH=CHCH3
O2CCH3
l-Butene or 2-Butene
2-Acetoxybutane
1
CH3CHTCHCH3
+ 2 02
----> 3 CH3CO2H
/
O2CCH3
Acetic Acid
Operated by: Bayer and Hills
Carbonylation of Methanol
CH3OH + CO ---> CH3CO2H Operated by: BASF and Monsanto
Formate CH3OCHO ---> CH3CO2H
Isomerisation of Methyl
Not Yet Commercialised
Figure 1: Acetic Acid Production Routes ROLE SPECIFICATION CONSTRAINTS
According to the industrial routes, stoichiometries of up to two steps in length were allowed, with a m a x i m u m of four species p e r m i t t e d in any step. Table 1 shows the knowledge based role specification constraints employed in the acetic acid e x a m p l e where, as before, R denotes r e a c t a n t only, P d e n o t e s t h e final product, C d e n o t e s p r o d u c t or co-product, N denotes t h e exclusion of a species from a s y s t e m a n d a b l a n k space denotes no restriction. T h e s e c o n s t r a i n t s w e r e a g a i n developed specifically for two step s t o i c h i o m e t r i e s according to t h e following arg u m e n t s , b a s e d on chemical k n o w l e d g e a n d t h e e x i s t i n g i n d u s t r i a l c h e m i s t r i e s .
322 Table 1: Role Specification C o n s t r a i n t s - Carbaryl Example Species 12 3 4 5 6 7 8 910111213141516171819202122232425262728 R CRRRC!CRPN R R N R N N R N R N N R R N R N N N 0 1A& 1B i C R R R C CC C R N R C N C R R C N C R N C C C C C System
9 Alcohols (species 1, 13 and 18) oxidise to aldehydes and then to carboxylic acids in two steps and so are included as reactants only in systems 1A and 1B, and excluded altogether from system zero (except methanol, species 1, which is allowed as a reactant in system zero for carbonylation directly to acetic acid, and is unrestricted in systems 1A and 1B). 9 Accordingly, aldehydes (species 8, 14 and 19) are included as products or co-products only in systems 1A and 1B and reactants only in system zero. 9 U n s a t u r a t e d molecules (species 11, 17 and 22) may be reactants only in all systems, their formation is not considered. 9 Alkanes (species 12 and 23) may be oxidised directly to acids, therefore they are included as raw materials only in system zero, and excluded from systems 1A and lB. 9 Higher carboxylic acids (species 15 and 20) are unlikely raw materials and undesirable co-products for a promising stoichiometry, they are therefore excluded altogether. 9 Formates (species 24, 25 and 26) and acetates (species 10, 16 and 21) are esters of formic and acetic acids respectively. They are therefore unlikely raw materials, and due to the conditions necessary for esterification (concentrated sulphuric acid) they are also unlikely co-products. They are therefore excluded from system zero (except methyl formate, species 10, for isomerisation) and included only as products or co-products in systems 1A and lB. 9 Formic acid (species 7) is included as a co-product in system zero, since it is a recognised industrial by-product, and is included as a reactant only in systems 1A and 1B to allow the generation of formates. 9 Ketones (species 29 and 30) are produced by oxidising secondary alcohols. No such alcohols are included here so that these species are excluded from system zero, and included only as products or co-products in systems 1A and lB.
323
9 H 2 0 a n d C 0 2 (species 2 and 6) are included as co-products only in all sys-
t e m s according to the i n d u s t r i a l chemistries. 9 C O , 0 2 a n d / / 2 are included as r e a c t a n t s only in all systems.
CHEMISTRY CONSTRAINTS Knowledge based c h e m i s t r y constraints were employed using the the b i n a r y product and r e a c t a n t r e a c t a n t flags, is and iis respectively, found in the whole n u m b e r stoichiometry constraints as defined in C h a p t e r 7. It is w o r t h recalling t h a t the b i n a r y variable iis takes the value zero if species s is a product a n d u n i t y if species s is a reactant, while zero or u n i t y gets assigned to is w h e n s is a r e a c t a n t or a product, respectively. 9 alcohols, alkenes, alkanes and aldehydes m a y not react w i t h each other iil + ii9 + ii13 + iils + iill + ii17 + ii22 + ii12 + ii23 + iis + ii14 + ii19 _~ 1
(1)
9 carbonylation (reaction with carbon monoxide) is restricted to alcohols and formates ii3 - (iil + ii13 + iils + ii24 + ii27 + ii2s + ii5) ~_ 0
(2)
9 formates m u s t either react with oxygen or carbon monoxide or undergo isomerisation ii24 + ii27 + ii2s -- ii3 -- ii4 ~_ 2 -- E
iis
(3)
8
9 formates m a y be produced only by esterification of formic acid w i t h the a p p r o p r i a t e alcohol 2i24 - ii7 - iils ~_ 0
(4)
2i27 - ii7 - iil ~ 0
(5)
2i2s - ii7 - ii13 ~_ 0
(6)
9 aldehydes m a y only be produced by oxidation of the a p p r o p r i a t e alcohols or oxidation or h y d r a t i o n of the a p p r o p r i a t e u n s a t u r a t e d compounds 2i8 - ii13 - iill - ii17 - ii22 - ii2 - ii4 ~ 0
(7)
2i14 - iils - i i l l - ii22 - ii2 - ii4 ~_ 0
(8)
2i19 - iil7 - ii4 IVl~i~,
(5)
F=I;IVDre.f