Technologies for Migration and Commuting Analysis: Spatial Interaction Data Applications John Stillwell University of Leeds, UK Oliver Duke-Williams University of Leeds, UK Adam Dennett University of Leeds, UK
Business science reference Hershey • New York
Director of Editorial Content: Director of Book Publications: Acquisitions Editor: Development Editor: Publishing Assistant: Typesetter: Production Editor: Cover Design: Printed at:
Kristin Klinger Julia Mosemann Lindsay Johnston Christine Bufton Myla Harty Myla Harty Jamie Snavely Lisa Tosheff Yurchak Printing Inc.
Published in the United States of America by Business Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail:
[email protected] Web site: http://www.igi-global.com/reference Copyright © 2010 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Technologies for migration and population analysis : spatial interaction data applications / John Stillwell, editor. p. cm. Includes bibliographical references and index. Summary: "This book addresses the technical and data-related side of studying population flows"--Provided by publisher. ISBN 978-1-61520-755-8 (hardcover) -- ISBN 978-1-61520-756-5 (ebook) 1. Population research. 2. Population--Mathematical models. 3. Migration, Internal--Mathematical models. I. Stillwell, John C. H. (John Charles Harold), 1952HB850.T445 2010 304.601'5195--dc22
2009032616 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher.
List of Reviewers Martin Bell, School of Geography, Planning and Environmental Management, The University of Queensland, Australia Adam Dennett , School of Geography, University of Leeds, UK Oliver Duke-Williams, School of Geography, University of Leeds, UK Arlinda Garcia-Coll, Departament de Geografia Humana, Universtat de Barcelona, Spain Phil Rees, School of Geography, University of Leeds, UK John Stillwell, School of Geography, University of Leeds, UK
Table of Contents
Foreword ............................................................................................................................................. xii Preface ................................................................................................................................................ xiv Acknowledgment ............................................................................................................................... xxii
Section 1 Spatial Interaction Data Sources and Analysis Issues Chapter 1 Interaction Data: Definitions, Concepts and Sources ............................................................................. 1 John Stillwell, University of Leeds, UK Adam Dennett, University of Leeds, UK Oliver Duke-Williams, University of Leeds, UK Chapter 2 Access to Census Interaction Data ........................................................................................................ 31 Adam Dennett, University of Leeds, UK John Stillwell, University of Leeds, UK Oliver Duke-Williams, University of Leeds, UK Chapter 3 Interaction Data: Confidentiality and Disclosure.................................................................................. 51 Oliver Duke-Williams, University of Leeds, UK Chapter 4 Analysing Interaction Data ................................................................................................................... 69 John Stillwell, University of Leeds, UK Kirk Harland, University of Leeds, UK Chapter 5 Temporal and Spatial Consistency ........................................................................................................ 89 Oliver Duke-Williams, University of Leeds, UK John Stillwell, University of Leeds, UK
Chapter 6 A New Migrant Databank: Concept and Development....................................................................... 111 Peter Boden, University of Leeds, UK Phil Rees, University of Leeds, UK Chapter 7 Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies............................................................................................................... 133 Paul Norman, University of Leeds, UK Paul Boyles, University of St. Andrews, UK Section 2 Spatial Interaction Analysis and Modelling Applications Chapter 8 Internal Migration Patterns by Age and Sex at the Start of the 21st Century...................................... 153 Adam Dennett, University of Leeds, UK John Stillwell, University of Leeds, UK Chapter 9 Internal Migration Propensities and Patterns of London’s Ethnic Groups.......................................... 175 John Stillwell, University of Leeds, UK Chapter 10 Migration and Socio-Economic Polarisation within British City Regions.......................................... 196 Tony Champion, University of Newcastle, UK Mike Coombes, University of Newcastle, UK Chapter 11 Issues Associated with the Analysis of Rural Commuting ................................................................. 212 Martin Frost, Birkbeck College London, UK Adam Dennett, University of Leeds, UK Chapter 12 Defining Labour Market Areas by Analysing Commuting Data: Innovative Methods in the 2007 Review of Travel-to-Work Areas....................................................................... 227 Mike Coombes, University of Newcastle, UK Chapter 13 Estimating Spatially Consistent Interaction Flows Across Three Censuses ....................................... 242 Zhiqiang Feng, University of St. Andrews, UK Paul Boyle, University of St. Andrews, UK
Chapter 14 Modelling Migration with Poisson Regression................................................................................... 261 Robin Flowerdew, University of St. Andrews, UK Chapter 15 Analysing Structures of Interregional Migration in England ............................................................. 280 James Raymer, University of Southampton, UK Corrado Giulietti, Universiry of Southampton, UK Chapter 16 Commuting to School: A New Spatial Interaction Modelling Framework......................................... 294 Kirk Harland, University of Leeds, UK John Stillwell, University of Leeds, UK Appendix Interaction Data: Classroom Activities .............................................................................................. 316 Adam Dennett, University of Leeds, UK Compilation of References ............................................................................................................... 329 About the Contributors .................................................................................................................... 349 Index ................................................................................................................................................... 353
Detailed Table of Contents
Foreword ............................................................................................................................................. xii Preface ................................................................................................................................................ xiv Acknowledgment ............................................................................................................................... xxii Section 1 Spatial Interaction Data Sources and Analysis Issues Chapter 1 Interaction Data: Definitions, Concepts and Sources ............................................................................ 1 John Stillwell, University of Leeds, UK Adam Dennett, University of Leeds, UK Oliver Duke-Williams, University of Leeds, UK Chapter 1 aims to clarify definitional and conceptual issues relating to the key interaction phenomena, migration and commuting, on which we concentrate in this book and for which we strive to obtain information to enhance our understanding of the processes that are taking place in the real world. The chapter contains the summary of an audit of interaction data sources, outlining the characteristics of the different types of data that are available from censuses, registers and surveys. Chapter 2 Access to Census Interaction Data ........................................................................................................ 31 Adam Dennett, University of Leeds, UK John Stillwell, University of Leeds, UK Oliver Duke-Williams, University of Leeds, UK Chapter 2 provides a technical guide to the Web-based Interface to Census Interaction Data (WICID), an online software system that has been developed by the Centre for Interaction Data Estimation and Research (CIDER) to allow users to build queries and extract data from the last three censuses quickly and easily.
Chapter 3 Interaction Data: Confidentiality and Disclosure.................................................................................. 51 Oliver Duke-Williams, University of Leeds, UK Chapter 3 presents the different methods used to reduce the risk of disclosure and maintain confidentiality. In particular, the chapter explains the small cell adjustment method (SCAM) that was introduced by the Office of National Statistics when pre-processing the 2001 data and evaluates the impacts of this adjustment on the data. Chapter 4 Analysing Interaction Data ................................................................................................................... 69 John Stillwell, University of Leeds, UK Kirk Harland, University of Leeds, UK Chapter 4 introduces the reader to different types of analysis that can be undertaken once the data have been prepared. The Chapter discusses the ‘interaction matrix’ and the notation frequently used in the literature for origin-destination variables before explaining a number of different measures of interaction, reviewing popular statistical and modelling methods and offering a new way of visualising interaction flows through vector analysis. Chapter 5 Temporal and Spatial Consistency ........................................................................................................ 89 Oliver Duke-Williams, University of Leeds, UK John Stillwell, University of Leeds, UK Chapter 5 addresses the major issues confronting those interested in how migration or commuting changes over time. Inconsistencies occur in a variety of ways and for different reasons: in the definition and measurement of variables from one census to the next; in the way that counts are categorised within particular themes; due to the new standard classifications that come into operation; due to variations in counts that are released by the census authorities; and due to continuous changes between censuses in the geographical boundaries of the spatial units that comprise census areas. Chapter 6 A New Migrant Databank: Concept and Development ...................................................................... 111 Peter Boden, University of Leeds, UK Phil Rees, University of Leeds, UK Chapter 6 demonstrates how data from the various sources of international migration might be integrated in what has been called a ‘New Migrant Databank’ so that local, regional and local authorities can monitor the different measures and attempt to explain the variations that appear in trends evident in data from different sources.
Chapter 7 Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies............................................................................................................... 133 Paul Norman, University of Leeds, UK Paul Boyles, University of St. Andrews, UK Chapter 7 considers two different sets of microdata that are produced from the census: the Samples of Anonymised Records (SARs) which give researchers a valuable way of looking at the migration and commuting characteristics of individuals that are unavailable from aggregate census products; and the Longitudinal Study (LS) which is a sample of individuals that have been tracked from the 1971 Census through to the 2001 Census and therefore allow investigation of inter-censal change. Section 2 Spatial Interaction Analysis and Modelling Applications Chapter 8 Internal Migration Patterns by Age and Sex at the Start of the 21st Century...................................... 153 Adam Dennett, University of Leeds, UK John Stillwell, University of Leeds, UK Chapter 8 is the first case study and considers one of the most important selective influences of migration, age, by investigating the internal migration patterns of those in different age groups and making use of a national area classification as a framework for summarising different measures of migration flows at the district scale. Chapter 9 Internal Migration Propensities and Patterns of London’s Ethnic Groups.......................................... 175 John Stillwell, University of Leeds, UK Chapter 9 makes use of data from specially commissioned tables that cross-classify age and ethnicity at ward level and allow the analyses of variation in migration propensities for Britain’s ethnic groups by broad age group. The analysis focuses on wards in London, and reveals distinctive spatial patterns of net migration when separating flows between wards within London from flows between London wards and the rest of England and Wales. Chapter 10 Migration and Socio-Economic Polarisation within British City Regions.......................................... 196 Tony Champion, University of Newcastle, UK Mike Coombes, University of Newcastle, UK Chapter 10 takes the theme of migration and socio-economic polarisation and uses the Special Migration Statistics data set to analyse the propensities and patterns of migrants with certain socio-economic characteristics. The analysis is conducted for a set of 27 city regions, with London, Birmingham and Bristol being selected as three case studies for particular attention.
Chapter 11 Issues Associated with the Analysis of Rural Commuting ................................................................. 212 Martin Frost, Birkbeck College London, UK Adam Dennet, University of Leeds, UK Chapter 11 is the first of the case study chapters that considers commuting. It aims to provide an assessment of the suitability of 2001 Census data for analyses of commuting in rural England. Several of the problems with the census data are highlighted, not least that of the adjustment of small flows in order to reduce the risk of disclosure. Chapter 12 Defining Labour Market Areas by Analysing Commuting Data: Innovative Methods in the 2007 Review of Travel-to-Work Areas ...................................................................... 227 Mike Coombes, University of Newcastle, Uk Chapter 12 reports on the definition of a new set of Travel-to-Work areas (TTWAs) for ONS based on 2001 flows unadjusted by SCAM between small areas known as lower level super output areas. The chapter reports the method of definition and shows how these functional labour markets can be visualised at different stages of the definition process. Chapter 13 Estimating Spatially Consistent Interaction Flows Across Three Censuses ....................................... 242 Zhiqiang Feng, University of St. Andrews, UK Paul Boyle, University of St. Andrews, UK Chapter 13 explains the methodology developed to generate sets of migration and commuting flows that can be compared between censuses. This addresses the issue of spatio-temporal inconsistency discussed in Chapter 5. A modelling approach that has been developed that makes use of Poisson regression to estimate 1981 and 1991 inter-ward flows that are consistent with flows between 2001 Census wards. Chapter 14 Modelling Migration with Poisson Regression................................................................................... 261 Robin Flowerdew, University of St. Andrews, UK Chapter 14 also uses Poisson regression to estimate the independent variables that might be used to predict inter-district migration in Britain. The chapter explains the statistical modelling methodology in detail and reports the results of a calibrating a series of different types of Poisson model. Chapter 15 Analysing Structures of Interregional Migration in England ............................................................. 280 James Raymer, University of Southampton, UK Corrado Giulietti, Universiry of Southampton, UK
Chapter 15 explains how a multiplicative component framework can be used to explore the age and ethnic structures on inter-regional migration in England using data from the 1991 and 2001 Censuses. Inter-regional matrices are disaggregated into a series of components that enable insights into the stability in the overall level of migration, the out-migration component, the in-migration component and the origin-destination component. Chapter 16 Commuting to School: A New Spatial Interaction Modelling Framework......................................... 294 Kirk Harland, University of Leeds, UK John Stillwell, University of Leeds, UK Chapter 16 proposes a new spatial interaction modelling framework which separates out the constraint equations in the classic spatial interaction model proposed by Wilson from the actual model equation and which uses a genetic algorithm to define the optimum form of model to be used. The modelling framework is demonstrated using school pupil commuting data. Appendix Interaction Data: Classroom Activities .............................................................................................. 316 Adam Dennett, University of Leeds, UK Compilation of References .............................................................................................................. 329 About the Contributors ................................................................................................................... 349 Index ................................................................................................................................................... 353
xii
Foreword
Mobility is at once the most intriguing and the most intractable of population dynamics. For households and individuals it is the process that underpins activity in every life course domain; for localities and regions it is the primary agent of population flux; for nations it is the thread that stitches together the spatial fabric and connects it to global society. What complicates analysis is the elasticity of time and space; as well as being multifaceted and repetitive, mobility is almost infinitely variable in duration and spatial extent. As in all fields of science, scholars have sought understanding of this complex whole by segmenting mobility into discrete chunks, categorised and defined by convenient space-time boundaries. While inevitably artificial, this process of classification has been fundamental to progress in conceptualising, measuring, analysing and interpreting the dynamics of population mobility. Theories, data sources and models of mobility have all advanced. With contributions extending over more than 30 years, the University of Leeds School of Geography sits at the epicentre of British research on the quantitative analysis of migration, and this volume stands as another distinctive pillar in the development of mobility scholarship, drawing together an impressive cast of leading contributors. Building on earlier ESRC-funded initiatives, the volume’s title foreshadows a focus on technologies and applications. These are important goals: mobility shapes lives and transforms communities and we need to demonstrate how tools and technologies can advance our understanding of processes and outcomes. The chapters in this book serve that goal well, with exciting and thoughtful perspectives on important contemporary issues such as the role of migration in socioeconomic polarisation, patterns of rural commuting, and ethnic migration differentials: all driven by innovative techniques. But this volume is worthy of note for at least three other reasons. First, is its insistence on a clear understanding of the data upon which the researchers draw; a critical ingredient to rigorous analysis. Second, is the endeavour to extend beyond the Census and encompass alternative sources of mobility data, such as the LS, the School Census, higher education statistics and the Labour Force Survey, and to build new integrated databanks. Availability of data has long been viewed as the fundamental constraint to mobility research, so this wider perspective is both notable and welcome. Third, and perhaps most significant, is the broadening of focus beyond conventional internal migration, to encompass not only commuting, as heralded in the volume title, but also international migration, a dimension in which statistical data are arguably still more deficient. Together, these features represent a clear endeavour to
xiii
bridge the spatial and temporal fragmentation that characterises mobility studies, and it is this breadth of scope and compass, as well as the innovative technologies and applications, that recommend this excellent book. Martin Bell The University of Queensland, Australia July 2009 Martin Bell is a population geographer with core interests in the fields of population mobility and demographic forecasting. He graduated from Flinders and has a PhD from Queensland where he is now Professor. He has just completed six years as Head of the School of Geography, Planning and Environmental Management. He is the Director of the Queensland Centre for Population Research which undertakes pure and applied research and provides education and training in demography and population geography. He has written extensively about migration, most recently on Mobility in the New Millennium: Australians on the Move (2009), and the focus of his current research is on cross-national comparisons of internal migration.
xiv
Preface
HUMAN SPATIAL INTERACTION Moving house is an event that most people experience at some stage during their lifetimes whereas going to work or to study are part of daily activities for the majority of the pre-retirement population living in the United Kingdom (UK). For many individuals, both these activities can be amongst the most stressful experiences in their lives, much more so than other mobility behaviour such as going shopping, visiting friends or going on holiday, although the latter can be extremely challenging if travel arrangements break down! Much more time is spent on daily travel to work than on the relatively infrequent process of moving from one home to another, but the time invested in migrating can be considerable when the hours spent finding a suitable property are taken into account along with those used up deciding whether to move in the first place. The statistics from the last census of population in 2001 tell us that migration and commuting are remarkably important phenomena in contemporary times, as they have been throughout modern history. Over 6.2 million people in the UK changed their place of usual residence in the 12 month period before 29th April 2001, representing around 10% of the total population of the UK. As shown in the table below, 5.15 million of these migrants moved within England, with relatively small numbers moving across national boundaries within the UK. In addition, a further 467,000 immigrants arrived from outside the UK in the same period and 406,800 individuals moved but we do not know where from because their origins were unstated on their census forms. Table 1. Migrants and commuters in the UK Destinations
Wales
Scotland
Northern Ireland
5,153,436
48,248
43,675
7,899
Wales
42,614
243,851
1,546
325
Scotland
42,831
1,396
473,789
2,633
8,812
360
2602
127,999
22,310,327
35101
19424
954
Wales
63,764
1,117,481
921
79
Scotland
14,566
200
4,066,398
175
3,030
51
450
674,511
Origins England
Northern Ireland
England Migrants in 2000-01
Commuters in 2001 England
Northern Ireland
xv
In comparison, over 28.3 million people were recorded by the 2001 Census as commuting to work, around two fifths of the total population, although 2.1 million flows involved people whose home and workplace locations were the same. The commuting flows originating in Scotland shown in the table also include the movements of students aged 16 and over to their places of study whereas the flows for the other nations are all associated with journeys to work. Recent studies have shown how commuting patterns vary for people in different occupations and people who use different means to travel to work (Littlefield and Nash, 2008) and how distances travelled vary spatially for those in part-time and fulltime work (Dent and Bond, 2008). The aggregate figures for migrants and commuters in the UK indicate that we are dealing with two forms of human behaviour that are very substantial in numerical terms as well as being hugely significant activities from an individual point of view. Studies of both aggregate flows and individual experiences have been carried out by researchers from a wide range of academic disciplines. At one end of the spectrum are the transport planners and regional scientists who seek to model interaction behaviour, usually in aggregate form, often using sophisticated quantitative techniques once empirical analyses of the data have been undertaken. This type of research may result in very practical applications: a new road is built to reduce commuting congestion or new housing is created in response to the pressure of demand from potential in-migrants. At the other end, sociologists and social psychologists carry out interviews and apply their more qualitative skills to understand the decisionmaking processes involved, usually at an individual or micro level, and to develop their causal models. Studies like these may provide much better understanding of phenomena like ‘road rage’ or lead to new methods of reducing stress or making better decisions about optimum locations for living and working. In between these perspectives there are, of course, a whole range of disciplinary approaches, including important contributions from theoretical and applied human geographers, both quantitative and qualitative. This book reflects the interests of quantitative human geographers with most of the authors based in departments of Geography in the UK. Hence, the book primarily focuses on the technologies that geographers use to capture, extract, manipulate, analyse, model and display migration and commuting data. It is perhaps worth reflecting for a moment on the historical importance of migration and commuting since both phenomena have served to transform societies in major ways and to influence how settlements and landscapes have developed over time. We might consider, for example, the mass migration of individuals away from the countryside and towards the centres of industrial and demographic growth in nineteenth century Britain, drawn by the opportunities of jobs in the new mills and factories built as a consequence of the Industrial Revolution. In twentieth century Britain, particularly in the post-war years when car ownership became a reality for many families, suburbanisation became a characteristic feature of most large towns and cities in common with the urban sprawl evident in North America. This shorter-distance, intra-urban residential mobility was directly responsible for the growth in commuting that occurred in the 1950s and 1960s and illustrates the interdependence between the two phenomena, migration and travel to work. In the 1970s and thereafter, attention has turned from suburbanisation to new processes of population redistribution. One of these processes has been counterurbanisation, the movement of individuals and families away from major metropolitan centres and down the urban hierarchy to increasingly rural areas as documented by Fielding (1982) and Champion (1989), amongst others. Whilst the process of suburbanisation involved breadwinners whose place of work before and after the change of house remained the same, counterurbanisation also included those migrating away from the cities, severing their commuting to work ties and adopting a less urban style of living. In some cases, these individuals were those deciding to work locally or from home; in other cases, they involved those seeking unconventional lifestyles; the pioneers of the counterurbanisation movements were actually those reaching retirement age who, when becoming economically inactive, no longer required to maintain their ties to a place of work. The major losses from the big conurbations have continued in the twenty-first
xvi
century, causing policy makers to fear implications of urban neighbourhoods being abandoned whilst rural areas come under increasing house-building pressures (Bate et al., 2000). Recent decades have seen the emergence of new streams of migration and commuting. Whilst the decline in the cost of travel and the preference for a higher quality of living environment has resulted in the rise of the long-distance commuter (Green and Owen, 1999; Nielson and Hovgesen, 2008), processes of gentrification have been taking place in inner city neighbourhoods that have attracted new residents as well as retaining those who might previously have considered moving out. Flats now form a greater proportion of new housing developments than ever before and many cities have experienced new trends in reurbanisation or ‘city living’ (Unsworth, 2007), encouraged by the policies of central and local government to promote urban renaissance and by the increasing numbers of students attending higher education institutions and requiring accommodation relatively close to their place of study. These trends in internal population redistribution have been occurring at a time when London and many large provincial cities in the UK, whilst losing migrants in considerable quantities to the rest of the UK, have experienced large influxes of immigrants from overseas. Asylum seekers and refugees have become a major stream of newcomers in the last decade, locating initially in London and the South East but moving to provincial locations under the dispersal arrangements put in place by the Home Office and living in National Asylum Support Service (NASS) accommodation. Most recently, attention has been drawn to the increasing number of economic migrants, new migrant workers who are living and working, often temporarily, in different parts of Britain, some of whom have entered the country illegally but who provide a source of cheap labour. International net immigration over the past decade has been at an unprecedented scale, reaching a peak of 244,00 in 2004, and this has led to proposals from an all-party group of politicians for a ‘balanced migration’ target policy (Migration Watch, 2008). All these migration and commuting trends, and the patterns of flows that occur as a consequence, raise many questions about the explanatory factors that lie behind the behaviour we observe and about how things will change in the future, leading to a wide range of theoretical, empirical and model-based research looking at historical change but also offering prediction and speculation about what might happen in the future. Some scenarios, particularly associated with immigration, make regular headline news and define new socio-political agendas. Key questions resurface time and time again: Are people currently more or less likely to migrate than in the past? How many new immigrants can we expect from overseas? Do people nowadays commute over longer distances than in the past? What variations in migration propensity exist between different sub-groups of the population? Do more people travel to work by car or by public transport? What are the motivations behind the changes that have occurred in the patterns of migration and commuting? What will be the impact of particular changes in migration or commuting propensities or patterns?
SECTION 1: THE DATA IMPERATIVE In order to investigate the patterns, processes and drivers of mobility, there is a requirement for comprehensive and reliable data. Where have these data come from historically and why is the Census of Population revered as such a key source of information in the UK? What are the alternative sources of data on migration and commuting? The importance of understanding and using the data that are available to analyse these forms of human interaction provides the rationale for the first half of this book, which contains a series of seven chapters that cover data availability from different sources, online access to census data, confidentiality issues and disclosure constraints, generic analysis and mapping tools, and
xvii
spatial and temporal consistency, together with more detailed accounts of reconciling data on international migration from different sources and of using census micro data for investigation of migration and health and deprivation. The contents and issues presented in the chapters that constitute Part 1 are particularly pertinent at the time of writing since much public and governmental concern has been expressed about the limitations of current sources of both UK internal and, in particular, international migration. In the Foreword of a report by an Interdepartmental Task Force on Migration in 2006, Karen Dunnell, the National Statistician, states that “there is broad recognition that available estimates of migrant numbers are inadequate to meet all the purposes for which they are now required. They are the weakest component in population estimates and projections in the United Kingdom, both nationally and at local level” (National Statistics, 2006, p. 3). The report contained 15 recommendations which have been the focus of an initiative, led by the Office of National Statistics (ONS) for ‘Improving Migration and Population Statistics’ (IMPS) resulting in various changes to the collection of international migration data and creation of sub-national immigration and emigration estimates that are critically important in the production of annual mid-year population estimates and biennial population projections. In 2008, a Parliamentary committee reviewed issues raised by local authorities and others about the inadequacy of official population statistics and its report (House of Commons Treasury Committee, 2008) resulted in a cross-government programme – the ‘Migration Statistics Improvement Programme’ (MSIP) – that is responsible for delivering the Task Force recommendations between 2008 and 2012. A Report by the UK Statistics Authority (UKSA, 2009) reviews progress on MSIP and the adequacy of co-operation across government to deliver the planned improvements, whilst commissioned research by Rees et al. (2009) published in the same UKSA report, provides a concise summary of migration statistics, a critique of MIPS and a review of migration estimation methods. Concepts of space and time are critical in the understanding of migration and commuting since both involve the movement between two places and are measured over specified periods of time. Chapter 1 introduces the reader to some of the basic definitional and conceptual issues that underpin interaction data before providing a summary of data from the most important census, administrative and survey sources. The fundamental importance of population censuses in the UK is emphasized and 2001 Census variables are contrasted with those produced from the 1991 Census. In fact, census questions about migration have evolved from the earliest focus on place (county/parish) of birth (from 1851 Census) and country of birth (from 1841 Census) to place of residence five years previously (1971 Census) and one year previously (1961 to 2001 Censuses). Commuting flows are constructed by comparison of a place of work and a residential address and census questions focus on the mode of transport used. Census questions about place of work (1921, 1951-2001 Censuses) and journey to work (1966, 1971-2001 Censuses) are now commonplace but a question about ‘occupation one year ago’ was also asked in 1971. Easy access to interaction data is critical because of the complexities of large origin-destination data matrices and Chapter 2 provides a technical guide to the Web-based Interface to Census Interaction Data (WICID), an online software system that has been developed by the Centre for Interaction Data Estimation and Research (CIDER) to allow users to build queries and extract data from the last three censuses quickly and easily. Equally important is knowledge of the ways in which data from these censuses have been adjusted and Chapter 3 presents the different methods used to reduce the risk of disclosure and maintain confidentiality. In particular, a new small cell adjustment method (SCAM) was introduced by the Office of National Statistics (ONS) when pre-processing the 2001 data that does not permit any of the data to be ‘recovered’ with methods similar to those used to ‘unsuppress’ the data that was suppressed in the 1991 Census tables. The impacts of adjustment on the flow counts at different spatial scales are evaluated in this chapter.
xviii
Once data sources have been identified, data have been extracted and downloaded and the characteristics of the data have been understood, the next challenge is to consider how the data are going to be analysed. Chapter 4 has been written with this in mind, introducing the reader to the ‘interaction matrix’ and the notation frequently used in the literature for origin-destination variables before explaining a number of different measures of interaction, reviewing popular statistical and modelling methods and offering a new way of visualising interaction flows through vector analysis. Chapter 5 confronts another major issue for those researchers interested in how migration or commuting changes over time: spatial and temporal consistency. The fact is that inconsistencies occur in a variety of ways and for different reasons. There are inconsistencies in the definition and measurement of variables from one census to the next; there are differences in the way that counts are categorised within particular themes and new standard classifications come into operation; there are variations in the counts that are released by the census authorities; and there are continuous changes between censuses in the geographical boundaries of the spatial units that comprise census areas. The issues of consistency are considered and a time series analysis of migration in Britain is presented based on a consistent annual time series of data obtained from National Health Service (NHS) patient registers from 1998 to 2006. One of the most newsworthy phenomena in the twenty-first century has been the volume of immigrants of one form or another arriving in Britain in recent years, resulting in considerable press coverage and central government attention in terms of policy response as well as improving data integrity. As mentioned previously, there has been recognition that existing census, administrative and survey mechanisms for measuring both immigration and emigration flows have been inadequate and that better international migration statistics are required. Chapter 6 demonstrates one approach to this problem, explaining how data from the various sources of international migration might be integrated in a ‘New Migrant Databank’ so that authorities at national, regional and local level can monitor the different measures and attempt to explain the variations that appear in trends evident in data from different sources. Finally, in the last of the chapters in this first part of the book, Chapter 7 looks in more detail at two different sets of microdata that are produced from the census. The Samples of Anonymised Records (SARs) give researchers a valuable way of looking at the migration and commuting characteristics of individuals that are unavailable from other census products, whereas the Longitudinal Study (LS) is a sample of individuals that have been tracked from the 1971 Census through to the 2001 Census and therefore allow investigation of inter-censal change.
SECTION 2: CASE STUDIES Several of the chapters in Section 1 contain analyses of migration and commuting but it is the nine chapters in the second half of the book which provide a series of case studies of migration and commuting in different contexts using different methods. The applications commence with what are essentially descriptive analyses of propensities and patterns before progressing to studies that use increasingly sophisticated methods and modelling techniques to understand mobility in the UK. The first three chapters in Section 2 are dedicated to analyses of migration data from the 2001 Census. Chapter 8 considers one of the most important selective influences of migration, age, by investigating the internal migration patterns of those in different age groups and making use of a national area classification as a framework for summarising flows at the district scale. Migration age schedules are compared across types of district for measures of migration turnover and churn. Whilst data from the 2001 Census Special Migration Statistics (SMS) are used for this investigation, Chapter 9 is based primarily on tables specially commissioned from the ONS that cross-classify age and ethnicity at ward level and
allow the analyses of variation in migration propensities for Britain’s ethnic groups by broad age group. The analysis reported in this chapter focuses on wards in London, revealing distinctive spatial patterns of net migration when separating flows between wards within London from flows between London wards and the rest of England and Wales and confirming that migrants of all ethnic groups tend to move away from areas of higher deprivation. The third chapter in this mini-series, Chapter 10, also uses the SMS, but in this instance, the theme is migration and socio-economic polarisation and the data set used involves migrants with certain socio-economic characteristics. In fact, the counts for this variable are only available for ‘moving group reference persons’, a new category of migrants introduced in the 2001 Census for the first time. Moreover, the analysis is conducted for a set of 27 city regions, with London, Birmingham and Bristol being selected as three case studies for particular attention. In contrast to these chapters on migration, the next two chapters are both concerned with commuting. Chapter 11 aims to provide an assessment of the suitability of 2001 Census data for analyses of commuting in rural England. Several of the problems with the census data are highlighted, not least that of the adjustment of small flows in order to reduce the risk of disclosure. This problem is avoided in the work reported in Chapter 12 which involved the definition of a new set of 2001 Travel-to-Work areas (TTWAs) for ONS based on flows unadjusted by SCAM between small areas known as lower level super output areas. TTWAs have been defined using commuting data from each of the last four censuses and the 2001 TTWAs have been generated from a much larger set of flows than previously. The chapter reports the method of definition and shows how these functional labour markets can be visualised at different stages of the definition process. The four remaining chapters of the book each involves the application of a modelling approach to the data sets concerned with a different purpose in mind in each case. In Chapter 13, the aim is to generate sets of migration and commuting flows that can be compared between censuses. This addresses one of the major issues discussed in Section 1, that of spatio-temporal inconsistency, and the modelling approach that has been developed makes use of Poisson regression to estimate 1981 and 1991 inter-ward flows that are consistent with flows between 2001 Census wards. Poisson regression is also the statistical modelling style used in Chapter 14 to estimate the independent variables that might be used to predict inter-district migration in Britain. The chapter explains the methodology in detail and reports the results of a calibrating a series of models: a Poisson model that includes traditional gravity variables, the same model with an additional contiguity dummy, a negative binomial form of the model to allow for over-dispersion; and a zero-inflated Poisson model. Chapter 15, on the other hand, explains how a multiplicative component framework can be used to explore the age and ethnic structures on inter-regional migration in England using data from the 1991 and 2001 Censuses. Inter-regional matrices are disaggregated into a series of components that enable insights into the stability in the overall level of migration, the out-migration push component, the in-migration pull component, and the origin-destination component representing the distance between places. A multiplicative component model expressed as a log-linear model enables the effects of different components to be identified and described. Finally, in the last chapter of the book, Chapter 16, a new spatial interaction modelling framework is proposed which separates out the constraint equations in the spatial interaction model proposed by Wilson (1971) from the actual model equation and which uses a genetic algorithm to define the optimum form of model to be used. The modelling framework is demonstrated using flow data on the journey to school by pupils in Leeds. There is no doubt that the developed world is witnessing fundamental shifts in social behaviour which include changes in the two forms of mobility with which we are interested: moving house and travelling to work. Furthermore, we must also acknowledge that research is also changing as we attempt to analyse and understand the huge quantities of data that are now being collected and made available on increasingly powerful computational facilities. This book is therefore aimed at students, researchers
xx
and practitioners interested in contemporary migration and commuting patterns and the data sets that underpin our knowledge of this human behaviour. In particular, the chapters review, describe and analyse information known collectively as the interaction or origin-destination data sets and we have included three classroom exercises in the appendix to guide students through the extraction, processing and mapping of flow data. We recognise that there is huge popular interest in issues related to both migration – housing market pressures, gentrification, second and holiday home ownership, counterurbanization, and immigration, for example – and to commuting – urban traffic congestion, new road construction, environmental sustainability and public transport, politics of the ‘school run’ and definition of travelto-work areas, for example. Despite the importance of understanding these phenomena, the data sets that exist to inform us about them remain under-used and under-exploited by researchers and policy makers, partially because of their huge size and complexity. This book attempts to demystify the data sets and to demonstrate how they can be used with modern computational processing and analysis tools to provide insights into the changing behaviour of individuals, households and moving groups in the twenty-first century. John Stillwell, University of Leeds, UK Oliver Duke-Williams, University of Leeds, UK Adam Dennett, University of Leeds, UK
REfERENCES Bate, R., Best, R. and Holmans, A. (2000) On the Move The Housing Consequences of Migration, Report for the Joseph Rowntree Foundation, York. Champion, A. (1989) Counterurbanization: The Changing Pace and Nature of Population Deconcentration, Arnold, London. Dent, A. and Bond, S. (2008) An investigation into the location and commuting patterns of part-time and full-time workers in the United Kingdom, using information from the 2001 Census, Office for National Statistics Paper, Available at http://neighbourhood.statistics.gov.uk/HTMLDocs/images/Commuting%20patterns%20pt%20article%20final_tcm97-70154.pdf. Accessed 20 October 2008. Fielding, A. (1982) Counterurbanisation in Western Europe, Progress in Planning, 17(1): 1-52. Green, A.E., Hogarth, T. and Shackleton, R.E. (1999) Longer distance commuting as a substitute for migration in Britain: a review of trends, issues and implications, International Journal of Population Geography, 5: 49-67. Littlefield, M. and Nash, A. (2008) Commuting patterns as at the 2001 Census, and their relationship with modes of transport and types of occupation, Office for National Statistics Paper. Available at http://www. neighbourhood.statistics.gov.uk/HTMLDocs/images/Commuting%20by%20Occupation%20and%20 Transport%20-%20Final%20for%20pdf_tcm97-70153.pdf. Accessed 20 October 2008. Migration Watch (2008) Balanced Migration A New Approach to Controlling Immigration, Report for a cross-party group of parliamentarians, Migration Watch, Deddington. Available at http://www.migrationwatchuk.com/balancedmigration.pdf.
xxi
National Statistics (2006) Report of the Inter-departmental Task Force on Migration Statistics, National Statistics, London. Available at: http://www.statistics.gov.uk/statbase/Product.asp?vlnk=14731. Nielson, T.A. and Hovgesen, H.H. (2008) Exploratory mapping of commuting flows in England and Wales, Journal of Transport Geography, 16(2): 90-99. Rees, P., Stillwell, J., Boden, P. and Dennett, A. (2009) A Review of Migration Statistics Literature, incorporated in UK Statistics Authority, Migration Statistics: The Way Ahead, Report 4 (Final), UKSA, London, Available at http://www.statisticsauthority.gov.uk/reports---correspondence/reports/index. html. UK Statistics Authority (2009) Migration Statistics: The Way Ahead, Report 4 (Final), UKSA, London, Available at http://www.statisticsauthority.gov.uk/reports---correspondence/reports/index.html. Unsworth, R. (2007) ‘City living’ and sustainable development: the experience of a UK regional city, Town Planning Review, 78(6): 725-747. Wilson, A.G. (1971) A family of spatial interaction models and associated developments, Environment and Planning A, 3:1-32.
xxii
Acknowledgment
The Centre for Interaction Data Estimation and Research (CIDER) (formerly the Census Interaction Data Service) is based in the School of Geography at the University of Leeds and is currently funded until 2011 by the Economic and Social Research Council (ESRC) through the Census Programme (RES-348-25-0005). CIDER provides access for members of the academic community to interaction data sets through a web-based interface known as the Web-based Interface to Census Interaction Data (WICID) available at http://cider.census.ac.uk/ which is hosted on a server maintained by the MIMAS national data centre based at the University of Manchester. Estimations of consistent flows for consecutive censuses have been produced by Zhiqiang Feng at the School of Geography and Geosciences at the University of St Andrews. All census output is Crown copyright and is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland. Several chapters of the book contain data requested from UK census agencies or other organisations. The authors concerned are grateful for the supply of these data sets and, in some cases, for the help of particular individuals in preparing, analysing and/or visualising the data. Where appropriate, each chapter has a section at the end that acknowledges the assistance of individuals and the financial support provided by funding organisations. This volume has its origins in a session convened by CIDER at the ESRC Research Methods Festival in July 2008 held at St Catherine’s College, Oxford. The editors are very grateful to those who participated in that event and who have contributed to this volume. Special thanks, however, are reserved for Alison Manson and James Heggie from the Graphics Unit in the School of Geography at the University of Leeds whose major task it has been to redraw all the figures contained in the book in a consistent style and format. They are to be congratulated for having done a really excellent job.
Section 1
Spatial Interaction Data Sources and Analysis Issues
1
Chapter 1
Interaction Data:
Definitions, Concepts and Sources John Stillwell University of Leeds, UK Adam Dennett University of Leeds, UK Oliver Duke-Williams University of Leeds, UK
ABSTRACT This initial chapter has two aims. Firstly, it seeks to clarify definitional and conceptual issues relating to the key interaction phenomena, migration and commuting, on which the authors concentrate in this book and for which they strive to obtain information to enhance their understanding of the processes that are taking place in the real world. The chapter explains the conceptual distinction between migrants and migrations, the importance of which becomes clear when the difference between transition and movement data is outlined, and it considers the alternative units of migrant measurement that are used such as individuals, wholly moving households and moving groups. Whilst migration tends to be measured over a period of time, typically a year, commuting is an activity that occurs on a much more frequent basis and consequently is usually measured as the numbers making a journey on one day. The chapter indicates how commuting to work and commuting to study are defined and measured. Secondly, the chapter contains the summary of an audit of interaction data sources, outlining the characteristics of the different types of data that are available from censuses, registers and surveys. Particular emphasis is placed on the former, the Census of Population, for which there are a number of data products providing migration and commuting counts at different spatial scales and disaggregated by various attributes; micro data are distinguished from macro data. However, the chapter also introduces a range of other interaction data sources such as the registers of National Health Service patients, the Pupil Level Annual School Census, the databases of the Higher Education Statistics Agency, various national level surveys such as the Labour Force Survey and the International Passenger Survey. In some cases, the data are exemplified using tables or maps. The chapter concludes with a reflection on the importance of the census as a key data source for small area analysis and a plea that, in a post-census world, sufficient steps be taken by central government to ensure the creation and provision of information systems for monitoring DOI: 10.4018/978-1-61520-755-8.ch001
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Interaction Data
migration and commuting in an effective way, providing accurate and reliable intelligence on trends and creating opportunities for new research projects that develop explanations.
INTRODUCTION Interaction data refer to counts of flows between geographical origins and destinations. They may be measured by different units and they can be extracted from a range of sources. Whilst flows of commodities, finance, vehicles, telephone calls and email are all typical examples of what constitutes a wide spectrum of flow data, our focus here is on the people whose movement is part of two common geographical phenomena, migration or commuting, whose volumes and patterns are of considerable importance to researchers of human behaviour as well as to planners and policy makers tasked to ensure adequate housing provision and improve traffic congestion. Different types of interaction data are available from population censuses, administrative registers and social surveys, and we shall explore examples of each of these sources in detail in later sections of the chapter. Some data are cross-sectional whilst other sources provide continuous time series flow statistics useful for sub-national population estimation or projection. Some interaction data are currently available online from internet web sites whilst others are much less accessible, are limited because of disclosure control, or need skilled/experienced staff and extensive effort to ensure accurate extraction or estimation. In Britain, in the absence of an official population registration system, censuses are accepted as providing the most comprehensive and most reliable migration and commuting data, particularly for flows within and between small areas. Several of the potential non-census interaction data sets originate from administrative sources and involve the collection of records arising from some transaction, or registration, or as a record of service delivery. They are collected for admin-
2
istrative rather than research purposes and many are based in government departments. A selected audit of administrative data sets (Jones and Elias, 2006) shows that most are used to provide stock information, but some include variables that provide information about flows of migrants or commuters such as NHS patients, school pupils, university students, workers and those attending hospital. In some cases, registration data have much simpler structure than census data and are only available at a relatively aggregate spatial scale but are particularly valuable because they are produced on a regular temporal basis. In other cases, the information on migration or commuting has to be generated from the primary unit data using time-consuming data matching and manipulation algorithms. Surveys are the other main source of interaction data and, in many cases, surveys such as the Labour Force Survey (LFS) or the International Passenger Survey (IPS) provide reasonably detailed data in response to migration questions but are of limited value because their sample sizes allow only restricted spatial coverage. The IPS data on immigrants and emigrants are only published at regional scale and even then, users are advised to smooth out irregularities in the data by calculating three year averages. In most cases, survey data are particularly valuable because of the crossclassification possibilities that are available with primary unit data, even though the geographical dimension may be limited. This chapter provides a review of the sources of interaction data that exist in the UK together with information about their characteristics, estimation methods, attributes, limitations and availability. Separate sections deal respectively with different census, administrative and survey data sources and we conclude with some reflections on the
Interaction Data
value of the census as a key data source for small area analysis and a plea to central government to ensure the creation and provision of information systems for monitoring these critical processes of migration and commuting in an effective way in the post-2011 era. However, we begin in the next section with an introduction to certain definitions and concepts.
DEfINITIONS AND CONCEPTS Those who undertake secondary research on spatial interaction phenomena depend on a relatively limited number of data sources from which information can be gleaned on flows that are defined and measured in particular ways. Whilst migration has been defined as a permanent change of usual residence (Rees, 1977), this definition is restrictive because not all migrations are permanent; they may be temporary or cyclical. Higher education students move between parental home and college accommodation on a temporary basis whereas casual labourers frequently return to the same workplaces on a seasonal basis. The difficulty is in establishing the time period spent by an individual at one location for that usual residence to
become permanent. Moreover, those with second or more homes may have temporary residence in different places of usual residence. Similarly, there are problems in defining what constitutes the commute to work, particularly when individuals may have multiple work locations or when travel to work involves a long-distance commute once a week to temporary accommodation and short-distance daily commutes. Familiar definitions of migration and commuting are found in the census, where, according to the 2001 Census Glossary (online), “a migrant is a person with a different address one year before the Census to that on Census day” and “the migrant status of children under age one is determined by that of their next of kin”. Commuting in England, Wales and Northern Ireland involves “the means of travel used for the longest part, by distance, of the usual journey to work”, whereas in Scotland only, commuting involves “travel to main place of work or study (including school)”. Counts of migrants and commuters are derived from the two census questions shown in Figure 1. Compared with the treatment of the migrant as an individual, the migration event is somewhat harder to measure accurately than the vital events of birth and death that occur once only to each
Figure 1. Census questions on place of former usual residence and place of work
3
Interaction Data
person, and in Britain are recorded in official registers, classified by the time and the place at which they occurred. Migration, in contrast, is a process that happens once, more than once, or never for each individual over the course a lifetime. When it does occur, it is frequently difficult to pinpoint precisely in time; transfer between two residential locations may in fact occur in stages over several days or weeks. Definitions of migration help us to distinguish initially between migrants and non-migrants, and thereafter, to distinguish between different categories of migrant. Recurrent themes in both cases concern the distance over which moves are made, the frequency with which cyclical moves are repeated, and the intended degree of permanence of the move. Some definitions of migration are more inclusive than others over the question of which individuals should be counted as migrants. Lee (1966) for example, includes all changes in usual residence as migration, regardless of the distance of the move, whereas Bogue (1959) seeks to separate longer-distance movers from local movers, the latter not being considered as migrants. The distinction between ‘migrants’ and ‘migrations’, as identified by Courgeau (1973; 1976), is particularly important because it is at the heart of the difference between the two major types of data: transition data and event or movement data. Transition data are recorded by the census, involving a comparison of individuals’ locations at the start and end of a (transitional) time period. They may be termed ‘exist-survive’ data, as it is necessary for an individual to exist at the start of a time period (12 months in the case of the 2001 Census) and to survive until the end for inclusion within the measured results. Whilst the census measures migrants, event data are those that attempt to record all migration events that occur, usually as they happen. A typical example of event data are the re-registrations that are recorded centrally for National Health Service patients whenever they change doctor. These data are held by NHS
4
central registers and provide a useful indicator of time series trends, as demonstrated by Stillwell et al. (1992). In order to clarify the difference between migrants and migrations, we can use a graphical device known as a Lexis diagram (Vandeschrick, 1993; 2001) which illustrates the relationship between three related demographic dimensions: age (group), time (period), and birth cohort, on a two dimensional plane. The horizontal axis of the Lexis diagram in Figure 2 shows time, marked in units of five years from 1986 to 2011 whilst the vertical axis shows age in completed years from 0 to 25. The spaces between tick marks on the horizontal axis can be used to define both time periods and birth cohorts, the latter being those persons born during a given time period. The tick marks on the vertical axis can indicate age groups. A fuller diagram might extend upwards to a final open ended age group (e.g. ‘90+’), and may cover a variable number of time periods on the x axis, depending on the amount of data available and era to which they refer. Figure 1 is composed of a lattice of triangular components, and these elements are the smallest areas that can be defined on the diagram given known values of birth cohort, time period and age group. Thus, in Figure 2, the left-most birth cohort on the x axis includes all persons born between 1 January 1986 and 31 December 1991 inclusive and is labelled 1988-91, whilst the second birth cohort starts on 1 January 1991 and is labelled 1991-96. Similarly, the lowest age group runs from birth to the day before the fifth birthday and is labelled 0-4, and the second age group runs from the day of the fifth birthday to the day before the tenth birthday, and is labelled 5-9. The diagonal lines on the diagram track birth cohorts over time whereas the horizontal lines indicate the numbers in each age group, period by period. Figure 2 shows the path of the 1986-91 birth cohort (stippled) and the counts of those aged 10-14 in each period (shaded). We observe the triangle marked by the points ABCDEF. The area defined by the points
Interaction Data
Figure 2. Lexis diagram indicating different migration concepts
FBDE includes persons from the 1986-91 birth cohort who were all aged 5-9 in 2001 and 10-14 in 2006; these are the exist-survive migrants over the five year period as recorded by the census. In contrast, the area defined by the points ABDF includes those aged 10-14 during the period 199601, including half those from the 1986-91 birth cohort and half those from the previous cohort, although it would not be possible to tell which was which from the data; these correspond to the administrative NHS event data. Finally, the area marked FBCD includes those from the 1986-91 cohort when they were aged 10-14 in either 199601 or 2001-06. Data of this type may be generated from a cohort survey that asked members at a particular age whether they had migrated when aged 10-14 but did not ask precisely how old they were when the event took place. So far, we have assumed that the migration process involves an individual migrant. However, the individual is not necessarily the most rational unit of observation for migration, especially when the reasons for moving or staying are being considered. It may be more useful in certain
contexts to consider the family or the household as a more suitable unit since residential location decisions are often based on a variety of factors such as employment, commuting possibilities and educational opportunities that affect different members of a single household. This is the reason why, in the 1991 Census, a distinction was drawn between ‘all migrants’ and those that were resident in ‘wholly moving households’. Out of a total of 4.7 million migrants during 1990-91, 65.8% were resident in wholly moving households, defined in the Census as households in which all members indicate that they had a common different address one year previously. The use of households as the fundamental observational units is not problem free. Recent British censuses in 1981, 1991 and 2001 have defined a household as being either a person living alone or a group of people (not necessarily related) who live at the same address and share common housekeeping, share at least one meal a day or share some accommodation such as a living room (Denham & Rhind, 1983; OPCS/ GROS, 1992; ONS, 2001a). Whilst this definition may cover many cases, it is unlikely that it covers every case of persons who consider themselves to be ‘a household’, although it is perhaps impossible to find a definition suitable to be completely inclusive. Those who migrate as individuals have a non-uniform age distribution - heavily skewed to young adults. It is ironic that one of the largest groups of persons making migration decisions as individuals – students going to a higher education institution – are one of the most poorly recorded groups. Some authors, including Flowerdew (1997), suggested the use of ‘moving units’ for the measurement of migration at both individual and non-individual levels. Most households would contain zero, one or two moving units, although any number would be possible. These moving units group together all persons who have moved from the same origin address, as determined by responses on the census form. Thus, if a particular
5
Interaction Data
household had five resident occupants, of which two had migrated from a common address A, another two had migrated from common address B and the fifth person had migrated from a third common address C, then it would be identified as containing three moving units. These ideas were incorporated into the 2001 Census with the measurement of the moving group, a person or group of people within a household or communal establishment who moved together from the same usual address one year before the census. Stillwell & Duke-Williams (2007) show that in 2000-01 there were approximately 5.8 million persons migrating within the UK in 3.5 million groups, of which 48% were wholly moving households and the remainder were ‘other moving groups’. A very high percentage of those in the latter category were in fact individual movers (85%) and amongst those moving in households, over half involved 3 or more persons moving together, 28.3% in two person households and 20% as single people. In addition to the new concept of moving groups, the 2001 Census also required the specification in certain tables (e.g. economic position) of a moving group reference person (MGRP). This is straightforward when there is only one person but may be less so when groups comprise more than one person. Consequently, rules are required to distinguish the MGRP. A detailed study of migration and socioeconomic change in Britain’s larger cities, based on data from 2001 SMS Table MG109 on moving group migration by National Statistics Socioeconomic Classification (NS-SEC), has been produced by Champion et al. (2007). Consideration of the meaning of usual residence, and the phenomena of multiple usual residences, links migration with commuting behaviour. This is because many persons with more than one usual residence will have a cyclic residential behaviour due to their interaction with the labour market. Relatively little research has been done on multiple home ownership because of the lack of data, but some evidence to suggest
6
the presence of multiple usual residences may be found in the Special Workplace Statistics from the 1991 Census. These data were generated for a 10% sample of economically active persons, cross-classified by variables including place of enumeration, place of work and mode of transport to work. When analysed at the regional scale, the mode of transport is sometimes inconsistent with the distance between the residence and the workplace; for example, mode of transport may be ‘by foot’, despite the fact that the residence and workplace may be separated by a large distance. There are a number of possible reasons for such inconsistencies: either the responses given on the Census form may have been incorrect or they may have been incorrectly coded during the data processing stage. Alternatively, the data may be correct and the explanation may be because the Census was conducted on a Sunday, with responses being collated for places of residence at the weekend. If the place of residence on weekdays was close to the place of work, then the respondent may correctly indicate ‘by foot’ as their method of transport to work. The figures for travel to work by foot and by bicycle – which may be expected to be short trips – average around 4.5% of the overall total, with many of the larger (proportional) flows being between non-contiguous standard regions (Duke-Williams, 2004). Whilst the commuting data provide some evidence that individuals have more than one usual residence, several research studies have explored the relationships between migration and commuting behaviour. Green (1997a), for example, has focused on the trade-offs between migrating and not migrating, and between migrating and commuting longer distances, and Green et al. (1999) have identified the emergence of ‘dual location households’ as being a response to conflicts between a rise in the number of dual earner or dual career households and a concomitant rise in insecurities related to housing and labour markets. McHugh et al. (1995) offer a framework for studying cyclical migration and multiple residences, and
Interaction Data
provide a useful review of the literature relating to the limitations of conventional definitions of migration with respect to these phenomena. We now consider the sources from which these data are extracted in more detail.
CENSUS INTERACTION DATA SOURCES The results of UK population censuses represent a massive collection of demographic and socioeconomic data accessible from published volumes or in digital form (Rees et al., 2002). The various census products from which migration or commuting data can be extracted include: main census tables; Special Migration, Workplace and Travel Statistics; commissioned tables; Samples of Anonymised Records; and the Longitudinal Studies. These sources are discussed in turn with particular reference to data available from the 2001 Census. Further details of the 1991 Census are provided by Dale & Marsh (1993) and Openshaw (1995).
Census Tables The Census Offices produce a range of data tables that are available online to members of the academic community through the Casweb interface (Harris et al., 2002). The Key Statistics (KS) provide an overview and summary of the main topics of the 2001 Census in a limited number of simple univariate tables for output areas (OAs), the smallest geographical units of the 2001 Census outputs. The Standard Tables (ST) and ST Theme Tables provide the most detailed attribute breakdowns available in a large number of cross-tabulations but only down to ward level in England, Wales and Northern Ireland, and to postcode sector level in Scotland. The Census Area Statistics (CAS), CAS Theme Tables and CAS Univariate Tables are similar to those covered in the ST data sets but are available at OA scale like the KS and are less detailed in order to protect the
confidentiality of personal information. Armed Forces tables provide information on members of the Armed Forces and data are available down to local authority district (LAD) level for England and Wales only. Certain tables from amongst these contain migration data although the spatial definition of the origins of inflows or the destinations of outflows is very broad. KS24: Migration (All people), for example, provides counts of migrants in various categories including those moving into an ‘area’ from elsewhere in the UK (in-migrants) and from outside the UK (immigrants), those moving within the area and those moving out of the area (out-migrants) during the previous 12 months. There is also a category for those recorded with no usual address one year ago, some of whom may be in-migrants or immigrants. Figure 3 illustrates detail from Table KS24 at district level, showing this classification and its extension to include those ‘People in ethnic groups other than White’ who move. Table ST008: Resident type by age and sex and migration, in comparison, contains a spatial breakdown that is similar to that used in KS24 except that ‘areas’ are distinguished from ‘associated areas’ within the UK. Counts of migrant flows from individual areas to the aggregate spatial units generated from the following categories: • • • •
•
Lived at same address Lived elsewhere one year ago within same area No usual address one year ago Inflow ◦ Lived outside the area but within ‘associated area’ one year ago ◦ Lived outside the ‘associated area’ but within the UK one year ago ◦ Lived outside UK one year ago Outflow ◦ Moved out of the area but within the ‘associated area’ ◦ Moved outside the ‘associated area’ but within the UK.
7
Interaction Data
Figure 3. Detail from KS table 24 at district level. Source: ONS website at http://www.statistics.gov.uk/ statbase
Similar categories to these are used for ST009: Age of household reference person (HRP) and number of dependent children by migration of households, ST010: Household composition by migration of households, and TT033: Migration (People): All people in the area and those who have moved from the area in the past year, within the UK. The term ‘area’ refers to the particular area level being shown in the table; in the case of an ST or CAS table for a ward, ‘area’ translates to the name of the ward. In England and Wales, the ‘associated area’ refers to LAD for tables at ward (electoral division in Wales); parish (community in Wales) or OA level. For all other geographical areas, the ‘associated area’ is England and Wales. These data can be extracted online Casweb at (http://census.ac.uk/casweb/), although KS Table 24 is not contained within Casweb and must be accessed directly from the ONS web site (http:// www.statistics.gov.uk). Whilst data derived from the 2001 Key Statistics and Standard Tables have been used in analyses of patterns of internal migration by Champion (2005b) and of international migration by Horsfield (2005), there are no tables that provide interaction data on commuting equivalent to those above relating to migration other than
8
TT011 which provides flows from each area (OA) to aggregate areas based on the distance traveled to work (< 2km, 2-5km, 5-10km, 10-20km, 2030km, 30-40km, 40-60km, and 60+km). It is clear that the interaction data in the main census tables are very limited; although flows from origins or to destinations are available at different spatial scales (local authorities, wards or output areas), flows between origins and destinations are not and for these data, it is necessary to use the OriginDestination Statistics.
Origin-Destination Statistics As in 1991, two major migration and commuting interaction data sets are available from the 2001 Census: the Special Migration Statistics (SMS) and the Special Workplace Statistics (SWS). However, in Scotland, the SWS in 2001 were replaced with a new set of Special Travel Statistics (STS) that include journeys to place of study as well as place of work. These data sets are also known collectively as the 2001 Census Origin-Destination Statistics, detailed specification of which is available in ONS/GROS/NISRA (2001). They are currently accessible to members of the academic community and data suppliers registered with the
Interaction Data
Census Registration Service via the Web-based Interface to Census Interaction Data (WICID) (see Chapter 4). The 2001 data sets have been reviewed by Rees et al. (2002) and Cole et al. (2002). More recently, Stillwell & Duke-Williams (2007) have explained the structure of the 2001 interaction data sets, documenting the differences in the data sets between 2001 and 1991 and the problems associated with making comparisons between 1991 and 2001, and examining the impact of the small cell adjustment methods (SCAM) used to adjust flows in 2001 to ensure confidentiality and reduce the risk of disclosure. A summary of the tables and counts from the 2001 and 1991 Censuses (Table 1) shows a similar number of tables but considerably more counts in
2001 than in 1991. Data are available in 2001 for three sets of interaction zones: level 1 involves 426 ‘districts’ that include metropolitan districts, unitary authorities and other local authority areas in England and Wales, council areas in Scotland and parliamentary constituencies in Northern Ireland; level 2 includes 10,608 ‘interaction wards’; and level 3 contains 223,060 OAs throughout the UK. The STS for Scotland in 2001 contain counts for children aged under 16 and require additional categories in certain tables. The 1991 SWS data identified in Table 1 are the 10% sample of journey from home to work flows produced only at ward level and referred to as SWS Set C (Cole et al., 2002). The data in each of these tables are available from WICID, together with data sets of
Table 1. Tables, counts and variables in the 2001 and 1991 special interaction data sets Data sets
Level 1 (District)
Level 2 (Ward*)
Level 3 (OA)
2001 SMS
10 tables (996 counts) Migrants: age sex, family status, ethnicity, illness, economic activity Moving groups: tenure, economic activity, NS-SEC, knowledge of Gaelic/Welsh/Irish
5 tables (96 counts) Migrants: age, sex, ethnicity, Moving groups: NS-SEC, tenure
1 table (12 counts) Migrants: age, sex
1991 SMS
SMS Set 2 11 tables (94 counts) Migrants: age, marital status, ethnicity, illness, economic position Wholly moving households: residents, tenure, economic position of head, Gaelic/ Welsh speakers
SMS Set 1 2 tables (12 counts) Migrants: age, sex Wholly moving households: residents
Not available
2001 SWS
7 tables (936 counts) Employees and self-employed: age, sex, living arrangements, employment status, mode of travel, NS-SEC, industry, ethnicity
6 tables (354 counts) Employees and self-employed: age, sex, family status, mode of travel, NS-SEC, occupation, employment status
1 table (36 counts) Employees and self-employed: mode of travel
1991 SWS
Not available
9 tables (274 counts) Employees and self-employed: economic position, hours worked, family position, distance, mode of travel, cars available, occupation, social class, industry
Not available
2001 STS
7 tables (1,176 counts) Persons**: age, sex, family status, mode of travel, NS-SEC, industry, ethnicity, employment status
6 tables (478 counts) Persons**: age, sex, family status, mode of travel, NS-SEC, ethnicity, employment status
1 table (50 counts) Persons**: mode of travel
* postal sector in Scotland; ** persons including those who do not work or study
9
Interaction Data
flows adjusted for suppression in 1991, inflated for under-enumeration in 1991, or estimated from 1981 and 1991 data to be consistent with 2001 boundaries. The modelling methodology that underpins the latter estimation is explained in Boyle and Feng (2002) and in Chapter 13 of this book. These data allow detailed analysis of migration and commuting behaviour at each of the three spatial levels or aggregations thereof. Figure 4 exemplifies aggregate flows of in-migrants and in-commuters to Leeds from the 2001 Census, revealing how the catchment areas vary for the two phenomena. Recent examples of detailed spatial empirical analyses of migration using the 2001 SMS include those studies of age-specific flows (Dennett & Stillwell, 2008a; 2008b), of migration and socio-economic change (Champion et al., 2007; Champion and Coombes, 2007) and of ethnic migration (Stillwell & Hussain, 2008). Recent studies of commuting that utilise the 2001 SWS/STS include those by Harland et al. (2006) and Nielson & Hovgesen (2007).
Commissioned Tables Customised output from the 2001 Census may be commissioned from ONS Customer Services when particular cross-tabulations are not available from the standard tables, but commissioned tables incur charges to recover staff and material costs. Once a table has been delivered and paid for by a customer, it is listed on the ONS website and is available to all users free of charge on request from the Census Customer Services. All commissioned tables of 2001 data are subject to checks to ensure confidentiality. There is a function on the commissioned tables spreadsheet that allows the identification of tables of interest by entering the topics of interest. Each commissioned table is subject to SCAM procedures and consequently inconsistencies will appear when checking totals with data from other census sources. Hussain and Stillwell (2008) have used commissioned data from ONS to analyse district level ethnic migration trends in England and Wales cross-classified by age, whilst the Greater London Council Data Management and Analysis Group has produced
Figure 4. In-migrants (a) and in-commuters (b) to Leeds district. Source: Census 2001 SMS/SWS
10
Interaction Data
a detailed briefing on ethnic migration in London (Mackintosh, 2005) based on commissioned data.
Samples of Anonymised Records Samples Of Anonymised Records (SARs) were introduced as a new innovation in the UK as one of the outputs of the 1991 Census, and offer a considerable degree of flexibility for multivariate analysis of individual records (Dale, 1998). These ‘microdata’ comprise a set of records relating to individuals and (where appropriate) households, with personal data such as names and addresses removed. However, there are spatial variables available, including residential location at the time of the census, location of address one year ago for migrants, and country of birth. In order to generate interaction flows, any two spatial references can be cross-tabulated, with possible disaggregation by any other chosen variable(s). A total of four SAR files were generated from the 1991 Census; two relating to GB, and two relating to Northern Ireland. In both cases, the Individual SAR was a 2% sample of individuals in the Census and the Household SAR, a 1% sample of households with a record for each sampled household, followed by a set of records for each individual within the sampled household. Separate geographies were used for the primary reporting areas (i.e. the location at the time of the Census). Details of the type of geography and the numbers of areas for spatial variables are summarized in Dennett et al., 2007). Migrant origin is limited to the standard region, and workplace location is only available as a broadly coded categorical variable, or as a broadly coded ‘distance to workplace’ observation. This limits the potential for use of the SAR as interaction data. However, because of their large sample size and the ability to cross-tabulate variables not available from the main census tables, the 1991 SARs have been used to identify the characteristics of migrants. One example is the migration of the elderly to join existing households by Al-Hamad
et al. (1997). The 1991 SARs have also been used to assess the impacts of tenure on long-distance migration compared with short-distance migration by Boyle (1995), indicating that long-distance migrants are less likely to move into council housing than other tenures. The range of SAR files was expanded to five with the 2001 Census: the Individual SAR (Licensed), the Individual Controlled Access Microdata Sample (Individual CAMS), the Special License Household SAR, the Household Controlled Access Microdata Sample (Household CAMS) and the Small Area Microdata (SAM). The two CAMS files offer more detailed versions of the respective licensed files, and they are made available under more restrictive conditions. There are several potential locational variables which could be used to generate interaction data. For the Individual CAMS, a much more detailed residential geography is available than was the case with the 1991 SARs, based on the LAD but with much lower thresholds used for amalgamation. For other variables, the spatial geography is based on an expanded Government Office Region (GOR) geography that includes Wales, Scotland and Northern Ireland as additional regions, and splits London into Inner and Outer. Whilst the sample size for the Individual SAR has increased from 2% in 1991 to 3% in 2001, the value for use as interaction data is diminished due to the reduction in resolution of the primary geography from 278 regions (in GB) to 13 regions in 2001. The Licensed Household SAR has the same sample size as in 1991, but has little or no potential for use as interaction data, due to the removal of the primary geography in order to reduce the risk of disclosure. The two CAMS files have more potential for use in interaction data analysis. The Individual CAMS has a LAD based geography for both migrant origins and destinations, thus offering similar spatial detail to 2001 SMS at Level 1. The data can be disaggregated by any chosen variable, although the sample size, coupled with the generally low incidence of one-year migrants
11
Interaction Data
in all Census data (around 12% of individuals were identified as migrants) will tend to restrict the ability to carry out multivariate analysis. The household CAMS file has a detailed primary geography, but only a categorical version of migrant origin. Both CAMS files feature very detailed versions of the country of birth variable, allowing spatially detailed analysis of life-time mobility. The 2001 outputs also saw a new flavour of microdata: the Small Area Microdata (SAM), an individual sample (5%) which sacrifices attribute detail in order to permit greater spatial detail. For migration analysis, the SAM has the advantage of a detailed destination geography, although the origins remain as the expanded GOR geography. Thus, for the study of in-migrants, considerable detail can be discerned. However, as with all other 2001 SARs and the 1991 SARs, workplace address is provided solely as a movetype classification, meaning that the data are not suitable for use as origin-destination commuting data. The 1991 SARs data, and the licensed versions of the 2001 SARs data are of limited use for spatially detailed analysis of interaction data. However, they retain the general advantage of microdata as an opportunity for flexible multivariate analysis, and thus have potential use for the aggregate study of characteristics of those involved in spatial interactions (i.e. migrants and commuters). A recent example of this is the study of the characteristics of ethnic migration by Finney and Simpson (2008). In general, the SARs are more useful for interaction data use with respect to migrants than to commuters, as there is no spatial coding of workplace location.
Longitudinal Studies Longitudinal studies are data sources that contain multiple observations of a population of interest over a period of time. They include both surveys which are repeated at intervals for a known set of respondents, and more general instruments from which a sample is extracted, and externally linked
12
to records for the same persons from earlier collection rounds. Examples of the former type of longitudinal study include the UK cohort studies, in which a selected sample are surveyed in multiple sweeps over the course of their lives. Examples of the latter type include the census based Longitudinal Study, which is derived from samples extracted from each decennial census. Data of these kind provide a valuable research resource, including the analysis of interaction flows. There are three major longitudinal studies in the UK that are based on census data, with linked administrative records from other sources including vital events and registration data. These are: the ONS Longitudinal Study of England and Wales (LS) the Scottish Longitudinal Study (SLS) and the Northern Ireland Longitudinal Study (NILS). These differ in a variety of ways including the length of the time period covered, the sampling fraction used, and the types of other data linked into the study. The LS is the longest established of the three studies, containing linked data from the 1971, 1981, 1991 and 2001 Censuses. The sample is selected on the basis of four (undisclosed) birth dates, giving a sample fraction of around 1%. Persons born on one of these days are extracted from each Census and attempts are made to link them to established records from earlier Censuses or to administrative records. In addition to the core sample members, records are also extracted and added to the LS for other persons in the sample member’s household, although these additional persons are not (unless they also happen to be a sample member) tracked in later censuses, unless they are still living in a sample member’s household. The linked LS has enabled researchers to examine changing patterns of settlement and local geography as well as factors affecting longterm migration. The link between inter-regional migration and social mobility has been explored by Fielding (1992) to identify the South East region as an ‘escalator’. The relationships between counterurbanisation and social mobility have been investigated by Fielding (1998) and vari-
Interaction Data
ous studies have tracked the spatial distribution of the population in different parts of the country (Williams, 2000; Davies et al., 2006), migration relating to health and deprivation (Norman et al., 2005) and the geographical and social dynamics of ethnic groups (Platt et al., 2005). The SLS is a continuous study, incorporating data from the 1991 and 2001 Censuses. It is a 5.5% sample, based on 20 birth dates providing linked information for approximately 274,000 individuals in Scotland. For each individual, the SLS has all the variables that can be extracted from the complete 1991 and 2001 census forms, including place of usual residence 12 months before the census and details relating to the journey to work or study. The NILS is the most recently started study, and contains data from the 2001 Census only. NILS members are selected on a total of 104 birth dates, giving a much larger sample size than the other two studies of around 28%. The linked administrative data include birth and death registrations, health service related migration data, and information about members’ households from the Valuations and Land Agency. A new UK Household Longitudinal Study (UKLHS) is due to start its first wave of data collection in 2008 consisting of a wholly new sample of households, an ethnic minority boost sample, and a sample (up to 100%) drawn from the existing British Household Panel Survey (BHPS). It will yield a sample of at least 40,000 households, making it the largest type of study of its kind in the world, and will provide interaction flow data. Whereas the longitudinal studies are based around linked census data, which contain, generally speaking, the same questions each time, birth cohort surveys use different questionnaires in each sweep. There are four significant birth cohort studies in the UK: the MRC National Survey of Health and Development (NSHD) (the British 1946 birth cohort study); the National Child Development Study (NCDS) (the 1958 birth cohort study, originally known as the Perinatal Mortality Survey); the 1970 British Cohort Study
(BCS1970); and the Millennium Cohort Study (cohort born in 2000/2001). These tend to contain core questions that are asked at each sweep, plus additional questions that reflect changing interests and research priorities. Clearly, the questions that are asked to (the parents of) young children in the earliest waves of any birth cohort study will be very different to those asked as the survey members grow to adulthood and subsequently into retirement. Whilst adult members of the earlier cohort studies have been asked numerous questions in each wave about employment and occupation related issues, it would appear from examination of the available data that specific questions about the location of members’ workplaces have not been regularly asked. Thus, the potential for use as journey-to-work interaction data is very limited. In contrast, the very nature of the studies, which track individuals over time, means that a near complete record of residential history is maintained, giving rise to very rich migration based interaction data.
ADMINISTRATIVE INTERACTION DATA SOURCES Whilst the Census provides the most comprehensive and reliable data on migration and commuting, particularly for smaller areas, its periodic nature means that researchers must look elsewhere for interaction data relating to inter-censal periods. A number of administrative registers provide useful information on a regular basis on both residential movement and commuting. In this section we introduce a selection of these sources and refer the reader to Dennett et al. (2007) for further detail.
NHSCR Data for England and Wales One source of migration estimates is the registration system that records National Health Service (NHS) patients who migrate and change their
13
Interaction Data
doctor. The NHS Central Register (NHSCR) at Southport records movements of patients between Health Authority (HAs) areas in England and Wales and the Census Office has developed systems for capturing the reporting of re-registrations of patients between areas used to administer the general practitioner services of the NHS (Bulusu, 1991). These areas, initially known as Family Practitioner Committee Areas (FPCAs), became Family Health Service Areas (FSHAs) – groups of London boroughs, metropolitan districts and shire counties – in 1990 until late 1996 when HAs were introduced. Since the early 1980s, individual anonymised records from the NHSCR known as primary unit data (PUD) have been created by ONS (formerly OPCS) in quarterly data files. Entries in the NHSCR include the date of birth mentioned above together with the sex, the codes of the FHSA that the patient has been registered with in the past as well as the new FHSA code. The registration data available from the NHSCR are defined as ‘movement’ data and their measurement is conceptually different from that of ‘transition’ data available from the census, as explained earlier in the chapter. When compared for 1981, the NHSCR flows tend to be larger in volume than census counts because they capture multiple and return moves as well as student movements (Boden et al., 1992). Whilst researchers have highlighted many of the conceptual and definitional characteristics and shortcomings of the data (Stillwell et al., 1992; Champion et al., 1998), the NHSCR data has been used in a number of studies of time series trends (e.g. Bulusu, 1989; 1990; Devis, 1984; Rosenbaum & Bailey, 1991; Stillwell et al., 1992; Stillwell, 1994; Stillwell et al., 1996). Details of a time series of NHSCR data from 1975 to 1998 for a consistent set of areas (Duke-Williams & Rees, 1993) – used for a major migration modelling study commissioned by the Office of the Deputy Prime Minister (ODPM, 2002; Champion et al., 2003; Fotheringham et al., 2004) and for examining trends in internal
14
migration by Kalogirou (2005) – are documented in Dennett et al. (2007). Since 1998, NHSCR data sets have continued to be produced for a national geography of health-related administrative areas. The data are processed and tabulated for quarterly periods and matrices of origin-destination flows between GORs are available from ONS publications. Whilst NHSCR data aggregated into tables of gross inflows and outflows by broad age group (15 and under, 16-59, 16-64, 60+, 65+) are available for HAs from the ONS web site, origin-destination flows between HAs are available only on request.
Patient Register Data for England and Wales The NHSCR system in England and Wales only records movements between HAs and, in the past, ONS has used information from electoral registers and the most recent census to apportion NHSCR inflows and outflows between constituent local authorities (LADs). The inadequacy of the electoral registers in the estimation of sub-HA flows led ONS to investigate the patient registers held by every HA in England and Wales (Scott & Kilbey, 1999; Chappell et al., 2000). These registers contain the NHS number, gender, date of birth, date of acceptance at the HA and, importantly, the postcode of address, for each patient. With postcode unit information being available, it is possible theoretically to create aggregate migration matrices for any level of geography, a significant advantage over the FHSA/HA boundaries that previous NHSCR migration estimates were restricted to. NHSCR and patient register data differ in their composition in that the NHSCR data are counts of moves from one area to another in a particular period of time whereas patient register data are counts of persons migrating and are, conceptually, equivalent to census transition data. With patient registration data only recording migrations if the address at the beginning of the period is different
Interaction Data
from the address at the end of the period (and if two addresses are present on the register), some categories of migrant are unaccounted for who would be identified by the NHSCR such as newborn infants born during the period, international immigrants, persons who have been discharged from the armed forces and other migrants who leave the register before the end through death, emigration or enlistment into the armed forces (Scott & Kilbey, 1999). Acting on advice from extensive consultations about patient register data, ONS have used patient register and NHSCR data in combination to produce migration estimates for England and Wales since 1999. By obtaining a download from each patient register on an annual basis and by combining all the HA patient register extracts together, a total register for the whole of England and Wales has been created, the Patient Register Data System (PRDS). Comparing records in one year with those of the previous year by linking on NHS number enables identification of each person who changes their postcode. The download is taken at 31 July each year to enable migration
estimates to be made for the year ending 30 June. This is consistent with the assumption that people delay registering with a new GP for a month after they move. Tables are created from the PRDS, combined with NHSCR data and used as an indicator of population movement (Migration Statistics Unit, 2007). A range of tables are available from the year ending mid-1998 onwards and cover inflows and outflows rounded to the nearest 100, disaggregated (in some cases) by age and gender for HAs and LADs. Recently, ONS have used the patient register data to produce origin-destination matrices from the year ending mid-1999. These tables include origin-destination matrices of flows between LADs in England and Wales by broad age group. Figure 5 illustrates net migration patterns in 2004-06 and changes since are different from those in 2000-01. As yet, despite the potential for estimating migration between areas from the postcode unit level upwards, no attempt has been made by ONS or anyone else to estimate migration for anything lower than LAD.
Figure 5. Net migration balances by local authority in England and Wales, 2005-06 (a) and changes since 2000-0 (b), based on patient registration estimates. Source: NHS patient register data supplied by ONS
15
Interaction Data
Scotland and Northern Ireland A similar combination of locally held patient register and NHSCR data is now in use in Scotland to estimate internal migration. In Scotland, patient register data is known as the Community Health Index (CHI) and includes information including postcode, date of birth, gender, details or registered GP and the date joined GP lists. It was not until 2002 that exactly the same method of constraining patient register estimates to NHSCR estimates was used in Scotland (GRO Scotland, 2006; 2007). And matrices of flows between council areas were created. Patient register data in Northern Ireland is known as the Central Health Index (NI-CHI). In the late 1990s, CHI data in Northern Ireland underwent enhancement through more complete postcoding of the data, and through aggregate statistics being made available at local government district (LGD) and parliamentary constituency (PC) levels. A comparison has been carried out between the migration estimates obtained from the 2001 Census and NI-CHI data (NISRA, 2005) concluding that estimated migration flows between LGDs were sufficiently similar to justify the use of NI-CHI in estimating inter-censal migration flows within the country. Selected migration statistics are available from the NISRA website by LGD from 2001 to 2005. However, whilst it is possible to identify the net impacts of internal and external migration on LGD populations down to the single person, it is impossible to ascertain from where or to where migrants are coming from or where they are going to.
Pupil Level Annual School Census (PLASC) Whilst the 2001 STS in Scotland provided details of the daily travel to study for students and children, similar data are not produced for England and Wales or Northern Ireland, However, the annual
16
Pupil Level Annual School Census (PLASC) does collect data from each local education authority (LEA) in England and Wales on the location of pupils and the schools that they attend, providing a potentially useful source for data set on the journey to school, if confidentiality issues can be addressed. Various data sets are collected and held by the Department for Education and Skills (DfES) within a centralised ‘data warehouse’. These include the National Pupil Database (NPD), local authority data, school level data, school workforce data and geographical data (Ewens, 2005a; Jones & Elias, 2006). The NPD was established in 2002 and contains linked individual pupil records for all children in the state school system which is updated annually. Each pupil is given a unique pupil number (UPN) and has an associated set of attributes: age, gender, ethnicity, special educational needs, free school meal entitlement, key stage assessments, public exam results, home postcode and school attended. The NPD combines information from the PLASC with information on pupil attainment, reference data on schools and LEAs. PLASC is the foundation of the NPD, including variables such as ethnicity, a low-income marker, and information on Special Education Needs (SENs). The linking of pupils from one year to the next using the UPN means that a longitudinal profile of each pupil is available whose extent depends on how long the pupil has been in the education system. Potentially, this means that pupils can be tracked over time and their transitions through the education system can be identified, including their movements between schools and between different home addresses (Harland & Stillwell, 2007a). PLASC data are therefore a potential source of data on commuting to school, on pupil mobility between schools and of child migration from one usual residence to another. At the moment, there is no indication of any system is in place to process the data to generate this type of information. However, recent
Interaction Data
Figure 6. School territories based on data derived from the PLASC for Leeds. Source: PLASC data supplied by Education Leeds
research based on PLASC includes studies of the mobility of English school children (Machin et al., 2006) and moving home and changing school in Greater London (Ewens, 2005b), and work based on PLASC data for Leeds (Harland & Stillwell, 2007b), supplied by the LEA (Education Leeds), allows residential migration and movement between schools to be quantified as well as the movements from home to school that indicate how school territories vary in size and shape. Figure 6 shows two examples of the latter.
Hospital Episode Statistics Data Another potential source of data on commuting is the Hospital Episode Statistics (HES) held by the NHS and providing data on the journey to hospital. These data include details of all patient admissions to NHS hospitals in England from 1989-90 (Liffen et al., 1988). Data for NHS hospitals in Northern Ireland, Scotland and Wales are collected separately by respective national offices. Each record holds around 100 personal, medical and administrative details of each patient admitted to hospital in England, including geographical information about the location of treatment and where the patient lived. Around
12 million new records are added to the dataset each year, with most of the variables collected at point of contact from the Patient Administration System (PAS). Requests for this data in the form of database extracts or custom tabulations are currently made to the NHS Information Centre through their external data custodians, Northgate Information Solutions. If permission is granted by the relevant advisory groups, it should be possible to obtain data for patients relating to their residential location (which could be as detailed as postcode unit or OA) and their location of treatment (which in theory could be as detailed as hospital postcode). These interaction data can be further disaggregated by variables including gender, age, ethnicity, admission/discharge date, length of treatment spell and illness/diagnoses/ operation type. Little, if any, research appears to have been done on the ‘commute to hospital’. This data set provides the potential to investigate hospital catchment areas for different types of operation and to compute average distances to hospital for different types of treatment across the country, for example.
17
Interaction Data
Other Sources Whereas the Census and NHS patient registers provide interaction flow data for the population in aggregate terms, PLASC and HES statistics are examples of commuting data sets involving particular subgroups of the population: schoolchildren and hospital patients. There are many other sources of interaction data on particular sub-groups, some of which will be mentioned briefly in this section. One sub-group of particular importance are students whose migration flows were included in the census in 2001 but not formally in 1991. The Higher Education Statistics Agency (HESA) is the central source for HE statistics, collecting and providing data on students and staff in HEIs as well as destinations for HE graduates. There is a ‘student dataset’ that includes variables such as: A/AS level/Highers points scores, age, disability, gender, ethnicity (white/non-white), source of tuition fees, subject area of study together with domicile and location of HE institution. There is also a ‘first destination dataset’ that includes information on activity, qualification required for job, employer size, SIC, SOC and location of employment or institution of further study. HESA does not allow general access to microdata but flows at super output area level can be purchased on request. There are a number of administrative sources of data on international migration flows. The Worker Registration Scheme (WRS), administered by the Home Office, provides a cumulative total of the number of nationals of the eight Central and Eastern European countries that joined the European Union (EU) who have registered to work in the UK. A National Insurance Number (NINo) is allocated to each overseas national entering the UK who wishes to work or claim benefits in the UK and is recorded on National Insurance Recording System (NIRS). There are figures by ‘year of arrival’ that show arrivals subsequently allocated a NINo according to their reported arrival date into the UK. The figures by ‘year of registration’ are based on the date of registration onto NIRS i.e.
18
after the NINo application and allocation process has been completed. NINo data are extracted each year in June and provide flows by local authority area. Boden & Stillwell (2006) have identified the variations in NINo registrations by Government Office region and explored the patterns by destination within Yorkshire and the Humber for Poles and Pakistanis, the two largest groups of labour inflows. The Immigration and Nationality Directorate of the Home Office has responsibility for immigration control, applications for settlement, citizenship and asylum. Consequently, it produces statistics on immigration control, enforcement, citizenship and asylum. Asylum applications are identified as either ‘port’ or ‘in-country’. Port asylum seekers are those who apply at port when entering the UK. They are relatively few in number and are usually not captured in the IPS since they are detained for separate interview on arrival. In-country asylum seekers are those entering the UK who do not apply for asylum on arrival but do so once in the UK. These individuals are also unlikely to be captured as migrants in the IPS (see later section). The Home Office also collects data on the dependents of asylum seekers (although this has not been done rigorously in the past) and these, together with the counts of principal asylum seekers and international migration data from the IPS, are used to produce estimates of Total International Migration (TIM). As with principal applicants, an allowance is made for those dependants who are not migrants because they are returned within a year. ‘Visitor switchers’ are visitors who enter or leave the UK intending to stay in the destination country for less than a year but who actually stay for a year or longer. For the years before 2001, estimates of visitor switcher inflows from the non-European Economic Area (non-EEA) were made from the Home Office database of after-entry applications to remain in the UK. IPS data on visitors for these years are only used to estimate visitor switcher data for individuals not covered by the available Home Office data. Since 2001, visitor switcher flows are
Interaction Data
estimated from IPS data relating to two categories of visitors: those who initially intend to stay for 6-11 months, and those who indicate that they may stay for longer than a year although intended length of stay is uncertain. The Home Office generates estimates of asylum seekers and visitor switchers by broad origin region and local authority area of destination (Stillwell et al., 2002). At a more general level, the United Nations Statistics Division collects and disseminates, at the international level, official national data on international migration whereas Eurostat publishes tables on international migration and asylum by individual European country, the EU as it was constituted on 1 May 2004, the former EU15, the Economic Monetary Union, the European Economic Area and the European Free Trade Association. Inter-regional migration flows are also available from Eurostat but only for registered users with a userid and password. Finally, data are available from specific administrative sources that relate to different types of mobility. For example, residential property transactions normally, though by no means exclusively, involve migration from one house to another and the previous addresses of new owner occupiers are held by estate agents, institutions operating in the housing market as well as the Land Registry. Likewise, changes of address are recorded at the Drivers and Vehicle Licensing Agency (DVLA) when vehicle owners move house. Council tax records are another important administrative source of migration data which are likely to be more reliable and more comprehensive than either of the previous two sources. Data on commuting flows to entertainment events such as football matches might be derived from details of season ticket holders held by the clubs or flows to public libraries might be extracted from the computerised library systems. Moreover, commercial organisations undertaking regular research surveys sometimes collect interaction data. An example of this is Axciom’s annual research opinion poll that asks respondents where they lived previously and when they moved to their current residence.
Clearly there are many sources that contain information about trip origins and destinations that can be geo-referenced either to a specific point or to a geographic area of some type. In various cases, the data that exist are inaccessible because of confidentiality constraints.
SURVEY INTERACTION DATA SOURCES Surveys are the third major type of data source. The ESRC/JISC funded Economic and Social Data Service (ESDS) provides access to a range of archived UK survey data sources. Some of these data sources include information that can be used to measure population movements over different temporal and spatial scales, both within the UK and between the UK and other countries.
Labour force Survey The quarterly Labour Force Survey (LFS) is a continuous quarterly household survey of around 53,000 households, representing around 126,000 individuals (Madouros, 2006) whose main purpose is to provide information that can “be used to develop, manage, evaluate and report on labour market policies” (ONS, 2006a). The survey has been running since 1973, although the format has changed over the years. The more recent rounds of the LFS contain details of the residential and workplace movements of respondents, at the scale of GOR and Standard Region. For each respondent, their region of residence is recorded at the time of interview, as well as their region of residence three months and one year before. The region of place of work is also recorded for the main job and second job of each respondent, although the region of place of work three months or one year before the interview is not recorded. LFS data are available from the ESDS in individual respondent (primary unit data) form, which allows users to create their own flow matrices, either for residential
19
Interaction Data
or commuting flows, through a process of crosstabulation, although it is apparent that regional definitions are not constant between the origin (region of residence one year ago) and destination (region of current residence). For respondents who were not born in the UK, information regarding their country of birth and origin is included, as well as the year of arrival in the UK. Country of residence three months ago and one year ago are also included, providing (in some cases) a timeframe in which to contextualise movements. A major change in the structure of the quarterly frequency of the survey occurred in 2006, when the seasonal quarterly basis (starting March-May) of collection which had been the norm since 1992, was changed to a calendar quarterly basis (starting January-March). From March-May 2005 quarter, a ‘special licence’ data set is also available, where the geographical scale of reference available for each individual is local authority district (LAD) for both residence and place of work. The Northern Ireland (NI) LFS is closely related to the GB LFS, with very similar variables included in the survey. It has been running for the same amount of time, with quarterly coverage from 1994 onwards. The sample size for the NI LFS is around 8,500-9,000 individuals. Primary unit data for the NI LFS are only available from 1995 to 2000 via the ESDS. Summary results for later dates are available through the Northern Ireland Statistics and Research Agency (NISRA). Despite some difficulties with using the LFS to understand migration and commuting flows, a number of studies have used LFS data for this purpose. Forsythe (1992), Gordon (1995) and Bover (2002) have analysed inter-regional flows whilst Shields (1998), Dustmann & Faber (2005) and Dustmann et al. (2005) have looked at immigration.
International Passenger Survey The International Passenger Survey (IPS) is a large, multipurpose sample survey of passengers arriv-
20
ing at, and departing from, the main UK airports and seaports as well as those passing through the Channel Tunnel. Details of the sampling methods are available in ONS (2006d). As a measure of migration, the IPS has three main limitations. Firstly, it does not cover all types of migration; flows along land routes between the UK and the Irish Republic are excluded as are most asylum seekers and some of their dependants. Secondly, it is subject to a degree of uncertainty, although ‘standard errors’ are estimated. Thirdly, migration estimates are based on respondents’ intentions, which may or may not become their final actions. Thus, some adjustments are required to account for ‘switchers’ who change their intentions. The IPS has been described as the ‘richest’ source of information on international migration (ONS, 2005b), but the problems listed above have limited its utility. The annual ONS Total International Migration (TIM) publication (ONS, 2005b) outlines standard error calculations that need to be applied to the total flow estimates calculated from the sampled data. Furthermore, aggregate statistics provided by the ONS have been subjected to seasonal adjustment (ONS, 2006a and Annex D, ONS, 2006b) to ‘smooth’ the effect of seasonal travel in the UK, and produce quarterly information that is directly comparable. Data available from the ESDS is available by quarter in annual packages. In addition, national estimates of international travel produced from the IPS have been created using complex variable weighting procedures for which little detailed information is provided, although a brief overview is given in Travel Trends (ONS, 2006a). Whilst IPS data is the primary source used by the Government to produce estimates of international migration, the TIM estimates also include data on asylum seekers from the Home Office, as well as data from the Irish Central Statistics Office (ONS, 2006c). Annual publications produced by the ONS on international migration (ONS, 2003b; 2004b; 2005b; 2006c) have all made use of the TIM estimate
Interaction Data
and therefore the IPS. Other studies that have used the IPS to measure international migrations include Salt (2005) and Large & Ghosh (2006a; 2006b). Attention has been drawn to problems associated with the international migration figures in the wake of the revelation that the numbers of people coming into the UK from Eastern Europe in recent years have been significantly underestimated. The report for the inter-departmental task force on migration statistics (ONS, 2006d) reviews some of the current issues and shortcomings related to current international migration estimates.
General Household Survey and Northern Ireland Continuous Household Survey The General Household Survey (GHS) has been in existence since 1971, and has been conducted on an annual basis since then, with the exception of two breaks – one in 1997/98 and another in 1999/2000. The sample size changes slightly year-on-year, but it is usually between 8,00010,000 households, which comprise around 15,000-20,000 respondents. Results are published through the ONS in summary form and the ESDS in primary unit form on an annual basis. The main purpose of the study is to collect data on a range of core topics, covering household, family and individual information. The GHS has always asked a question relating, in some way, to the amount of time each respondent has lived at a current address. From this, it is possible to derive some indication of in-migration from somewhere else within Britain or immigration from outside Britain. A question relating to how many moves the respondent has made in the past five years is also included. Unfortunately, no question is included which could give a precise indication of the place of origin for internal migrants so it is only possible to determine whether an individual is currently living in a specific GOR, and that they either did
or did not live there n years ago. The finest spatial unit of reference for any movement is the GOR/ Standard Region scale. Information relating to the date of arrival in the UK for respondents born elsewhere is included, as is their country of birth. From this, it is possible to infer something about international immigration. However, country of residence before moving to GB is not included as a variable, thus casting some doubt on the real origin of the migrant and limiting any conclusions that can be drawn. The Northern Ireland Continuous Household Survey (CHS) is related to (and indeed modelled on) the GHS in GB. However, the topics covered and continuity of the data are slightly different. Beginning in 1983, the CHS samples around 1% of the households in Northern Ireland. Covering similar general topics to the GHS, there are also variables which can be used to measure population migration. The spatial units used are of a ‘finer grain’ than those used in GB, with data aggregated by electoral ward as well as by district council area. However, variables allowing the monitoring of migration patterns are not included on the same regular basis as they are in the GHS and so migration analysis using the CHS is not possible.
National Travel Survey First commissioned in 1965/66, the National Travel Survey (NTS) has, since then, provided periodic snapshots of British travel behaviour. Currently the NTS samples 16,000 addresses in Great Britain annually. Primary unit data are available to download for selected periods since 1972 from the ESDS and summary statistics and reports are also available through the ONS and the Department for Transport. The smallest geographical scale for which aggregate data are made available is the GOR, despite data being collected at postcode sector level. Origins and destinations that are published for each journey are referenced only by GOR. Additional data included for all
21
Interaction Data
cases includes variables such as distance and frequency of journeys made on a given travel day, mode of transport, and average annual and weekly mileage. Standard socio-demographic identifier variables are also featured, including age, gender, marital status, socio-economic group and industry of employment. Interpretation of flow matrices from the NTS needs to be carried out carefully, as the flows represent all journeys carried out by the sample population in their given ‘travel week’ between origin and destination regions. Table 2 is a summary of the interaction data that are available from social survey sources. The principal advantages of some of these surveys are that they publish results with high frequency – often annually, but in some cases quarterly, allowing the researcher to build a time series of migration data and to identify migration trends up to the most recent quarter or annual period, thus providing valuable information with which to complement data from the decennial census of population. However, the major drawback, shared by most of the social surveys covered here, is that the spatial resolution for published statistics tends to be the GOR. Such large spatial units mean that only very general patterns of movement can be observed, despite the rich variety of other attributes that can be ascribed to the individual respondents. Moreover, as a consequence of the detail inherent in many surveys, the sample size of the survey is often relatively small, with implications for accuracy.
CONCLUSION This chapter demonstrates that origin-destination flow data are available from a wide-ranging set of census, administrative and survey sources, some of which were not specifically designed to provide statistical information to support research on migration or commuting directly, yet provide valuable insights into these patterns of behaviour for which there is a considerable
22
paucity of reliable information. The most important data source for migration and commuting flow data is the Census and it is clear that the last three censuses have generated a number of products from which flow transition data can be extracted. In most cases, there are online interface and extraction systems or mechanisms of assistance already in place to allow users to access flow data; the WICID system giving access to the SMS, SWS and STS is explained in detail in Chapter 2. An ONS consultation document looking forward to the 2011 Census (ONS, 2005a) indicates that migration and commuting questions similar to those asked in 2001 will be asked again in 2011 and it is likely that separate Origin-Destination Statistics will be produced once again. In order to maximise the success of the 2011 Census, the ONS carried out a test of the procedures to be used in England and Wales on 13 May 2007 on 100,000 households in five local authorities. It is interesting to observe that there were a number of questions on the ONS test questionnaire from which it would be possible to extract new interaction data. The first of these relates to visitors and simply asks for usual address, thus providing some indication of where visitors come from by age and sex. Secondly, there is the question about country of birth that allows a measure of lifetime migration to be derived but, in addition, there is a question for those born abroad about when they most recently arrived to live in England and Wales. In theory, this should enable the creation of matrices of those born overseas by origin and destination and year of entry. The familiar question relating to place of usual residence one year ago is asked, but there are also questions asking about other addresses at which an individual stays for part of the week or year. The second address is asked for together with information about the reason for staying at the second address. Reasons are categorised as ‘to stay with another parent/ guardian’; ‘to stay when I work away from home’; ‘to stay when not at university/boarding school’; ‘my holiday/second home’; ‘to stay when I’m on
Interaction Data
Table 2. Summary of major survey data sources containing interaction data Survey
LFS
LFS NI
IPS
Start date
1973
1973
1993
Current sample size
53,000 households, 126,000 individuals annually.
8500- 9000 individuals annually.
250,000 passengers annually.
Current timing
Calendar quarterly sampling and release.
Calendar quarterly sampling and release.
Continual sampling, quarterly compilation, annual release.
Main variable types covered (Variables flows can be disaggregated by).
Age, gender, ethnicity, level of education, marital status, religion, number of dependent children, employment type, sick days, socio-economic classification.
Age, gender, ethnicity, level of education, marital status, religion, number of dependent children, employment type, sick days, socio-economic classification.
Age, gender, UK port or route, type of vehicle, type of fare, purpose of visit, intended length of stay, money spent on beer, wine, spirits and cigarettes, overseas origin or destination.
Interaction data
GOR to GOR and International country to GOR interaction matrices possible. Disaggregation by any variable of choice. LAD to LAD with special permission.
GOR to GOR and International country to GOR interaction matrices possible. Disaggregation by any variable of choice.. LAD to LAD with special permission.
International country of origin to UK county matrices possible. Dissaggregation by any variable of choice.
Survey
GHS
CHS
NTS
Start date
1971
1983
1965/66
Current sample size
8000-10,000 households, 15,000-20,000 individuals annually.
4,500 households (around 1% of Northern Ireland total).
16,000 households annually.
Current timing
Annual release.
Annual release.
Data collected on sample ‘travel week’ for study sample over course of a year. Annual release.
Main variable types covered (Variables flows can be disaggregated by).
Household members, household and family information, household accommodation, housing tenure, consumer durables including vehicle ownership, employment, pensions, education, health and use of health services, income.
Household members, household and family information, household accommodation, housing tenure, consumer durables including vehicle ownership, employment, pensions, education, health and use of health services, income.
Accessibility of public transport, access to amenities, household vehicle access, household composition and household socio-economic information, age, gender and marital status, employment, occupation and industry details, income, place of work and travel to work details.
Interaction data
Very little. GOR of destination is all that can be accurately measured. Origin is either current GOR or ‘elsewhere.’ There is no way of telling which.
Only for 1983. NI electoral ward or council area can be origin or destination. Immigration from GB or Eire also available for this year alone.
GOR to GOR commuting data is available readily for most recent years. This data should be available in theory for other years too, although in practice availability is variable.
duty (armed forces)’; and ‘other’. There is also a question about how long the individual stays at the second address: ‘less than half the time’, ‘about half the time’ or ‘more than half the time’. These questions have the potential to generate a considerable amount of new interaction data relating to temporary mobility although they may not appear on the final census form and decisions on data released have yet to be made.
Amongst the administrative sources that have been considered in this chapter, the data source that seems to have been most exploited by practitioners and researchers hitherto has been the NHS patient reregistration system, as evidenced by the adoption of NHSCR data into the official population estimation methodology and the use of patient re-registration data for identifying changes in the magnitude and spatial patterns of movement between censuses,
23
Interaction Data
even though, as the audit carried out by Dennett et al. (2007) has demonstrated in detail, the data have their shortcomings and further work is required to develop a consistent set of patient register data for the UK as a whole. The chapter indicates that there are other sources of administrative data that have the potential to provide valuable information on migration and commuting. The availability of time series of NHS hospital episode statistics and PLASC data on journey to school also have considerable potential for those seeking to better understand patterns of commuting, although, in the latter case, significant investment is required to ensure that the attributes of individual pupils are correct and consistent from year to year and algorithms would be required to produce the flows of children between particular geographical units. The analysis of HESA time series data on student flows to and after university would be particularly useful given the inconsistency of measuring students in the last two censuses. Administrative data sets containing flows of individuals entering the country from overseas, such as the NINo statistics and the Home Office data on asylum statistics and visitor switchers are extremely useful for helping to understand trends and patterns of immigration. There is a strong argument for the development of what Rees & Boden (2006) refer to as a New Migrant Databank (NMD), a common framework within which to assemble counts and indicators of immigration derived from the various different sources Surveys are the other major type of interaction data source and whilst there are several that provide information about migration, there are very few that contain any details of commuting flows. The main advantage of survey data, especially those which are available in primary unit data form such as the IPS, GHS or NTS is that the user can cross-classify different attributes and provide cross-tabulations that complement data from the decennial census of population, as well as providing more up-to-date information.
24
However, one of the key constraints of survey data, due largely to the fact that the records are only a small sample of some population, is the implication for analysis at spatial scales below that of the GOR. In addition, there are problems with intermittent temporal coverage as well as limitations with some of the methods used for collecting the samples associated with particular surveys, e.g. IPS. Finally, we should acknowledge that 2011 is likely to see the last census of population in the UK. Proposals have been formulated for an integrated population statistics system (IPSS) (ONS, 2003a) that combines census data at individual level into a single comprehensive statistics database with survey and administrative data and will underpin the country’s population and social statistics. At the heart of these proposals is a high quality address register containing information on properties and characteristics of individuals associated with these properties together with a population register, which will provide the basis for linkage with data from other sources. The results of the 2011 Census will form the basis of the information contained in the proposed system that will subsequently be updated with data from further censuses, the proposed Integrated Household Survey and other administrative and registration systems. This system is likely to generate interaction data on a more regular basis and it will be very important to ensure that data release is maximised without the effects of disclosure control becoming too detrimental.
REfERENCES Al-Hamad, A., Hayes, L., & Flowerdew, R. (1997). Migration of the elderly to join existing households, evidence from the Household SAR. Environment & Planning A, 29(7), 1243–1255. doi:10.1068/a291243
Interaction Data
Boden, P., & Stillwell, J. (2006). New migrant labour in Yorkshire and the Humber. The Regional Review for Yorkshire and the Humber, 16(3), 18–20. Boden, P., Stillwell, J., & Rees, P. (1992). How good are the NHSCR data? In Stillwell, J., Rees, P. H., & Boden, P. (Eds.), Migration Processes and Patterns Voume 2 Population Redistribution in the United Kingdom (pp. 13–27). London: Belhaven Press. Bogue, D. (1959). Internal migration. In Hauser, P., & Duncan, O. D. (Eds.), The Study of Population (pp. 486–509). Chicago: University of Chicago Press. Bover, O. A. M. (2002). Learning about migration decisions from the migrants, Using complementary datasets to model intra-regional migrations in Spain. Journal of Population Economics, 15(2), 357–380. doi:10.1007/s001480100066
Champion, A., Bramley, G., Fotheringham, A., Macgill, J., & Rees, P. (2003). A migration modelling system to support government decision making. In Geertman, S., & Stillwell, J. (Eds.), Planning Support Systems in Practice (pp. 269–290). Heidelberg: Springer. Champion, A., & Coombes, M. (2007). Using the 2001 census to study human capital movements affecting larger cities: insights and issues. Journal of the Royal Statistical Society A, 170(2), 447–467. doi:10.1111/j.1467-985X.2006.00459.x Champion, A., Coombes, M., Raybould, S., & Wymer, C. (2007). Migration and Socioeconomic Change A 2001 Census Analysis of Britain’s Larger Cities. York, UK: Joseph Rowntree Foundation. Champion, A., Fotheringham, S., Boyle, P., Rees, P., & Stillwell, J. (1998). The Determinants of Migration Flows in England: A Review of Existing Data and Evidence. London: DETR.
Boyle, P. (1995). Public housing as a barrier to long-distance migration. International Journal of Population Geography, 1, 147–164.
Chappell, R., Vickers, L., & Evans, H. (2000). The use of patient registers to estimate migration. Population Trends, 101, 19–24.
Boyle, P., & Feng, G. (2002). A method for integrating the 1981 and 1991 GB census interaction data. Computers, Environment and Urban Systems, 26, 241–256. doi:10.1016/S0198-9715(01)00043-6
Cole, K., Frost, M., & Thomas, R. (2002). Workplace data from the census. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 269–280). Chichester, UK: Wiley.
Bulusu, L. (1989). Migration in 1988. Population Trends, 58, 33–39.
Courgeau, D. (1973). Migrants and migrations. Population, 1, 96–129.
Bulusu, L. (1990). Internal migration in the United Kingdom, 1989. Population Trends, 62, 33–36.
Courgeau, D. (1976). Quantitative, demographic, and geographic approaches to internal migration. Environment & Planning A, 8(3), 261–269. doi:10.1068/a080261
Bulusu, L. (1991). Review of migration data sources. OPCS Occasional Paper 39. London, OPCS. Champion, A. (2005b). Population movement within the UK. In R. Chappell (Ed.), Focus on People and Migration 2005 Ed. (pp. 91-113). Basingstoke, UK: Palgrave Macmillan.
Dale, A. (1998). The value of the SARs in spatial and area-level research. Environment & Planning A, 30, 767–774. doi:10.1068/a300767 Dale, A., & Marsh, C. (Eds.). (1993). The 1991 Census User’s Guide. London: HMSO Publications.
25
Interaction Data
Davies, E., Williamson, P., & Houldsworth, C. (2006). The Leaving of Liverpool: An Examination into the Migratory Characteristics of Liverpool. Mimeo, University of Liverpool.
Dustmann, C., & Faber, F. (2005). Immigrants in the British labour market. Fiscal Studies, 26(4), 423– 470. doi:10.1111/j.1475-5890.2005.00019.x
Denham, C., & Rhind, D. (1983). The 1981 census and its results. In Rhind, D. (Ed.), A Census User’s Handbook (pp. 17–88). London: Methuen.
Ewens, D. (2005a). The National and London Pupil datasets: An introductory briefing for researchers and research users. Data management and Analysis Group. Greater London Authority.
Dennett, A., & Stillwell, J. (2008a). Internal migration in Great Britain – a district level analysis using 2001 data. Working Paper 08/1, School of Geography, University of Leeds, Leeds.
Ewens, D. (2005b). Moving home and changing school: widening the analysis of pupil mobility. Data management and Analysis Group. London: Greater London Authority.
Dennett, A., & Stillwell, J. (2008b). Population turnover and churn - enhancing understanding of internal migration in Britain through measures of stability. Population Trends, 134, 24–41.
Fielding, A. (1992). Migration and social mobility – South East England as an escalator region. Regional Studies, 26(1), 1–15. doi:10.1080/003 43409212331346741
Dennett, A., Stillwell, J., & Duke-Williams, O. (2007). Interaction data sets in the UK: an audit. Working Paper 07/03, School of Geography, University of Leeds, Leeds.
Fielding, A. (1998). Counterurbanisation and social class. In Boyle, P., & Halfacree, K. (Eds.), Migration in Rural Areas: Theories and Issues. Chichester, UK: John Wiley.
Devis, T. (1984). Population movements measured by the NHSCR. Population Trends, 36, 15–20.
Finney, N., & Simpson, L. (2008). Internal migration and ethnic groups: evidence for Britain from the 2001 Census. Population Space and Place, 14, 63–83. doi:10.1002/psp.481
Duke-Williams, O. (2004). The development and use of information systems for monitoring and analysing migration in Britain. Unpublished PhD Thesis, University of Leeds, Leeds. Duke-Williams, O., & Rees, P. (1993). TIMMIG: a program for extracting migration time series tables. Working Paper 93/13, School of Geography, University of Leeds, Leeds. Duke-Williams, O., & Stillwell, J. (2007). The effect of small cell adjustment on interaction data produced from the 2001 Census. Environment & Planning A, 39(5), 1079–1100. doi:10.1068/ a38143 Dustmann, C., Fabbri, F., & Preston, I. (2005). The impact of immigration on the British labour market. The Economic Journal, 115(507), 324–341. doi:10.1111/j.1468-0297.2005.01038.x
26
Flowerdew, R. (1997). The potential use of moving units in British migration analysis. In Rees, P. (Ed.), Third workshop: the 2001 Census - special datasets: what do we want? Working Paper 97/9. Leeds: School of Geography, University of Leeds. Forsythe, F. (1992). The nature of migration between Northern Ireland and Great Britain - a preliminary analysis based on the Labour Force Surveys, 1986-88. The Economic and Social Review, 23(2), 105–127. Fotheringham, A., Rees, P., Champion, T., Kalogirou, S., & Tremayne, A. (2004). The development of a migration model for England and Wales: overview and modelling out-migration. Environment & Planning A, 36, 1633–1672. doi:10.1068/a36136
Interaction Data
Gordon, I. (1995). Migration in a segmented labour market. Transactions of the Institute of British Geographers, 20(2), 139–155. doi:10.2307/622428
Horsfield, G. (2005). International migration, Chapter 7 in Chappell, R. (ed). Focus on People and Migration 2005 Ed., Palgrave Macamillan, Basingstoke, pp. 114-129.
Green, A. (1997). A question of compromise? Case study evidence on the location and mobiolity strategies of dual career households. Regional Studies, 31(7), 641–657. doi:10.1080/00343409750130731
Hussain, S., & Stillwell, J. (2008). Internal migration of ethnic groups in England and Wales by age and district type, Working Paper 08/xx), School of Geography, University of Leeds, Leeds.
Green, A. E., Hogarth, T., & Shackleton, R. E. (1999). Long Distance Living: Dual Location Households. Bristol, UK: Policy Press. GRO Scotland. (2006). Retrieved from http://www. gro-scotland.gov.uk/statistics/publications-anddata/annual-report-publications/annrep/01sect5/ estimating-migration.html GRO Scotland. (2007). Retrieved from http:// www.gro-scotland.gov.uk/statistics/migration/ information-on-migration.html Harland, K., Duke-Williams, O., & Stillwell, J. (2006). Commuting to school: an investigation of 2001 Census STS and PLASC data. Presentation at GISRUK’06, University of Nottingham, 10 April. Harland, K., & Stillwell, J. (2007a). Commuting to school in Leeds: how useful is the PLASC? Working Paper 07/02, School of Geography, University of Leeds, Leeds. Harland, K., & Stillwell, J. (2007b). Using PLASC data to identify patterns of commuting to school, residential migration and movement between schools in Leeds. Working Paper 07/03, School of Geography, University of Leeds, Leeds. Harris, J., Hayes, J., & Cole, K. (2002). Disseminating census area statistics over the Web, Chapter 8 in Rees, P., Martin, D., & Williamson, P. (Eds.). The Census Data System, Wiley, Chichester, pp. 113-121.
Jones, P., & Elias, P. (2006). Administrative data as research resources: a selected audit. Draft Paper, Warwick Institute for Employment Research. Kalogirou, S. (2005). Examining and presenting trends of internal migration flows within England and Wales. Population Space and Place, 11(4), 283–297. doi:10.1002/psp.376 Large, P., & Ghosh, K. (2006a). A methodology for estimating the population by ethnic group for areas within England. Population Trends, 123, 21–31. Large, P., & Ghosh, K. (2006b). Estimates of the population by ethnic group for areas within England. Population Trends, 124, 8–17. Lee (1966). A theory of migration, Demography, 3(1), 47-57. Liffen, K., Maslen, S., & Price, S. (1988). HES Book. London: Department of Health. Machin, S., Telhaj, S., & Wilson, J. (2006). The mobility of English school children, Discussion paper. Centre for Economic Performance. Mackintosh, M. (2005). 2001 Census: The migration patterns of London’s ethnic groups, DMAG Briefing Paper 2005/30. London: Greater London Council Data Management and Analysis Group, Greater London Council.
27
Interaction Data
Madouros, V. (2006). Impact of the LFS switch from seasonal to calendar quarters: an overview of the switch of the LFS to calendar quarters and the potential effects of this change on users. London: ONS. McHugh, K. E., Hogan, T. D., & Happel, S. K. (1995). Multiple residence and cyclical migration - a life-course perspective. The Professional Geographer, 47(3), 251–267. doi:10.1111/j.00330124.1995.00251.x Migration Statistics Unit. (2007). Using patient registers to estimate internal migration. Technical Guidance Notes, Migration Statistics Unit. ONS, Titchfield. Nielson, T. A. S., & Hovgesen, H. H. (2007). Exploratory mapping of commuter flows in England and Wales. Journal of Transport Geography, 16, 90–91. doi:10.1016/j.jtrangeo.2007.04.005 NISRA. (2005). Development of methods/sources to estimate population migration in Northern Ireland. NISRA Paper. Retrieved June 13, 2007, from http://www.nisra.gov.uk/archive/demography/publications/dev_est_mig.pdf Norman, P., Boyle, P., & Rees, P. (2005). Selective migration, health and deprivation: a longitudinal analysis. Social Science & Medicine, 60(12), 2755–2771. doi:10.1016/j.socscimed.2004.11.008 Office for National Statistics, General Register Office for Scotland, Northern Ireland Statistics and Research Agency. (2006). UK Statistical Disclosure Control Policy for 2011 Census Output, November. Retrieved March 1, 2007, from http://www.statistics.gov.uk/census/pdfs/ SDCpolicy.pdf Office of the Deputy Prime Minister. (2002). Development of a Migration Model. London: ODPM.
28
ONS. (1999). 1996-based subnational population projections England, ONS PP3 No 10. London: The Stationery Office. ONS. (2001). Census 2001 Origin-destination Statistics. Final Specifications. ONS. (2003a). Proposals for an Integrated Population Statistics System. Discussion Paper, ONS, October. ONS. (2003b). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2001. Series MN no.28. Office for National Statistics, London. Retrieved January 15, 2007, from http://www.statistics.gov. uk/downloads/theme_population/MN28.pdf ONS. (2004a). Proposals for a Continuous Population Survey. Consultation Paper, ONS, July. ONS. (2004b). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2002. Series MN no.29. Office for National Statistics, London. Retrieved January 15, 2007, from http://www.statistics.gov.uk/downloads/theme_population/MN_no_29_v3.pdf ONS. (2005a). The 2011 Census: Initial View on Content for England and Wales. Consultation Document. London: ONS. ONS. (2005b). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2003. Series MN no.30. Office for National Statistics, London. Retrieved January 15, 2007, from http://www. statistics.gov.uk/downloads/theme_population/ MN_No30_2003v3.pdf ONS. (2006a). Travel Trends. A report on the 2005 International Passenger Survey. Retrieved January 15, 2007, from http://www.statistics.gov. uk/downloads/theme_transport/traveltrends2005. pdf
Interaction Data
ONS. (2006b). Methodology for the experimental monthly index of services. Retrieved January 15, 2007, from http://www.statistics.gov.uk/iosmethodology/downloads/Whole_Report.pdf
Rees, P., & Boden, P. (2006). Estimating London’s New Migrant Population: Stage 1 – Review of Methodology. London: Greater London Authority.
ONS. (2006c). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2004. Series MN no.31. Office for National Statistics, London. Retrieved January 15, 2007, from http://www.statistics.gov. uk/downloads/theme_population/MN31.pdf
Rees, P., Martin, D., & Williamson, P. (Eds.). (2002). The Census Data System. Chichester, UK: Wiley.
ONS. (2006d). Report for the Inter-departmental Task Force on Migration Statistics. Retrieved January 15, 2007, from http://www.nationalstatistics.org.uk/about/data/methodology/specific/population/future/imps/updates/downloads/ TaskForceReport151206.pdf ONS/GROS/NISRA. (2001). Census 2001 Origin-Destination Statistics (Final Specifications), London. Retrieved May 15, 2007, from http://www.statistics.gov.uk/census2001/pdfs/ OriginDest4web.pdf OPCS/GROS. (1992). 1991 Census, Definitions Great Britain. London: HMSO. Openshaw, S. (Ed.). (1995). Census User’s Handbook. Cambridge: GeoInformation International. Platt, L., Simpson, L., & Akinwale, B. (2005). Stability and change in ethnic groups in England and Wales. Population Trends, 121, 35–46. Poulain, M. (1996). Confrontation des statistiques de migration intra-Européennes: vers plus d’harmonisation. European Journal of Population, 9(4), 353–381. doi:10.1007/BF01265643 Rees, P. (1977). The measurement of migration from census and other sources. Environment & Planning A, 9, 257–280. doi:10.1068/a090247
Rosenbaum, M., & Bailey, J. (1991). Movement within England and Wales during the 1980s, as measured by the NHS Central Register. Population Trends, 65, 24–34. Salt, J. (2005). International Migration and the United Kingdom - Report of the United Kingdom SOPEMI correspondent to the OECD, 2005. Retrieved January 15, 2007, from http://www.geog. ucl.ac.uk/ mru/docs/Sop05fin_20060627.pdf Scott, A., & Kilbey, T. (1999). Can patient registers give an improved measure of internal migration in England and Wales? Population Trends, 96, 44–55. Shields, Ma, P. (1998). The earnings of male immigrants in England: evidence from the quarterly LFS. Applied Economics, 30(9), 1157–1168. doi:10.1080/000368498325057 Stillwell, J. (1994). Monitoring intercensal migration in the United Kingdom. Environment & Planning A, 26, 1711–1730. doi:10.1068/a261711 Stillwell, J., & Duke-Williams, O. (2003). A new web-based interface to British census of population origin-destination statistics. Environment & Planning A, 35, 113–132. doi:10.1068/a35155 Stillwell, J., & Duke-Williams, O. (2007). Understanding the 2001 UK census migration and commuting data: the effect of small cell adjustment and problems of comparison with 1991. Journal of the Royal Statistical Society. Series A (General), 170(Part 2), 1–21.
29
Interaction Data
Stillwell, J., Duke-Williams, O., Feng, Z., & Boyle, P. (2005). Delivering census interaction data to the user: data provision and software development, Working Paper 05/01, School of Geography, University of Leeds, Leeds. Stillwell, J., Duke-Williams, O., & Rees, P. (1995). Time series migration in Britain: the context for 1991 Census analysis. Papers in Regional Science: Journal of the Regional Science Association, 74(4), 341–359. Stillwell, J., & Hussain, S. (2008). Ethnic group migration within Britain during 2000-01: a district level analysis. Working Paper, 08/2, School of Geography, University of Leeds, Leeds. Stillwell, J., Rees, P., & Boden, P. (Eds.). (1992). Migration Processes and Patterns: Vol. 2. Population Redistribution in the United Kingdom. London: Belhaven Press.
30
Stillwell, J., Rees, P., & Duke-Williams, O. (1996). Migration between NUTS level 2 regions in the United Kingdom. In Rees, P., Stillwell, J., Convey, A., & Kupiszewski, M. (Eds.), Population Migration in the European Union. Chichester, UK: Wiley. Stillwell, J., Rees, P., Eyre, H., & Macgill, J. (2002). Improving the treatment of international migration. In ODPM (2002), Development of a Migration Model (pp. 223-238). London: ODPM. Vandeschrick, C. (1992). Le diagramme de Lexis revisité. Population, 92(5), 1241–1262. doi:10.2307/1533940 Vandeschrick, C. (2001). The Lexis diagram, a misnomer. Demographic Research, 4(3), 98–124. Williams, M. (2000). Migration and social change in Cornwall 1971-91. In R. Creeser & S. Gleave (Eds.), Migration in England and Wales Using the Longitudinal Study (pp. 30-39). London: The Stationery Office, ONS Series LS.
31
Chapter 2
Access to Census Interaction Data Adam Dennett University of Leeds, UK John Stillwell University of Leeds, UK Oliver Duke-Williams University of Leeds, UK
ABSTRACT This chapter is concerned with how users gain access to census interaction data. The authors outline a brief history of electronic access to interaction data sources and identify a number of issues and problems which led to the development of the Web-based Interface to Census Interaction Data (WICID). After presenting a number of practical and technical prerequisites for WICID, the authors explain in detail the architecture underpinning the system and the importance of the metadata framework for both initial successful implementation and ongoing maintenance and flexibility. Much of this chapter is devoted to explaining the basic query building and data extraction processes from a user perspective and further guidance relating to some of WICID’s less basic but no less useful features, is provided.
INTRODUCTION We live in times when gaining access to electronic secondary data is increasingly quick and easy. In the UK, there are a range of avenues for those seeking access to go down, at the end of which a vast quantity of data are available. A researcher wishing to gain access to the latest Census Key Statistics, for example, needs only log on to the ONS neighbourhood statistics website (http://neighbourhood. DOI: 10.4018/978-1-61520-755-8.ch002
statistics.gov.uk/), select the desired area(s) and select whichever variables are required before simply clicking a button which will deliver the selected data over the internet to his/her desktop. Access to these data are unrestricted and freely available to anyone with an internet-enabled computer. It is not just standard census data that are available with no more effort than a few clicks of a button. The UK Data Archive (http://www.data-archive. ac.uk/) provides access to a colossal range of data for academic users (geographically referenced or otherwise) from an increasingly wide range of social
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Access to Census Interaction Data
surveys and other sources. Advances in computing technology, coupled with the proliferation of digital data storage and developments in network capacity and speed mean that data, which, not more than twenty years ago were difficult to gain access to let alone process with ease, are now so freely available that many users take their availability for granted. The development of technologies that have brought us to the current zenith of rapid access to numerous sources of secondary data has been gradual. Particular technological developments in data access have been driven and shaped by the demands of users, the nature of the datasets themselves and the concerns of the data custodians relating both to who accesses the data and exactly how much detail users are able to access. For example, as outlined in the Neighbourhood Statistics Programme evaluation report (ONS, 2006), the development of the current online system came about as the result of the poor availability of small area key statistics that could be used in local government policy and planning. Data that were available were from disparate sources and often only accessible after payment to custodians was made. With the desire to increase the availability of data as the main driver, the particular technological developments which have taken place to help provide access to these data have been guided by a better understanding of the particular requirements of end-users as the system developed. Interaction data can be viewed as a particular sub-set of statistical information with unique characteristics; characteristics which have resulted in different approaches to access. Accessibility and ease of access drove the development of the Neighbourhood Statistics programme; it could be argued that due to the unique nature of interaction data and its more specialist utility and resultant smaller user-base, the route to a more universal access has been very different to that of neighbourhood and other statistics required more readily by governmental decision makers. One of the problems in the
32
past with using origin-destination data has been that the available software has been difficult to use, and this can certainly be attributed to the smaller user-base for the data. This is especially true in comparison to access software for the area statistics. To provide context for the latest access developments for interaction data, a brief review of software packages that were used to access origin-destination data prior to the Web-based Interface to Census Interaction Data (WICID – the system developed for use by the academic community in conjunction with the outputs from the 2001, 1991 and 1981 Censuses) will be given.
THE ACCESS PROBLEM There now exist some excellent web-based interfaces to data sets from the Census of Population and other sources. The neighbourhood statistics website already mentioned provides access to the most recent census data counts. Another example is CASWEB (http://casweb.mimas.ac.uk/), providing access to UK Census Area Statistics and related information to academic users from the most recent 2001 Census, but also the 1991, 1981 and 1971 Censuses in the form of counts of persons and households for various geographical units. CASWEB has been developed by the Census Dissemination Unit (CDU) which forms part of the Manchester University-based MIMAS service and its development, like that of WICID, has been funded as part of the Census Programme by the Economic and Social Research Council (ESRC) and the Joint Information Systems Committee (JISC). Another example is NomisWeb (http:// www.nomisweb.co.uk/), giving online access to the most up-to-date and detailed labour market data produced from official sources and run by the University of Durham. Both CASWEB and NomisWeb are similar in that they are primarily based on stock variables; that is, counts or values relating to specific geographical areas which
Access to Census Interaction Data
can be aggregated easily to larger spatial units, although NomisWeb now also provides access to interaction data. Interaction flow data are more complicated than stockdata, not only because two sets of geographical areas (origins and destinations) are involved and because the size of the matrices can be very large, but also because of the problem encountered when flow counts are aggregated. For example, whilst the population of London can be derived as the sum of the populations of its constituent boroughs, the total migrants arriving in the capital city is not equivalent to the sum of each borough’s in-migration flow because of all the flows that occur between boroughs within London. The above features of interaction data sets suggest that they are conceptually more difficult to manipulate than stock information and thus require handling with more technically sophisticated software. An early software package designed to provide access to interaction data was MATPAC (the MAtrix analysis package). This was produced by MVA Systematic Ltd., initially to provide access to the 1981 Special Workplace Statistics Set C, (Manchester Computing, 1989) but was developed subsequently to provide access to the 1981 Special Migration Statistics. A very similar package – MATPAC91 – was also produced in order to provide access to interaction data from the 1991 Census. As an early attempt to provide access to complex interaction data, MATPAC was concerned primarily with facilitating some form of customised access to the data. MATPAC was certainly not furnished with a ‘point-and-click’ user interface; anyone wishing to use the system would need to have a familiarity with the VM/ CMS operating system and would need to be able to run programmes, edit command files or be able to transfer pre-prepared command files across the network. For data to be accessed through MATPAC, users were required to create a command file containing a series of instructions which could then be run on the software. Each command file
was broken down into a series of modules, with any given extraction task possibly requiring a number of modules in separate command files. A typical set of modules comprising a query may have appeared thus: •
•
•
•
•
•
Subarea: this module was used to read in national level data files and produce local data files as an output. Data produced by this module would be held in MATOPCS and ZONELIST files holding the raw data and area identifiers respectively. A single set of areas was defined in the SUBAREA module and output files would contain all flows between these areas, as well as all flows into and out of these areas. Scan: produced summary information from a system file or a ZONELIST file. Of particular use would be an output list of distinct origins and destinations which could be used in subsequent re-zoning procedures. Create: was used to create a system file from a local pair of MATOPCS and ZONELIST files, this file could then be used in conjunction with other modules to generate printed output, or manipulate the matrix in some way. Report: could be used to create a printed report of chosen variables in a variety of styles. Filters could be specified to include or exclude particular zones. Modify: could be used to derive new variables from those already existing. Variables could be added, subtracted, divided and multiplied as well as simple functions such a square root, minimum and maximum being available. Transpose: allowed a matrix to be transposed so that rows could become columns and visa versa. Using this function in conjunction with MODIFY allowed for net flows to be calculated by subtracting the flow from a given origin or destination
33
Access to Census Interaction Data
•
from its equivalent flow in the transposed matrix. Rezone: provided the facility to re-aggregate a system file.
So, a selection of modules similar to the ones above would need to be compiled and run by any user wishing to access interaction data via MATAC. As is evident from the list of modules and the functions contained within them, MATPAC was a powerful data extraction and manipulation tool; it allowed users to specify customised sets of origins and destinations and associated inflows and outflows linked to specific variables, these flows could then be output in a format requested by the user. As is also evident, however, MATPAC required a relatively large amount of specialist knowledge to use. Creating modules and compiling them into a command file to run on the system was not only a fairly complex process, but was time consuming and error prone. Furthermore, MATPAC did not use a generic database management system to manipulate the data and this imposed certain limitations. For example, where ranges of area codes were used, the system had to assume that input data were sorted alphabetically. In addition some users found that the system was not able to handle large volumes of data: anecdotal evidence suggests that the use of MATPAC within the Greater London Council was constrained to small datasets as the software could not handle large matrices effectively. Where MATPAC struggled with usability, another early software development faired better, although perhaps at the cost of MATPAC’s increased interaction data specific functionality. The National Online Manpower Information System (Nomis), developed initially by the Manpower Services Commission and the Universities of Newcastle-Upon-Tyne and Durham in the late 1970s (Townsend et al., 1987), provided access to range of geographically referenced datasets. Nomis differed from MATPAC in a number of ways: Firstly, it was designed as a
34
system to hold a variety of structured datasets at a variety of spatial scales, with the potential for new data to be added to the system at any time. Furthermore, data could be output in a range of formats. Nomis was also designed to be used more widely than MATPAC, with a user-base broader than the relatively narrow selection of academic users present there. Originally Nomis provided access to NHSCR migration data amongst the range of standard datasets. Access was facilitated through an interactive command interpreter: users would build up a query through typing textual commands and the system would respond with relevant messages. In order to download NHSCR data, the user would need to type character strings into the command interpreter to specify origins, destinations, age/sex characteristics and time periods of interest. The system would then respond with the desired output. The original Nomis system required end users to pay for the time it took to both extract data and to be connected to the system. This undoubtedly resulted in the limited use of the system by academics. Nomis was superceded in the 1990s by NomisWeb. The command interpreter was replaced by a more user-friendly interactive web interface and is still in operation, now providing free online access to a range of census, survey and other population data. The NHSCR migration data which were originally available through Nomis are no longer accessible; however interaction data in the form of the Special Workplace Statistics for 1981, 1991 and 2001 are available. Whilst relatively easy to use, the generic system which caters for both standard and interaction data has its limitations. It is possible to extract data for different number of origins and destinations of different geographies, but the range of geographies is limited, with local authority district being the finest grained geography available. Other potentially useful features such as the choice to download interaction data either as a matrix or a pair-wise list, or to carry out some analysis functions before
Access to Census Interaction Data
download are not offered. Furthermore, with the disappearance of the NHSCR data from the system, Nomis no longer provides access to migration data – a significant loss. This brief review of historical attempts to provide electronic access to interaction data reveals that whilst providing access that would otherwise not have existed, both MATPAC and Nomis had a set of individual limitations which will have curtailed the use of interaction data by a wider audience. As a result of these general limitations, in the 1990s, work began on a new, flexible system which would provide access to census interaction data, avoiding the limitations of these earlier systems. It was recognised that providing easier access to these data would facilitate wider use and more research output. This new development was in part made possible by advances in computer hardware technology and in open source software, and would be known as the Web-based Interface to Census Interaction Data or WICID.
INTERfACE REQUIREMENTS The construction of WICID had to take on board a number of key requirements. Firstly, given the overall aim of encouraging further use of the interaction data sets, it was fundamental to provide a user-friendly interface, facilitating query building, quick data extraction and simple downloading of files containing interaction flows in common formats for analysis in other software packages. The simplicity of the interface and the ease with which subsets of data can be extracted were considered to be of paramount importance, especially since one key group of potential users identified was students taking undergraduate and taught postgraduate modules in higher and further education institutions across the country. A second and related requirement was the provision of a library containing popular data subsets and their associated pre-defined queries, together
with a facility for allowing users to store their own customized queries and to retrieve those created previously and saved in their own workspace on the server. Each of these features was regarded as being conducive to simplifying the user’s role where possible, yet providing clear guidance. The third requirement was the need to provide a system that was flexible in the sense that new data sets, such as the 2001 Census interaction data, could be added as they became available and without having to undertake any major system redesign. The creation of a generic framework of metadata files embracing all the information components within the system has been particularly important in this context. Detailed explanation of the metadata framework is found in Duke-Williams and Stillwell (2002). Flexibility was also an important criterion behind the fourth requirement of the system which was to develop a simple method for the selection of relevant geographical areas. As mentioned previously, an extraction system for interaction data has to consider dual geographies of both flow origins and destinations. In its simplest form, this might involve the facility to select matching sets of origin and destination areas at one spatial scale. The complexity is introduced when we attempt to provide users with the option of selecting non-matching sets of origins and destinations at any one scale, but also to build queries involving origins and destinations selected from different geographical levels. Furthermore, a system was required which would handle immigration flows from overseas and flows from unstated origins. It was also necessary to allow intra-area flows to be identified, and included or excluded optionally, in a way which users could understand, particularly when these flows involved persons living and working at the same location (home workers). As well as providing the option of flexibility in the selections of geographical origins and destinations, another critical requirement of WICID was to assist users by providing alternative methods to help users in selecting the variables required.
35
Access to Census Interaction Data
One approach to presenting the variables to the user was through the use of the familiar OPCS/ ONS published table structure; an alternative approach was to present users with a list of all variables contained in all tables, thus allowing users to specify the variable in which they were interested. In addition, it was considered important to allow users to create their own specific ‘derived variables’ from those that had been initially selected. Thus, for example, users of the five-year age group data in the 1991 SMS Set 2 would be able to select all the age groups and subsequently, within the query construction process, derive their own aggregations of age groups to fit their particular requirements. In addition, whilst ensuring system security and preventing unauthorised users from extracting data from sets for which they are not registered, it was deemed useful in the interests of promoting accessibility to create an open system that allowed unregistered users to explore the system and its data contents before registration. With this in mind, the sample set of migration data was included from the 1991 Census, involving persons moving within and between districts where the flows have been ‘blurred’ to preserve confidentiality. Finally, it was thought beneficial that WICID, once constructed, could be re-packaged and redistributed to be used with other, non-UK interaction data. The WICID system has been built from a number of linked software components including a web server (Apache), a database management system (PostgreSQL) and an application language (PHP Hypertext Processor). All the software components are freely available (in contrast to the main alternative – a Microsoft IIS server, hosting an MS-SQL database, queried by an ASP scripting language – components of which would need to be purchased), permitting re-distribution of the entire WICID system (except the actual data sets) without licensing difficulties to other public or private organisations in the world who might want to provide access to their own
36
interaction data sets across the internet in a similar manner. We turn our attention in the next section to explaining the architecture of the system in the light of these interface requirements.
WICID: SYSTEM ARCHITECTURE AND METADATA WICID has been designed to offer a user interface via a standard web browser and, as already indicated, is constructed from a number of linked software components including a web server, a database management system (DBMS) and an application language. It should be noted that the terms ‘web server’ and ‘DBMS’ refer to software programs and not to physical hardware. The architecture is illustrated schematically in Figure 1. •
•
Web server: Users interact with WICID by means of a web browser connected to a web server. Whenever a user clicks on a link within their browser, a request for an associated document is sent to the web server. Some requests are for pre-existing HTML documents that are returned directly by the server; other requests will be for dynamically created web pages. When the server receives a request for such a page, it runs a program that produces the page as its output. WICID uses Apache (http:// www.apache.org) as its web server. Database management system (DBMS): This is used to store data of different types: primary migration and commuting flow data to which WICID provides access; metadata describing those primary data; and ‘state’ data that record details of the sessions of each logged in user. WICID uses PostgreSQL (http://www.postgresql. org) as its DBMS and to provide support for the storage and manipulation of geometric features (i.e. points, lines, polygons, et cetera). In particular, a third party add-
Access to Census Interaction Data
Figure 1. WICID architecture: Schematic framework
•
on to PostgreSQL called PostGIS (http:// postgis.refractions.net/) offers facilities to handle spatial data that follow the OpenGIS ‘Simple Features Specification for SQL’ standard. Application language: In order for dynamic web pages to be created, a programming language is required. Such languages can either be incorporated into the web server or run externally. For any given user request, a program must produce output that can be sent directly to a web browser; typical output is thus syntactically correct HTML. The application language is also used to glue system components together, so the chosen language must also include suitable functions to send queries to the preferred DBMS and to be able to receive and interpret responses. The programming language for WICID is PHP (PHP Hypertext Processor) (http://www.php.net).
The WICID system thus consists of a number of static web pages, a set of PHP scripts that dynamically generate further web pages, and a collection of data that are held in the DBMS. It may be noted that the software components listed above are all free (that is, the source code for the
software is made available, and can be modified and redistributed). This permits redistribution of the entire WICID system (except the actual data sets) without licensing difficulties, and also permits modification of the software if necessary. The relationship between the software components used can be illustrated by considering the chain of events that occurs when a user requests a page from the WICID system. A request for a given page is sent from the user’s browser to the web server. The server then maps the requested URL to a local filename and processes the request. In the case of standard HTML format pages, the server simply returns the local file. In the case of a PHP script, the server executes the script and sends the results back to the user. The output of the PHP script must therefore be HTML formatted text, or any other format that will be readily understood by the browser. During execution of the PHP script, queries may be sent to a DBMS, which will process the query and return results to the PHP script. Use of a web browser and server to provide a system such as WICID offers the advantage of platform portability and user familiarity. Anyone with a suitable web browser can use the system, regardless of the operating system or specifications of their computer, and the user is required only
37
Access to Census Interaction Data
to be able to use that web browser, rather than to learn a new and different software package. However, the web model is limited in a number of respects, and it is necessary to address these limitations. There are two significant problems in the use of the web to provide a more sophisticated interface to a large and complex data set. Firstly, the interface is limited to the basic elements provided by HTML unless an additional language such as Java is used to provide applets that extend the functionality of the browser. The use of Java may be viewed as undesirable because it raises the barrier to entry from ‘any web browser’ to entry from ‘any web browser capable of running Java applets’. On the other hand, restricting WICID to a purely HTML based system might seem unnecessarily cautious and could in fact lead to a significant loss of potentially useful functionality to the end user, especially when the vast majority of users, being based in the UK, will have relatively upto-date software installed on their computers and find it easy to install the relevant Java plug-ins if needed. With this in mind, most of WICID was created with standard HTML. However, where the inclusion of an additional language such as Java was seen to provide significant benefits to users, it was included. An example of this is in the inclusion of the interaction map selection tool to the system. Java is required to display the map, but once displayed it allows users without an encyclopaedic knowledge of UK boundary locations and contiguities to select areas from visual information appearing on the screen. The second limitation is that the web is a stateless medium. That is, all requests to the server are processed independently and irrespective of any previous requests made by the user. This is problematic if the interface is sufficiently complex that it will require more than one page. The problem of statelessness is overcome by establishing a session that is identified with a unique sessionid. The session-id is created when a user requests a page that is part of the WICID system for the
38
first time. Whenever a user requests a subsequent page, their browser automatically supplies the current session-id. At the end of each script, the session state (that is, the current value of important variables) is saved, and at the start of each script the values are retrieved, using the session-id as a key. The session-id variable is stored by WICID as a browser-side cookie. In order for the system to flexibly handle a variety of ‘primary’ data (that is, the actual raw migration and commuting data sets described earlier that were provided by the census offices), it is necessary to store metadata that describe these data. The information system can then use the information encoded within the metadata to handle each data set correctly. The metadata consist of a number of related tables. In accordance with standard relational database practice, the metadata tables are normalised as far as is practical; thus there are a relatively large number of small tables that hold lists of possible values that describe aspects of each data set. WICID has been designed to have few (or no) hard-coded assumptions about the data that it handles. For example, although it is intended for use with UK census data, the metadata model makes no assumptions about the different geographies to be included, and thus the system could be extended to handle data from other countries with little difficulty. Instead of pre-determined assumptions, all aspects of the primary data sets are described in the metadata, and are looked-up whenever a dynamic page is produced. The main exceptions to this data-independence come in the form of the help system, background information and narrative material included in the various pages; all of which are written specifically for UK census interaction data. This flexibility is very important, as one of the key design criteria for WICID is for the system to be able to handle data sets that do not yet exist but will arrive at some future date, and the precise specification of which is not yet known. However, there is a requirement that the data sets to be used in WICID
Access to Census Interaction Data
must adhere to a basic format: that the data can be represented as a series of vectors, where each vector describes a flow between a single origin and a single destination. The vector should have a structure that includes an origin identifier, a destination identifier, and a set of fields giving information about the flow between the origin and the destination. This information will generally be a set of counts disaggregating the flow in some manner, and it is assumed that the data fields can be divided into one or more portions, described as ‘tables’, that disaggregate the flow (or subsets of it) in different ways. The 1981 and 1991 and 2001 SMS and SWS data sets follow this format. At present, WICID assumes a rule that the vectors must be aggregate observations and must be coded with a unique pairing of origin and destination. However, this rule does not lead to many assumptions within the application code, and could be over-ridden by modifying the SQL statements used to extract data from the DBMS. The overall metadata structure and tables involved are illustrated in Figure 2. There are a small number of additional metadata tables that are also
used in the system but not shown which are typically ‘views’ derived from the existing tables shown. The metadata tables are divided up into several groups, although they are not all mutually exclusive. The central table is meta_datasets, which lists all the interaction data sets held in the system. A number of groups of tables are also shown under ‘Attribute information’, ‘Geography information’, ‘Data set information’and ‘Administrative information’. Lines linking tables indicate a relationship between the metadata tables. The line joining meta_datasets and meta_geog, for example, represents the fact that two geographic codes are held for each data set listed in meta_datasets and the values of these codes must be drawn from the set extant in meta_geog. Given the general metadata structure outlined above, WICID enforces a set of requirements concerning the types of fields used to hold the data, and the names of all fields used in the database. This is done so that the name of any field can be derived internally wherever needed using a fixed set of rules. Data to be loaded into the system must therefore be coded appropriately and all fields given their ‘correct’ names.
Figure 2. The structure of the metadata tables
BASIC INTERfACE fUNCTIONS Although WICID has a guest function that allows anyone to logon and explore the system, those wishing to extract and download interaction data are required to become registered census users and to be part of a recognised higher or further education institution or have an individual Athens username and password. The Census Portal facilitates access to the census data resources for UK higher and further education via Federated Access Management (Shibboleth) or Athens Single Sign On at http://census.ac.uk/. Currently the system is only available to members of the academic community in the United Kingdom. Once registered, access to the CIDER homepage is at http//cider.census.ac.uk/ as shown in Figure 3. Clicking on the WICID button on the homepage will start the system.
39
Access to Census Interaction Data
Figure 3. The CIDER homepage
In technical terms, one of the major challenges in developing WICID has been to allow users the flexibility of selecting zones from different geographical scales and allowing the origin zone set to be different from the destination zone set. In practice, this means that the system has to respond to the following type of query: extract all the commuting flows into the centre of a particular city (designated by a central ward) from other wards in the city, from surrounding districts, from other counties or regions and from the rest of the
Figure 4. The WICID query interface
40
world. From a user perspective, this procedure is facilitated by using a traffic light metaphor (Figure 4) to ensure that the variable selections and the geographical settings have been completed satisfactorily before the query is allowed to run. Users (students) can refine the query prior to extraction e.g. to set the diagonal (intra-zone) flows to zero if required. Originally geographical settings and variables could be selected in whichever order the user required. During an update of the WICID system
Access to Census Interaction Data
Figure 5. Tables available in 2001 SMS level 1
in 2007, it was decided that this flexibility was unnecessary and provided little benefit to the user. Inexperienced users especially would encounter confusion when selecting geographies that related to one time period before attempting to select data from another. As is shown in Figure 4, WICID now guides the user through the query process through a logical order of tabbed pages. Users are now required to select their data first, before being able to select from a range of geographies only associated with their data time period. In the first ‘Data’ step of the query building process, the user is presented with a set of data selection methods: ‘Quick selection’ allows the selection of pre-defined totals that are available with certain data sets; ‘Select by variable’ enables particular variables to be found quickly; ‘Select by data set and table’ allows the selection of cells in tables as described in published documentation and is the most commonly used method. This final mode of selection also differs from the first two in that additional steps asking the user to select migration or commuting data as well as a data year are presented before individual data tables can be examined. Once a data set has been selected, the user chooses the table required (Figure 5) and finally the variable(s) from the selected table. Figure 6
indicates that 24 interior cell values are available to be selected from 2001 SMS Level 1, table 3: These relate to the seven ethnic groups with male, female and total values available for each ethnic group, as well as totals for each sex and a grand total. Once cells have been selected and added to the query, a green traffic light on the ‘Data’ tab indicates that user is able to proceed to select origin and destination geographies. Of course, at this point the user is not restricted to selecting only data from this table; it is entirely possible to select additional data from other tables held within the system. After clicking ‘Geography’ on the query interface, the user is asked to select either origins or destinations first, before being transferred to a screen indicating the available selection tools. There are various methods of selecting geographical areas (Figure 7): ‘Quick selection’ enables all areas at a certain scale to be selected; ‘List selection’ allows areas to be chosen from a list of all areas at each scale; and ‘Type-in-box’ selection provides for one area to be selected at a time – useful if the name of the area is known and the user does not want to scroll through a large list of options. Furthermore there is a ‘Map selection’ tool which allows users to select areas by clicking them on a map. The ‘Copy Selection’ facility is
41
Access to Census Interaction Data
Figure 6. Cells of table 3 in 2001 SMS level 1
very useful if a user wishes to select data for the same origins and destinations as it automatically sets the remaining geographies to whatever origins or destinations have already been selected. The most common method used for selecting areas tends to be the list selection method (Figure 7). If, for example, the user chooses 2001 interaction data districts, the screen shown in Figure 8 will appear and the user can make the desired selection by checking the appropriate boxes and clicking on ‘Add chosen areas’. Within the list selection options a useful mechanism is available
Figure 7. Area selection tools available
42
which allows users to select all areas of a lower level geography that fall within a higher level; i.e. to select all wards in one district by selecting that single district. This facility is particularly useful at the level of output area for 2001 data where only area codes rather than names exist. Different sets of origins and destinations can be selected and each set may contain areas at different spatial scales so long as data exists in the database for this level. Consequently, results can be produced for asymmetric matrices in which the origin and destination areas are com-
Access to Census Interaction Data
Figure 8. List selection of districts
pletely different, e.g. flows between boroughs of London and districts of the South West region of England. One of the key problems that became evident when using the system in the classroom (Stillwell, 2006b) was that users were frequently unfamiliar with the geographical units that they wanted to select for sets of origins and destinations. This was particularly so when identifying origin zones that, for example, were contiguous with one destination zone, e.g. what are the counties that are contiguous with Greater London? What
are the districts that are contiguous with the Leeds metropolitan District boundary? In this context, it became necessary to develop a tool that enabled users to view a map of the area in which they were interested on the screen and to toggle between the different spatial scales, making boundaries visible when appropriate and providing the facilities to select areas by pointing and clicking on areas on the map. As a result, the Map Selection tool was developed in order to assist with this problem and to help users less familiar with UK geographical
Figure 9. The map selection tool
43
Access to Census Interaction Data
boundaries (Figure 9). The map selection tool has been facilitated in part by an add-on to the PostgreSQL relational database known as PostGIS. We have found that this facility is particularly important when users are students doing project work, especially for small areas whose precise locations and names they will not be familiar with. All the area selection methods require that users initially choose the spatial scale or level of geography for which they wish to extract data in the first instance. For example, in 2001 users have the option of four levels of data – output areas, wards, districts and Government Office Regions. Users can then zoom in and out on the map, before clicking on a particular area if they wish to select it; the selection being signified by the area turning grey (Figure 10). The map can also be furnished with additional information to assist the user. Area names or codes can be added, thus speeding up selection, especially where the user is not immediately able to identify an area from its shape. As with the list selection, users are also able to select all areas of one geography that fall within another. For example, by clicking on Bexley, the user could also select all wards or output areas within.
Once the selection of origins and destinations has been made, the data can be extracted and downloaded. Before download however, users can review and adapt and save their query. The tabbed query builder will open the ‘Finalise’ screen where a number of options are presented (Figure 11): Applying an intra-area flow filter allows the user to either select only intra-area flows, or alternatively to omit these flows from the data output; the query summary will show on screen all selected origins, destinations and data cells; the save option allows users to save queries – particularly useful where complex sets of origin and destination geographies have been created. Once any final alterations have been made to the query, clicking the ‘Run’ tab will send a request to the database to extract the data. The time taken to perform the query will depend upon the amount of data to be extracted. When the extraction has been completed the user will be informed of the time taken to complete the procedure (Figure 12) and invited to continue, at which point the data can be shown on screen (if not too extensive), downloaded or subjected to a suite of inbuilt analysis functions. Figure 13 contains an example of a simple query for extracting migration flows from the 2001 SMS
Figure 10. Example of areas being selected using the map selection tool
44
Access to Census Interaction Data
Figure 11. The finalise screen
Table MG 101 between the four countries of the UK and the output, as presented on screen within WICID prior to downloading, is shown in Figure 13b. Other examples of queries can be found in Stillwell and Duke-Williams (2003) and Stillwell (2006a). The first five rows and columns are always printed to the screen to allow initial verification
that the required data have been extracted. If the data are to be downloaded, the user has the option of choosing various formatting, labelling and layout options. Flows can be downloaded in matrix format or as a list for each origindestination pair. Prior to download WICID allows users to undertake some analysis of the data extracted. This feature will be described in
Figure 12. Screen indicating extraction has been completed
45
Access to Census Interaction Data
Figure 13. Example of simple query and data extracted
more detail along with some other additional interface functions below.
SOME ADDITIONAL INTERfACE fUNCTIONS In addition to its standard interface functions, WICID offers a selection of further facilities which enhance and add value to the data. One such example is the facility to download population data in tandem with selected migration data. The varying sizes of areas and the populations within them mean that the comparison of raw migration flows can be misleading. It is standard practice in migration research to compute rates of movement which will facilitate some direct comparison of areas with different sized populations. When calculating rates, a population at risk (PAR) denominator is used. As outlined by Bell et al. (2002), it would be preferable to use a PAR measured at the begin-
46
ning of a transition period rather than the end due to the potentially distorting effect of births, deaths, immigration and emigration. However, in practice this is impractical (and in many cases impossible if population data are to be drawn from the same source as the interaction data) to attain, so end-of-period PAR are used instead. All PAR used in WICID are associated with the census dates. Whilst researchers using interaction data have always been able to calculate rates of movement by accessing PAR data from other sources, the data may not necessarily be aggregated to the appropriate geographical level, or indeed available from a single easily accessible place. Within WICID, where corresponding population data are available for migration or commuting data, a facility is available to download PAR along with the flow data. As is shown in Figure 14 the system searches for appropriate population data, and when it is found, gives the user the option of attaching the data to the output. Importantly WICID will aggregate these population data to the appropriate origin and destination geographies, saving the user significant time when calculating rates post-download. The option to derive variables is another key additional function which allows users to customise their output. For example, with the ethnic data shown above in Figure 14, it may be that the user is only interested in examining the flows of white migrants in comparison with all other ethnic groups. Rather than download the data for all other groups separately e.g. Indian, Chinese, Mixed, et cetera, and then perform a lengthy aggregation procedure post-download, by selecting the ‘Derive’ new variables option when selecting data, it is possible to aggregate any number variables together before download. In this example, all other ethnic groups could be grouped into a variable ‘other ethnic groups’. This derive function has proved particularly useful where age group variables are being downloaded – the default option providing a large number of
Access to Census Interaction Data
Figure 14. WICID automatically finding par data from selected migrant flows
age groups of varying sizes. ‘Derive’ enables the user to standardise the range of each age group before download. As mentioned previously, WICID has a number of built in analysis functions which allow selected data to be analysed before being downloaded. The analytical facilities comprise a set of basic descriptive statistics and suite of indicators that WICID will compute for a choice of any five of the counts that have been selected in the query. These include: •
•
•
Standard descriptive statistics: produced either for the whole matrix or for origins or destinations, the statistics include the minimum, maximum, mean and median flow counts, as well as standard deviation and coefficient of variance statistics for whichever variables are selected. Correlation statistics: produced for either the whole matrix of for selected origins and destinations, Pearson’s correlation coefficients are computed for selected pairs of variables. Distance travelled: For the variables selected, the mean or median distance travelled is calculated either for the whole
•
matrix of flows, or for flows out of or into the origins and destinations selected. These distances are ‘as-the-crow-flies’ distances in kilometres, metres or miles, calculated by using distance matrix constructed from the Euclidean distances between the centroid of each origin or destination zone. Indices of connectivity and migration inequality: statistics calculated for the whole matrix and for origins and destinations. As described in Stillwell et al. (2005), connectivity measures the number of pairs of zones with a flow between them, and divides the figure by the total number of zones in the system. This produces a score between 0 and 1 which can be interpreted as a percentage. A score close to 1 (100%) for any variable signifies that flows for that variable between all possible areas are high. A low score signifies flows are more concentrated between a few areas in the system. The index of inequality first calculates the observed mean flow for all origins and destinations in the system. A score of 0 indicates that the flows for that variable are identical
47
Access to Census Interaction Data
Figure 15. WICID help system opening inside a new browser window
•
to the mean of the system. A score of 1 suggests a total difference with the mean flows in the system. Index of migration effectiveness: Otherwise known as the index of migration efficiency, this measure gives a rate of net movement as a proportion of the total migration moves experienced by an area. It is a useful way of standardising net movements when suitable PAR are not available. WICID will calculate this index for both the whole matrix or for specific selected areas.
The final additional function of the WICID interface is the WICID help system (Figure 15) that aims to provide support online for any of the problems that users may encounter when building a query. At the bottom of every page in the WICID interface is a ‘WICID Help’ button. Clicking on this button will open up the customised WICID help system at the appropriate page. On every page in the system is detailed guidance about the particular part of the query building process the user is at. There is a navigation tree to the left
48
which allows users to choose which part of the system they wish to view if the current page is not the appropriate one.
ADMINISTRATIVE fUNCTIONS From an administrator’s perspective, it is very important to build a system that allows the monitoring of usage rates, data downloads and personal usage profiles. Furthermore, a system which allows for easy maintenance of datasets, geographies and their associated metadata is crucial if the intended flexibility is to be realised. WICID incorporates a range of system administration tools designed to aid the ongoing monitoring and maintenance of the system (Figure 16). A range of usage statistics are collected by WICID. These are useful for helping develop and enhance the service offered to users, as well as for helping quantify the value of the continued provision of the service. Through examining which institutions make most and least use of WICID and comparing these data with similar data from other census data support units, we have been able to
Access to Census Interaction Data
Figure 16. System administration within WICID
target both support and interface orientation sessions where necessary. Usage can be monitored on a monthly or quarterly basis, and by individual account reference, institution, event (e.g. start session, extract data, view statistics, download query, view results on screen et cetera), data set and account type. As was outlined in the list of interface requirements, one of the important features in the development of WICID is the ease with which new data sets can be included in the system as and when they become available. These data include both the flow data that form the core of the system, the geography data that contextualise the flows and the metadata which link the flows and the geographies together. Within WICID system administration, there are a number of facilities which, once the raw geographies of flow data tables have been added to the database, enable the administrators to update the relevant metadata tables which, in turn, add the new data to the suite of pre-existing data available to the end-user. This facility for enabling ease of update is especially important where versions of WICID using non-UK data and running elsewhere in the world need to be updated by local administrators.
CONCLUSION This chapter has focused on the issue of access to interaction data, and has demonstrated how the development of the Web-based Interface to Census Interaction Data (WICID) has facilitated an ease, simplicity and breadth of access where previously options were limited because existing software that was either difficult and cumbersome to use or did not provide the detail and flexibility that would allow widespread use of these important data sets. We have outlined the need for a user-friendly and highly flexible system which would be able to facilitate quick and easy access to a wide range of interaction data at a wide range of spatial scales which would be able to cope with new data and geographies as they became available. Through detailed explanation of the system architecture and metadata, we have demonstrated the feasibility of such a system before providing examples of the finished user interface from a user perspective. We have shown that the structured query building process in WICID enables users to build, with ease, simple queries based single variables for a symmetric set of origin and destination regions or more complex queries with data from multiple tables,
49
Access to Census Interaction Data
derived variables, origins and destinations of different numbers and sizes and with accompanying population at risk data, before either analysing or downloading data in a variety of formats. The flexibility of WICID means that it will be able to incorporate whatever new interaction data become available in the future. It is likely that interaction data from the 2011 Census will arrive in a similar format to previous censuses and WICID would be a suitable host, particularly enabling comparisons between data from previous censuses. Whilst the future of the census beyond 2011 remains unclear, the flexibility of the data entry procedures for WICID means that the system is capable of hosting non-census interaction data sets, perhaps supplied by a population register system. The simple addition of a time identifier/ variable to the standard origin/destination/variable format currently in place would enable annual or quarterly flow data to be incorporated into the system, providing users with opportunities for monitoring migration on a more continuous basis.
Manchester Computing. (1989). MATPAC User’s Manual, NAT 664.
REfERENCES
Stillwell, J., Duke-Williams, O., Feng, Z., & Boyle, P. (2005). Delivering Census Interaction Data to the User: Data Provision and Software Development. Working Paper 05/1, School of Geography, University of Leeds, Leeds.
Bell, M., Blake, M., Boyle, P., Duke-Williams, O., Rees, P., Stillwell, J., & Hugo, G. (2002). Cross-national comparison of internal migration: issues and measures. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 165, 435–464. doi:10.1111/1467-985X.00247 Duke-Williams, O., & Stillwell, J. (2002). Webbased access to complex UK census datasets. In IASSIST 2002 Conference, University of Connecticut, CT.
50
ONS. (2006). Neighbourhood Statistics Programme - Evaluation Report. London: Office for National Statistics. Stillwell, J. (2006a). Providing access to censusbased interaction data: that’s WICID. The Journal of Systemics Cybernetics and Informatics, 4(1), 63–68. Stillwell, J. (2006b). Using WICID (Web-based interface census information data) in the classroom. The Journal of Systemics Cybernetics and Informatics, 4(6), 106–111. Stillwell, J., & Duke-Williams, O. (2003). A new web-based interface to British census of population origin-destination statistics. Environment & Planning A, 35(1), 113–132. doi:10.1068/a35155 Stillwell, J., Duke-Williams, O., Feng, Z., & Boyle, P. (2005). Delivering census interaction data to the user: data provision and software development. Working Paper 05/01, University of Leeds, Leeds.
Townsend, A. R., Blakemore, M. J., & Nelson, R. (1987). The Nomis database - availability for users and geographers. Area, 19(1), 43–50.
51
Chapter 3
Interaction Data:
Confidentiality and Disclosure Oliver Duke-Williams University of Leeds, UK
ABSTRACT As we saw in Chapter 1, interaction data sets have been derived from a number of sources, including censuses, other surveys and from a range of administrative sources. These typically have the characteristic that the data form large, sparsely populated matrices. Where the matrices do have non-zero values, those numbers are often small. This is highly significant when confidentiality is concerned – small numbers in aggregate data are generally seen as representing an increased risk of disclosure of data. This chapter looks at confidentiality issues with particular regard to interaction data. Different types of disclosure are considered, together with the reasons why interaction data are thought to pose particular disclosure problems. Methods of disclosure control are outlined, and then two particular methods are studied: those used in the 1991 and the 2001 UK Censuses. The methods used and the extent of their effects are described, and suggestions for how best to use the affected data sets are given.
INTRODUCTION The first chapter of this book introduced a variety of interaction data sets that exist in the UK, generated from censuses and a variety of administrative data sources. These vary in nature in many ways – some are based on population samples, whilst others are based on a theoretically complete census; in the case of migration data, some data sources capture DOI: 10.4018/978-1-61520-755-8.ch003
migrants over a transitional period, whilst others capture information about specific migration events. They are also made available in different ways and with vary degrees of freedom of access. A common theme, however, is that as data about individuals, safeguards are taken to ensure confidentiality in the data released for research. This chapter develops ideas about confidentiality of interaction data and measures that have been taken to prevent disclosure of information about individuals. The data sets that are discussed in this
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Interaction Data
chapter share the general characteristic that they are built up from records that refer to individuals. This introduction starts by describing some of the key terms that shape the chapter. What do we mean by ‘confidentiality’ and ‘disclosure’? More specifically, what do we mean by these terms in the context of interaction data comprised of individual records from censuses, surveys and administrative registers? Given ideas of confidentiality and disclosure, how do statistical agencies go about ensuring that confidentiality is maintained? In defining these terms, it will be seen that in large part this is done through modifications to the data before they are released. This chapter goes on to consider some specific ways in which interaction data have been modified in order to make them ‘safe’ to release. How significantly have the data been affected? What can users do to accommodate these modifications in their research? The chapter focuses on procedures used in the 1991 and 2001 Censuses. What is confidentiality? Confidentiality is a term that refers to preventing disclosure of information to unauthorised parties. It applies to many different sorts of data. In this chapter, confidentiality will be discussed with respect to personal data, although it is also often considered with relation to business data as well. For personal data, confidentiality is a concept closely related to privacy. Public and private agencies generally have legal and ethical obligations to ensure that they maintain confidentiality of the data that they collect. It is usually argued that for statistical agencies, being seen to ensure confidentiality is an important element of building public trust, and that increased levels of trust lead to improved response rates. However, Singer et al. (1993) studying the 1990 US Census, argued that trust in confidentiality had only a limited affect on response rates and that this relationship varied for black and white respondents. The effect of trust may vary depending on the nature of the survey taken: in the case of a sample study, the individual has the ability to opt out, whereas in the case of
52
a census the individual faces legal coercion to complete a census form. Confidentiality for public data has two main aspects: first, confidentiality must be maintained over raw data. Thus, statistical agencies must ensure that data are stored and processed (either in-house or via sub-contractors) in a secure manner, without inadvertent or deliberate disclosure. Confidentiality of raw data is usually ensured by appropriate data security arrangements, and by the threat of legal penalties against employees or sub-contractors should they disclose information. Most recent media stories about problems of data protection such as the child benefit data loss (Poynter, 2008) focus on actual or potential confidentiality breaches through failures of internal data security. The second aspect of confidentiality arises when the data are released, and it is on this area that this chapter focuses. A combination of tactics are used to ensure confidentiality in released data. Some data sets require individual or corporate users to sign license agreements; these typically contain legal undertakings not to disclose information relating to individuals. However, legal protections alone are not usually considered sufficient to ensure that confidentiality will be maintained, and thus further measures are also taken. These further measures take the form of statistical disclosure control methods which modify the data that are to be publicly released, in order to reduce the risk of disclosure. There are a number of different approaches to disclosure control, and there are multiple variants of general approaches. Some of these are described below in the section on imputation and disclosure control. The question of ‘what is confidentiality’ is thus generally answered by stating that confidentiality involves avoiding disclosure; this naturally raises the question ‘what is disclosure?’. Broadly speaking, disclosure relates to the unintended release of information about individuals. Lambert (1993) identified a number of ambiguities related to the concept of disclosure, and noted that there are dif-
Interaction Data
Table 1. Table illustrating attribute disclosure Marital status
Male
Female
Single
2
0
Married
4
0
Separated/widowed/divorced
2
3
fering definitions of what constitutes disclosure, and that different parties – legislative bodies and survey respondents, for example – may have different opinions about whether disclosure has occurred. A question may also be raised as to whether disclosure has occurred if the supposedly disclosed information is not in fact correct. Two types of disclosure are usually identified in the literature: identity disclosure and attribute disclosure. Identity disclosure or identification occurs when an individual is linked to a record or entry in a released data set, through reference to a known set of key variables. Identity disclosure in micro-data is known as re-identification and is a serious problem, because all attributes in the micro-data record are automatically disclosed. Identity disclosure can occur in aggregate data if a person can be ‘identified’ in a published crosstabulation. This can occur through cell values of ‘1’ in an unmodified output table, indicating that there is only one person with a particular combination of characteristics. Attribute disclosure occurs when additional information about an individual is revealed in disseminated data. As mentioned above, attribute disclosure occurs as a consequence of identity disclosure in micro-data releases. In aggregate data, attribute disclosure can occur if the distribution of cell values in a table (or set of tables with common variables) allows an intruder to determine that an individual with known characteristics a and b must also have the previously unknown characteristic c. An example of this is shown in Table 1 which shows counts by sex and marital status for all persons in an area i. If a data intruder
knows that an individual is female and lives in area i, then they can deduce her marital status; this is an example of attribute disclosure. In the UK – as with other countries – confidentiality is supported by a various elements of legislation. In general, data in all forms held on individuals are subject to the Data Protection Act (DPA) 1998. The DPA gives certain rights to individuals, for example the right to view data held regarding themselves; it also establishes a number of principles that must be followed by data controllers and data processors. Amongst these principles are various provisions ensuring appropriate security of data. The Act also defines a number of categories of ‘sensitive’ personal data, and establishes stronger rules for the keeping and processing of data of these kind. Sensitive data include a person’s racial or ethnic origin, political opinions, religious beliefs, trade union membership, physical and mental health, sexual persuasion and actual or alleged criminal offences and related proceedings or sentences. Personal data therefore include information that people might consider to be ‘public’ and information that is ‘private’. ‘Public’ personal data include information such as one’s name and address: information that is readily given out on a daily basis (indeed, it would make everyday life extraordinarily awkward were one to refuse to give this information to other people). ‘Private’ personal data include information such as one’s political opinions or affiliations: it is not necessary to give out this information on a regular basis, and many would prefer not to do so. Clearly, however, the division between the public and the private is a fuzzy and subjective boundary that will be placed differently by different people. Yet
53
Interaction Data
from a data protection point of view, all personal data – both public and private – have to be handled very carefully. In the case of census data, additional protections are established by specific pieces of legislation. The Census Act 1920 concerned the taking of a census, and set out penalties for failing to comply, but did not directly address the subject of data confidentiality. It was amended by the Census (Confidentiality) Act 1991, which specifically penalised the disclosure of ‘personal census information’ (and also of disclosing information known to have been disclosed in contravention of the Act). ‘Personal census information’ was defined as meaning any census information which relates to an identifiable person or household. Some census questions are usually considered to be more sensitive than others. In the UK, recent censuses have been preceded by a debate on the merits of including various new questions, and this debate has included the possibility of questions about personal or household income. The ‘income question’ is typically seen as intrusive and has not been included in any UK census to date, due to fears that such a question would harm overall response rates. Another area of sensitivity has been religion – the 1920 Census Act explicitly ruled out the inclusion of questions about religion in the census (in Great Britain), and the 2000 Census Act was required to remove this exclusion and pave the way for the inclusion of such a question in the 2001 Census. The final key piece of legislation relevant to the UK census was the Public Records Act 1958; which was used to establish the ‘100-year rule’ that census returns would be made available for public inspection 100 years after the census was taken. In summary then, disclosure is the unintended release of information about individuals, and confidentiality is the prevention of disclosure. For census and other data, the view is generally taken that complete avoidance of disclosure is impractical; rather, steps are taken to minimise the risk of disclosure. Interaction data present
54
statistical agencies with particular problems because of their sparse nature. The use of two geographical identifiers (origin and destination) forces an additional level of disaggregation on all interaction data that rapidly leads to small numbers occurring in cells of output tables. Assume that the population of the UK is around 60 million people, and that they are evenly distributed. If a spatial zoning system contains 10,000 areas, then we would expect to find 6,000 people in each zone. This is a large number, and would allow us to produce cross-tabulations which would hopefully avoid attribute disclosure. However, the number of possible interaction flows within and between these areas would be 10,0002. If people were distributed entirely evenly across this flow matrix, the expected total number of people in each flow – prior to any disaggregation – would be 0.6. Of course, flows are not evenly distributed, and nor do people move around in fractional units. Commuters travel from a large number of residences to a limited number of workplace locations, and generally work somewhere relatively close to their home. Similarly, migrants generally move short distances and prefer some destinations over others. The effect of this uneven flow distribution is that interaction data are characterised by having a very skewed distribution: there are a small number of flows containing many people, and a long tail of low magnitude flows. In many interaction data sets, there are also a large number of cases where there was no observed flow between a given origin and destination. The frequency distribution of flow sizes in the 1991 Census interaction data sets is illustrated in Figure 1 which shows an abridged set of results – only flows of up to 100 persons are shown. The graph shows the observed frequencies of flows involving different numbers of total persons for three data sets, the 1991 SMS Set 1 (migration at ward level), SMS Set 2 (migration at district level) and SWS set C (commuting at ward level). In all three cases, the most common flow value (between each pair of wards or districts) is 0, i.e.
Interaction Data
Figure 1. Flow size frequencies, 1991 census interaction data sets. Source: 1991 Census SMS/SWS
there was no observed flow. In the two ward level sets, this accounts for over 99% of all potential flows, whilst for the district level migration data, it accounts for about 33% of all potential flows. In order to accommodate these data, it should be noted that the graph has a logarithmic y axis. There follows for all three sets a steady decline in observed flow frequency: the next most commonly observed total is 1, followed by 2, and so on. By the time flows of 100 or more are seen, we are down to frequencies of 100 or so for SMS Set 1, and 10 to 20 for SMS Set 2 and SWS Set C. The dominance of small values is of course countered by the fact that there are some larger flows, and that a large proportion of persons may
be included in these larger flows. Table 2 shows the flows sizes that account for quartiles of all persons in the three 1991 Census interaction data sets. For the SMS Set 1, 25% of all migrants in the data set are accounted for in flows of three or fewer persons, whereas for SMS Set 2, 25% of all persons in the data set are found in flows of 204 or fewer persons. The 100% value in each case indicates the largest observed flow – all persons are accounted for in flows of this value or lower. It can be seen from Table 2 that in both ward leveldata sets, the majority of persons are included in low magnitude flows. These, of course, are flow totals. In order to carry out research into the characteristics of migrants and commuters, we are
Table 2. Flow sizes accounting for different proportions of all persons, in 1991 census interaction data sets Cumulative proportion of persons accounted for
Flow size by data set 1991 SMS1 (ward)
1991 SMS2 (district)
1991 SWSC (ward)
25%
3
204
2
50%
14
4,170
9
75%
61
8,505
29
100%
1,056
52,027
407
Source: 1991 Census SMS/SWS
55
Interaction Data
interested in disaggregations (by age, sex, social class, et cetera) of these totals. Yet the small flow totals for ward to ward flows afford very little scope for non-disclosive disaggregation. Without some form of value modification, the tables that would be produced would be full of the 1s and 0s that we have seen above lead to problems of identity and attribute disclosure. This introduction has outlined the nature of confidentiality and disclosure. If it is true that the risk of disclosure is heightened by the presence of small values in tables (especially 0s and 1s), then it can be seen that interaction data, due to their sparse nature, pose a particular problem for statistical agencies. However, by modifying data to make them ‘safe’ to release, it is possible that agencies reduce their analytic worth. This chapter focuses on the disclosure control methods applied to interaction data in the 1991 and 2001 Censuses. The methods used are studied, and the effects that these methods have had on the data are described. Advice is also given about how best users can accommodate the effects of disclosure control.
IMPUTATION AND DISCLOSURE CONTROL All censuses suffer from generic problems, including unit non-response and item non-response. Unit non-response is the failure of a person or a complete household to complete a census form whatsoever. Item non-response is the failure of a person who has otherwise completed a form to have answered a particular question. This might have occurred for a number of reasons: simple error of omitting the question, deliberate choice, or poor form design or wording leading to a person thinking that the question does not apply to them. Censuses also suffer from error and inaccuracy in questions that have been answered. Problems of non-response and error are addressed as far as possible through the edit and imputation stages of processing of the data prior to their release. For
56
the 2001 UK Census, a sophisticated methodology known as the ‘One Number Census’ was followed in order to address systematic under-enumeration biases . Census questions that are used to derive interaction data suffer particular problems when it comes to non-response. Reviewing a variety of census data (not all from the UK), Simpson and Middleton (1997) found that migrants were 2-10 more likely to be missed from a census than residents who have not moved. In the case of the 2001 UK Census, published rates for non-response and imputation demonstrate both the extent and the spatial variability of these problems. The migration question, ‘What was your address one year ago?’, was not answered by 4.5% of respondents in England and Wales (ONS, 2003), with rates for individual districts varying from 4.2% to 9.4%. The workplace postcode address on the commuting question has a national non-response rate of 7.8%, with individual districts varying from 7.3 to 18.5%. For both questions, the district with the lowest non-response rate was Hart in Hampshire, whilst the district with the highest non-response rate was the London Borough of Newham. In part, high rates of non-response are to be expected for these variables. Address one year ago is an item likely to suffer from recall error, especially for persons who may have migrated more than once in the intervening period. In particular, the postcode of the earlier address may not be correctly remembered. Similarly, the workplace postcode may not be accurately known by many employees. Statistical disclosure control (SDC) refers to the use of various approaches that regulate the release of data, in order to minimise the risk of disclosure. Different approaches are used for micro-data (the release of anonymised individual records) and for aggregate data. Clearly, there are trade-offs between the strength of the disclosure control applied and the utility of the released data. These are typified by the extreme examples: on the one hand, an agency could refuse to release any data. There would be no risk of disclosure, but similarly,
Interaction Data
there would be zero utility. On the other hand, an agency could release a full individual-level data set. This would offer the greatest flexibility to researchers, but would also be highly disclosive. Most data are released with disclosure control pitched somewhere between these two extremes: a data set is produced and released, but subject to modifications that reduce the risk of disclosure but at the same restrict the potential usefulness of the data. Different methods of disclosure control naturally offer different balances: a ‘good’ disclosure control method is obviously one that allows the statistical agency to release a useful data set, but one that still minimises the risk of disclosure control. For many data sets, the data are subject to a single release phase: a fixed set of statistical abstracts are produced (univariate frequency counts, multivariate cross-tabulations, et cetera) and then released, with no subsequent release of additional information. In this scenario, disclosure control methods can be neatly sub-divided into two types: pre-aggregation and post-aggregation. Pre-aggregation disclosure control methods are applied to individual records before they are aggregated to create cross-tabulations and other outputs. Pre-aggregation measures are similar to disclosure control methods used for micro-data releases in that they are performed on individual level records, although clearly different risks are involved in micro-data release and in the preparation of records for aggregation. For micro-data releases, an important aspect is sampling (possibly skewed to avoid individuals with unique variable combinations), whereas for aggregation, the whole population will be used. Modifications at the record level prior to aggregation might include over-imputation and record swapping. Record swapping is a technique in which some records are swapped between proximate geographic areas. In a census context, this would typically be done on a household level basis: two households with generally similar characteristics have their location identifiers swapped. Over-imputation is a
method in which some known variables or records are deliberately deleted, and then new values are imputed with the same procedures as used for general imputation of missing data. A drawback with methods such as record swapping is that if the records to be swapped are randomly selected, then it is inevitable that individuals with rare or unique combinations of variables are unlikely to be selected for swapping. Thus, the approach is not good at protecting population uniques. A variety of post-aggregation methods of disclosure control exist. These are performed on output tables (univariate frequency counts, multivariate cross-tabulations et cetera) produced after the data have been aggregated. For the area statistics in both the 1981 and 1991 UK Censuses, the preferred method was a cell blurring process known as Barnardisation. The process involved a two-pass process in which individual cells in tables were modified by adding the values -1, 0 or +1; the probabilities of non-zero modification were not published. For the 2001 Census, an alternative method of Small Cell Adjustment (SCAM) – discussed and described below – was used. SCAM was a modified version of the more general cell rounding approach. Under generic cell rounding, all values in an output table are rounded, typically to base 3 or base 5. Problems with cell blurring and cell rounding arise if the table includes marginal totals, as these totals will become inconsistent with the table contents if no remedial action is taken. One approach is that of controlled rounding (Cox, 1987) in which table contents are rounded in a manner in which they remain consistent with marginal totals. Aside from rounding, another approach is that of suppression, in which disclosive values are removed from the table (and set to 0) prior to publication. The act of aggregation in itself has some important implications for disclosure control. In most cases (and all cases considered in this chapter) aggregation is done on a spatial basis: all individual records within a fixed area – a ward, a district, et cetera – are selected, and aggregate results are
57
Interaction Data
calculated. The risks of disclosure in any data are related to the number of individual records on which the aggregate is based, the heterogeneity of those data, and the number of categories used to tabulate each component variable. A simple approach to reducing disclosure risk is thus to impose minimum population thresholds for any region for which outputs are to be created. Two developments in the processing of the 2001 UK Census were significant in this regard (Martin, 2002). Firstly, accurate geo-referencing of households allowed a separation of input and output geographies. In the past, the enumeration districts used to plan and carry out the census were used for the smallest level of outputs. By geo-referencing households, any aggregation of individual households could be used to generate new base units. This permitted the second development: a carefully planned output geography in which the base units (‘Output Areas’) were designed after enumeration took place, and could thus be constrained by actual (rather than predicted) population size and also by social heterogeneity. Strengthening the area design and spatial aggregation process in this way was an important step in minimising general disclosure risk in 2001, yet it still did not address the problems posed by sparsity and low values in the interaction data.
DEALING WITH SUPPRESSION Of THE 1991 CENSUS Three sets of interaction data were produced as part of the outputs of the 1991 Census: two sets of migration statistics – SMS Sets 1 and 2 – and one set of (journey to) workplace statistics – SWS Set C (SWS Sets A and B were respectively residencespecific and workplace-specific aggregates). The SWS data were created from a 10% sample of forms, and therefore no additional disclosure control was considered necessary: even if an observed flow showed a person with a unique combination of characteristics, it would be impossible for a
58
data intruder to claim with certainty that the data related to a specific individual. The two SMS sets illustrate that subtle relationship between disclosure and data sensitivity. SMS Set 1 contained flows within and between 10,933 wards in Great Britain. As illustrated in Figure 1 and Table 2, it is sparsely populated, and of those flows that were observed, the majority have very low counts. SMS Set 2 contained flows within and between 459 districts in Britain. It represents a direct aggregation of SMS Set 1, and is consequently less sparse, and has a much larger median flow size. Given this basic description, one might expect the risk of disclosure for SMS 1 to be higher than for SMS Set 2. However, the sets also differed in the cross-tabulations that were included in each case. Set 1 comprised two tables: broad age groups by sex, and counts of wholly moving households and the migrants within them; Set 2 contained these tables plus additional others: a five-year age by sex table, and tables breaking down migrants by marital status, ethnic group, household residency and illness status, economic position and whether residents were Gaelic or Welsh speakers. Further tables broke down the number of wholly moving households in each flow by tenure and economic position of the head of household and of household residents. Additional disclosure control methods were not applied to the ‘basic’ tables (age by sex, and counts of households and residents) – presumably because the contents were not considered sufficiently ‘sensitive’. Thus, no post-aggregation disclosure control was applied to SMS Set 1. In the case of SMS Set 2, postaggregation disclosure control was applied to the additional tables (not including the more detailed age by sex table) only. The disclosure control method applied was that of suppression. When an aggregate table is suppressed, values in the table that are considered to be disclosive are reset to zero. If a suppressed table also contains marginal totals, the two types of suppression are required to fully protect the data. Primary suppression is the process of suppress-
Interaction Data
ing the cells that are specifically considered to be disclosive. Secondary suppression is a process in which additional cells are also suppressed in order to prevent the ability to deduce the primary suppressed values by subtracting known values from marginal totals. An alternative approach would be to re-calculate marginal totals, taking the primary suppression into account. To what extent was suppression significant in these data? Figure 2 is a pixel chart which illustrates the extent to which suppression was used in the 1991 SMS Set 2. The image shows a district level origin-destination matrix, in which each flow is represented as a single pixel. There are 460 ‘columns’, representing 459 destination districts, plus an outflow total, and 561 ‘rows’ representing 459 origin districts, 98 foreign origins and 4 inflow totals. Each pixel is coloured depending on the size of the flow between the origin and the destination: white where the observed flow was 0, black where the observed flow was of 10 or more persons, and grey where the observed flow was of 1 to 9 persons. The latter category includes all flows for which the tables relating to migrants (as opposed to households) were suppressed. A visual inspection of the image indicates a large proportion of blue flows. There are clear clustering effects in the image; this is related to the ordering of districts. Origin and destination districts have the same order, with foreign origins in the bottom part of the image. Districts are ordered alphabetically within counties; thus there is a ‘string of beads’ effect diagonally across the image: each square bead is the set of flows within a county. The counties are ordered with Greater London first (thus, the large top left bead), followed by the metropolitan counties, followed by other English counties (as defined in 1991), followed by counties within Wales and then by regions within Scotland. The image thus summarises some general aspects of migration within Britain: most counties have significant interactions with Greater London, and London is also by far the most significant destination for migrants from foreign origins.
As described above, the 1991 SMS Set 2 were subject to primary suppression. The suppressed tables did not directly include marginal totals, and secondary suppression was not carried out. However, the main district-to-district flows were supplemented with district-to-county and countyto-district totals, which could be used as marginals, together with the total count of district-to-district migrants (taken from unsuppressed tables such as Table M01). The totals flows were subject to suppression independently from the main distictto-district flows, thus if the district-to-county flow was of 10 or more migrants, then all tables relating to migrants were included in the output without any suppression. At a higher level of geography, there were also district inflow and outflow totals, again independently subject to suppression. Rees and Duke-Williams (1997) used the availability of these county and national higher level totals to deduce suppressed values for many observations within the SMS Set 2. Where values could not be fully deduced, estimates were made of the correct Figure 2. The extent of suppression in 1991 SMS set 2 – origin-destination pixel chart. Source: 1991 Census SMS
59
Interaction Data
value. Table 3 shows the migration by ethnic group from the district of Mid-Bedfordshire to districts in the county of Avon, as originally published in the 1991 SMS Set 2. The data are taken from table M05, a table that was subject to suppression. The column marked ‘Avon total’ was taken from the district-to-county set of data, whilst the flows by ethnic group to each district were taken from the district-to-district data set; in this example the total flow from Mid-Bedfordshire to Avon was of more than 10 persons, so this district-to-county observation was not suppressed. The row marked ‘All persons total’ was calculated from table M01 (Age by sex), a table that was not subject to suppression, and so all values were available. It can be seen from Table 3 that the flow from Mid-Bedfordshire to Bath was of less than 10 persons, and so the observations by ethnic group have been suppressed. However, it will be recalled that the district-to-county flows were above the suppression threshold. If the known values (i.e. all those except the flow to Bath) are subtracted from the Avon total, then the missing flows are revealed. In this example, subtracting the known values (11 + 0 + 16 + 0 + 69) from the total for White migrants leaves a remainder of 3; this is the correct original value that had been suppressed. Whilst the example shown is a simple one, the process becomes harder if more than one district-to-district level flow within any county set is suppressed. Rees and Duke-Williams used a range of approaches similar to one illustrated to recover suppressed data, and to determine minimum and maximum possible values prior
to estimation. The recovered data were used to create a modified version of the 1991 SMS Set 2, known as the SMSGAPS data set. This is available to academic users in the UK via CIDER (http:// cider.census.ac.uk). Figure 2 indicated that suppression was relatively widespread in the 1991 SMS Set 2. Of the 165,873 reported flows, some 145,193 (88%) were suppressed in the migrant tables. However, at first consideration, the effect of suppression may seem to be of limited importance: it is only minor flows that were suppressed. Although a large number of the reported flows were suppressed, these flows contained a much smaller proportion of migrants: of the district to district flow data alone, some 419,577 migrants out of 4,688,180 (8.9%) were in suppressed flows. However, given the large numbers of origins and destinations in the data set, it is often desirable to aggregate the flows relating to a number of locations. Whilst all the component flows may be ‘minor’, once they are aggregated a more major flow may emerge, yet the details of this flow will be distorted if the components have been suppressed. The recovered SMSGAPS data set permits comparisons to be made between the assumed ‘true’ figures, and those that were originally released. Table 4 is a migration matrix aggregated to Standard Region level for White migrants. It shows the proportion of actual migrants who were reported in the SMS Set 2 as released; thus for migration within the ‘North’ region, 99% of actual migrants were reported. It can be seen that for flows within a Standard Region, the level of reporting is rela-
Table 3. Migration from Mid-Bedfordshire to Avon, 1990-91, from SMS set 2 Bath
Bristol
Kingswood
North Avon
Wansdyke
Woodspring
Avon Total
White Black groups Indian, Pakistani and Bangladeshi Chinese and Other
0 0
11 1
0 0
16 0
0 0
69 0
99 1
0 0
0 0
0 0
0 0
0 0
0 1
0 1
All Persons Total
3
12
0
16
0
70
101
Source: 1991 Census SMS
60
Interaction Data
Table 4. The effects of suppression in 1991 SMS set 2: percentage of ‘correct’ count of white migrant between standard regions included in original data Origins
Destinations North
York &Humber
East midlands
East Anglia
South East
South West
West Midlands
North West
Wales
Scotland
North
99
73
22
20
25
17
18
50
11
41
Yorks & Hum
69
100
77
51
50
46
51
73
27
37
East Mids
20
72
99
65
40
28
59
44
21
22
East Anglia
15
46
64
100
55
33
29
26
23
39
South East
23
48
42
73
97
67
41
36
25
34
South West
16
43
25
34
56
99
52
31
45
29
West Mids
24
55
65
34
42
56
99
59
50
21
North West
54
75
43
29
38
38
55
100
65
27
6
23
18
24
28
47
48
53
99
17
41
33
24
33
35
30
21
31
14
99
Wales Scotland
Source: 1991 Census SMS
tively high – although this deliberate removal of data should be seen in the context of the ‘missing million’ under-enumerated persons from the 1991, about whom there was much concern (Simpson and Dorling, 1994; Mitchell et al., 2002). More immediate concern is raised by the offdiagonal flows: those between different Standard Regions. These data relate to large physical areas, and the naïve user might assume that such large areas will have large associated flows which cannot possibly be affected by disclosure control issues. The key point to note is that flows for these areas can only be assembled by aggregating from flows between smaller units, and it is those component flows that are affected. In this example, the maximum proportion of White migrants in an aggregated flow between different Standard Regions that were included in the original data was 77%, whilst the minimum proportion was just 6%. A similar picture emerges if the other ethnic groups are analysed, albeit with some degree of variation in the proportions included. Significantly, the Standard Regions with severe problems are not constant for each ethnic group: in other
words the degree of under-reporting varied by Standard Region for each group. As a result of this, apparent patterns in the data may be misleading. There are 10 Standard Regions, giving a total of 100 pairs of Standard Regions between which migration occurs. Of these, in 16 cases the largest non-White group of migrants differs if one compares the original SMS data with the assumed correct SMSGAPS data. The flow magnitude in spatially aggregated data is not necessarily a perfect indicator of the degree to which the data might have been affected by suppression. Figure 3 is a scatterplot showing the proportions of migrants that were reported in the original SMS data for four ethnic groups, when the data were aggregated to Standard Region level. At the top of the plot are a set of flows with close a proportion close to 1.0 of migrants reported. These were the intra-regional flows. The other points are interregional, and for these points the relationship between the proportion of migrants reported and reported flow size is weak. For any given flow size, there is a wide variation in the proportion of migrants reported, and there are a number of cases of large flows (5-10,000 reported persons)
61
Interaction Data
Figure 3. Proportions of migrants reported in flows between standard regions by ethnic group, by reported flow size, using 1991 SMS set 2. Source: 1991 Census SMS
for which the number of persons reported is less than 50% of the true value. In summary, the process of suppression had a significant and worrying affect on the 1991 SMS Set 2. A large number of flows were suppressed, and although these flows were minor on a perflow basis, the suppression had a large potential cumulative effect. Using the original data, a user might aggregate results for large areas and gain results that were misleading. The flow magnitude in such aggregated data is not a good indicator of whether or not the data have been seriously affected by suppression. Two additional datasets were produced that accounted for different problems with the 1991 SMS data. The MIGPOP data set accounted for various types of under-enumeration in the Census as a whole. For work that is based on the total number of migrants between areas, or on migrants disaggregated by age and sex, then users are recommended to use the MIGPOP data set in preference to any other 1991 migration data sets. However, the MIGPOP data contains only age by sex observations. The SMSGAPS data set accounted for the effects of suppression in the published data, and provided alternative counts
62
for all tables except tables M11S and M11W. For research work based on any socio-demographic disaggregation other than age, users are recommended to use the SMSGAPS data set. Both of these data sets can be accessed via CIDER.
THE EffECTS Of SCAM ON THE 2001 CENSUS As was the case in 1991, several sets of interaction data were produced as part of the outputs of the 2001 Census. These included the 2001 Special Migration Statistics, Special Workplace Statistics and Special Travel Statistics (STS). The STS were a superset of the SWS in their coverage: as well as journeys to workplaces, they also included information about journeys to places of study for school children and students. The STS were only produced for residences in Scotland, and were produced in place of the SWS. The data were released at three spatial scales: Level 1 (districts et cetera), Level 2 (wards et cetera) and Level 3 (output areas). As in 1991, statistical disclosure control methods were used to preserve confidentiality in the interaction data. One major
Interaction Data
difference between the 1991 and 2001 outputs is that, with the 2001 outputs, the workplace data (SWS and STS) were produced from 100% data, rather than from a 10% sample as had been the case previously. As a result of this, disclosure control methods were applied to these data sets as well as to the SMS. The disclosure control methods applied to outputs from the 2001 Census were very different to those applied to the 1991 Census outputs. Following a consultation exercise and discussion on various alternatives, the ONS decided to adopt a series of disclosure control methods including both pre-aggregation and post-aggregation adjustments. The pre-aggregation modifications introduced during edit and imputation involved record and item level imputation. One important imputation for the migration and commuting counts was that of migrants for whom the origin was missing or only partially stated and of commuters for whom the workplace with missing or only partially stated. The ONS Edit and Imputation Evaluation Report (ONS, 2003) gave national imputation rates for each variable, with per-district rates also made available on-line. The imputation rate for address one year ago was 4.5% over the whole of England and Wales, with district level rates varying from 2 to 9.4%. Similarly, the imputation rate for workplace postcode across England and Wales was 7.8%, and ranged from 3.9 to 18.5% at the district level. A second level of implicit disclosure control was in the setting of minimum population thresholds as output areas were constructed, although as we have seen above, such thresholds have little meaning for interaction data: firstly, commuters and migrants are a sub-set of all persons in any area – a small subset in the case of migrants – and secondly, they are highly spatially disaggregate: thus individual flows to many destination areas are small, even for origins with large populations. Having constructed an output geography, an additional layer of disclosure control was applied: that of record swapping. Under record swapping, two house-
holds from proximate areas are selected contains sets of individuals with similar characteristics; these households are then ‘swapped’ between locations. This was carried out on an undisclosed proportion of records. The final technique that was used to prevent disclosure – and the focus of attention in this section – was the so-called ‘small cell adjustment method’ (SCAM). This method involved the adjustment of small counts appearing in all aggregate 2001 Census outputs. Only ‘base’ values were modified; many – if not all – tables included in the outputs featured marginal totals and subtotals. These totals were re-calculated from the base values after SCAM had been performed. ONS never gave an exact definition of what were considered ‘small counts’ although the affected values were widely understood to be initial values of 1 or 2. Value frequency analysis of any outputs that were subject to SCAM will indicate that the values of 1 and 2 are not found. These small values were modified such that they became either 0 or 3, with values of 1 most likely to become a 0, and values of 2 most likely to become a 3. It is not possible to determine in any outputs the difference between a ‘real’ 3 and a value of 1 or 2 that was modified to become a 3, and likewise for values of 0. There are a number of general effects arising from the use of SCAM. Whilst individual tables are consistent (because totals were re-calculated) inconsistencies arise between tables. For example, many tables feature an ‘All persons’ total. This value for ‘All persons’ differs between tables, when any of the component base cells contained ‘small values’. A second type of inconsistency exists between spatial levels, again due to that the fact that all tables were independently subject to SCAM. Thus, if any equivalent value (e.g. ‘All persons’) is found in a district level table, and then compared to the result generated by summing the equivalent value for all member wards of that district, inconsistencies are likely to be found (Stillwell and Duke-Williams, 2007).
63
Interaction Data
From the point of view of the interaction data, a further aspect of SCAM that has the potential to confuse users is the differential manner in which SCAM was applied. For the Census area statistics (e.g., CAS, Standard Tables, et cetera), SCAM was applied to results in England and Wales and in Northern Ireland, but not for results in Scotland. The same was generally true for the interaction data, although these data were compiled and released as a UK level data set, and thus the variable nature of SCAM was less obvious and thus more confusing. As with the area statistics, the interaction data were subject to SCAM on the basis of where the data were collected. For migration data, the data were collected at the destination end (i.e., the respondent’s usual residence at the time of the census): thus, the data were SCAMmed if they had a destination in England, Wales or Northern Ireland, but not if they had a destination in Scotland. In contrast, journey to work (and journey to school) data were collected at the origin (i.e., the person’s residence, rather than their workplace), and so the data were SCAMmed for residences in England, Wales or Northern Ireland, but not for residences in Scotland. These rules still apply to cross-border flows: thus information about migrants from Scotland to England would be subject to SCAM, but information about commuters who lived in Scotland but worked in England would not
be subject to SCAM. A final layer of protection for the 2001 interaction data was considered necessary for Level 3 (output area) journey data: because of the prevalence of small values at this spatial scale, the workplace data were subject to SCAM for residences in Scotland, and furthermore, no data were produced at this level for residences in Northern Ireland. As emphasised earlier, the interaction data are particularly characterised by the dominance of small values, and thus inconsistencies between tables are very common. Again, this is pertinent to interaction data, as users often wish to aggregate areas to larger units. What then, were the actual effects on the data that were caused by SCAM? The most notable effect as that counts of total migrants and total commuters are strongly dominated by multiples of 3. Figure 4 shows the frequencies of flows of various total sizes (number of persons), for values up to 50 persons, in the output area level data for SMS and SWS. Figure 4 uses totals from table MG301 – flows within and between distinct output areas. There were around 200,000 instances of a flow total of 1, and around 70,000 instances of a total of 2 persons. The graph then indicates over 1.3 million instances of flow totals of 3. The totals of 1 and 2 that are present in the data are from migration flows with destinations in Scotland.
Figure 4. Flow frequency totals, 2001 interaction data. Source: 2001 Census SMS/SWS
64
Interaction Data
It will be recalled that SCAM was applied to internal or base cells in a table, and that totals – such as those shown in Figure 4 – were recalculated from modified values. Table MG301, from which Figure 4 is derived, contains six internal cells. Thus, a total of 3 might be reached if one of these cells contained either an initial value of 3 or a small value that was rounded up to 3, and all other internal cells were either zero, or initially contained a small value that was rounded down to 0. It is also possible that a flow total of 3 could be found in a flow to a destination in Scotland, with sufficient internal cells containing the values 1 or 2. The large spike of 3s in Figure 4 thus contains a large number of flows with destinations outside Scotland which prior to SCAM would have had a flow total of 1 or 2. Similar spikes are evident at totals of 6 and 9. These could be reached in a variety of ways, but are most commonly found for tables with 2 or 3 internal cells that have the value 3, and with all other cells having the value 0. These may be ‘real’ 3s but it is clear that many will have been rounded up through SCAM. A similar but even more extreme pattern can be observed in Figure 4, showing flow total frequencies for Table W301, which forms the output area level journey to work data. This data set only covers England and Wales, and thus all values in it were subject to SCAM. There are no 1s or 2s, and over 5 million flows with a total of 3. Totals of 4 and 5 do exist, but at very low frequencies (around 100,000 total of 4 and 40,000 totals of 5). For flow totals of 3, it is worth considering what the range of actual totals prior to SCAM might have been. Table MG301 has six internal cells, and for a flow to be tabulated, at least one cell must have contained an initial value. The minimum real total would therefore be 1. Values of 2 could be rounded down to zero, and thus the maximum real total that could be represented by a SCAMmed total of 3 is 13: a ‘real’ 3 in one cell, and five original counts of 2, all of which were rounded down. This is feasible, if improbable: assuming a probability of a 2 being rounded
down to zero as 1/3, the joint probability of this happening five times would be 1/35, or 1/243. Similarly, with 22 internal cells, the maximum real total that a SCAMmed total of 3 in Table W301 could represent would be 45, although this is extremely unlikely, with a probability of 1/321, or odds of some 1 in 10 billion! Unlike the case with suppression in the 1991 Census, there is no obvious way of reversing the effects of SCAM. Instead, the user must plan their research in a suitable way to accommodate these effects. For Levels 1 and 2, there are multiple output tables, and thus multiple ways of generating values for totals such as ‘All migrants’ or ‘All commuters’. Consequently, it is possible to generate a mean value across all tables. It would be expected that this would have the result of ‘smoothing out’ the frequency distribution, although in practice the results are not perfect. Figure 5 shows the observed ‘All persons’ flow size frequency distribution in the 2001 SMS Level 1, using a single table (Table M101) and using a mean calculated across four tables (M101 to M104). The first series – shown with a dotted line – shows considerable spikiness associated with multiples of 3. The second series – shown with a solid line – has a considerable dampened fluctuation, especially above values of about 6. However, some fluctuation clearly still exists for small values. For many purposes, cross-table means such as these can be useful. A problem with using crosstable means may arise if they are recommended as a ‘preferred count’. It is easy to envisage a user generating rates using observations within in a table as a numerator, and a mean total as a denominator: such rates may be very unreliable. For those wishing to calculate interaction rates based on table values (for example, the proportion of migrants who were of a given age, the proportion of commuters who were car drivers, et cetera) it is still necessary to use totals derived from the same table as the numerator, in order to ensure consistency. How then can users best accommodate the effects of SCAM in their work? The main advice that
65
Interaction Data
Figure 5. Flow size frequencies in 2001 SMS level 1 using single table and multiple-table mean. Source: 2001 Census SMS
has been given, and is repeated here, is that users should try to build values that they want to use out of the minimum number of internal cells possible (Duke-Williams and Stillwell, 2007). One of the significant effects of SCAM is that change to the data is very apparent: as soon as users start to look at raw data, the chances are that they will find that the data are largely composed of 0s and 3s. If users are not aware of SCAM, this will be confusing, and as they learn more about the process, it will harm confidence in the reliability of the data. As stated above, users of interaction data often want to aggregate large numbers of origins or destinations. Typically, this may be done to generate concentric rings around a focus area. This may be done purely on a distance basis, or on an administrative function basis (e.g., flows to a target set of wards from the other wards in the rest of the district, and then from other districts in the county, and then from neighbouring counties and finally from other Government Office Regions). This approach is especially likely when dealing with output areas. Few users are interested in specific flows to or from a particular OA, especially as at such a small level the data captured in the 2001 Census are not necessarily representative of a longer-term pattern. However, users are interested 66
in aggregating together groups of OAs into coherent and functionally sensible regions, which may not match existing areal groupings such as wards. The value fluctuations introduced by SCAM at best harm the user’s confidence in doing so and, at worst, mean that any such aggregations they make are not statistically reliable.
BESPOKE DISCLOSURE CONTROL TECHNIQUES for Non-Census Data Sets Techniques for any data will depend on the format (whether microdata or aggregate data) and the perceived sensitivity of the contents. The actual data which might be disclosed are not necessarily particularly personally revealing. For example, in the past, data have been released in a number of forms from the National Health Service Central Register (NHSCR) (Boden et al., 1992). These have included aggregate and microdata versions. The microdata version are the most potentially disclosive, as researchers had access to individual records. However, there records were anonymised, and stripped of
Interaction Data
any identifying fields (such as an NHS patient number); the record consisted of area codes for origin and destination, together with the sex and date of birth of the person migrating. This is personal data, but poses limited risk since there are no attributes attached to the data. Many data sets – not just interaction ones – are distributed with licensing agreements that form that the primary line of defence, from the data provider’s point of view: users sign undertakings that open them up to legal proceedings should they breach confidentiality. However, as with census data, license restrictions alone may not be considered sufficient to fully protect the data. The most common disclosure control techniques used (or proposed, for future data sets) involve some degree of rounding of results. As with census data, a variety of rounding options are available with a choice of bases. As stated above, the risks of disclosure are affected by a number of factors, including the coarseness There are many sources of sample and survey based data, and also of administrative data sets, from which interaction data could be derived. Dennett et al. (2007) list many sources and potential sources of interaction data, including a number which may potentially be developed into new data sets in the next few years. If this is to be done, then disclosure control arrangements will need to be established with the data providers. It will be important to balance these arrangements so that the resulting data are useful. An assessment of potential risk of disclosure must take into account the degree of sensitivity of the data set (that is, the amount of private personal data that are included), and the impact of different types of disclosure control on the data. The most commonly suggested forms of bespoke disclosure control are two-fold: firstly, geographic aggregation, and secondly, rounding. Both of these are fairly obvious things to choose: rounding is an easy choice because it is easy to apply, and easy to explain. Geographic aggregation is also easy to apply, although as we have seen, even at very large spatial scales, interaction data are often comprised of small numbers.
SUMMARY Interaction data occupy difficult ground when it comes to disclosure control. The data sets are typically sparsely filled matrices, dominated by small values. Consequently, the apparent risk of identity disclosure is high. This can be balanced against the potential attribute disclosure. In some cases, such as the NHSCR data, there are no attributes (other than age and sex) to disclose, and thus the data should be considered relatively safe. In other cases, such as with census data, there is a greater range of attributes at risk. In these cases, data providers should seek to make available data sets with few attributes, in addition to the richer variants. For both migrants and commuters, an accurate count of total persons would be welcomed by data users, at it would permit other tables (which had been subject to disclosure control) to be understood in better context. It was an unfortunate reality of the 2001 Census that disclosure control techniques were changed after the table designs for aggregate data were finalised. Had it been known that SCAM was to be used, it is likely that many users would have preferred simpler tables that would be less affected by the process. However, this is an important lesson for the future: it is important to consider the impact of disclosure control on the output data. If the effects of SDC are apparent, then confidence in the data may be harmed, even if the selected method is statistically un-biased.
REfERENCES Boden, P., Stillwell, J., & Rees, P. (1992). How good are the NHSCR data? In J. Stillwell, P. Rees, & P. Boden (Eds.), Migration Processes and Patterns Volume 2 Population Redistribution in the United Kingdom (pp. 13-27). London: Belhaven.
67
Interaction Data
Cox, L. (1987). A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association, 82(398), 520–524. doi:10.2307/2289455 Dennett, A., Duke-Williams, O., & Stillwell, J. (2007). Interaction data sets in the UK: an audit. Working Paper 07/05, School of Geography, University of Leeds, Leeds. Duke-Williams, O., & Stillwell, J. (2007). Investigating the potential effects of small cell adjustment on interaction data from the 2001 Census. Environment & Planning A, 39(5), 1079–1100. doi:10.1068/a38143 Lambert, D. (1993). Measures of disclosure risk and harm. Journal of Official Statistics, 9(2), 313–331. Martin, D. (2002). Geography for the 2001 Census in England and Wales. Population Trends, 108, 7–15. Mitchell, R., Dorling, D., Martin, D., & Simpson, L. (2002). Bringing the missing million home: correcting the 1991 small area statistics for undercount. Environment & Planning A, 34, 1021–1035. doi:10.1068/a34161 Office for National Statistics. (2003). Census 2001 Review and Evaluation: Edit and Imputation Evaluation Report. Retrieved from http://www. statistics.gov.uk/census2001/proj_eai.asp Office for National Statistics. (2005). Census 2001 Review and Evaluation: Data Quality Report. Retrieved from http://www.statistics.gov. uk/census2001/proj_dq.asp
68
Poynter, K. (2008). Review of Information Security at HM Revenue and Customs - Final report. London: HMSO. Retrieved from http://www. hm-treasury.gov.uk/independent_reviews/poynter_review/poynter_review_index.cfm. Rees, P. H., & Duke-Williams, O. (1997). Methods for estimating missing data on migrants in the 1991 British Census. International Journal of Population Geography, 3, 323–368. doi:10.1002/ (SICI)1099-1220(199712)3:43.0.CO;2-Z Simpson, S., & Dorling, D. (1994). Those missing millions: implications for social statistics of non-response to the 1991 Census. Journal of Social Policy, 23, 543–567. doi:10.1017/ S0047279400023345 Simpson, S., & Middleton, E. (1997). Who is missed by a national Census? A review of empirical results from Australia, Britain, Canada and the USA. CCSR Working Paper No 2, Centre for Census and Survey Research, University of Manchester, Manchester. Singer, E., Mathiowetz, N., & Couper, M. (1993). The impact of privacy and confidentiality concerns on survey participation: the case of the 1990 Census. Public Opinion Quarterly, 57, 465–482. doi:10.1086/269391 Stillwell, J., & Duke-Williams, O. (2007). Understanding the 2001 Census interaction data: the impact of small cell adjustment and problems of comparison with 1991. Journal of the Royal Statistical Society. Series A (General), 170(2), 1–21.
69
Chapter 4
Analysing Interaction Data John Stillwell University of Leeds, UK Kirk Harland University of Leeds, UK
ABSTRACT Large and complex interaction data sets present researchers with analytical challenges and this chapter attempts to identify and illustrate a number of ways to analyse origin-destination flows. Given the impossible task of providing a comprehensive review in such a limited space, certain analytical measures, modelling methods and visualisation techniques have been selected for inclusion, following an introduction to the notation commonly employed to represent interaction variables. Various Census and NHS patient register data sets are used to exemplify interaction measures, beginning with simple net balances and inflow/outflow ratios and moving onto indices of connectivity, inequality and distance moved. The multiplicative component framework is introduced as a particularly useful analytical approach. More sophisticated methods of modelling interaction data using statistical or mathematical calibration techniques are reviewed, examples of log-linear regression and spatial interaction model structure are highlighted in the context of historical calibration and a brief discussion of the use models for future projection is included. Maps that show patterns of geographical movement function as effective illustrative and research tools. Computerized mapping of geographical movement has evolved since the 1970s and 1980s and, in this chapter, we introduce a new method of mapping flows using vectors and illustrate this approach with micro data on pupils travelling to school. The chapter aims to provide a broad introduction to analysis methods for interaction data, many of which are subsequently applied in later chapters of the book.
DOI: 10.4018/978-1-61520-755-8.ch004
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Analysing Interaction Data
INTRODUCTION Interaction data are frequently held in matrix form since this is an efficient way to store and manipulate flows between origins and destinations. This is not always the case since some software packages prefer to have data input as pair-wise origin-destination flows, particularly when matrices are sparsely populated and many cells contain zero entries. Whichever mode of storage or transfer is used, the important question is: how do we analyse a flow matrix? What are the extensions and components of the matrix that we use in order to understand the patterns? What particular measures of interaction and tools can be used to describe and analyse the flow data to best effect, recognising that, in some instances, the matrix may be very large. What more sophisticated techniques involving modelling can be applied to identify structure or pattern in the data or determine the significance of explanatory variables? What methods exist for visualising interaction data? These are some of the questions addressed in this chapter. We begin with a consideration of the elements of the interaction matrix and an algebraic notation that can be used to represent various concepts and measures. Thereafter, the chapter is divided into sections that cover different types of analysis and visualisation: measures for analysing gross and net flows for each area and measures of inter-area connectivity and inequality are illustrated; statistical and mathematical models are reviewed briefly by way of setting the scene for the modelling chapters that follow later in the book and methods of projecting migration forward into the future are also discussed. A novel approach to the visualisation of interaction micro data using vector analysis is introduced in the penultimate section. To illustrate different measures and methods, we have selected certain interaction data for a number of different contexts: migration between boroughs in London from the 2001 Census, moves between districts in England and Wales based on
70
data from the NHS patient registers, journey to work data from the 2001 Census SWS for Leeds and London and journeys from home to school by pupils from the Pupil Level Annual School Census (PLASC).
INTERACTION MATRIX AND NOTATION An interaction matrix (Figure 1) is a framework of rows and columns, representing origins and destinations respectively, such that the cell at the intersection of any one row i and any one column j contains a measure of the interaction flow, Iij, between that origin and destination pair. The flow may be a raw count of migrants or commuters although it may be a derived measure such as a rate, a probability or some other measure of interaction. The matrix may be symmetric with n origins and m destinations, in which case n=m, but this is often not the case and sometimes origins and destinations are entirely separate spatial entities, such as the residential addresses of school children and the schools they attend or the neighbourhoods where consumers live and the retail outlets they frequent. In a symmetric matrix, the diagonal (unshaded) cells of the matrix are the cells with intra-area flows (Iii and Ijj) whereas the off-diagonal cells contain all the inter-area counts. The sum of the cell values for each origin i is the outflow total, frequently represented as Oi: m
Oi = å I ij ,i ¹ j = I i* j =1
(1)
whilst the sum of the inflows to one destination j is usually represented as Dj: n
D j = å I ij ,i ¹ j = I *j i =1
(2)
Analysing Interaction Data
Figure 1. Symmetric flow matrix with marginal totals
MEASURES Of INTERACTION If we consider migration to be the interaction variable, M, then NMi is the net migration for area i, the balance of all flows to and from other areas within the system of interest which is calculated as: NM i = Di - Oi
(3)
A positive value indicates in-migration exceeds out-migration for this area, whilst a negative value indicates that out-migration exceeds in-migration. A value of zero indicates that migration flows into and out of the area are equal. The net migration intensity or rate, nmri, is then defined as: nmri = NMi / Pi *100 These are known as the marginal totals, and an asterisk subscript, alternative notation to the sigma, is used to denote summation across all origins or destinations. Consequently O* and D* are the sums of the outflow and inflow totals respectively and I** is the total of interaction flows within the system, such that O* = D* = I**. As well as notation for cells of the matrix and marginal totals, a notation for populations is required since these are frequently required as denominators for rate calculations. Pi is the population of an area i at any spatial scale. In the next section, we examine a number of measures of interaction. The first group contains aggregate measures relating to an individual area based on the marginal totals or the intra-area flow but can be defined in some cases as aggregate measures relating to the system as a whole. These ‘area’ measures, which include net balances, effectiveness, turnover and churn, are followed by measures focusing on ‘inter-area’ flows and include median distance moved, connectivity and inequality. We use migration and commuting data for illustration in the next section.
whereas the gross out-migration rates, ori, is calculated as: ori = Oi / Pi * 100
(4)
and the gross in-migration rates, dri, is calculated as: dri = Di / Pi * 100
(5)
Normally crude rates of this nature are expressed per 100 of the population at risk (PAR) as shown in the equations, but occasionally, if the numbers are small, they may be expressed per 1,000 or even per 10,000 persons. Care should always be taken to ensure that, when computing out-migration or in-migration intensities or rates, that the intra-area flows are removed from the numerators although this does not matter when calculating net migration rates since the intraarea flows will cancel out. Furthermore, there has been much debate about which populations should be used to compute rates of net or gross migration taking place over a time period. Neither beginning-of-period or end-of-period populations
71
Analysing Interaction Data
give a good representation of the PAR so the use of mid-period populations is often suggested as more appropriate given the assumption that inmigrants and out-migrants will both spend, on average, half the period in the area. In practice, particularly with census migration transition data, beginning-of-period populations are not available and therefore end-of-period populations are used by default. Another issue is that whilst area populations are appropriate PAR for out-migration rates, they are not necessarily suitable for in-migration rates since the true PAR for those moving into an area is the population of the rest of the system outside the specific area. Again, for practical reasons, the area population tends to be used despite this conceptual mis-specification. Immigration rates, for example, are based on destination area populations because of the impracticality of using a rest of the world population. In the case of certain census interaction variables, the PAR may be unavailable. One example is the flows of commuters from home to work that are reported in Table 104 of the Special Workplace Statistics. The counts available here are for representative persons by NS-SEC, for which no corresponding
PAR are available from the main census tables, as indicated by Champion in Chapter 10. Net migration is only one measure of the net exchange between inflows and outflows. Other measures include the inflow/outflow ratio and migration effectiveness or efficiency. The former is defined simply as inflow divided by the outflow for an area and generates values that are greater or less than 1 but never negative. Figure 2 shows net migration balances, net migration rates and inflow/outflow ratios for the boroughs of Greater London in 2001-01 using three different thematic mapping methods respectively: proportional symbol, ranged and interpolated surface mapping. The maps are based on flows within London and show the pattern of outward movement from inner to outer boroughs. Migration effectiveness (or efficiency) is defined by expressing net migration as a function of inter-area turnover rather than area population, where turnover is the sum of gross inflows and outflows as follows: MEi = (Di – Oi) / (Di + Oi) * 100
Figure 2. Measures of net migration exchange, London boroughs, 2000-01. Source: 2001 Census
72
(6)
Analysing Interaction Data
Migration effectiveness can take negative and positive values, like net migration, but offers a measure of the extent to which net migration redistributes population around the spatial system. Whilst net migration rates would be the same if the inflow and outflow totals were 100 and 110 and 1,000 and 1,010 respectively, migration effectiveness values would be -4.76 and -0.49, indicating that net out-migration represents 4.76% of turnover in the first case and 0.49% in the second. In addition to computing area-specific measures, it is also possible to produce system-wide measures of net migration: NM* = ∑|Di – Oi| / ∑Pi * 100
Figure 3. Age-specific migration rates and effectiveness scores, districts in England and Wales, 2006. Sources: Mid-year population estimates and patient register/NHSCR data supplied by ONS
(7)
and of migration effectiveness: ME* = ∑ |Di – Oi| / ∑(Oi + Di) * 100
(8)
where the sigma sign refers to summation across all areas. Area-specific migration effectiveness scores are not mapped in Figure 2 because they are highly correlated with the inflow/outflow ratios. Instead, we plot the variation in system-wide migration effectiveness scores for a set of broad age groups for patient register moves between districts in England and Wales alongside the age group-specific rates of migration for this spatial system (Figure 3). Whilst the migration rates increase in the child and late teen ages, peak in the 20s and decline thereafter, the migration effectiveness scores are very high in the late teenage years, relatively low in the 20s and then increase with age. It is highly likely that the peak in effectiveness at age 16-19 reflects the net migration of students to districts with higher education institutions and of young workers to first or second jobs away from home. Effectiveness drops as individuals settle into their jobs but as age increases and migration propensities fall, the effectiveness of net migration increases, peaking at age 60-74 when retirement may be associated with a change of location. Plane (1984) provides one of the earliest analyses of interstate
migration in the USA using this measure whilst Stillwell et al. (2000; 2001) have explored how effectiveness varies by age between city regions in Britain and Australia. This example serves to highlight the value of alternative measures of migration when comparing migration characteristics between different subgroups of the migrant population. In this instance we have chosen to illustrate age-specific variations but there are also interesting variations when other variables are explored such as ethnicity, where it appears that net migration if more effective in redistributing non-white migrants in Britain than white migrants (Figure 4) (Stillwell et al., 2008). Net migration represents a redistribution of 18% of black migration turnover compared with only just above 5% for white inter-district arrivals and departures. The age-specific migration rates depicted in Figure 3 lead us to consider two further measures of aggregate migration: the standardised migration rate and the gross migraproduction rate. The standardisation of crude migration rates is one way of adjusting for the effects of different age (and sex) structures in the population at risk. Assume, for example that we are interested in comparing
73
Analysing Interaction Data
Figure 4. Migration effectiveness by ethnic group in Britain, 2000-01. Source: 2001 Census SMS
The use of net migration rates is limiting because it only tells part of the internal migration story and, although migration effectiveness standardises by the inter-area turnover, this measure does not make it the most suitable option for assessing the relative stability of underlying area populations because of the significant volume of intra-area migration that may be occurring. Two further measures can therefore be derived: turnover and churn. The former, toi, we have encountered above in the effectiveness calculations and is defined as a rate as follows: toi = ((Di + Oi) / Pi) * 100
the migration intensities between ethnic groups. It is well-known that the population structure of Asian groups tends to be relatively youthful compared with the national average so we should take this into account when calculating migration rates. The standardised migration rate is calculated by weighting each age-specific rate according to the proportion of a standard population in that age group, where the standard population might be that for the UK or for Europe as a whole, for example. Standardised migration rates for ethnic groups in the year before the 2001 Census have been produced by Finney and Simpson (2008). The gross migraproduction rate (gmr) is the sum of the age-specific probabilities, usually for each single year of age cohort across a population over a given time period. In effect, it measures the area under the schedule of age-specific migration rates and is analogous to the gross reproduction rate, the term used by demographers to describe the level of fertility rates or probabilities. Agespecific gmrs for out-migration and in-migration have been analysed by Rogers (1995) whilst Bell et al. (2002) have shown that the national gmr in Britain is around 7, half that in Australia according to data from the respective censuses in 1991 and 1996.
74
(10)
whereas the rate of ‘churn’, chi, is defined to include an inter-area migration term as follows: chi = ((Di + Oi + Iii) / Pi) * 100
(11)
Whilst the term ‘turnover’ is one that can be found throughout the demographic literature, its precise meaning and the way in which it is calculated is not universal. For example, in an article published by the ONS (ONS, 2007) turnover (for small areas) is calculated by averaging internal migration flows over a three-year period. This is to avoid the possible distorting effects that might be caused in smaller areas by localised phenomena, such as the building of a new housing estate or the demolition of old housing. In other work by Large and Ghosh (2006) turnover is calculated over one year but using both internal and international migration data. Bailey and Livingston (2007), on the other hand, calculate ‘turnover’ just from internal migration data, but rather than using only inflows and outflows, also incorporate within area moves – the measure we describe here as ‘churn’. Bailey and Livingston suggest that churn is associated closely with deprivation, especially when small areas are involved. Specific local factors which may agitate local populations at the small scale cease to be important at larger scales. Despite this, measuring churn is important
Analysing Interaction Data
for ascertaining a more accurate measurement of the relative stability of the population in different areas. It could be easily argued that where two areas with the same levels of population turnover are compared, it would be the area with the higher levels of internal movement relative to the population size that would have the less stable population. Figure 5 summarises the various measures of migration for Greater London using 2001 Census SMS data. The topmost schedule is the rate of churn which reaches a peak by age group 25-29 at almost 25% of the population and is dominated by intra-area moves. Turnover, in contrast, is much less variable and obscures the rather different schedules for in-migration and out-migration. In-migration peaks at age 20-24, whereas out-migration has less variation but is highest in the early 60s, and these schedules define the net migration rate. Migration effectiveness, calculated here as a rate per 10 persons rather than per 100 for convenience, accentuates the variations shown by the age-specific net migration rates, indicating that London’s net losses occur across all age groups except 20-29. The influence of distance on interaction between places was recognized in the early studies of Ravenstein (1885; 1889) and subsequent work by Westefeld (1940) and Zipf (1946). As indicated in Figure 5, shorter-distance moves within London are far more common than longer-distance moves between London and elsewhere. Whilst we can differentiate moves within and between areas in this way, an alternative measure of migration is the mean distance migrated, although Bell et al. (2002) prefer the median distance moved since the distribution of distances is negatively skewed reflecting the strong distance decay effect which occurs with migration and most other interaction data. SMS Table 104 provides data on migrants by economic activity and Figure 6 illustrates the median distance moved for the various categories of economically active and inactive migrants between districts in Britain. In contrast to the median migration distance of 94.7km for all migrants, it
Figure 5. Measures of migration by age for greater London, 2000-01. Source: 2001 Census SMS
is the self employed, both part-time and full-time, who tend to move over shorter distances whereas the unemployed, the economically inactive, the retired and students, both economically active and inactive, tend to move over longer distances. The majority of students are not economically active and this group, plus full-time and part-time employees and the permanently sick and disabled, move shorter distances than the overall migration distance. There is considerable debate in the literature about how distance should be measured (Boyle and Flowerdew, 1997). In the previous example of migration flows between districts, Euclidian distances were calculated between district centroids but many studies improve distance calculations by using road networks or more realistic measures so as to avoid the problems of straight lines across estuaries or mountainous areas. When intra-area flows are included, a common method for measuring the intra-area distance is to use the square of the radius of the circle that approximates in size to the area. We will return to a discussion
75
Analysing Interaction Data
Figure 6. Median distance of inter-district migration by economic activity, Britain, 2000-01. Source: 2001 Census SMS
of the influence of distance on interaction flows when we consider modelling methods in a later section. Beforehand, we consider some further ways of analysing the flows between origins and destinations. A number of other measures and can be used for analysing the flows taking place between origins and destinations. In this instance we limit our selection by illustrating two measures of connectivity that are available for analysing data extracted with the WICID system (see Chapter 2) – connectivity and inequality. The index of connectivity can be calculated for the whole of the matrix or for each specific origin or destination. It is simply a measure of the number of links that each area has with other areas in the system and is more useful when dealing with situations in which interaction flows do not connect all places with one another when the index will take the value of unity. The problem with the index is that it does not make allowance for the size of the flow between any origin and destination. We exemplify the index here using journey to work data from the 2001 Census SWS. The first example is for in-commuters to each of the wards in Leeds from other wards of the city according to the mode of transport used. The city centre ward, City and Holbeck, ranks third in the country with over 100,000 in-commuters per 76
weekday, though many of these come from beyond the city boundaries. The schedules in Figure 7 display the wards ranked according to the index of connectivity for each travel mode. The schedule for car drivers is horizontal at a connectivity of 1 for all but two wards, indicating that virtually all wards receive in-commuters from all other wards and highlighting the popularity of car transport. At the other end of the connectivity spectrum, the connectivity of wards for train commuters within Leeds is much lower. Only two wards, City and Holbeck and University, have commuters coming from more than a third of all wards and several do not receive train commuters from any other wards. Wards across the city are more connected by bus passengers than they are by car passengers and perhaps surprisingly, wards are more connected by in-commuters on foot than by bicycle. In fact, if we consider flows of commuters in all directions, ward connectivity by foot is 60% and by bicycle is 35%, whereas connectivity by bus is 92%. The magnitude of connectivity can be measured and visualised in other ways and the example shown in Figure 8 indicates the linkages between boroughs in Greater London for migrants in two ethnic groups, white and black African/Caribbean. The maps show the major inter-borough connections where the thickness of the line represents
Analysing Interaction Data
Figure 7. In-commuting connectivity of wards in Leeds by mode of transport, 2001. Source: 2001 Census SWS
the volume of the flow. The Thames appears to be a barrier to connectivity between boroughs north and south of the river for blacks and in the east end for whites. One approach to further study of connectivity is to establish whether there is any hierarchy evident in the flow matrix. This might be identified by analysing the net migration flows, NMij, between areas i and j defined as: NMij = Mji - Mij
(12)
where a positive value indicates that the flow from j to i is greater than the flow from i to j, and a negative value indicates that the flow from i to j is greater than the flow from j to i.Courgeau (1976) uses the term ‘net interchange’ to describe this value. It may be that areas position themselves in the hierarchy according to their net balances with other areas so that the area at the top of the hierarchy gains from all others and loses to none whereas that at the bottom loses to other areas but gains from none, and those in between gain from those below them in the hierarchy and lose to those above. The index of interaction inequality is derived from the method defined in Bell et al. (2002) which computes half the sum of the absolute differences between each observed flow and the observed mean across all origins and destinations, except where the origin is the same as the destination. A system-wide index value of zero would indicate that all origin-destination flows in the system are equal to the mean, whereas a value of unity would suggest only one positive flow in the system with all other flows being zero. Figure 9 depicts the area inequality indices for out- and in-commuters within Leeds at ward
Figure 8. Migration connectivity between boroughs of London by ethnic group, 2000-01. Source: 2001 Census SMS
77
Analysing Interaction Data
level. Here a value of zero would mean that all the flows either to or from each ward are the same, whereas a score of one means maximum inequality. The scores for the out-commuters are mostly between 0.4 and 0.6, suggesting a much greater tendency toward inequality than with the in-commuters, whose values fall to less than 0.2 for in-commuters to City and Holbeck, Richmond Hill and University wards all of which are central and which contain a high proportion of the city’s workplaces. The index of inequality illustrated in Figure 9 is one measure of how concentrated flows from all origins into each destination are. Another measure of ‘spatial focusing’ is the Gini index developed by Plane and Mulligan (1997) that involves comparing each off-diagonal origin-destination flow count, Mij, with every other count in the matrix, Mkl, as follows: G = ∑∑∑∑ | Mij – Mkl | / 2n (n-1) ∑∑ Mij (13) where the first set of sigmas represent summation over areas i, j, k and l and the index is interpreted as half the arithmetic mean of the absolute differences between all pairs of off-diagonal flows. A Gini index of 0 is where all flows in the matrix
are of equal size and an index value of 1 is the extreme case where all migrants are found in one inter-area cell. Bell et al. (2002) compute indices of around 0.65 from census and NHSCR data for a set of 67 areas of Britain in 1990-91 whereas the index for 69 areas in Australia in 1995-96 is 0.77, suggesting migration in Australia is more spatially focused than in Britain. Other indices of concentration include the Theil index (Allison, 1978) and the coefficient of variation (Rogers and Raymer, 1998). This section has indicated that there are a number of tools available for researchers to analyse the flows between areas in a matrix so as to measure variables such as connectivity, linkage, inequality of concentration. Most of these measures can be applied to inflows, outflows and total flows as well as inter-area flows but one integrated approach to analysing structure is through the multiplicative component framework proposed by Rogers et al. (2002) that disaggregates the origin-destination flow matrix into four components – overall, origin, destination and origin-destination components – as follows: Mij = T oi dj odij
Figure 9. Indices of inequality for commuters in Leeds by ward, 2001. Source: 2001 Census SWS
78
(14)
Analysing Interaction Data
where T is the total number of migrants, oi is the proportion of all moves leaving area i, dj is the proportion of all moves entering area j and odij is the proportion of all moves from area i to area j. This approach is particularly valuable when analysing a time series of interaction matrices and exploring whether change in a migration flow is caused by an overall increase in migration, a change outflows or inflows or an increase in area connectivity. The approach was used over 20 years ago by Willekens and Baydar (1986) and is adopted by Raymer and Giulietti in Chapter 15 of this book where it is expressed as a log-linear statistical model. We now turn to modelling in the next section.
STATISTICAL AND MODELLING ANALYSIS Modelling interaction phenomena has been at the heart of much quantitative research in human geography for several decades with recent applications in retailing (e.g., Birkin et al., 2002) and transport studies (e.g., Simmonds and Skinner, 2003) as well as in migration (e.g., Rees et al., 2002) and commuting (e.g., Coombes in Chapter 12 of this volume). In the context of migration, though relevant for other interaction data, important distinctions can be drawn between micro and macro modelling approaches (Stillwell and Congdon, 1991) and between deterministic and projection models (Stillwell, 2009) but the distinction that we want to focus on in this section is that between statistical and mathematical models. There has been longstanding academic interest in explanatory modelling of interaction behaviour since the early studies of Zipf (1946) and others who incorporated terms measuring the masses of each origin and destination and of the distance between them and who calibrated their models statistically using log-linear regression techniques. Modifications were made to these early Newtonian gravity models by introducing
parameters to weight the influence of the origin and destination factors and by experimenting with alternative distance functions. The tradition of statistical modelling continued with new explanatory variable being introduced (e.g. Lowry, 1966) and Congdon (1991) provides a relatively recent example of the application of a log-linear model calibrated for inter-region migration flows: log Mij = bo + b1logPi + b2log Pj + b3logdij + eij (15) where Pi, Pj and dij are the gravity variables representing populations of areas i and j and distance between them, bs are the regression parameters and eij is a random error term associated with each interaction. Two key developments in statistical modelling have taken place in the last 20 years. Firstly, the recognition that there are likely to be local variations in calibrated parameters has led to the application of geographically weighted regression (GWR) methods (Fotheringham et al., 2002) and the re-specification of models as typified in equation 15. Secondly, the restrictive assumptions associated with the log-normal model have also led to the emergence of new statistical models based on the Poisson distribution (Congdon, 1991; Flowerdew, 1991). The key issue here is that the migration dependent variable is likely to be measured in discrete units (integer counts of persons) and follows a discrete probability distribution rather than a log-normal continuous distribution; this is particularly important when the likelihood is greater that there will be a large number of small flows in the origin-destination matrix and a much smaller number of large flows. The model equation under a Poisson regression approach becomes: Mij = exp (b0 + b1logPi + b2log Pj + log dij) + eij (16)
79
Analysing Interaction Data
Flowerdew (1991) demonstrated the utility of fitting Poisson regression models using the GLIM software. More recently, a model calibrated using Poisson regression and incorporating a large number of explanatory variables was constructed for the Office of the Deputy Prime Minister (ODPM, 2002) and the application of origin-specific Poisson models calibrated using GWR has been undertaken by Nakaya (2001) with similar models being used to compare inter-regional migration in Japan and Britain by Yano et al. (2003). Flowerdew explains the Poisson approach in further detail in Chapter 14 of this book and in Chapter 13, Feng and Boyle use this model type for estimating flows between areas in 1981 and 1991 that are consistent with areas in 2001. One of the shortcomings of the earlier statistical approaches was the inability of ordinary least squares regression formulation to predict interaction that was consistent with observed flows from each origin and to each destination (Senior, 1979). This was remedied by the use of balancing factors to ensure that internal constraints within the model were satisfied (Wilson, 1967) and the same model was derived from first principles using entropy-maximising techniques (Wilson, 1972). A family of spatial interaction models with different constraints was developed, each of which was applicable in situations where varying amounts of information was known. The doubly constrained variant of a spatial interaction model for migration between regions i and j incorporated balancing factors for both origins and destinations, Ai and Bj, known mass terms, Oi and Dj and a power function for distance decay, dij, is as follows:
peting destinations variable to remove the effect of spatial structure by Fotheringham (1983). The spatial interaction modelling approach is revisited in Chapter 16 by Harland and Stillwell where a new framework is developed separating out the constraint procedure from the main modelling and using a genetic algorithm to establish which combination of model components give the best fit. This short synopsis demonstrates some of the ways in which analysts have attempted to simulate macro interaction flows between origins and destinations, quantifying the significance of explanatory factors along the way. Historical models are those calibrated using historical data; projection models, on the other hand, are those used to project forward in time, either under assumptions of no change or based on what if scenarios. Explanatory models of migration, particularly those involving large numbers of causal variables as defined by Champion et al. (1998) are useful for testing what-if scenarios driven by policy variables such as employment change, house prices or taxation. The sophisticated MIGMOD system commissioned by the ODPM was a deterministic model constructed with policy evaluation in mind but it should be recognised that the independent variables are themselves often difficult to predict in the future. Consequently, many of the migration projection models have been demographic in nature, assuming that rates of migration, mij, will remain the same (the Markovian assumption) or will change according to some observed historical trend. These rates are then multiplied by a start-of-projection-period population to give the projected migration flows as follows:
Mij = Ai Bj Oi Dj dij-β
Mij proj = mij * Pi proj
(17)
where β is the distance deterrence parameter measuring the influence of distance on migration. This approach was extended with the calibration of zone-specific distance decay parameters by Stillwell (1978) and the incorporation of a com-
80
(18)
It was the development of multi-regional demography in the mid-1960s that heralded the proper specification of inter-zonal flows such as these rather than net-migration balances in projection models. Rogers (1966; 1967; 1968)
Analysing Interaction Data
pioneered the development of multi-region systems and provided the theoretical rationale for the use of migration flows rather than net balances in Rogers (1990). An alternative approach to the Rogers’ multi-regional survival model known as accounts-based modelling was developed during the 1970s by Rees and Wilson (1977). Thus, demographic models have developed from models requiring little information about migration to models requiring maximum information about migration and population projection modelling has become more sophisticated as the migration component has been specified with more precision. There are two key questions that relate to the internal migration component of demographic modelling. The first is how to incorporate some form of change into the parameters governing the intensity and pattern of migration in future and the second is how to deal with the problem of huge data arrays when the origin-destination-time-agesex dimensions are cross-classified. Important work on the temporal stability of migration was undertaken in the 1980s in the Netherlands by Baydar (1983). Baydar decomposed migration flows into an overall component or the total number of migrants, a generation component or the probability of out-migration from region i, and a distribution component or the probability of in-migrating to region j given origin i. She then used a log-linear model to calibrate the parameters which quantify the time dependence of the different variables and thus identified the most stable and volatile components. This is the approach mentioned earlier and adopted by Raymer and Giulietti in Chapter 15. The second issue revolves around the necessity to shrink large dimensional multi-regional models since the modern form of a demographic sub-national migration model is the multi-state model that uses migration flow information by age, sex, region of out-migration and region of in-migration. In its pure form, the multi-state migration model is highly descriptive: it has a separate parameter for every piece of information of the migration
pattern. This means that the data requirements for the full multi-dimensional model are very large indeed. Research by van Imhoff et al. (1997) has shown how far it is possible to simplify (shrink) the structure of the multi-regional model before the resulting loss of information and accuracy becomes unacceptable. From a methodological point of view, the multi-state model can be regarded as an accounting structure for a spatial interaction model. Both developments have converged using the framework of the Poisson regression model as described by Stillwell (2009).
VISUALISING MICRO INTERACTION DATA Visualisation through mapping is an important consideration but flow data are notorious for being difficult to map effectively in popular GIS desktop packages such as ArcGIS and MapInfo. One traditional approach is the mapping of ‘desire lines’, where origins and destinations are connected by straight lines on a map (e.g. as shown in Figure 8) Waldo Tobler pioneered the computer mapping of flow data (Tobler, 1987) and there are software packages now available that enable users to generate interaction directional flow maps such as Flowmap developed at the University of Utrecht, which has network analysis functionality and has been used as a planning support system in various different contexts (Geertman et al., 2003). The Office for National Statistics have recently released CommuterView, a system for mapping origin-destination commuting flows from the 2001 Census which is available on DVD. The aggregation of desire lines between all origins and destinations can be undertaken to create a surface of commuter flows and Nielsen and Hovgesen ((2007) have compared maps of commuter flows in England and Wales in 1991 and 2001 using the same 5km by 5km grid. The use of cartograms as distinct from maps of conventional geographical polygons has been used
81
Analysing Interaction Data
by Dorling (1995) for mapping the census and Dorling’s circular cartogram algorithm has been used to map net migration in Britain and Australia (Stillwell et al., 2000). Another approach is the calculation and mapping of functional regions from underlying geographical units on the basis of their self-containment with respect to commuting. Coombes provides an excellent example of this approach in defining travel-to-work areas in Chapter 12 of this book using detailed 2001 Census data. In this section, we are also concerned with mapping commuting regions but have chosen to explore a new method which attempts to provide some indication on the major direction of flows within catchment areas around schools. The micro data comes from the PLASC and is for all the children attending state primary and secondary schools in Leeds for 2005/06. The volume of pupil commuting flows on a daily basis makes the analysis of journey-toschool data quite complicated. The relatively short nature of many of these flows and the number of origin and destination combinations, means that conventional desire line maps are difficult to read. An innovative method of analysing the pupil commuting flows has been established here using vectors. A vector is the position of one point in space relative to another, which has a direction and a magnitude. In this application of vector analysis, the direction is the two dimensional angle of travel in relation to north (0º), and the magnitude is the distance travelled in kilometres. A vector of 90º for 2km would mean a commute of 2km east of the origin. Since the coordinates used are British National Grid coordinates, calculating the vector from one point to another can be achieved using Pythagoras’ theorem (the length of the hypotenuse is equal to the sum of the squares of the other two sides) to calculate the distance and trigonometry to calculate the angle. In Figure 10, the origin and destination points are known, therefore finding the point to form a right angled triangle is simply a matter of, in this case, using the origin X1 coordinate and the destination Y2
82
coordinate. Simple subtraction provides the length of the opposite and adjacent sides of the triangle, and by applying Pythagoras’ theorem, the length of the hypotenuse can be calculated. The angle of the vector (θ) is calculated using the trigonometric law of cosines which states that the cosine of the angle is equal to the adjacent divided by the hypotenuse. To convert from the cosine of the angle to the angle in degrees, simply apply the inverse mathematical function of the cosine, arccos. The location of the origin and destination pair are not always conveniently located so that the right angle of the triangle falls on the north of the origin. Figure 11 shows the different sectors that a destination point may be classified into in relation to the origin. The diagram also shows the logic used to identify which sector the destination point is in relative to the origin, using only the origin and destination coordinates. It also displays the coordinates of the right angle and the number to be added to the final resulting angle to get the true vector angle. Calculating a vector for each pupil’s commute to school provides a method to geographically analyse the general direction of flows, but this does not provide a great deal of insight into commuting patterns. A more effective calculation is the vector for each pupil’s commute to school in relation to the city centre of Leeds. This is accomplished by calculating the pupil to school vector angle, and the pupil to city centre vector Figure 10. Components of a triangle
Analysing Interaction Data
Figure 11. Sectors and associated rules for calculating a vector’s angle
angle. If the pupil to school angle is greater than the pupil to city centre angle, the pupil to city centre angle is subtracted from the pupil to school angle. Otherwise, the pupil to school angle is subtracted from the pupil to city centre angle and the result subtracted from 359. The final vector angles will all be relative to the city centre, 0º will no longer refer to north but to vectors heading towards Leeds city centre, and 180º no longer refers to south but to vectors heading away from the city centre. One limiting case of this analysis is when a pupil commutes in the direction of the city centre, but travels through the centre and onward to the opposite side of the city, maybe to a school in the suburbs. This is registered as a flow into the city and must be considered when interpreting the results. Figure 12a shows the count of pupil school commutes by school phase for 2005/06. There are far more primary phase pupil commutes than there are for the secondary phase and the graphic shows that primary phase pupils tend to commute towards the city centre, whereas secondary phase pupils tend to commute away from the city centre to attend school. Figure 12b displays pupil com-
muting patterns weighted by the distance travelled to school (the sum of all vector distances). This diagram clearly shows the bi-polar nature of pupil commuting in Leeds on a daily basis. Although there are fewer primary phase pupils than there are secondary phase pupils, pupils in the secondary phase travel much greater distances to school, especially travelling away from the city centre. The emphasis of the distance travelled towards the city centre by primary pupils indicates that primary pupils travelling towards the city centre tend to commute greater distances than those travelling in the opposite direction. Vector analysis can also be used as an alternative method for constructing school territories, similar to that used by Harris and Johnston (2008). Vectors can be calculated between each school and each attending pupil’s home location, in the same way as was explained above but where the school is the origin point and the pupil’s home is the destination point. Instead of orientating these vectors to the city centre, they are left orientated with 0º indicating north, 90º east, 180º south and 270º west. The maximum travelling distance is calculated for each vector angle; this is not the sum of all distances as previously used, but the largest distance associated with each vector angle for each school. These points are used to draw a polygon using a desktop mapping system such as MapInfo, automated using MapBasic. Most pupils travel relatively short distances to attend school but there are outliers, who travel much greater distances. These anomalous pupils can be removed, in travelling terms, from the majority of pupils attending a school. Including these outlying pupils in territory construction would lead to biases in certain directions. It is possible to exclude such pupils using the mean and standard deviation of pupils’ commuting distances to each school. Only including pupils commuting less than the mean distance to each school results in a very close core territory, comparable to those produced by Harris and Johnston (2008). Territory calculations can be easily, and retrospectively, altered to
83
Analysing Interaction Data
Figure 12. Commuting patterns of pupils in relation to Leeds city centre, 2005/06. Source: PLASC data supplied by Education Leeds
include pupils commuting less than the mean plus one or two standard deviations away from each school, creating a more all-encompassing territory definition while still excluding the extreme outliers. Additionally, the territory definitions can be generalised by aggregating flows that occur in a general direction together. For example, instead of plotting all 360o in the territory, it is possible to reduce the number of points to, say, 36. Calculating all of the flows that occur between 5o and 15o and taking the longest flow that falls below the territory size threshold value stated above (the mean, mean plus one standard deviation or mean plus two standard deviations) and plotting that at the mid-point, 10o. The resulting territory definition is generalised in both size and the number of points used to plot the shape. Additionally, vectors can be aggregated together, calculating the number of flows and the average distance travelled from within a specified region. Here the vector angles have been aggregated together to form eight directional points corresponding to the eight points of a compass, north, north-east, east, south-east, south, south-west, west and north-west. Figures 13a and13b show the line representations of the aggregated vector
84
territories for primary and secondary schools in 2005/06 respectively, indicating that there are significantly more pupil inflows into Leeds from all the surrounding districts to attend primary school than there are to attend secondary school. The core pupil territories for primary and secondary schools in Leeds are shown in Figures 13c and 13d for 2005/06. These territories have been defined using the mean travelling distance for pupils at each school, and generalised to 36 points. Using these core territories and some simple point in polygon queries, school choice in Leeds can be assessed. For example, 76% of all primary and secondary phase pupils lived in at least one school core territory providing the correct phase of education. 37% of primary school pupils and 33% of secondary school pupils lived within the overlap of two or more school core territories.
CONCLUSION This chapter has reviewed a wide range of analytical methods ranging from simple measures of effectiveness and connectivity to much more sophisticated modelling techniques using Poisson
Analysing Interaction Data
regression or spatial interaction modelling methods. We have introduced the notation associated with handling interaction data in matrix form and we have illustrated several of the measures of interaction with selected examples of macro data. Further examples of these measures and methods will follow in the chapters in part 2 of the book. Finally, in the last section, a new method of visualising interaction micro data using vector analysis has been presented. Several different methods for analysing interaction data are available to researchers, each one produces results at a slightly different data ‘resolution’. For example, using basic net flows provides useful information about the interactions occurring between two zones, whereas, calculating the index of interaction inequality considers all interactions within a system. It is necessary to select the appropriate analysis method for the correct data resolution to be investigated (individual, zone by zone or system wide). Of equal
importance to the selection of data resolution, is the selection of analysis technique to be used. Each of the different methods described in this chapter provides insights into different aspects of a problem, such as the levels of connectivity between zones or dissimilarity of interactions taking place within a system and it is important to approach a research problem using the right technique or combination of techniques. Interaction data, by its very nature, is very difficult to visualise. As demonstrated in the discussion of the component parts of an interaction matrix, each interaction has an origin, a destination and a magnitude. The problem is how to simplify this abundance of information about interactions to effectively convey to a reader the patterns contained within the data. The vector analysis presented in the latter part of this chapter is a novel method for achieving this goal. The initial presentation of pupil to school interactions in Leeds is a high level visualisation of the pupil
Figure 13. School territory definitions using vectors, 2005/06. Source: PLASC data supplied by Education Leeds
85
Analysing Interaction Data
commuting patterns for the whole system using radar diagrams. It highlights the propensity for school pupils in different phases of education to travel in different directions to attend school. Applying vectors analysis at a lower data resolution and using a GIS to plot the results territory definitions for each individual school are produced. Although not suitable in all circumstances, these innovative visualisation techniques provide an additional tool for the researchers toolbox when considering interaction data.
ACKNOWLEDGMENT The vector analysis and interaction visualisation research in this chapter has been undertaken as part of an ESRC CASE PhD studentship in partnership with Education Leeds.
REfERENCES Allison, P. D. (1978). Measures of inequality. American Sociological Review, 43, 865–881. doi:10.2307/2094626 Bailey, N., & Livingston, M. (2007). Population turnover and area deprivation. York, UK: Joseph Rowntree Foundation. Baydar, N. (1983). Analysis of the temporal stability of migration in the context of multi-regional forecasting. Working Paper No. 38, Netherlands Interuniversity Demographic Institute, Voorburg. Bell, M., Blake, M., Boyle, P., Duke-Williams, O., Rees, P., Stillwell, J., & Hugo, G. (2002). Crossnational comparison of internal migration: issues and measures. Journal of the Royal Statistical Society A, 165(2), 1–30. Birkin, M., Clarke, G., & Clarke, M. (2004). Retail Geography and Intelligent Network Planning. Chichester, UK: Wiley.
86
Boyle, P., & Flowerdew, R. (1997). Improving distance estimates between areal units in migration models. Geographical Analysis, 29, 93–107. Champion, A. G., Fotheringham, A. S., Rees, P., Boyle, P. H., & Stillwell, J. C. H. (1998). The Determinants of Migration Flows in England: a Review of Existing Data and Evidence. Department of Geography, University of Newcastle, for the Department of Environment, Transport and the Regions. Congdon, P. (1991). An application of general linear modelling to migration in London and the South East. In Stillwell, J. C. H., & Congdon, P. (Eds.), Migration Models: Macro and Micro Perspectives (pp. 113–136). London: Belhaven Press. Courgeau, D. (1976). Quantitative, demographic and geographic approaches to internal migration. Environment & Planning A, 8, 261–269. doi:10.1068/a080261 Dorling, D. (1995). A New Social Atlas of Britain. London: John Wiley. Finney, N., & Simpson, L. (2008). Internal migration and ethnic groups: evidence for Britain from the 2001 Census. Population Space and Place, 14, 63–83. doi:10.1002/psp.481 Flowerdew, R. (1991). Poisson regression modelling of migration. In Stillwell, J. C. H., & Congdon, P. (Eds.), Migration Models: Macro and Micro Approaches (pp. 92–112). London: Belhaven Press. Fotheringham, A. S. (1983). A new set of spatial interaction models: the theory of competing destinations. Environment & Planning A, 15, 15–36. doi:10.1068/a150015 Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Chichester, UK: Wiley.
Analysing Interaction Data
Geertman, S., de Jong, T., & Wessels, C. (2003). Flowmap: a support tool for strategic network analysis. In Geertman, S., & Stillwell, J. (Eds.), Planning Support Systems in Practice (pp. 155– 175). Heidelberg, Germany: Springer. Harris, R. J., & Johnston, R. J. (2008). Primary schools, markets and choice: studying polarization and the core catchment areas of schools. Applied Spatial Analysis and Policy, 1(1), 59–84. doi:10.1007/s12061-008-9002-8 Large, P., & Ghosh, K. (2006). Estimates of the population by ethnic group for areas within England. Population Trends, 124, 8–17. Lowry, I. (1966). Migration and Metropolitan Growth: Two Analytical Reports. San Francisco: Chandler. Nakaya, T. (2001). Local spatial interaction modelling based on the geographically weighted regression approach. GeoJournal, 53, 347–358. doi:10.1023/A:1020149315435 Nielson, T. A. S., & Hovgesen, H. H. (2007). Exploratory mapping of commuter flows in England and Wales. Journal of Transport Geography. Retrieved from doi:10.1016/j.trangeo.2007.04.005 ODPM. (2002). Development of a Migration Model. Report prepared by the University of Newcastle upon Tyne, the University of Leeds, and the Greater London Authority/London Research Centre. London: ODPM. ONS. (2007). Population turnover figures reveal large changes in the population of seaside towns. Retrieved September 12, 2008, from http://www. neighbourhood.statistics.gov.uk/dissemination/ Info.do?page=news/newsitems/14-march-2007population-turnover-analysis.htm Plane, D. (1984). A systematic demographic efficiency analysis of US interstate population exchange. Economic Geography, 60, 294–312. doi:10.2307/143435
Plane, D., & Mulligan, G. (1997). Measuring spatial focusing in a migration system. Demography, 34(1), 251–262. doi:10.2307/2061703 Ravenstein, E. G. (1885). The laws of migration. Journal of the Royal Statistical Society, 48(2), 167–227. Ravenstein, E. G. (1889). The laws of migration. Journal of the Royal Statistical Society, 52, 241–301. doi:10.2307/2979333 Rees, P., Fotheringham, S., & Champion, A. (2002). Modelling migration for policy analysis. In Clarke, G., & Stillwell, J. (Eds.), Applied GIS and Spatial Analysis. Chichester, UK: John Wiley and Sons. Rees, P. H., & Wilson, A. G. (1977). Spatial Population Analysis. London: Edward Arnold. Rogers, A. (1966). Matrix methods of population analysis. Journal of the American Institute of Planners, 32, 40–44. Rogers, A. (1967). Matrix analysis of interregional population growth and distribution. Papers / Regional Science Association. Regional Science Association. Meeting, 18, 17–196. Rogers, A. (1968). Matrix Analysis of Interregional Population Growth and Distribution. Berkeley, CA: University of California Press. Rogers, A. (1990). Requiem for the net migrant. Geographical Analysis, 22, 283–300. Rogers, A. (1995). Multiregional Demography: Principles, Methods, and Extensions. Chichester, UK: Wiley. Rogers, A., & Raymer, J. (1998). The spatial focus of US interstate migration flows. International Journal of Population Geography, 4(1), 63–80. doi:10.1002/(SICI)10991220(199803)4:13.0.CO;2U
87
Analysing Interaction Data
Senior, M. (1979). From gravity modelling to entropy maximising: a pedagogic guide. Progress in Human Geography, 3(2), 175–210. Simmonds, D. C., & Skinner, A. (2003). The South and West Yorkshire strategic land-use/transportation model. In Clarke, G., & Stillwell, J. (Eds.), Applied GIS and Spatial Analysis (pp. 195–214). Chichester, UK: Wiley. doi:10.1002/0470871334. ch11 Stillwell, J. (2009). Inter-regional migration modelling: a review. In Poot, J., Waldorf, B., & van Wissen, L. (Eds.), Migration and Human Capital: Regional and Global Perspectives. Cheltenham, UK: Edward Elgar.
Tobler, W. (1987). Experiments in migration mapping by computer. The American Cartographer, 14(2), 155–163. doi:10.1559/152304087783875273 Van Imhoff, E., Van der Gaag, N., Van Wissen, L., & Rees, P. (1997). The selection of internal migration models for European regions. International Journal of Population Geography, 3(2), 137–159. doi:10.1002/(SICI)10991220(199706)3:23.0.CO;2R Westefeld, A. (1940). The distance factor in migration. Social Forces, 19, 213–218. doi:10.2307/2571302
Stillwell, J., Bell, M., Blake, M., Duke-Williams, O., & Rees, P. (2000). A comparison of net migration flows and migration effectiveness in Australia and Britain: Part 1, Total migration patterns. Journal of Population Research, 17(1), 17–41. doi:10.1007/BF03029446
Willekens, F., & Baydar, N. (1986). Forecasting place-to-place migration with generalized linear models. In Woods, R., & Rees, P. (Eds.), Population Structures and Models: Developments in Spatial Demography (pp. 203–245). London: Allen & Unwin.
Stillwell, J., Bell, M., Blake, M., Duke-Williams, O., & Rees, P. (2001). A comparison of net migration flows and migration effectiveness in Australia and Britain: Part 2, Age-related migration patterns. Journal of Population Research, 18(1), 19–39. doi:10.1007/BF03031953
Wilson, A. G. (1967). A statistical theory of spatial distribution models. Transportation.
Stillwell, J., & Congdon, P. (Eds.). (1991). Migration Models: Macro and Macro Approaches. London: Belhaven Press. Stillwell, J., Hussain, S., & Norman, P. (2008). The internal migration of ethnic groups in Britain: a study using the census macro and micro data. Paper prepared for the European Association for Population Studies, Barcelona, July. Stillwell, J. C. H. (1978). Interzonal migration: some historical tests of spatial-interaction models. Environment & Planning A, 10(10), 1187–1200. doi:10.1068/a101187
88
Wilson, A. G. (1972). Papers in Urban and Regional Analysis. London: Pion. Yano, K., Nakaya, T., Fotheringham, A. S., Openshaw, S., & Ishikawa, Y. (2003). A comparison of migration behaviour in Japan and Britain using spatial interaction models. International Journal of Population Geography, 9(5), 419–431. doi:10.1002/ijpg.297 Zipf, G. K. (1946). The P1P2/D hypothesis: on intercity movement of persons. American Sociological Review, 11, 677–686. doi:10.2307/2087063
89
Chapter 5
Temporal and Spatial Consistency Oliver Duke-Williams University of Leeds, UK John Stillwell University of Leeds, UK
ABSTRACT One of the major problems challenging time series research based on stock and flow data is the inconsistency that occurs over time due to changes in variable definition, data classification and spatial boundary configuration. The census of population is a prime example of a source whose data are fraught with these problems, resulting in even the simplest comparison between the 2001 Census and its predecessor in 1991 being difficult. The first part of this chapter introduces the subject of inconsistencies between related data sets, with general reference to census interaction data. Various types of inconsistency are described. A number of approaches to dealing with inconsistency are then outlined, with examples of how these have been used in practice. The handling of journey to work data of persons who work from home is then used as an illustrative example of the problems posed by inconsistencies in base populations. Home-workers have been treated in different ways in successive UK censuses, a factor which can cause difficulties not only for researchers interested in such working practices, but also for those interested in other aspects of commuting. The latter set of problems are perhaps more pernicious, as users are less likely to be aware of the biases introduced into data sets that are being compared. In the second half of this chapter, we make use of a time series data set of migration interaction data that does have temporal consistency to explore how migration propensities and patterns in England and Wales have changed since 1999 and in particular since the year prior to the 2001 Census. The data used are those that are produced by the Office of National Statistics based on comparisons of NHS patient records from one year to the next and adjusted using data on NHS patients re-registering in different health authorities. The analysis of these data suggests that the massive exodus of individuals from major metropolitan across the country that has been identified in previous studies is continuing apace, particularly from London whose net losses doubled in absolute terms between 1999 and 2004 before reducing marginally in 2005 and 2006. Whilst this pattern of counterurbanisation is evident for all-age flows, it conceals significant DOI: 10.4018/978-1-61520-755-8.ch005
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Temporal and Spatial Consistency
variations for certain age groups, not least those aged between 16 and 24, whose migration propensities are high and whose net redistribution is closely connected with the location of universities. The time series analyses are preceded by a comparison of patient register data with corresponding data from the 2001 Census. This suggests strong correlation between the indicators selected and strengthens the argument that patient register data in more recent years provide reliable evidence for researchers and policy makers on how propensities and patterns change over time.
INTRODUCTION As described in the earlier chapters of this book, there are a variety of interaction data sets, originating from different sources. These cover different aspects of human mobility, and have been collected in different ways and for different purposes. In some cases – such as the decennial census – questions are purposefully asked of respondents for the primary intention of gathering information about that particular topic. In other cases – such as with many administrative sources – data have been gathered for a primary purpose (health service administration, for example), and flow data have subsequently been derived from them as a secondary benefit. A common aspect of many of these data sets is that they form part of a time series, whether on a decennial basis in the case of the census, or on a more frequent basis in the case of administrative data sets. In the first half of this chapter, we consider the various causes of inconsistency in time series interaction data sets and review the various ways in which researchers cope with the challenges that inconsistency presents, specifically when handling census data, and use home-working for illustration. Thereafter, the second half of the chapter, we take advantage of a consistent set of administrative data to examine time series trends in migration since the last census.
CAUSES Of INCONSISTENCY On its own, each interaction data set provides a valuable cross-sectional view of the population,
90
and can be used to answer a rich variety of research questions. However, this richness is greatly increased by the ability to compare similar data sets over time. Rather than looking at the extent and magnitude of patterns at any particular point of time, we can look at the ways in which they have changed over the course of a decade or longer. However, in order to do this, it is necessary to consider the extent to which data sets change over time. There are a number of ways in which change can occur and these affect different types of data to varying extents. Firstly, the geography may change. At a small area level, this is inevitable, as ward boundaries must change over time; in the UK, a number of statutory bodies have a duty to review ward boundaries within local authorities in order to ensure that councillors each represent a roughly equal number of electors. There are separate bodies that do this for England, Wales, Scotland and Northern Ireland. At an even smaller level – that of the output area or enumeration district – change is also inevitable in some areas as local construction projects (housing estates, changes of land usage, new infrastructure projects, et cetera), affect the small area population distribution. Whilst boundary change has been a continual process over the two centuries during which censuses have been taken in the UK, it is only relatively recently that small area boundary changes have been significant for the purposes of census comparison, as it is only in the computer-based era that results have been collated and published at this level (Gregory & Ell, 2005). At a larger geographic scale, local authority boundaries are subject to occasional change fol-
Temporal and Spatial Consistency
lowing changes to local government structure. These changes have included the creation of Greater London in 1965 from parts of the surrounding counties, the creation of new shire counties and metropolitan counties in 1974 and the removal of the Greater London Council and metropolitan county councils in 1986, leaving London boroughs and metropolitan districts as single-tier authorities. A series of local government reorganisations in the 1990s saw many two-tier county and district structures being replaced with unitary authorities (ONS, 1999). A total of 25 shire counties were split or partially split into unitary authorities in England in a series of revisions from 1995 to 1998, and there was comprehensive change to single-tier unitary authorities in Wales and council areas in Scotland in 1996. These changes are a potential source of complication for spatial analysts, in that they have involved both boundary changes and changes to hierarchical aggregations of areas. From the perspective of census analysis, comparison of areas across any two censuses is difficult, but the construction of sensible time series for annual data series is very awkward, especially for the latter half of the 1990s. The extent to which geographic change may impact any particular data set depends on the spatial level at which it is published. Some data sets contain only limited spatial disaggregation – for example, standard regions and Government Office Regions (GORs) – and are thus not affected by boundaries changes at lower levels, although they would, of course, have been affected by changes to the designation of GORs. Where data are published at a small area level, it may be possible to re-aggregate small areas to larger units, and create close (but not exact) matches to an alternative set of units. Users of data sets that are only published at local authority level however, face considerable challenges in attempting to generate spatially consistent zoning systems for time series running over a long period of time. A second form of change can occur in the nature of the data gathered in successive data sets
– the questions asked of respondents to censuses or surveys, or the data gathered on individuals in administrative data sets. Generally speaking, data derived from administrative sources are simpler in the range of content than census data, and are less significantly affected by this type of change. The range of questions asked in a census is reviewed prior to each census, and whilst a core of questions remain common, some new questions are introduced and other questions are dropped. Furthermore, whilst a question on a given theme may exist in more than one census, it is possible that the range of possible answers (provided by tick boxes on the census form) will vary. For interaction data, the key questions asked in the census – address one year ago, and workplace location – have not changed in recent censuses. Even when questions remain the same between censuses or other surveys, a third type of inconsistency can arise from the choice of variables and classifications used to disaggregate published data. For example, whilst questions about occupation have long been included in the census, the journey-to-work interaction data produced as part of the 2001 Census outputs did not include a table disaggregated by occupation, whereas equivalent data sets produced from the 1981 and 1991 Censuses had included such a table. A user trying to compare occupation-specific commuting patterns in 2001 with the patterns in 1991 would be out of luck. The occupational tables produced as part of the 1981 and 1991 Special Workplace Statistics highlight a further consistency problem; they were constructed using different occupational classifications, and are thus difficult to compare anyway. Occupations are, of course, an area that change over time; one would not expect the 2001 Census to use the occupational coding as that used in the 1901 Census. Yet, even with ontologies that are not subject to temporal drift, problems can still occur when categories are aggregated. The simplest example is that of ‘age’; whilst the raw data are entirely unambiguous, age is invariably aggregated in different ways (e.g. single years, five
91
Temporal and Spatial Consistency
year ago groups and so on) in different tables, and in outputs from different censuses. Often two or more age groups in one table can be added together to create a count that is comparable to a group in a second table, but this is not always the case. A related set of inconsistencies are those that occur due to different treatment of non-standard cases. For many census variables, the majority of responses will be simple to code, but there are a number of other cases for which specific categories have to be created. These often relate to missing, incomplete or ambiguous answers. The migration and commuting data sets that are produced from the census include several such issues, for example the treatment of migrants with an incomplete former address, or workers with a missing workplace. These have been treated differently in different censuses; thus in 1991, workers with an unstated workplace location were included as a specific category, whereas in the 2001 outputs, a workplace location was imputed, and such persons were included with all others. Migration data in the census are based on the usual residence one year prior to the census. For those aged under 1, there is no such residence. In the 1991 SMS, all tables were produced for migrants aged 1 or more, whereas in the 2001 SMS, an origin was imputed for infant migrants (usually the same as the mother’s origin). In each instance, a suitable argument can be made for treating these ‘difficult’ cases in a given manner; however, changes in methods again lead to difficulties in comparisons between censuses. A fourth, more subtle form of change between censuses arises from differences in the population bases used. There are various ways in which the population of any area could be defined: those persons who are there when the census is conducted – the present population – or those who are usually there (even if not there at the time) – the resident population. If counting the persons present, then further more subtle distinctions can arise – does it depend on persons being present at a certain time of day? The population base used
92
in the UK has varied between censuses. Traditionally (that is, from the initial census in 1801), a present population base has been used. During the course of the twentieth century, the need by users for information about the resident population became clear. In 1981 and 1991, two population bases were calculated: both present and resident populations. In 2001, the resident population was the sole population base used. For both the 1991 and 2001 Censuses, improvements were made to the methods used to transfer persons back to their ‘correct’ address. There are several difficulties in determining a usually resident population, associated with those who are temporarily resident elsewhere. A very significant group in this respect are full-time students, and it is important to note that the way in which students were recorded was different in the 2001 Census from the method used in 1991. In 1991, students were considered to be resident at their family home, whereas in 2001 they were considered to be resident at their term-time address. Changes in the population base used can make a significant change to the population recorded in some areas; this can be especially true in locations with large student populations. However, a naïve user of census data may ignore this, and observe significant changes in the populations associated with those areas when comparing two censuses. In reality, any changes may be more strongly influenced by the way in which the data have been recorded than by genuine population change. Problems with consistency in census data do not only occur when users are trying to compare data across different censuses. There is also a range of inconsistencies that arise within a single census. These include inconsistencies between different types of output, between outputs for different member countries of the UK, and between outputs at different spatial scales. Some of these inconsistencies arise due to the use of statistical disclosure control (SDC) methods on outputs. There are a variety of SDC related inconsisten-
Temporal and Spatial Consistency
cies that have been described and discussed in Chapter 3 of this book. Other problems arise from the plethora of geographies used in connection with the census: when comparing counts from one set of outputs with another (for example, the interaction data and the area statistics), it can be the case that different geographies are used in the tables that the user would like to compare. The 2001 Census interaction data suffer from this table, as they have somewhat unusual geographies. At the ward level, they are a combination of Census Area Statistics (CAS) wards in England, Wales and Northern Ireland, and Standard Table (ST) wards in Scotland. At the district level, a wide variety of area types are used (reflecting the complex local government structure in the UK), with the use of parliamentary constituencies in Northern Ireland (as opposed to local authorities) marking a difference between the interaction data and the area statistics.
APPROACHES TO COPING WITH INCONSISTENCY As the previous section has shown, there are a variety of types of and reasons for inconsistency in interaction data. Consequently, there are a variety of strategies that have been used in an attempt to either overcome the inconsistencies or at least reduce their impact. This section looks at approaches to both spatial inconsistency and temporal inconsistencies that arise from differences in coding of data attributes. It should be noted, however, that these two inconsistencies are not mutually exclusive – they often both occur, and thus must both be addressed before analysis can be carried out. Spatial inconsistency between censuses arises when the fundamental reporting unit (for whatever data set or level of interaction data are under consideration) changes between censuses. As described above, changes between censuses in the UK are common. An initial approach to
dealing with spatial inconsistency is to aggregate small units from one data set in order to generate a set of units which are the same as, or considered to be a reasonable estimate of, the geography in the other data set to which a comparison is going to be made. This is most readily achieved with smaller more detailed building block units. The lowest level geography used with the 2001 Census – output areas (OAs) – featured much smaller units that the lowest geographies used in either 1981 or 1991, and therefore is it easiest to use the 2001 data and aggregate to a earlier geography. There are two problems with this approach. Firstly, and less serious, is the simple selection of an older geography – many researchers would prefer to carry out analysis and present results using a contemporary geography rather than an older one that may be considered obsolete. Secondly, and more intractable, are the specific problems associated with disclosure control and the 2001 OA level data sets. As described in Chapter 3 of this book, the 2001 OA level data were significantly affected by disclosure control methods, and are therefore not an ideal starting point. A slight modification of the aggregation approach is to aggregate both data sets being compared to a common larger geography. If relatively large spatial units such as counties or GORs are appropriate for the analysis being carried out, then the situation is straightforward; ward or district level data from both data sets being compared can both be aggregated. The larger the spatial units for comparison the easier it becomes, as concerns about changes to boundary definitions in the component units become less significant. Bell et al. (1999) describe two approaches to aggregation, one based on simple hierarchical aggregation of units (using a time-series of UK data) into larger areas, whilst a second approach using GIS overlays was used to assemble inconsistent units into larger areas using a time-series of Australian data. A distinction may be made between aggregations that are based on a neat nesting of components
93
Temporal and Spatial Consistency
into a larger area and those only attempt a ‘best estimate’ of a larger area, in the knowledge that some components will straddle the boundaries of larger areas. One example of ‘best estimate’ aggregation are a set of ‘common’ geographies were created by CIDER to aid comparison between 1991 and 2001 outputs. These common geographies were assembled through the aggregation of wards from both sources in order to create a set of larger regions for which data could be tabulated from either set of outputs. Three levels of geography are provided, with the intention of making available a set of comparison geographies that are ‘good enough’ rather than perfect. It is also possible to aggregate components from two geographies that each form ‘best estimates’ of a theoretical common geography: Frost et al. (1996) aggregated different ward geographies to sets of similar concentric bands in order to compare SWS data from 1981 and 1991. In order to address geographic inconsistencies in a manner which deals more systematically with boundary mismatches, it is necessary to apply more sophisticated solutions. Boyle & Feng (2002) described a method used to create sets census interaction data that estimated flows for base geographies other than those originally used to publish the data. Whilst static area-based counts can be interpolated using an assumption that populations are uniformly distributed (see, for example, Gregory et al., 2001), interaction data do not lend themselves to this approach, as it is assumed that flows drop in intensity as distance increases. The method used by Boyle & Feng – explained more fully in Chapter 13 of this book – took this into account, and modelled flows for smaller units than the lowest level normally used to report interaction data. Thus, in the case of 1981 and 1991 data, which were originally published at ward level, flows were modelled at enumeration district (ED) level, such that the flows remained consistent with the known ward level results. The ED level data were then aggregated to form a new base geography. This approach allowed, for
94
example, 1991 ward based flows to be tabulated using the 2001 wards as their base geography. The method therefore permitted comparison of interaction data for small areas; this is in contrast to the usual aggregation method, that generates results only for relatively large areas. A sequence of estimated data sets have been produced that tabulate 1981 ward level data using both the 1991 and 2001 based ward geographies, and that tabulate 1991 ward level data using the 2001 ward geography; these data sets have been made available for academic researchers by CIDER. An example of an inconsistency in a population base that can be addressed relatively easily is that of the inclusion (or not) of home-workers in SWS data sets. Home-workers – those who work from their home all or most of the time – are believed to be increasing in number, as changes in working practice make this possible for a wider ranger of occupations, and also as improvements in telecommunications technology permit more people to do so. Thus, it would be useful to explore the changing social and demographic characteristics of home-workers. In some data sets, home-workers have been tabulated as a distinct group (with a specific workplace destination coding), whereas in other datasets, they have been included as part of the general flow of workers within an area. The former arrangement was used for the 1991 SWS, whilst the latter was used for both the 1981 and 2001 SWS. The difficulties posed by these differences are described below at some length in order to demonstrate the problems that arise and possible solutions. Where home-workers are included as part of the general commuting flow within an area (OA, ward or district depending on the spatial resolution of the data set), they can be distinguished in only one output table. Tables showing the method of transport used to get to work include an explicit category of ‘works at home’, allowing counts of home-workers to be established and distinguished from those who commute to work by some mode of transport. Figures 1 and 2 show the distribution
Temporal and Spatial Consistency
of home-workers as recorded in the 2001 SWS. The figures show, for all districts in England and Wales, and for all wards in London respectively, the proportions of all employees and self-employed persons in an area who are home-workers. These are independently shaded as low, medium and high (darkest shading) quantiles. At a national level (Figure 1), home-working appears to be much more common in rural and relatively isolated parts of the country. Figure 2, on the other hand, shows a pattern of home-working in London that is strongly focussed on the more affluent west and north-west areas of the capital. Home-working also appears to be relatively common in south London, a pattern which may be related to the lack of underground train lines in the area. However, where home-workers are included in the intra-area flow, it is not possible to distinguish them in any other output tables, so it is not possible to explore (in the SWS data) their social and demographic characteristics. In contrast, homeworkers were separately coded in the 1991 SWS, allowing their characteristics to be explored in Figure 1. Home-workers as proportions of total employees and self-employed, by district for England and Wales, 2001. Source: Census 2001 SWS
all output tables. It is unfortunate, therefore, that no comparison can be made of home-workers as recorded in the 2001 Census. A second problem arises from this state of affairs for all users of the SWS: home-workers are included in all data in the 2001 and 1981 SWS, whereas they are not included in the 1991 data. Forward comparisons of 1981 with 1991 therefore risk inconsistency if the numbers of commuters living and working in the same area are compared. Similarly, problems will arise in comparisons of the 1991 and 2001 SWS datasets (beyond those posed by the fact that the 1991 data come from a 10% sample, whereas the 2001 data are from a 100% coding). In order to compare 1981 with 1991 or 1991 with 2001, it is necessary to remove home-workers from the 1981 or 2001 data, or to add the 1991 homeworkers into the general commuting population. The former approach is limited by the fact that this is only possible in the mode-of-transport table, meaning that any other tables cannot be validly compared. The latter approach is thus a more general purpose one, although it is useless for the specific case of analysing home-workers. Both operations can be done ‘by hand’ in extracted data, although pre-prepared versions of the 1991 SWS are also available via the Centre For InteracFigure 2. Home-workers as proportions of total employees and self-employed, by ward for London, 2001. Source: Census 2001 SWS
95
Temporal and Spatial Consistency
tion Data Estimation And Research (CIDER) that have had home-workers merged into the general intra-area flow. The inclusion of home-workers in journey-towork data sets are of course only one example of inconsistencies that arise from differences in the coding of ‘difficult to handle’ cases in interaction data. A similar situation occurs with the treatment of workers with an ‘unstated’ workplace, and migrants with an unstated origin. As with the example of home-workers, these cases received specially coded workplace or migrant origin classifications in the 1991 datasets, whereas there were merged with other flows in the 2001 data sets. Whilst the home-workers example was a case of preference for a different manner of classification, there is a more pertinent reason for differences in handling ‘unknown’ cases in 2001: improvements in census coding and data handling by the census agencies allowed more advanced methods to be developed Amongst these new methods was the ability to impute a workplace or migrant origin via comparison to similar ‘donor’ households. There is a justification for this approach, in that the census agencies had the full raw data at their disposal, and were thus able to impute with far more confidence that any end-user would be able to do with only aggregate data on which to work. However, a familiar problem arises: the 1991 and 2001 data cannot be validly compared, as they have, in effect, different population bases. It is impossible to remove the imputed counts in 2001, as no indication was given of the proportion of persons in any given flow who were imputed. One solution therefore, would be to add the ‘unknown’ flows from 1991 into the general set of flows, in order to make forward comparison more straightforward. In the case of home-workers, this approach was easy to implement; a new workplace destination can be imposed, as it is known that this must be the same as the area of residence. For to or from unknown locations this is clearly impossible. It is not possible to impute individual origins or workplaces, as the raw data are not available.
96
Instead, an aggregate distribution must be made. The simplest approach is to apply a pro-rata allocation to all unknown flows. Thus, if 20% of migrants to a given area are known to have come from a particular origin, it is assumed that 20% of any migrants with an unstated origin also came from that location. Pre-prepared versions of the 1981 and 1991 SMS datasets are available from CIDER that include pro-rate allocation of migrants with unstated origin, as are versions of the 1981 and 1991 SWS, with pro-rate allocation of workers with an unstated workplace. Direct pro-rata allocation is clearly not ideal – few would assume that migrants with an unstated origin have the same socio-demographic characteristics as other migrants, and that consequently they may have different mobility patterns. However, the use of any more complex modelling risks disagreement between analysts about the formulation of an allocation model; the raw counts of migrants with unstated origins and workers with an unstated workplace can still be explored, allowing individual researchers to apply their own allocation model.
TIME SERIES MIGRATION TRENDS The remainder of this chapter is concerned with demonstrating how consistent migration has been over the years since the 2001 Census: How has the volume of migration fluctuated over time? How stable are the propensities to migrate at different stages in the life course? Have the spatial distributions of migrant flows remained stable between 2001 and 2006?
Patient Register and NHSCR Data In order to answer these questions, we have to utilise data collected by ONS from patient registers compiled by health authorities across England and Wales, as introduced in Chapter 1 of this book. As explained in the technical guidance notes on
Temporal and Spatial Consistency
using patient registers to estimate internal migration (Migration Statistics Unit, 2007), ONS request a download of the patient register for each health authority (HA) each year (31 July) and compile a total register for the whole of the country containing the following information for each individual: NHS number; date of birth; sex; postcode of current place of residence; date of acceptance, i.e. first registered in the HA; patient’s HA; GP’s HA; registration type, i.e. birth; first acceptance; transfer from another HA; immigrant; ex-service; unknown; and previous HA. By linking the NHS number from one year to the next, it becomes possible to compare the postcode field of the individual record and to identify those cases where the postcode has changed, thus defining a migrant as someone who moved at some stage during the previous 12 months; this means that the count of changed postcodes is a measure of ‘transitions’, akin to the census definition of a migrant, rather than a count of moves or events, and it is possible to aggregate these data so they represent flows between local authority areas. The data received by ONS undergo a series of validation checks to remove records where patients have been issued with a temporary NHS number so they can receive treatment in an area that is not their usual residence, to identify records where there is incomplete data, and then to impute missing variables. However, one of the shortcomings of the patient register data system (PRDS) is that, like the census, certain population sub-groups will be missed out altogether if they are not present in both registers: infants born during the 12 month period, immigrants, and those leaving the armed forces will be excluded together with those who die during the period, who emigrate and those joining the armed forces. It is because of these omissions that ONS make use of the so-called NHSCR data, counts of patient re-registrations between HAs held by the Central Register of the NHS for England and Wales. Entries on the NHSCR are updated annually to generate a set of estimates of ‘moves’
taking place between HAs which include those omitted from the patient PRDS. Thus, each week, individual re-registration records are extracted from the NHSCR and sent to ONS from moves within England and Wales, as well as moves into England and Wales fro Scotland, Northern Ireland and abroad. Moves in the other direction, from Scotland and Northern Ireland and extracted from registers in these countries and sent to ONS on a quarterly basis. As with the patient register data, ONS processes the NHSCR data: validation, imputation and derivation of new variables. The NHSCR data between HAs are used to constrain the patient register data by applying scaling factors and the resulting estimates appear in a series of tables available annually through the national Statistics web site (http://www.statistics.gov.uk). In addition, ONS have recently used the patient register data in the production of annual mid-year versions of NHSCR-based tables for flows between local authorities. It is the inter-LA matrices for all ages and for broad age groups (0-15, 16-19, 20-24, 25-29, 30-44, 45-59, 60-74 and 75 and over) for years ending mid-year 1999 to mid-year 2006 that we use in the analysis which follows.
Comparison of Patient Register and Census Data Before we examine the time series migration trends since 1998-99, it is important to consider how consistent are the patient register data with respect to census flows. Comparisons of this type have been undertaken in studies of each census since 1971 (Ogilvy, 1980; Devis & Mills, 1986; Boden, 1989; Stillwell et al., 1992; 1995). Here we report on a comparison of patient register data for the year ending mid 2001 with 2001 Census data for 12 month period ending April 2001. Figure 3 presents the scattergraphs of in-migration and out-migration rates for all age migrants for all local authorities in England and Wales. The patient register rates, on the vertical axis, are computed using estimated final mid-year popu-
97
Temporal and Spatial Consistency
Figure 3. Relationships between patient register and census in-migration and out-migration rates, local authorities, 2000-01. Sources: 2001 Census SMS and patient register/NHSCR data supplied by ONS
lations whereas the census rates, shown on the horizontal axis, are calculated using end-of-period populations from the Census. The coefficients of determination for these graphs are 0.79 and 0.81 respectively, indicating strong positive correlation in both cases. The graphs in Figure 3 include both City of London and Isles of Scilly, both of which have very small populations and are amongst the largest residuals. The scatterplots for net migration rates (Figure 4) show greater variation around the line of best fit, as do the migration efficiencies which express district net migration flows as a percent-
age of the inflows and outflow. Consequently, the coefficients of determination drop to 0.56 and 0.55 respectively. The strength of the relationship for the all age flows provides strong evidence that the patient register data is identifying spatial variations in migration that are also apparent in the census data. However, when age-specific correlation analysis is undertaken, the coefficients of determination shown in Table 1 indicate much stronger relationships between census and patient register rates for out-migration, in-migration net migration and migration efficiency in certain age groups
Figure 4. Relationships between patient register and census net migration rates and migration efficiencies, local authorities, 2000-01. Sources: 2001 Census SMS and patient register/NHSCR data supplied by ONS
98
Temporal and Spatial Consistency
than others; in some cases such as the 16-19 year olds, R2 values for net migration and migration efficiency are higher than for gross migration rates. The most noticeable and consistently weak relationships across all four variables are found at age 20-24, although the least significant correlation is for net migration rates for 25-29 year olds. Lower levels of correlation in these age groups appear to be due to under-recording of migrants in the patient register data. This is likely to occur for various reasons including the lower likelihood of movement registration among students leaving higher education institutions following graduation as well as the lower propensity of individuals in these age groups to be registered with a health authority anyway. The ‘student factor’ at age 16-19 is less evident due to the fact that most students are compelled to register with the medical service when they arrive at their HE institutions.
Time Series Migration, 1998-2006 Despite the existence of NHSCR data stretching back to the mid 1970s, no study has managed to construct a consistent set of annual migration data over the last three decades with which to monitor migration propensities at the national level, primar-
ily due to the changing geographical areas used for the NHS administration. Using census and NHSCR data, Stillwell et al. (1992) showed how the volume of migration in Britain declined during the 1970s to a low in 1982 before rising again to a peak in the late 1980s as Britain recovered from the recession period, 1979-83. The boom years of the mid to late 80s were followed by further years of lower growth with migration in the year before the 1991 Census, involving 4.69 million persons compared with 4.72 million in the year before the 1981 Census. Migration rates picked up across the country during the 1990s (Van de Gaag et al., 2003) as GDP per capita rose throughout the decade and unemployment rates fell after 1993. The 2001 Census records a total migration of 6.05 million in Britain, although student migrants were included in this count, having been excluded in 1991. In this section, we explore what trends are apparent in migration in England and Wales using data on flows between a consistent set of local authority areas in England and Wales. Figure 5 shows that total migration varies over the period starting with 2.43 million in 199899 and ending with a slightly larger volume (2.44 million) in 2005-06. The time series index shows fluctuations of around 3% from the baseline (100) with 2005 being the only
Table 1. Coefficients of determination for census versus patient register data, local authorities, 200001 Age group
In-migration rate r2
Out-migration rate r2
Net migration rate r2
Migration efficiency r2
0-15
0.770
0.765
0.711
0.700
16-19
0.794
0.591
0.841
0.816
20-24
0.573
0.495
0.386
0.307
25-29
0.748
0.497
0.271
0.480
30-44
0.772
0.801
0.813
0.763
45-59
0.790
0.787
0.792
0.816
60-74
0.820
0.744
0.797
0.829
75+
0.681
0.668
0.626
0.582
All Ages
0.790
0.810
0.559
0.551
Source: 2001 Census SMS and patient register/NHSCR data supplied by ONS
99
Temporal and Spatial Consistency
Figure 5. Magnitude of inter-district migration in England and Wales, 1998-2006. Source: Patient register/NHSCR data supplied by ONS
year with fewer migrants that 1999. The third year in the sequence is that which overlaps with the 2001 Census where 2.24 million migrants were recorded as moving between districts and a further 3.24 million moved within districts in England and Wales. When we decompose the all age schedule by broad age group, it becomes apparent that these annual variations are due to fluctuations in the volume of those in the family and later working age ranges (Figure 6a) whereas flows for late teenagers and young adults plus those in older age groups appear more stable. However, when we compare migration intensities over time by age (Figure 6b), the most significant trends include the decline in migration rates at the age where migration propensity is at its highest (age 20-24) and in the previous age group (16-19) containing large numbers of students. Other age groups have relatively stable propensities throughout the period. An alternative measure of migration is the migration effectiveness score, explained in detail in Chapter 4 and indicating the importance of net migration taking place between districts in redistributing migrants around the country. Figure
100
7 contains two graphs that illustrate age-specific migration effectiveness in two different ways. Figure 7a presents age-specific migration rates with time on the horizontal axis whereas Figure 7b has age on the horizontal axis. Both graphs illustrate the importance of net migration as an agent of population redistribution at student age and also at older age, particularly retirement. Higher migration effectiveness means greater inequality between the inflows and outflows across the system and both the graphs suggest that it is the older working age and younger elderly groups which have seen a decline in migration effectiveness over the period. Hitherto, the analysis has been focused at the national level, so the next question to address is whether the time series variations also vary across space. In this instance we tackle the issue by examining geographical patterns at two spatial scales. Firstly, we provide a broad summary of change between 1998-99 and 2005-06 for all age migration by dividing the districts in England and Wales into the four categories defined and used for local governance and administration: London boroughs, metropolitan districts, unitary authorities and other local authorities. Secondly,
Temporal and Spatial Consistency
Figure 6. Inter-district migration flows and rates in England and Wales by age group, 1998-2006. Source: Patient register/NHSCR data supplied by ONS
we consider age-specific net migration rates at the district scale and examine what changes have happened between 2001 and 2006. The time series schedules of net migration using the four district-type categories demonstrate one of the most important features of the migration system in the country, the enormous net migration losses that are taking place from London to the rest of the country (Figure 8). In the first year of the pe-
riod, London was losing almost 56,000 migrants in net terms but this almost doubled to around 113,500 in 2004 before dropping back to just less that 80,000 by the last year of the period. In total, London has exported 486,000 people in net terms over eight years but has actually lost almost 1.8 million people through gross out-migration at an average of 235,000 per year. This phenomenal exodus is offset by the arrival of 1.2 million people
Figure 7. Inter-district migration effectiveness in England and Wales by age group, 1998-2006. Source: Patient register/NHSCR data supplied by ONS
101
Temporal and Spatial Consistency
from the rest of England and Wales together with a large number of immigrants from overseas. Commentators (Daily Mail, 28 September 2008) have attributed this massive exodus to an range of socio-economic indicators that include poor schools, overcrowding, severe transport congestion, increasing crime rates and high living costs as well as environmental deterioration. Migrants of all ethnic groups are leaving London but the flow is predominantly of white people and Chapter 9 of this volume explores the ethnic dimension of flows within London and between London and the rest of England and Wales. The schedule of net migration loss from London is reflected in the net migration gains evident for rural England, defined by the classification as other local authority areas. However, more rural areas are also receiving migrants in net terms from provincial metropolitan areas whose net migration losses are much more stable over the period. Unitary authorities also have a fairly stable time series of net migration with inflows and outflows in balance, creating a schedule of net migration that deviates only marginally around zero. One question raised by these time series schedules is whether the change for London is explained by increasing outflows or diminishing inflows. Figure 9a shows us that Figure 8. Net migration by district category, 1999-2006. Source: Patient register/NHSCR data supplied by ONS
102
the rates of in-migration tend to remain relatively stable for London whilst the rates of out-migration (Figure 9b) from the capital increase appreciably. The corollary of this is that the in-migration rate to other local authorities fluctuates more than the out-migration rates although there is a marginal decline in the latter over the period. Comparison of the two graphs indicates how the in-migration rates show wider variation between district categories that the out-migration rates though neither show any signs of convergence. Another question we might ask is where the major flows are taking place in the rest of the country. The data informs us that almost 19.8 million people moved between districts in England and Wales during the period and that more than half (9.2 million) moved between districts in different categories. However, almost 28% of migration took place between local authorities in rural areas and 12% between London boroughs. The flows between districts in the four categories in Table 2 are presented in rank order based on size of flow with rates defined as 100(Mij/Pi+Pj) shown in the final column and calculated using end-of-period populations of origin i and destination j district type categories. Given that almost 50% of the population live in other local authorities, it is not surprising that the largest volume of migration takes place between districts in this category although the highest rate of migration is between boroughs in London, where 14% of the population resides. The density of habitation in the capital and the relatively small size of the boroughs is partly responsible for the high rates. Moves between districts within the other two categories, metropolitan districts that contain 21% of the population and unitary authorities with 16% of the population, have relatively high rates at 8% and 6.5% respectively. The largest flows between categories are those from districts in unitary authorities to other local authorities and vice versa resulting in a gain of 127,600 over the period by the more rural areas. In net terms, whilst London lost almost 578,000 migrants to other local
Temporal and Spatial Consistency
Table 2. Migrant flows between district type categories, 1998-2006 Origin district type
Destination
Flow
Percentage
Rate
Other Local Authority
Other Local Authority
5,462,652
27.60
10.54
London Borough
London Borough
2,394,779
12.10
16.35
Unitary Authority
Other Local Authority
1,796,660
9.08
5.26
Other Local Authority
Unitary Authority
1,669,008
8.43
4.89
Metropolitan District
Metropolitan District
1,405,753
7.10
6.45
Unitary Authority
Unitary Authority
1,322,981
6.68
8.03
London Borough
Other Local Authority
1,253,975
6.34
3.77
Metropolitan District
Other Local Authority
1,002,138
5.06
2.72
Other Local Authority
Metropolitan District
813,891
4.11
2.21
Other Local Authority
London Borough
675,986
3.42
2.03
Metropolitan District
Unitary Authority
444,943
2.25
2.33
London Borough
Unitary Authority
431,505
2.18
2.77
Unitary Authority
Metropolitan District
397,726
2.01
2.08
Unitary Authority
London Borough
323,142
1.63
2.08
Metropolitan District
London Borough
201,855
1.02
1.11
London Borough
Metropolitan District
196,155
0.99
1.08
Source: Patient register/NHSCR data supplied by ONS
authorities between 2000-01 and 2005-06, it lost a further 108,000 to unitary authorities but gained 5,700 from the provincial metropolitan districts. Metropolitan districts lost 188,200 migrants in
net terms to other local authorities and a further 47,200 to unitary authorities. In our second spatial analysis, we consider the patterns of migration at the district level by
Figure 9. Gross migration rates by district category, 1999-2006. Source: Patient register/NHSCR data supplied by ONS
103
Temporal and Spatial Consistency
mapping the patterns of net migration in 200506 and the changes occurring since 2000-01 by using three shading categories for districts with net migration gains in 2005-06 (increasing gain, decreasing gain, loss to gain) and three categories for districts with losses (increasing loss, decreasing loss, gain to loss). The pattern of net migration losses from major metropolitan towns and cities and gains in many of the unitary authorities and other local authorities is shown for all age migrants in Figure 10a alongside the change between 2000-01 and 2005-06 in Figure 10b. In terms of absolute numbers, Birmingham lost more than anywhere else (-46,400) in 2005-06 and several London boroughs experienced losses of over 30,000, including Newham (-46,100), Brent (-37,600), Ealing (-35,400), Lambeth (-32,800) and Haringey (-30,000), whereas the districts with the highest gains are the East Riding of Yorkshire UA (18,200), North Somerset UA (12,500), Tendring (10,900), East Lindsey (10,800) and Arun (10,700). These districts have different sized populations, of course, but when rates are computed using end-of-period populations, Newham, Brent, Haringey, Ealing and Lambeth are all districts together with Hackney
and Southwark that have net migration rates over 10% of the population, with Newham at the top of the leaguer table showing a rate of net loss of 18.5%. At the other end of the net migration spectrum, rates are much lower with Torridge and West Lindsey having the highest rates of gain at 9.1% and 8.5% respectively. The change map indicates that in much of rural England and Wales, rates of gain are increasing (92 districts) whilst many urban districts are experienced increasing losses (38 districts), although there are more districts with decreasing gains (115 districts) and decreasing losses (49 districts). Several districts surrounding London and stretching westwards have become areas of gain in 2005-06. The spatial patterns of migration for different age groups indicate some interesting variations concealed by the aggregate patterns. Whilst the maps of net migration for the 0-15 year olds (Figure 11) show considerable similarity with those for all age migrants, the maps for the 16-19 year olds (Figure 12) and for the 20-24 year olds (Figure 13) have entirely different characteristics, explained by the fact that districts with major universities welcome large numbers of graduates that appear in the 16-19
Figure 10. Net migration in 2005-06 and change, 2000-01 to 2005-06, all ages. Source: Patient register/ NHSCR data supplied by ONS
104
Temporal and Spatial Consistency
Figure 11. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 0-15. Source: Patient register/NHSCR data supplied by ONS
category when they register as undergraduates and lose many of these same individuals three or four years later when they graduate aged 20-24. The only exception here is the situation for most London boroughs, whose balances are negative for those aged 16-19 and positive for those aged 20-24. The maps of change for the
16-19 year olds suggest increasing polarisation with more parts of the country having increasing losses (163 districts) than decreasing losses (133 districts) and more districts with increasing gains (39 districts) compared with decreasing gains (22 districts) and relatively few districts moving between the gain and loss categories. At age
Figure 12. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 16-19. Source: Patient register/NHSCR data supplied by ONS
105
Temporal and Spatial Consistency
Figure 13. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 20-24. Source: Patient register/NHSCR data supplied by ONS
20-24, the pattern of change is somewhat different with increasing gains experienced by many districts in the commuting belts around London and several of the metropolitan areas. Increasing losses, on the other hand, are evident from the districts containing major universities. At age 25-29 (Figure 14), the pattern of net migration does not change radically from
that aged 20-24. Relatively large losses are continuing to take place from university towns and cities, and London boroughs and districts adjacent to metropolitan cores are gaining. Moreover, the distribution of districts in each of the change classes is roughly even with certain rural parts of the country showing increasing losses but other rural parts having increasing
Figure 14. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 25-29. Source: Patient register/NHSCR data supplied by ONS
106
Temporal and Spatial Consistency
Figure 15. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 30-44. Source: Patient register/NHSCR data supplied by ONS
gains. The following age group, 30-44, is the age group that is likely to contain the parents of many of the 0-15 year olds; as a consequence, the patterns of net migration and the changes between 2000-01 and 2005-06 are very similar (Figure 15). The net migration of those in the older working age group, 45-59 (Figure 16), has a similar
pattern to that of the retirement age group, 60-74 (Figure 17), in each case showing how people are attracted away from the conurbation areas, particularly in the South East towards more rural areas, particularly to districts along the coasts of southern and eastern England, although increasing gains are experienced by many districts across central and northern England.
Figure 16. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 45-59. Source: Patient register/NHSCR data supplied by ONS
107
Temporal and Spatial Consistency
Figure 17. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 60-74. Source: Patient register/NHSCR data supplied by ONS
In contrast, the net migration balances for those aged 75 and over (Figure 18) do not have the same coastal orientation as the previous age groups. Metropolitan areas are certainly losing migrants in older age, but in much smaller numbers whilst rural areas closer to urban centres are where increasing gains are taking place.
CONCLUSION Time series spatial analysis can be a very frustrating activity when geographical boundaries change, when definitions and classifications change and when the raw counts collected through censuses are processed and/or published in different ways from census to census. The first half of this chapter has attempted to present some of the causes of
Figure 18. Net migration in 2005-06 and change, 2000-01 to 2005-06, ages 75 and over. Source: Patient register/NHSCR data supplied by ONS
108
Temporal and Spatial Consistency
these frustrations and has identified certain ways of coping with inconsistencies that have dogged the comparison between 1991 and 2001. The 2001 Census saw an important dislocation between the ED units for which data were collected by enumerators and the OA geography for which data were processed and published. It is to be hoped that the lowest level output geography in the 2011 Census will enable a more consistent comparison to be made between 2001 and 2011. The second half of the chapter has focused on interaction data that allows the monitoring of migration between districts in England and Wales over time, having demonstrated a strong correlation with all age rates of migration computed from the 2001 Census, if not for rates associated with those aged 20-24 and 25-29, where the patient register data appears underestimated compared with the Census. The time series of patient register data suggest that whilst the all age volume of migration varies from year to year with no unidirectional trend between 1998-99 and 2005-06, the propensities of those in the late teenage and young adult age groups show a tendency to decline whilst intensities for other age groups remain pretty stable throughout the period. Changes in migration effectiveness, on the other hand, are relatively small with the most noticeable trend being a decline in the effectiveness at age 60-74 and a change in the same direction at 45-59. Perhaps the most dramatic trend shown by the data appears when we consider the flows taking place between districts categorised into the four main governance types. At this spatial scale, we observe the phenomenal number of out-migrants from London boroughs to the rest of the country and to other local authorities in particular. The volume of net out-migrants doubled between 1998-99 and 2005-06 from 55,800 to 113,500, involving an outflow of over a quarter of a million (257,000) in 2005-06, of which 170,100 or two thirds went to other local authorities, offset by an inflow in the opposite direction of 143,500, only 20% of the 391,000 flows leaving these
rural areas, the majority of which (53%) moved to neighbouring unitary authorities. Many of the boroughs of London, including Wandsworth (19.1%), Lambeth (18.7%), Islington (18.5%) and Hammersmith and Fulham (17.4%) are amongst the districts with the highest rates of turnover in the country, alongside Cambridge (18.6%) and Oxford (18.3%), whose turnover rates are influenced by the numbers of students moving to and from these university locations. Finally, the analysis of net migration at the district scale in 2005-06 highlights the way in which migration propensities and patterns vary according to the life course such that the all age pattern of counterurbanisation obscures very different patterns of migration for those in the student age groups and those in their twenties. The changes taking place between 2000-01 and 2005-06 at each age group appear to be accentuating the differences between areas of gain and areas of loss. In conclusion, the message for policy makers, and census administrators in particular, that emerges from the work reported in this chapter is that consistent spatial boundaries and variable definitions are essential for reliable time series analysis. We can be hopeful that the 2011 Census will allow for much more accurate comparison with the 2011 Census than has been possible with the former and its predecessor. The emergence of consistent time series of patient register data since 1998-99 for England and Wales is to be much welcomed and we hope continued into the future, enabling a series of consistent inter-censal migration indicators for local authorities to be computed for the first time. Further research is required to include reliable and consistent estimates of flows between districts in Scotland and Northern Ireland as well as flows between districts in these countries and those of England and Wales so that a full set of inter-district flows within the UK is available for monitoring trends.
109
Temporal and Spatial Consistency
ACKNOWLEDGMENT The authors are grateful to Alistair Davies, Internal Migration Supervisor at the Migration Statistics Unit at ONS for supplying the patient registration data for years ending mid-1999 to mid-2006 and to Nicholas Stillwell for work on preparing the patient register data and computing the rates.
REfERENCES Bell, M., Rees, P., Blake, M., & Duke-Williams, O. (1999). An age-period-cohort database of inter-regional migration in Australia and Britain, 1976-96. Working Paper 99/02, School of Geography, University of Leeds. Boden, P. (1989). The analysis of internal migration in the United Kingdom using Census and National Health Service Central Register data. Unpublished PhD Thesis, University of Leeds, Leeds. Boyle, P., & Feng, Z. (2002). A method for integrating the 1981 and 1991 GB Census interaction data. Computers, Environment and Urban Systems, 26, 241–256. doi:10.1016/S0198-9715(01)00043-6 Daily Mail. (2008, September 28). Middle classes leading the flight as 250,000 quit London. Daily Mail. Retrieved from http://www.dailymail.co.uk/ news/article-1062314/Middle-classes-leadingflight-250-000-quit-London.html Devis, T., & Mills, I. (1986). A comparison a migration data from the National Health Service Central Register and the 1981 Census. OPCS Occasional Paper 35, OPCS, London. Frost, M., Linneker, B., & Spence, N. (1996). The spatial externalities of car-based worktravel emissions in Greater London, 1981 and 1991. Transport Policy, 3, 187–200. doi:10.1016/S0967070X(96)00027-3
110
Gregory, I., Dorling, D., & Southall, R. (2001). A century of inequality in England and Wales using standardized geographical units. Area, 33, 297–311. doi:10.1111/1475-4762.00033 Gregory, I., & Ell, P. (2005). Breaking the boundaries: geographical approaches to integrating 200 years of the census. Journal of Royal Statistical Society A, 168, 419–437. doi:10.1111/j.1467985X.2005.00356.x Migration Statistics Unit. (2007). Using Patient Registers to Estimate Internal Migration, Technical Guidance Notes. Migration Statistics Unit. ONS, Titchfield. Office for National Statistics. (1999). Gazetteer of the New and Old Geographies of the United Kingdom. Retrieved from http://www.statistics.gov.uk/ downloads/ons_geography/Gazetteer_v3.pdf Ogilvy, A. A. (1980). Inter-regional migration since 1971: an appraisal of the data from the National Health Service Central Register and labour Force Surveys. OPCS Occasional Paper 16, OPCS, London. Stillwell, J. C. H., Duke-Williams, O., & Rees, P. (1995). Time series migration in Britain: the context for 1991 Census analysis. Papers in Regional Science: Journal of the Regional Science Association, 74(4), 341–359. Stillwell, J. C. H., Rees, P. H., & Boden, P. (Eds.). (1992). Migration Processes and Patterns: Vol. 2. Population Redistribution in the United Kingdom. London: Belhaven Press. Van der Gaag, N., van Wissen, L., Rees, P., Stillwell, J., & Kupiszewski, M. (2003). Study of Part and Future Interregional Migration Trends and Patterns within European Countries. In Search of a Generally Applicable Explanatory Model. the Hague: Report for Eurostat, Netherlands Interdisciplinary Demographic Institute.
111
Chapter 6
A New Migrant Databank: Concept and Development Peter Boden University of Leeds, UK Phil Rees University of Leeds, UK
ABSTRACT Few parts of the UK remain unaffected by the surge in migration from central and eastern Europe that has been evident since the expansion of the European Union in 2004. However, the statistical instruments available to measure the multi-dimensional impact of international migration remain inadequate. The lack of empirical evidence to support research and analysis of migrant populations is an issue that affects a broad range of organisations at international, national, regional and local level. The problem is particularly acute in a selected set of local areas, where migrant populations have had a significant demographic, economic and social impacts. This chapter reports on work examining the changing profile and dynamics of the UK’s ethnic populations. The estimation and projection of ethnic group populations for local areas requires accurate intelligence on the inflow and outflow of international migrants. In the absence of a definitive source of data that can provide these statistics, the New Migrant Databank (NMD) has been developed which combines alternative sources of international migration data into a common statistical framework for presentation and analysis. The alternative sources of international migration data are summarised and a number of analytical examples are provided to illustrate how the NMD can provide a much improved picture of patterns and trends at a local level and the basis for improved intelligence on local estimates of both short-term and long-term migration. A number of developments are suggested, both to focus future research and to extend the content and value of the NMD as a common source of intelligence on UK immigration and emigration. DOI: 10.4018/978-1-61520-755-8.ch006
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
A New Migrant Databank
INTRODUCTION International migration is a key driver of population change in the UK. National projections for the UK for 2006-11 estimate an annual population increase of 435,000, with approximately half due to natural increase and half due to net immigration (ONS, 2008a). Since 2004, with the accession of ten new countries to the European Union (EU), the UK has experienced an unprecedented inflow of economic migrants from central and eastern Europe, supplementing the historical flow of migrants that has originated from the New and Old Commonwealth, EU member states and other foreign countries. This inflow of new migrants has had an impact upon existing migrant communities but has also created new, ethnically diverse communities in areas previously untouched by the effects of economic migration (Bauere et al., 2007). Whilst migrant workers have continued to fill vacancies in skilled and professional occupations, large numbers have also been employed in low-wage employment in rural businesses, particularly agriculture and food processing. At the same time, the construction industry, the hotel and catering industry and the care industry have all seen a significant influx of migrant labour. In the absence of a population register in the UK, there is no single source of statistics that provides a comprehensive measure of the new migrant population and its many dimensions. However, the UK-wide impact of migrant populations is placing far greater importance on their measurement and there is increasing political pressure for improved intelligence on the volume, profile and geographical distribution of migrant communities (Statistics Commission, 2007; House of Lords Select Committee on Economic Affairs, 2008a; 2008b; House of Commons Treasury Committee, 2008). With an uncertain statistical picture at a national level, the analysis of migrant impacts at a local level is even more problematic. But it is in these communities, schools, hospitals and local businesses where the impact has been most
112
dramatic and where better information to facilitate research, analysis and policy formulation is most crucial. This chapter reports on research investigating the impact of new migrant communities upon the UK’s ethnic profile. Whilst the primary focus of the research is the development of local area population projections for ethnic groups within the UK, a key component of these projections is the measurement and estimation of immigration and emigration at a local geographical level. The chapter provides an illustration of some of the difficulties involved in accurately measuring international migration and summarises the types of data that are available from alternative UK sources. No single source provides a definitive measure, so it is necessary for data sets to be combined to provide the most informed view of the impact of migration at a local level. This chapter illustrates how the New Migrant Databank (NMD) concept, can combine data sources to provide a common framework for presentation and analysis and the basis for improved measurement of international migration.
CONCEPTUAL AND DEfINITIONAL DIffERENCES The pattern of passenger journeys is complex, with visitors and migrants coming into and leaving from the UK for a variety of reasons and for a variety of lengths of stay (Figure 1). Most visitors and migrants will enter the UK legally, although some will arrive as illegal migrants. Some come to find work, some to study, others to join existing family members and some to seek protection from abuse or persecution in their home country (asylum seekers). Some migrants will come with dependents. Some will come to the UK as visitors and then decide to stay for a longer period, sometimes for more than 12 months (visitor switchers). Others will come as migrants with the intention of staying for a long period
A New Migrant Databank
Figure 1. UK international passenger arrivals and departures, 2006
but then change their mind and return within 12 months (migrant switchers). Some migrants will be highly skilled, others less so, seeking manual and semi-skilled employment. Some will emigrate permanently from the UK; others will leave for a short or extended length of time but then return. Some migrants will come to the UK, stay in one place for only a short period but then move on to a more permanent residence. In early 2008, the Home Office introduced the first phase of the UK’s new immigration system designed to simplify the process by which migrants from outside the European Economic Area (EEA) come to the UK (Home Office, 2006). The system is designed to streamline and simplify the process by which migrant workers enter the UK’s labour market. The new points-based system consists of five separate tiers: each is subject to different conditions, entitlements and entry-clearance checks: • • • •
Tier 1: highly skilled individuals; Tier 2: skilled workers to fill specific gaps in the UK labour force; Tier 3: low skilled workers to fill temporary labour shortages; Tier 4: students; and
•
Tier 5: youth mobility and temporary workers.
Points are awarded to migrants reflecting their skills, experience and age and the demand for these skills in the UK economy. The new system does not apply to migrants from the EEA and Switzerland. Migrants from Accession countries have freedom of movement throughout the EU but, until 2009, must register with the Workers Registration Scheme (WRS) if they are to be employed in the UK (this does not apply to migrants from Malta and Cyprus). Bulgarian and Romanian migrants have similar freedom of movement but employment in the UK requires an individual application for an ‘accession worker card’ and, in certain cases, an application for a work permit from an employer. There is a variety of terminology associated with the process of international migration, describing the event of moving to or from a country, the individuals involved and the duration of stay (Table 1). The terminology has changed over time in response to events (streams of migration) and the changing social meanings attached to different terms. The length of time a migrant stays in the UK is a particular issue when interpreting migration
113
A New Migrant Databank
Table 1. International migration terminology What is a migrant? General
A migrant is a person who relocates from one place to another within a specified period of time.
International
An international migrant is a migrant who relocates from one country to another.
Immigrant
An immigrant is an international migrant who migrates into a country.
Emigrant
An emigrant is an international migrant who migrates out of a country.
New
A new migrant is an international migrant who has arrived in the UK in the recent past. It is sometimes used to refer to recent immigrants from central and eastern Europe.
Old
An old migrant is an international migrant who arrived some years ago.
Long-term
A long-term migrant is an international migrant whose country of usual residence changes for a period of 12 months or more.
Short-term
A short-term migrant is an international migrant who moves to a country other than that of his or her usual residence for a period of at least 3 months but less than a year.
Transition
A transition is defined as a change in country of residence between two points in time. The Census records as migrant as a person whose country of usual residence has changed over a twelve-month period.
Move
A move is the event associated with a change in country of residence. There may be more than one move between two points in time.
Flow
Flows provide a count of the number of international migrants that come to or leave the UK in a specified period of time, usually a single year.
Stock
Stocks provide a count of the total number of resident migrants. Resident migrants can be counted in a number of different ways; as persons with a different country of birth or as persons who have immigrated to the UK within a specified time period.
statistics (Figure 2). Those staying for less than three months, for example, are generally classed as visitors. A ‘short-term migrant’ is defined as “a person who moves to a country other than that of his or her usual residence for a period of at least 3 months but less than a year, except in cases where the movement to that country is for purposes of recreation, holiday, visits to friends and relatives, business, medical treatment or religious pilgrimage” (United Nations Statistics Division, 2006). At present, data on short-term migrants are not included in published statistics and it is the indeterminate length of stay of the majority of Accession country migrants that is the main source of current confusion and uncertainty with UK migration statistics. A ‘long-term migrant’ is a person whose country of usual residence changes for a period of 12 months or more. It is this migrant definition that is the basis for the UK’s National Statistics on total annual immigration and emigration via a question
114
on intended duration of stay in the International Passenger Survey (IPS). The distinction between stocks and flows is also an important one when using migrant statistics. Stocks provide a count of the total number of resident migrants. Resident migrants can be counted in a number of different ways; as persons with a different country of birth or as persons who have immigrated to the UK within a speciFigure 2. Visitors and migrants
A New Migrant Databank
fied time period. Flows provide a count of the number of new migrants that come to or leave the UK in a specified period of time, usually a single year. Migrant flows will increase or decrease the size of the resident stock of migrants, depending upon the balance of emigration to immigration. Migrant workers have typically been classified as those who have arrived in the UK within the last five years, so will encompass both stocks and flows when considering available statistics. These workers will be a combination of legal and illegal workers, including those from the EEA who have a right to live and work in the UK plus those from outside the EEA who require a work permit or who come to the UK as working holiday makers or seasonal agricultural workers.
ALTERNATIVE DATA SOURCES There is no single data collection instrument for the measurement of international migration. There are a number of alternative sources which provide
specific intelligence about the movement of population into and out of the UK. These sources may be generally classified as either census, survey or administrative datasets (see Chapter 1). Each has its own limitations depending upon the question asked, the purpose of the data collection and the population covered (Figure 3). The decennial census is the most comprehensive source of data on the UK population but its data ages rapidly, particularly at a time of such significant demographic change. Surveys are rich sources of data but are typically not statistically robust for local-area analysis and do not adequately capture all migrant populations. Administrative sources can provide excellent geographical detail but typically do not have the data richness that a survey provides. Few sources provide data on emigration from the UK with administrative systems typically only providing data on new or resident migrants. The International Passenger Survey (IPS) is the only instrument measuring UK immigration and emigration, for nearly all types of migrant. It is a multi-dimensional survey, of which the migrant
Figure 3. Table of sources of data on UK international migration
115
A New Migrant Databank
questions are just one part. It surveys approximately 250,000 passengers each year: about 1 in 400 of the total number entering or leaving at the UK ports. Of this sample, about 1% are migrants whose stated intention is to stay or leave the UK for more than 12 months. This is equivalent to approximately 3,000 respondents, 70% of which are immigrants and 30% are emigrants. From 2007, the number of interviews with departing migrants is being boosted to a comparable level to those on entry. IPS respondents are asked their ‘intended length of stay’. Estimates of long-term migration (where duration of stay is less than twelve months) feed directly into National Statistics of Total International Migration (TIM) produced by the Office of National Statistics (ONS). TIM combines data from the IPS with additional statistics from the Home Office on asylum seekers and their dependants and from the Irish Central Statistical Office on estimates of migration between the UK and the Irish Republic based on the Labour Force Survey (LFS) in Ireland. Visitor switchers, those people whose original intention was to stay for less than twelve months but who subsequently stay for longer, are estimated from IPS visitor data. Migrant switchers, those people who intended to stay for more than 12 months but decide to leave within a year, are derived from Home Office data on non-EEA citizens. TIM statistics, provide the most accurate estimates of long-term immigration and emigration at a national level (ONS, 2008b). ONS has an ongoing programme of improvement for its international migration statistics that has included changes to the way sub-national estimates of long-term migration are produced. This has removed the tendency for over-estimation of immigration into London by incorporating statistics from the LFS to calibrate IPS migration flows for Government Office Region (GOR) and by creating a new ‘intermediate’ geography to improve the allocation to local area (ONS, 2007b; 2007c).
116
The Census can provide both a view of migrant flows for the year prior to enumeration and a measure of migrant stocks present in the usually resident population. Migration ‘flow’ data are derived from a question, which asks for an individual’s address twelve months prior to enumeration day. Only in-migration is measured as there is no attempt to capture information on individuals who have emigrated during the census year. The stock picture is derived from detailed country of birth statistics, although in the absence of a question on year of entry to the UK, it is not possible to measure the length of time a migrant has been resident in the UK. The Pupil Census contains individual pupil records for all children in grant maintained schools in the UK. The dataset is managed by the Department For Children, Schools And Families (DCSF) and collected on a twice-yearly basis from individual schools within each Local Education Authority (LEA). A total of 8 million pupils are included on the data set each year. The data set does not provide an obvious source of statistics on migrant workers but it does have the potential to provide an informed picture of the composition of local areas based on the changing profile of pupil numbers using captured information on ethnicity and first language. The Labour Force Survey (LFS) is a quarterly sample survey of households living at private addresses in the UK and provides the most detailed statistics on the UK labour market. The LFS captures a 60,000 sample of households in Great Britain and asks the question, ‘where were you living one year ago’ so it can provide a count of the ‘flow’ of migrants coming to the UK within a single year. It also records information on year of entry to the UK, which provides a picture of the length of time migrants have been resident – thus producing the most reliable statistics on the ‘stock’ of migrant workers in the UK. However, as an accurate measure of international migration the LFS has a number of constraints:
A New Migrant Databank
• • • •
it excludes students in halls of residence who do not have a UK resident parent; it excludes people in most types of communal establishments; it excludes migrants who have been in the UK for less than six months; and it is grossed to population estimates that only include long-term migrants.
The LFS is therefore likely to significantly undercount foreign students and foreign workers, whose stay is relatively short. The LFS contains a sample of about 700 international migrants per year (i.e. persons who state they were resident overseas one year ago). This small sample size precludes more detailed analysis of migrant inflow by local geographical area. During 2008-2009, it is planned to combine the LFS with the Annual Population Survey (APS), the General Household Survey (GHS), the Expenditure and Food Survey (EFS), and, the National Statistics Omnibus Survey (NSOS) to create the Integrated Household Survey (IHS). This single survey approach will create a much large sample size, with migrant worker questions from the LFS being retained in a core module that is expected to cover 221,000 households (ONS, 2007d). The capture of communal households and migrants who have been resident for less than six months will need to be addressed if the new IHS is to significantly improve the migrant counts available from the LFS. The Home Office regularly publishes National Statistics on immigration and asylum. British Citizens, those Commonwealth Citizens who have freedom of entry to the UK and nationals from the EEA are not subject to immigration control and are not included in Home Office statistics. No information is recorded on people emigrating from the UK. National Statistics produced by the Home Office fall into three broad categories: asylum; control of immigration and persons granted British Citizenship (Home Office, 2008a; 2008b). Most statistics are only available at a national level, with no sub-national provision. Work Permit statistics
for each local authority district and unitary authority (LADUA) have previously been made available but these data are no longer routinely produced by the Home Office, although asylum statistics are still available at a local authority level. For a new migrant to the UK, acquiring a National Insurance Number (NINo) is a necessary first step for employment/self-employment purposes or to claim benefits or tax credits. NINo statistics, managed by the Department of Works and Pensions (DWP) record an individual’s residence, ‘country-of-origin’, age and gender. The Information Directorate (IFD) within DWP is responsible for the publication of statistics from its National Insurance Recording System (NIRS) and a summary of NINo registrations to A8 migrants is published periodically as part of the more general release of migration statistics coordinated by ONS (Home Office, 2008c). NINo statistics exclude dependents of applicants, unless they claim benefits or work themselves. They will also exclude most students and those migrants who are not of working age and not claiming benefits. They provide no indication of the length-of-stay of a migrant worker and there is no formal deregistration process. Migrants can actually leave the UK and return at a later date without the necessity to re-register for a new NINo. Nationals from the Accession 8 countries of the Czech Republic, Estonia, Hungary, Latvia, Lithuania, Poland, Slovakia and Slovenia who come to work in the UK are required to register with the Workers Registration Scheme (WRS). A new registration is required when a person changes employment or an applicant is employed by more than one employer. Year of registration is recorded, as is nationality of the individual. Date of birth, gender and occupational status are also routinely captured. There is no method for tracking how long each applicant stays in the UK as, like the NINo system, there is no de-registration process necessary. A detailed statistical picture produced from the WRS is regularly published, illustrating the profile of applicants and of registered work-
117
A New Migrant Databank
ers and detailing type of employment, hours of work, wages and a regional disaggregation (Home Office, 2008c). The WRS provides richer data detail than NINo statistics but has a number of limitations. It records the location of the employer but not the residential location of the applicant. It only records information on A8 migrants and it also excludes those who are self-employed. In addition, the WRS will not record A8 migrants who come to the UK for reasons other than work, including students. The WRS is also only a temporary administrative system and is scheduled to terminate in April 2009. When new migrants first register with a General Practitioner (GP), they are explicitly identified as an individual whose previous address was outside the UK and who has spent more than three months abroad. The Patient Registration Database System (PRDS) records the age and gender of new migrants but does not provide any more detailed information on nationality, country of origin or country of birth. No information is captured on patients who have emigrated from the UK. GP registrations capture all migrants, regardless of age and employment status, so in theory they provide the most comprehensive view of migration inflows. Migrants captured by the registration process will include short-term migrants, in addition to those who have been resident for at least twelve months. It is not possible to identify actual or intended length of stay from the data. For the majority of migrants, there will be a time-lag between entering the UK and registering with a GP and some migrants may never complete the registration process during their stay in the UK. Young men, in particular, will delay registration after migration more than older men or women. Also, a PRDS record loses its migrant status once a patient moves within the UK and registers with a new GP. The Higher Education Statistics Agency (HESA) maintains a record of all students in the UK, including those whose country of usual
118
residence is outside the UK. HESA administrative systems do not capture the residential address of international students, only the location of the institution of study. Students provide information on their expected length of stay and although nationality is requested, it is not a mandatory field and coverage is typically poor. Age and gender are recorded and ethnicity is only provided on students with a UK domicile. A ‘flow’ picture can be produced, recording all students who arrive and depart in a particular year. In addition, by looking at all students who are studying during a particular year, a ‘stock’ picture can be produced. The picture is dynamic because of the constant churn of students by institution.
AN INTEGRATED VIEW TIM statistics have previously been the definitive source of information on international migration to and from the UK but as the migration process has become ever more diverse and complex it has been necessary to seek additional intelligence from alternative sources. The NMD is a simple concept but one which can provide a measure of consistency into the way international migration statistics are presented and interpreted (Rees and Boden, 2006). It can also provide the basis for more detailed research and analysis into the most appropriate methods for measuring international migration at all geographical levels. It is not a perfect solution to the problem of accurately measuring the impact of international migration; but in the absence of a population register, or equivalent, it is making best use of all the intelligence that is available and making it accessible to those that most need it. In summary, the NMD can provide: •
a ‘single view’ of alternative statistics for an extended time-series, by geographical area;
A New Migrant Databank
• • • •
a common and consistent reporting framework for all local authority areas; clarity of conceptual and measurement differences; a framework for analysis of trends and patterns in migration; and the basis for research and analysis targeted at improving migration and population estimation at a local geographical level.
For each local authority area, the NMD draws together statistics from a number of alternative sources, providing a consistent view of each, benchmarked against regional and national totals. The integration of information from different sources will depend upon permissions of use but as the data are captured at an aggregate geography, there are likely to be fewer issues associated with data protection and disclosure. The NMD combines statistics from a number of alternative sources, to provide a consistent view of each, benchmarked against each other and against regional and national totals. Figure 4 gives an example of the type of output that can be generated by the NMD for a selected urban local authority area. The statistics presented are as follows.
New Migrant Counts The summary chart in Figure 4 illustrates both change over time and the differences that exist between four alternatives sources: the 2001 Census, TIM statistics used within official mid-year population estimates, GP registrations to foreign nationals and NINo registrations to foreign workers. The NINo data are disaggregated to illustrate the total number of registrations and those to non-Accession migrants. In this local area, the 2001 Census benchmark indicates an inflow of approximately 2,500 migrants in the year prior to enumeration. Over time, there is consistency between the TIM estimate of long-term migration and the GP registrations. Prior to 2004, the
trend in NINO registrations to non-Accession migrant workers was also consistent with both TIM estimates and GP registrations; indicating an absence of short-term migrant workers in this local authority. Total NINo registrations rise sharply after 2004 illustrating the impact of A8 migrants. Interestingly, for this local authority, there is no corresponding rise in either GP registrations or TIM estimates, suggesting that a large number of these migrants have been short-term or, due to their age-profile, have not considered registering with a GP.
MYE: Internal and International Migration The second chart in Figure 4 illustrates the net flow picture for both internal and international migration. The statistics are drawn from the migration assumptions that underpin the mid-year population estimates (MYE) produced by ONS for 2002-2006 and for the sub-national population projections (SNPP) for 2007-2031. International migration is derived from TIM statistics, internal migration is based on evidence from GP registrations between local authority areas in the UK. This local area has experienced a significant net loss due to internal migration since 2002 that peaked in 2004 but reduced to 4,000 by 2006. A net inflow of long-term international migrants is evident, with an average annual net gain of approximately 2,000 over the time-period. The trend for the SNPP is for a continued net loss due to internal migration of over 4,000 per year, coupled with a net gain through international migration of approaching 2,000 per year.
Alternative Sources (2006) The next two illustrations are designed to give a snapshot of the migrant counts that are produced from alternative sources and to illustrate whether the local area has a consistent share of its regional total. An indication of the local authority’s share
119
A New Migrant Databank
Figure 4. Area profile: An example illustration from the new migrant databank. Sources: 100% data extract from the National Insurance Recording System (NIRS): 2006 Mid-year estimates (ONS, 2007a); 2006-based SNPP, current data (ONS, 2008c); GP registration statistics provided by ONS; Workers Registration Scheme (Home Office, 2008c); TIM estimate of Accession migrants (ONS, 2008b).
of the regional population total is provided in the second of the two charts. In this instance, data are displayed for: the TIM estimates that are used in the ONS mid-year estimates and those that have been used as the long-term assumptions in the ONS sub-national projections; 2001 Census;
120
WRS; GP registrations; and NINo registrations, for both Accession and non-Accession migrants. HESA data has no students recorded in this local authority. The TIM estimates and the GP registrations are consistent, both in absolute terms and as a percentage of the regional total. The WRS count
A New Migrant Databank
is low, an indication that few employers are based in the local area. The split between non-Accession and Accession migrants is approximately 70:30. The higher regional share for the NINo counts suggests a significant number of short-term migrants resident in the local area.
NINo Profile Two charts based on NINo registration data are presented. The first gives an indication of the changing profile of migrant workers registering for a NINO since 2002-03. Registration by Old and New Commonwealth migrants and those from ‘other’ countries has remained fairly stable, whereas the A8 and other European totals have increased since 2004-05. The second chart gives an illustration of the ethnic profile of NINo registrations for 2006-07. This profile has been derived by combining NINo country-of-origin data with corresponding data from the 2001 Census on international migration by ethnic group and country of origin. Over 70% of NINo registrations in 200607 were classified as ‘white’ ethnic, reflecting the rise in Accession and ‘other Europe’ migrant registrations to this local authority.
WRS Profile (2006/07) The WRS profile records Accession migrants who work within the local area. They may live elsewhere. The total number of WRS registrations is relatively small in this area but the large majority are employed in the hospitality sector, with a further 20% in general administration. The majority of these workers (68%) are Polish citizens with a further 12% from Lithuania.
PRELIMINARY ANALYSIS The creation of the NMD has provided the basis for research and analysis into the patterns and trends in international migration that are evident at a sub-
national level, making direct comparisons between datasets to provide a more informed picture of the variations that exist, particularly between TIM statistics and administrative sources.
What do the Different Data Sets Tell Us about the Level and Distribution of Immigration? The immigration profile for England demonstrates that the TIM immigration estimates and the GP registrations have followed a similar trend since 2002 (Figure 5a). Until 2006, when the TIM estimate dipped below the level of GP registrations, there was an approximate 5% difference between the two datasets. This suggests that although the GP registration data may include a number of short-term migrants, it does appear to provide a comparable indicator of long-term immigration. NINo registrations to non-Accession migrants also follow a similar overall trend to GP registrations and TIM estimates but at a much lower level, reflecting the fact that this dataset only covers migrant workers, excluding students, dependents and those not in employment but suggesting that they are likely to be predominantly long-term migrants. Accession migrants are responsible for the majority of the substantial rise in total NINo registrations since 2004. The consistency in the other data sets suggests that the majority of these Accession registrations have been to migrants with an indeterminate length of stay that have not been captured by the TIM estimation process and have largely not registered with a GP. The immigration profile for London in Figure 5b is similar to that for England although the peak in total NINo registrations is less significant, reflecting the fact that Accession migrants are dispersed across the UK and not more concentrated in the capital as is the case with non-Accession migrants. The TIM estimate of immigration to London for 2006 again falls below that of the GP registration total suggesting that there may be some under-estimation of flows to London.
121
A New Migrant Databank
Figure 5. Immigration profiles from alternative sources, England and London. Sources: 100% data extract from the National Insurance Recording System (NIRS); 2006 mid-year estimates (ONS, 2007a); GP registration statistics supplied by ONS.
Figure 6 provides an illustration of the regional distribution of immigration flows from alternative sources and indicates the corresponding regional share of population (Figure 6a). In 2001, long-term immigration to London comprised 34% of the total to England (Figure 6b). A further 20% was recorded in the South East and 10% in the East of England. In 2006, the TIM estimate of immigration to London remained at 34% of the total based on a resident population of approximately 15% of the England total (Figure 6c). The South East had reduced its share of immigration to 10% with the East Midlands, Yorkshire and the Humber and the North West increasing from their respective 2001 positions as a result. GP registration data for London in 2006 showed consistency with the TIM estimates but there were differences of 2% or more in the share of immigration for the South East, West Midlands and Yorkshire and the Humber, suggesting that TIM estimates could be less accurate in these regions (Figure 6d). These differences are explored further in the next section. The LFS in 2006 exhibits a flatter distribution for immigration flows, with only 30% concentrated
122
in London but 17% and 13% in the South East and East of England respectively (Figure 6e). Other regions, with the exception of the North East at 3%, have a share of 7-8% of the LFS total. The WRS captures A8 migrant registrations by employer location and thus produces a dispersed distribution across the English regions, with only 15% registered in London (Figure 6f). This reflects the spread of migrant employment across the country but hides the additional impact of self-employed migrant workers which are excluded from the WRS scheme. The NINo profile in 2006 provides a more accurate picture of the distribution of Accession migrants (A8 plus Bulgaria and Romania) by area of residence rather than employment. Only 26% of NINos were registered in London, with a much more dispersed profile between the regions confirming the impact of Accession migrants throughout England (Figure 6h). By contrast, half of non-Accession NINo registrations were to London based migrants, reflecting established migrant streams (Figure 6g).
A New Migrant Databank
Figure 6. Immigration profiles from alternative sources, GOR England. Sources: 100% data extract from the National Insurance Recording System (NIRS): 2006 Mid-year estimates (ONS, 2007a); 2006-based SNPP, current data (ONS, 2008c); GP registration statistics supplied by ONS; Workers Registration Scheme (Home Office, 2008c); TIM estimate of Accession migrants (ONS, 2008b).
123
A New Migrant Databank
Can Alternative Sources Help to Improve the Accuracy of National Statistics? The TIM estimates of international migration are primarily based on sample data derived from the IPS. To improve the robustness of the sub-national estimation of immigration, the LFS is used to distribute flows to regional level, which are then further allocated to a new ‘intermediate’geography before distribution to individual local authority areas based upon proportions evident from the 2001 Census. At present this process of local estimation does not incorporate additional intelligence from administrative sources, although initial analysis and investigation has been completed to compare WRS, NINo and GP registration statistics with those derived from the TIM estimation process (ONS, 2007d). The migrant population covered by the GP registration data was shown to be most comparable with the TIM estimates but inconsistencies in registrations in London and elsewhere were identified as a barrier to the more general application of the administrative data. The profiles in Figure 6 confirm that although TIM statistics and GP registrations are concep-
tually different, the general level and trend in their data for both England and London are quite consistent. This suggests that GP registration statistics could provide a useful comparison to the level of long-term immigration estimated by the TIM process. Figure 7 compares aggregate TIM statistics and GP registration data for each of the English regions for the period 2002-2006. One might expect general consistency between the two datasets at a regional level but the graph illustrates that there are significant differences evident, particularly for Yorkshire and the Humber, the South West and the West Midlands. In the South West and Yorkshire and the Humber TIM estimates of immigration were, in aggregate, over 20% higher than the total number of GP registrations in the corresponding period. In the West Midlands they were 14% lower. For Yorkshire and the Humber and the West Midlands the differences between the two sources of immigration statistics over time are illustrated in Figure 8. Since 2004, the TIM estimates for the West Midlands have diverged from the total number of GP registrations, following consistency in previous years. This could be due to a large number of short-term migrants registering with
Figure 7. GP registrations and TIM estimates compared, 2002-2006, GOR. Sources: TIM estimate of Accession migrants (ONS, 2008b); GP registration statistics supplied by ONS.
124
A New Migrant Databank
a GP and not appearing in TIM estimates. It may also indicate a degree of error in the sub-national allocation of TIM immigration estimates. In Yorkshire and the Humber the reverse is true with TIM estimates exceeding GP registrations by an average of 9,200 over the 2002-2006 time-series. This again may be due to either non-registration of new migrants with a GP or a potential inaccuracy in the allocation of TIM flows (and possibly a combination of the two). At a local authority level the variation in the pattern and trend in immigration between the data sources becomes more erratic but important differences are evident that can help to inform more accurate estimation. Figure 9 illustrates immigration profiles for three sample local authority areas where differences between sources are particularly significant: an urban district in Yorkshire and the Humber, a rural county in the West Midlands and a London Borough. For the urban district in Yorkshire and the Humber, GP registrations remain higher than total NINo registrations throughout the period, suggesting that short-term migration may be less of an issue in this district
and reflecting the presence of a large number of international students. What is more significant is the large difference between the administrative data sources and the TIM estimate, which in 2006, is approximately 4,000 higher than the equivalent total for GP registrations. This would suggest that the TIM estimate has over-estimated the impact of immigration upon this local authority district. Again this is emphasised by the ‘share of region’ graph, with 28% of the region’s TIM immigration flow allocated to this area, compared to only 22% of GP registrations and 24% of non-Accession NINo registrations. The rural county in the West Midlands has experienced a significant inflow of migrants, with the total NINo registrations peaking at almost 3,000 in 2006. GP registrations have also risen, suggesting that an increasing number of migrants have chosen an extended stay. TIM estimates are consistently below the GP registration total, suggesting that, in this case, there may be a slight undercount in the impact of long-term immigration to the county. For the London Borough there is reasonable consistency in the trend between the GP registra-
Figure 8. GP registrations and TIM estimates compared, 2002-2006, West Midlands and Yorkshire and the Humber. Sources: 100% data extract from the National Insurance Recording System (NIRS): 2006 Mid-year estimates (ONS, 2007a); GP registration statistics supplied by ONS.
125
A New Migrant Databank
Figure 9. Migration profiles for selected local authority areas. Sources: 100% data extract from the National Insurance Recording System (NIRS): 2006 Mid-year estimates (ONS, 2007a); 2006-based SNPP, current data (ONS, 2008c); GP registration statistics supplied by ONS; Workers Registration Scheme (Home Office, 2008c).
tions and the non-Accession NINo registrations, and the spike in the total NINo is similar to the national trend. However, the trend in the TIM estimate is inconsistent, falling below the totals for each of the other datasets from 2004-2006.
126
This suggests that the TIM estimate may be undercounting the true impact of long-term migration to the Borough. This is further emphasised by the ‘share of region’ graph, which shows that 6.2% of the London region’s GP registrations in 2006 were
A New Migrant Databank
in the Borough, compared to approximately 4% of the total TIM immigration for the same period. These analyses are not a definitive assessment of the accuracy of TIM statistics but they do provide an illustration of how the use of data from a number of different sources, regardless of their conceptual and definitional differences, can add significant intelligence to the estimation of local area migration statistics. It is also evident from this preliminary analysis that there are likely to be ‘clusters’ of local authority areas, which demonstrate similar patterns and trends in their respective migration profiles and similar differences between the migration counts derived from alternative sources: a topic for further research.
Can Alternative Sources Help to Better Understand the Impact of Short-Term Migration? Throughout the UK there has been considerable debate concerning the impact of short-term migrants; those whose stay in the UK is typically less than twelve months duration. TIM statistics, a key component of population estimation for local authority areas, include only migrants whose duration of stay in the UK exceeds twelve months. They exclude short-term migrants. Since 2004, the IPS has captured additional information on short-term migrants: those individuals whose intended length of stay is between three and twelve months. Using this data, ONS has produced its first set of experimental statistics on short-term migrants (ONS, 2008c). Based on exit surveys, the IPS estimates that, in 2005, 335,000 migrants came to England and Wales for 3-12 months (Figure 10). Of this total, approximately 90,000 were for employment in the UK, 70,000 were for study and the remainder were for other reasons such as holiday or visiting friends and relatives. The average length of stay of migrant workers was approximately five months. Figure 11 provides an alternative estimate of the short-term inflow of Accession migrants
to England and Wales and to London during the three year period following EU expansion in 2004. The estimates are derived using a combination of NINo registrations data, WRS estimates on dependants, TIM estimates of Accession migrant numbers and ONS evidence on the average length of stay of short-term migrants. The consistency in the trend between non-Accession NINo registrations and GP registrations in London suggests that the duration of stay of non-Accession migrants is typically longer-term. For this analysis of short-term-migration, it has been assumed that non-Accession migrants are captured by the TIM statistics as part of the long-term inflow. Accession migrants typically have a more indeterminate length of stay, although some will be captured in the long-term migrant counts. In 2006-07, the total number of NINo registrations to Accession migrants in England and Wales was approximately 274,000. NINO statistics are a count of migrant workers; they do not capture data on dependents. The WRS estimates that Accession migrants’ dependents account for a further 15% of the total migrant inflow.The NINo registrations have been factored accordingly to derive a total estimate of 315,000 Accession migrants to England and Wales during 2006-07. TIM immigration statistics for 2006 suggest that 92,000 of the total number of long-term migrants arriving in England and Wales were from Accession countries. Excluding these flows from the total NINo registrations gives an estimated number of short-term flows in 2006-07 of 223,000. Figure 10. Short-term migration inflows, England and Wales, 2005. Source: ONS (2008d)
127
A New Migrant Databank
Figure 11. Table of short-term migrant estimation, accession migrants
The ONS statistics from the IPS suggested that the average length of stay of short-term migrants was approximately five months. Applying this average duration of stay to the short-term flows produces an estimate of short-term migrant ‘stock’ of 93,000 for England and Wales in 2006-07. ONS has used the terminology ‘long-term migrant equivalent’ (LTME) to describe this stock estimate: on average, there were 93,000 LTME Accession migrants resident in England and Wales during 2006-07. Similar statistics have been estimated for London, although the TIM estimate of long-term Accession migrants is based on the England Wales proportion, in the absence of any more geographically disaggregate information. In 2006-07, a total of 57,000 short-term flows from Accession countries were estimated for London. This equates to a LTME of 24,000, assuming a five-month average duration of stay. A summary of the estimates for 2006-07 is provided in Figure 12.
128
Once again the methodology used for generating these estimates of short-term migrants is not perfect and there are a number of issues to note. Time-periods present a particular challenge, aligning the period of data capture for each source. In addition, students from Accession countries who do not register for a NINo are excluded from the short-term estimates. Finally, the absence of more disaggregate TIM statistics on the citizenship of migrants makes the estimation of short-term flows and stock at a local authority district level more problematic. However, in the absence of definitive statistics on international migration, these illustrations demonstrate how a combination of sources can derive additional intelligence that is not available when using each data source in isolation.
A New Migrant Databank
Figure 12. Accession migrants: Short-term estimates, London and England and Wales, 2006/07. Source: TIM estimate of Accession migrants (ONS, 2008b).
Can Alternative Sources Help to Estimate the Changing Impact of Migration by Ethnic Group? The analyses presented above highlight some of the difficulties of accurately measuring international migration at a local level but demonstrate that a combination of sources can serve to highlight inconsistencies in existing estimates whilst not necessarily providing a more accurate alternative. The estimation of migration by ethnic group is even more problematic with only the 2001 Census capturing an ethnic dimension in its migration statistics. An additional facet of this research project has sought to use evidence from the DWP’s NINo registration statistics to derive alternative ethnic profiles for immigration to each local authority. This has been done by applying an ethnic group to each NINo registration based upon the country of origin of the registrant. A commissioned Census table (Table C0880) has provided a link between ethnic group (16 main groups) and country of origin. Combining the two sources has produced an aggregation of NINo registrations by ethnic group for each local authority (Figure 13). There are a number of issues associated with the more general application of these NINo ethnic
profiles and their comparability with equivalent census statistics: •
•
•
•
Ethnicity is typically a ‘self-reported’ classification; in this analysis ethnicity has been assigned based upon historical evidence from previous immigration flows by country-of-origin; NINo statistics provide migrant worker registrations and not a count of all immigration flows. They will exclude students who do not register for work. The vast majority of Accession migrants will be classified in the ‘white’ ethnic group; NINo statistics do not take account of ‘White-British’ migrants who do not require a NINo registration; and NINo registrations are associated with migrants whose length of stay is indeterminate.
However, the derived ethnic profile provide a basis for further research into the comparison with census profiles and will be used to inform the estimation of updated ethnic migration profiles for local areas that will underpin the new ethnic population projection methodology.
129
A New Migrant Databank
Figure 13. Example ethnic profiles of NINo registrations for selected local authority areas 2006/07. Source: 100% data extract from the National Insurance Recording System (NIRS).
fUTURE DEVELOPMENT The creation of the NMD has provided a focus for the analysis of international migration at a sub-national level, highlighting the issues and difficulties associated with the derivation of accurate estimates of immigration and emigration. Further research is required to establish the most appropriate methodology for producing migration
130
flows by ethnic group for each local authority. Although conceptual differences exist between alternative data sources, the analysis presented here suggests that the GP registration data, in particular, could provide additional intelligence to the process of migration and population estimation. Further research into the differences that exist across all local authorities is necessary to establish how the allocation of TIM immigration estimates to local areas would be affected by more direct use of the GP registration data. The NMD will be subject to regular update, integrating new statistics as they are published. There is also scope to extend its content through the integration of data from the National Pupil Dataset, Work Permit Statistics and associated information from the new Points Based System. In addition, the DWP’s NIRS2 labour market database, in combination with linked data from HMRC, is an untapped resource on migration statistics that could potentially provide new intelligence on duration of stay in the UK using information on NI contributions and receipt of benefits. Given the large volume of research and analysis that has been completed on international migration since 2004, the NMD potentially provides a convenient source of intelligence to a wide range of organisations which continue to rely on ‘snapshot’ surveys and studies to understand a migration process which is subject to rapid change. In 2008, the economic outlook for the UK remains unclear and there is evidence that the peak inflow of migrants from Accession states has been reached. Economic development in central and eastern Europe and the relaxation of barriers to movement in Germany and France, could significantly change the dynamics of European labour migration. The requirement for more accurate and timely statistics on local populations has never been greater. The NMD provides a simple but effective mechanism for delivering a consistent picture on international migration based on a range of statistical evidence.
A New Migrant Databank
ACKNOWLEDGMENT The authors gratefully acknowledge the support of ESRC through the UPTAP project award RES-163-25-003 entitled ‘What happens when international migrants settle’.
REfERENCES Bauere, V., Densham, P., Millar, J., & Salt, J. (2007). Migration from Central and Eastern Europe: local geographies. Population Trends, 129, 7–20. Home Office. (2006) A Points Based System: Making Migration Work for Britain, Command Paper. http://www.homeoffice.gov.uk/documents/ command-points-based-migration?view=Binary Home Office. (2008a) Asylum Statistics, Quarter 1 2008.http://www.homeoffice.gov.uk/rds/pdfs08/ asylumq108.pdf Home Office. (2008b) Persons Granted British citizenship, United Kingdom, 2007. http://www. homeoffice.gov.uk/rds/pdfs08/hosb0508.pdf Home Office. (2008c) Accession Monitoring Report May 2004-March 2008, A8 Countries, A joint online report between the Border and Immigration Agency, Department for Work and Pensions, HM Revenue and Customs and Communities and Local Government. http://www.bia.homeoffice.gov. uk/sitecontent/documents/aboutus/reports/accession_monitoring_report/report15/may04mar08. pdf?view=Binary House of Commons Treasury Committee. (2008) Counting the Population, Eleventh Report of Session 2007-08. HC 183-1. http://www.publications.parliament.uk/pa/cm200708/cmselect/ cmtreasy/183/183.pdf
House of Lords Select Committee on Economic Affairs. (2008a) The Economic Impact of Immigration Volume I: Report, 1st Report of Session 200708, HL Paper 82-I. www.publications.parliament. uk/pa/ld200708/ldselect/ldeconaf/82/82.pdf House of Lords Select Committee on Economic Affairs. (2008b) The Economic Impact of Immigration Volume II: Evidence, 1st Report of Session 2007-08, HL Paper 82-II. http://www. publications.parliament.uk/pa/ld200708/ldselect/ ldeconaf/82/82ii.pdf ONS. (2007a) 2006 Mid-year Estimates. http://www.statistics.gov.uk/statbase/Product. asp?vlnk=601&More=N ONS. (2007b) Improved Methods for Estimating International Migration – Geographical Distribution of Estimates of In-migration. http://www. statistics.gov.uk/downloads/theme_population/ Geog_distn_in-migs.pdf ONS. (2007c) Improved Methods for Estimating International Migration – Geographical Distribution of Estimates of Out-migration.http://www. statistics.gov.uk/downloads/theme_population/ Geog_distn_out-migs.pdf ONS. (2007d) Update on the Development of the Integrated Household Survey. http://www. ccsr.ac.uk/esds/events/2007-03-29/ihs/slides/ bennett.ppt ONS. (2008a) National Population Projections: 2006-based. Series PP No 26. http://www. statistics.gov.uk/downloads/theme_population/ pp2no26.pdf ONS. (2008b) International Migration. Series MN No 33, 2006 Data. http://www.statistics.gov.uk/ downloads/theme_population/MN33.pdf ONS. (2008c) 2004-based SNPP, Current Data. http://www.statistics.gov.uk/STATBASE/Product.asp?vlnk=997
131
A New Migrant Databank
ONS. (2008d) Updated Short-Term Migration Estimates, mid-2004 and mid-2005. http://www. statistics.gov.uk/about/data/methodology/specific/population/future/imps/updates/downloads/ STM_Update.pdf Rees, P., & Boden, P. (2006) Estimating London’s New Migrant Ppopulation: Stage 1 – Review of Methodology, A Report commissioned by the Greater London Authority for the Mayor of London. http://www.london.gov.uk/mayor/refugees/ docs/nm-pop.pdf
132
Statistics Commission. (2007) Foreign Workers in the UK,Statistics Commission Briefing Note. http://www.statscom.org.uk/C_1237.aspx United Nations Statistics Division. (2006) Statistics and Statistical Methods Publications. http:// unstats.un.org/unsd/pubs/gesgrid.asp?ID=116
133
Chapter 7
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies Paul Norman University of Leeds, UK Paul Boyle University of St. Andrews, UK
ABSTRACT In this chapter we describe the Samples of Anonymised Records (SARs) and Longitudinal Studies (LSs). The SARs are cross-sectional data like the area and interaction data, but the LSs track people over time. These datasets differ from the United Kingdom’s other census outputs being individual-level ‘microdata’ and population samples. The microdata files are very versatile, allowing multi-way crosstabulations and statistical techniques and enabling application-relevant re-coded variables and study populations to be defined. The SARs files offer UK coverage although a UK-wide study is challenging because data for each country may be in separate files with different access arrangements and variable detail may be country specific. The Office for National Statistics (ONS) Longitudinal Study for England and Wales has underpinned a wide range of research since the 1970s. This well-established source is now complemented by longitudinal data for Scotland and Northern Ireland. Largely driven by the need to ensure respondent confidentiality, the SARs and LSs have some drawbacks for migration-related research. In addition to stringent access arrangements, the geographical area to which individuals are located in the SARs tend to be coarse and although the LS databases record the small area in which the LS member was living at each census, specific ‘place’ information is unlikely to be considered non-disclosive unless for large geographies. However, generic, contextual information about the ‘space’ in which people live DOI: 10.4018/978-1-61520-755-8.ch007
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
is useful even though actual places are not identified. Whilst the SARs and LSs are samples, they are, however, very large samples in comparison with other national surveys and represent first rate resources to complement other sources. In the course of this chapter, along with other references to SARs and LSbased migration research, we review work which utilised these sources to investigate inter-relationships between health, deprivation and migration. The SARs data show that migration is health-selective by age and distance moved and that those persons living in the public housing tenure who are moving into or within deprived areas are most likely to be ill. The role of migration in changing health inequalities between differently deprived areas can be explored using longitudinal data on both origins and destinations. The ONS LS reveals that migrants into and between the least deprived areas have better health than non-migrants, but migrants into and between the most deprived areas have the worst health. The effect of these changes has been to increase the inequality in health between differently deprived areas. A sorting, largely driven by selective migration occurs.
INTRODUCTION As described in Chapter 1, internal migration is captured in the census from the question that asks what the respondent’s usual address was one year ago, with international migration also indicated by the question that asks for the respondent’s country of birth. These are the sources for the migration information provided in census data outputs, including the standard area tables and the Special Migration Statistics (SMS). Both of these data sources provide migration (and other) data, aggregated into different administrative or census-based geographies. The standard tables of data that are released generally include one variable, or a small number of cross-tabulated variables, allowing counts of certain types of migrants to be determined. While the Census Offices carry out extensive consultations prior to each census to determine demand for the content and detail within these tables, they are limited in the amount of information that is provided about the migrant’s demographic and socio-economic characteristics. The pre-defined cross-tabulations may not provide the information for a particular analysis and the types of statistical modelling which can be applied to aggregate data can be restrictive (Marsh, 1993; Norman, 2003). Although tables can be commissioned from the
134
National Statistics Agencies, this route to data access is potentially time-consuming and costly, and the number of variables in the output would still be limited. These restrictions are not apparent in the Samples of Anonymised Records (SARs), the use of which is described here. These datasets include individual-level records for samples of either individuals or groups of individuals within households. Because these include most of the census variables collected about each individual, they allow analysis of a wide variety of different demographic and socio-economic factors at any one time; much is therefore known about the characteristics of migrants and non-migrants. Having individual-level detail allows the user to define custom multi-dimensional tabulations, to derive information using combinations of variables, to recode variable categories to be applicationrelevant; to extract custom study populations and to carry out sophisticated statistical analyses. Known as ‘microdata’, these individual records cannot be supplied for 100% of the population for confidentiality reasons, so samples are extracted which maintain the anonymity of respondents. As we shall see below, while the SARs have the advantage of relatively complete individual characteristics, these files lack detailed geography so that only crude flow information can be
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
derived. Like virtually all census outputs they also tell us little about the characteristics of migrants prior to the move, because retrospective information is not collected in the census (except for the migration question itself). In contrast, the Longitudinal Studies (LSs) are census-based outputs which provide continuous, multi-cohort samples of the population, allowing people to be tracked over time. The principal aim of the LSs is to link people’s census records at successive censuses. These datasets have the additional advantage that migrants can also be identified by comparing the respondent’s addresses at the time of a census with their address at the time of the previous Census ten years before. Below we describe the SARs and LSs and their usefulness for migration studies in more detail. Then, following a brief review of the relevant literature, we describe a variety of studies which have used these sources to investigate the relationships between health, migration and area deprivation.
relationships can be explored through variables which have been derived to provide information on household composition and life stage. The SARs are similar to data obtained from a sample survey but are a random sample of the population captured in the Census rather than being a geographically clustered sample as is often the case with surveys. Moreover, the sample size is much larger than most surveys thereby permitting analyses of small population sub-groups, including residents in communal establishments, a group usually missed in surveys. The SARs therefore provide a cross-sectional snapshot of the population at the time of the census and the range of datasets includes:
Individual SARs •
DESCRIPTION Of DATA SOURCES Samples of Anonymised Records The SARs are individual-level data extracted from the 1991 and 2001 Censuses, released as either individual or household files. The SARs cover the full range of census topics including housing, education, health, transport to work, employment and ethnicity (SARs, 2004); the emphasis is on socio-demographic detail with sub-national geographical information rather coarse at best. The files have a separate record for each individual including data which has close equivalence to the information respondents entered on their Census questionnaires. The files in the Individual SARs include summary information about each person’s living arrangements. The Household SAR files allow linkage between household and family members so that
•
The 1991 Individual SAR is a 2% sample containing over 1.1 million records on a full range of census topics. Individuals, in both households and communal establishments, are linked to geographical zones including their area of usual residence on census night. Geographical information is available down to local authority level with small local authorities grouped together. There are two files, one for England, Wales and Scotland and one for Northern Ireland. The 2001 Individual Licensed SAR is a 3% sample containing over 1.8 million records on all census topics similar to the above. In this file, geographical information is available down to Government Office Region within England plus Wales, Scotland and Northern Ireland. There is one file covering all of the UK.
Household SARs •
The 1991 Household SAR is a 1% sample of the population containing data about 216,000 households and 500,000 people
135
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
•
living within those households. There are two files, one for England, Wales and Scotland with a regional geography and one for Northern Ireland. The 2001 Census Special Licensed Household Sample of Anonymised Records is a 1% sample of the population with information on over 200,000 households and 500,000 household members. With an England and Wales coverage, this file contains no subnational geography and is subject to special access arrangements. Due to increased disclosure risk, users wanting more detail for England and Wales or household information for Scotland and Northern Ireland will need to use the Controlled Access Microdata Samples (see below).
Small Area Microdata •
The 2001 Small Area Microdata (SAM) is a 5% sample of the UK’s population containing nearly 3 million individual records. The lowest level of geography in the SAM are local authorities (and equivalents) in England, Wales and Scotland and Parliamentary Constituencies in Northern Ireland. Even though small LAs have been merged, because more geographical detail is in the SAM than the 2001 Individual SAR, the amount of individual demographic detail in the SAM is somewhat restricted. One file covers all the UK.
Controlled Access Microdata Samples •
136
If the above files for 2001 do not provide sufficient variable information, on application, more detailed versions of both the Individual and Household SARs are available for analysis within a safe setting at the Office for National Statistics (ONS). These
files include a local authority geography and have a UK coverage. Regarding data access and advice, for UK academics there is an ESRC-funded support team for the SARs based at the Cathie Marsh Centre For Census And Survey Research (CCSR), University of Manchester (http://www.ccsr.ac.uk/sars/). It is recommended that researchers read Dale et al. (2000), an invaluable resource about census microdata; Marsh (1993), Middleton (1995), Dale (1998) or Dale and Teague (2002) about the establishment and use of the SARs; and Norman (2003) on the differences between aggregate, area-level data and individual-level microdata.
Migration-Related Variables in the SARs Most of the variables are similar across the different SARs. Table 1 gives examples of migrationrelated variables available in the 1991 and 2001 Individual SARs. These variables are divided here into those relating to the migration event itself, the migrant’s origin and the migrant’s destination. The variable detail in Table 1 is slightly simplified for ease of presentation; the SARs documentation provides full information on the variables in the Individual and all other SAR files. In both the 1991 and 2001 SARs, whether or not somebody has changed address in the year before the census is identified in the ‘Distance moved’ variable which is divided into kilometre bands. This allows investigation of distance decay effects (since the majority of migration events are expected to be over short distances). In 2001, there is also a derived variable which defines whether the move in the year before census has been within the respondent’s local authority district or between districts but within regions or for other combinations of administrative areas. The Household SARs also have an indicator on whether or not the individual was part of a wholly
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Table 1. Migration-related variables available in the 1991 and 2001 samples of anonymised records Migration event-related (a) 1991 Individual SAR
Migrant origin
Migrant destination
Distance moved
Region of origin
Area of residence (large and aggregated smaller local authorities)
Not stated or not applicable
Not stated
City of London & City of Westminster
0-4 km
North
Camden
5-9 km
Yorks and Humb.
Hackney
10-14 km
East Midlands
Hammersmith & Fulham
15-19 km
East Anglia
Haringey
20-29 km
Inner London
Islington
30-39 km
Outer London
Kensington & Chelsea
40-49 km
Rest of South East
Lambeth
50-59 km
South West
Lewisham
60-79 km
West Midlands
Newham
80-99 km
North West
Southwark
100-149 km
Wales
Tower Hamlets
150-199 km
Scotland
Wandsworth
200 km and over
Outside GB
and other locations
Country of birth
Type of area (GB Profiles geodemographic classification)
England
High LLTI, retired pensioners, council housing, no car
Scotland
Outright-owners, detached housing
Wales
Semi-detached, privately owned with mortgages, high car ownership, families, professional jobs
Northern Ireland
Elderly, retired home-owners
Irish Republic
Asian, high unemployment, overcrowded, terraced housing
Australia
Small, semi-detached council housing
Canada
Terraced housing, Council or Housing Association, couples without children
New Zealand
and other area types
From outside GB
Kenya and other countries
(b) 2001 Individual SAR
Migration event-related
Migrant origin
Migrant destination
Distance moved
Region of origin
Region of usual residence
continued on the following page
137
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Table 1. continued Migration event-related
Migrant origin
Not applicable
Not applicable, et cetera
North East
0-2 km
North East
North West
3-4 km
North West
Yorkshire and the Humber
5-6 km
Yorkshire and the Humber
East Midlands
Migrant destination
7-9 km
East Midlands
West Midlands
10-14 km
West Midlands
East of England
15-19 km
East of England
South East
20-29 km
South East
South West
30-49 km
South West
Inner London
50-99 km
Inner London
Outer London
100-149 km
Outer London
Scotland
150 - 199km
Scotland
Wales
200 + km
Wales
Northern Ireland
From outside UK
Northern Ireland From outside UK
Migration indicator
Country of Birth
Not applicable, et cetera
England
No usual address one year ago
Scotland
Move within a Local Authority District area
Republic of Ireland
Move between Local Authority District areas but within region
N Ireland
Move between regions
Wales
Move between countries but within UK
W. Europe E. Europe India Pakistan and Bangladesh Rest of Asia Caribbe an N America Africa Other
continued on the following page
138
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Table 1. continued
(c) 1991 Household SAR
Migration event-related
Migrant origin
Migrant destination
Migration event-related
Migrant origin
Migrant destination
Distance moved
Region of origin
Region of usual residence
Not stated or not applicable
Not available
North
0-4 km
Yorkshire and Humberside
5-9 km
East Midlands
10-14 km
East Anglia
15-19 km
Inner London
20-29 km
Outer London
30-39 km
Rest of South East
40-49 km
South West
50-59 km
West Midlands
60-79 km
North West
80-99 km
Wales
100-149 km
Scotland
150-199 km 200 km and over From outside GB
Wholly moving household
Country of birth
Type of area (ONS classification)
Yes
England
Rural Areas
No
Scotland
Industrial & manufacturing towns
Wales
Purpose-built, inner city estates
Northern Ireland
Established owner-occupiers
Irish Republic
Metropolitan professionals
Australia
Deprived city areas
Canada
Lower status owner occupiers
New Zealand
Mature populations
Kenya
Deprived industrial areas
and other countries
and other area types
Note: The variable information in this table is not comprehensive. Users should consult the SARs documentation for full details.
moving household; information invaluable to studies of household formation and dissolution (see, for example, Al-Hamad et al., 1997; Hayes and Al-Hamad, 1997; 1999).
Two variables provide information on migrant origins. The ‘Region of origin’ indicates, for 1991, the Registrar General’s Standard Region in which the respondent was living a year before the
139
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
census. In 2001, equivalent information is available by the Government Office Regions (GOR) (the administrative geography which superseded the Standard Regions) and the UK’s constituent countries. A life-time approach to migration can be undertaken by using the ‘Country of birth’ variable. In 1991, a total of 42 countries and combinations were available but in 2001 the detail is much reduced with just eight specific and six broader categories of countries of birth given. The distance moved and origin variables allow subnational and international migrants to be distinguished. Migrant destinations correspond to the respondent’s area of residence at the time of the Census. In the 1991 Individual SAR, the geographies available relate to large (or aggregated smaller) local authorities or the standard regions. In the 2001 Individual SAR, the geographical information is more limited with just the GOR/UK country of residence available. Users with a greater focus on migration in relation to specific places in 2001 would be advised to use the Small Area Microdata (SAM) which, similar to 1991, is based on a local authority geography. The 1991 Individual SAR includes a geodemographic classification ‘GB Profiles’. This provides a label which describes the distinctive socio-demographic area characteristics of the census enumeration district (ED) in which each respondent was living. This information can therefore be used to investigate the types of areas to which people moved but no information is provided on their origin area type. Similarly, for residential areas/destinations, the 1991 Household SAR includes a ward level area type (Table 1c) (which will be discussed in the research examples below). More details on these classifications in relation to the SARs are available in Dale and Teague (2002). The SARs are cross-sectional data and provide migration related information for the period 199091 and 2000-01. Given the lack of geographical detail, particularly about migrant origins, a cross-
140
tabulation based on origins and destinations is uninformative in comparison with what can be achieved using the dedicated interaction Special Migration Statistics, especially given that the SARs are a sample. Where the SARs start to come into their own is when you are interested in exploring the detailed characteristics of movers and non-movers.
Longitudinal Studies There are three Longitudinal Studies (LSs) in the UK which have been created by merging census records with other routinely collected administrative data. Like the SARs, these LS sources contain a microdata sample about individuals; the fundamental difference being that the LSs are continuous, multi-cohort samples of the population, tracking people over time whereas the SARs are cross-sectional datasets. The ONS LS for England and Wales was the first of these longitudinal studies to be established. It contains a complete set of Census records for individuals, linked between successive censuses. The ONS LS sample comprises people in England and Wales born on one of four undisclosed dates of birth and thus represents around 1% of the population. The first sample was extracted after the 1971 Census and the four birth dates have been used to update the sample at the 1981, 1991 and 2001 Censuses. New LS members are added to the study through births and immigration and existing sample members leave through death and emigration. At any one census, the ONS LS includes microdata on over 500,000 sample members with around 300,000 people linked between successive censuses. In addition to their census records, each sample member’s LS record includes data about a variety of other demographic events collected from other sources, such as their death, births to sample mothers and cancer registrations. The Scottish Longitudinal Study (SLS) is a similar large-scale linkage study created by using data available from the Census and other
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
administrative sources (Boyle et al., 2008). The SLS is based on 20 birth dates making it a 5.3% representative sample of the Scottish population, although the overall sample size is smaller than the LS at around 270,000. The SLS began with data from the 1991 Census, rather than 1971 as in the LS. Unlike the LS, though, the SLS includes linkages to hospital admissions data and marriage events. The Northern Ireland Longitudinal Study (NILS) is the third large-scale data linkage study created by linking census and administrative data. This currently includes information from the 2001 Census and subsequent linkages to vital events and health registration datasets. Currently, there is no link back to 1991. NILS members are selected on 104 annual birth dates so that around 500,000 people are included. In each of the UK’s longitudinal studies, census information is also included for all people living in the same household as the LS member, but these other household members are not followed up to the next census, nor linked to other administrative data, unless they also happen to have one of the correct dates of birth. The residential location of each person is also recorded which means that contextual data about their location (e.g. urban/ rural; level of area deprivation) can be added and included in analyses. There is potential for future research to link the three LS datasets for a UKwide study and the four birth dates used for the England and Wales study are also among those used for the SLS and NILS. The high level of personal and geographical detail contained in the LS, SLS and NILS necessitates that the data are administered so that each LS member’s information is kept confidential. As a result the data are kept in secure settings and researchers wishing to use each LS must make an application for access. Analyses are either carried out remotely by submitting computer command files which automate statistical procedures or in person within a ‘safe setting’ at ONS, the General Register Office For Scotland (GROS) or at the
Northern Ireland Statistics And Research Agency (NISRA). Outputs of analyses are inspected prior to results being passed back to the researcher to check that small counts do not risk breaching an individual’s identity. UK academics are supported in their use of the ONS LS for England and Wales by the ESRC-funded Centre For Longitudinal Study Information And User Support (CeLSIUS) whose website (http://celsius.census.ac.uk/) has invaluable information about the LS and the application process and provides an online data dictionary with variable definitions. Further information on the ONS LS can be found in Dale et al. (1993), Hattersley and Creeser (1995), Creeser et al. (2002) and Blackwell et al. (2005). Non-academic users and academic users from outside the UK should apply direct to ONS. Researchers interested in Scotland and Northern Ireland should contact the Longitudinal Studies Centre – Scotland (LSCS) (http://www.lscs.ac.uk/) and the Northern Ireland Longitudinal Study (NILS) (http://www.nisra.gov. uk/nils/default.asp.htm). Each of these support units also provides a ‘Data Dictionary’ through which users can search for the availability of variables and detail on categories of information included.
Migration-Related Variables in the LSs A major advantage of the LSs is the information contained on migration. The cross-sectional data included in the SARs is limiting because we know little about a migrant’s characteristics prior to the move. We cannot say whether unemployed individuals are likely to move, as employment status is only recorded after the move on the Census day. Individual-level information linked over time in a longitudinal dataset enables this ‘before and after’ situation to be investigated. Here the focus is on the ONS LS for England and Wales but much will be applicable to the SLS and NILS. The migration indicators captured in the SARs are also available in the LSs. However,
141
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
in addition migration histories can be built up by comparing the person’s location at one Census with their location at a subsequent Census. Using the example of the ONS LS which has the longest period of follow-up, Figure 1 shows that if an LS member were alive before 1966 and survived until the 2001 Census then their migration history can be constructed in relation to five years before the 1971 Census (since the address from the 1966 Census was compared to the address at the 1971 Census), and then one year before each census and at the time of the census itself. In practice, many studies rely on comparing addresses at the time of each census, the time at which other demographic and socio-economic data are collected, and make less use of the one-year question because of the lack of individual characteristics in 1970, 1980, 1990 and 2000. In the NILS, migration events can be identified as in the LS and SLS but, in addition, moves can also be identified from changes in the address information in six monthly downloads of health registration data. If the operational decision is to investigate intercensal change then the geography of origin and destination need to be chosen. The LS has numerous geographies from which to select. The complex history of administrative, electoral and health geographies to which census geographies are often aligned means that: a geography may not exist from one census to the next; boundary change for the same geography may occur; and even if an area has the same name, the areal extent may be different (Norman et al., 2003). In longitudinal data, a person may apparently be living in a different place, but checks need to be made to check whether they are a migrant. In the SLS Figure 1. Time-line of potential migration event indicators in the ONS longitudinal study
142
the ‘boundary change’ problem is alleviated to some extent because each SLS member’s record is linked to ‘Consistent Areas Through Time’ (CATTs) which are a consistent set of areas which span the 1981, 1991 and 2001 Censuses (Exeter et al. 2005).
INVESTIGATING INTERRELATIONSHIPS BETWEEN HEALTH, DEPRIVATION AND MIGRATION USING THE SARS AND THE ONS LS The propensity to migrate depends on factors such as age, ethnicity, housing tenure, socioeconomic position and educational attainment and migrants are a select, non-random group of both the origin populations they leave and the destination populations they join. Those with higher socioeconomic status and educational achievement tend to migrate further, more often, for different reasons and to different areas than those of lower social class or educational level (Boyle et al., 1998; Buck et al., 1994; Champion et al., 1998). Since migrants differ from others in their motivation, personal circumstances and sociodemographic characteristics, the migration process is likely to have different effects on the age-sex structure and aggregate socioeconomic characteristics of migrant’s origin and destination areas. Place characteristics have long been acknowledged as important determinants of migration (Walters, 2000) and factors that potentially ‘push’ or ‘pull’ migrants between different places vary with age and stage in the life course (Boyle et al., 1998; Champion et al., 1998). Overall, we might expect most potential migrants to be keen to move away from more deprived areas towards less deprived areas, but that this is more likely to be achieved by those with higher socio-economic standing. On the other hand, we might expect those in poorer circumstances to be more likely to be forced to make moves down the deprivation hierarchy.
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Health can be regarded as an enabling factor in the migration process with the majority of migrants being young and healthy (Bentham, 1988; Findley, 1988). However, the relationship between health and migration varies by age. For the more elderly, poor health may make them more likely to change their residential location, with moves into formal or informal care settings. This raises the interesting question of whether health-selective mobility influences geographical variations in health. The notion that ‘health-selective migration’ can contribute to an increase or decrease in place-specific rates of illness is not new (Farr, 1864; Welton, 1872) but, until the 1990s, there was little empirical evidence on the effects of selective migration on geographical variations in health outcomes (Brimblecombe et al., 2000; Verheij, 1998). Given the strong relationship between health and deprivation (see, for example, Senior et al., 2000; Boyle et al., 2001; Boyle et al., 2004; Norman and Bambra, 2007) and the selectivity of the migration process with respect to both health and area type noted above, the need to understand the inter-relationships has been highlighted (Bentham, 1988; Gatrell, 2002; Boyle 2004) and these prompts provided the motivation for the case studies reported below.
Health-Selective Migration by Distance Moved and Destination Type Parallel projects by Boyle, Norman and Rees (2002) for Scotland, and Norman (2002) for England and Wales investigated the selectivity of the migration process in relation to age, distance moved and level of deprivation. Both of these studies used the 1991 SARs. Here we focus on the study carried out for Scotland in which Boyle et al. (2002) utilised individual-level data extracted from the 1991 Household SAR. The study population comprised 48,246 individuals resident in Scotland and excluded migrants from outside the UK who arrived in Scotland during
the previous year and those resident in communal establishments. The health outcome of interest in this study was self-reported limiting long-term illness (LLTI) based on answers to the 1991 Census question: ‘Do you have any long-term illness, health problem or handicap which limits your daily activities or the work you can do? Include problems which are due to old age.’ Note that the question was changed slightly in the 2001 Census to “Do you have any long-term illness, health problem or disability which limits your daily activities or the work you can do? Include problems which are due to old age.” This makes comparisons of self-reported illness through time more problematic. Using the ‘Distance moved’ variable, individuals were categorised into three groups based on their (non-) migrant status. The majority, 43,911, were nonmigrants, 3,029 moved a ‘short’ distance (less than 10 km) and 1,306 moved a ‘long’ distance (10 km or more) during the year before the census. The Household SAR was used because this dataset includes an 13 level geodemographic classification of wards (pseudo postcode sectors in Scotland). As the wards that fall in each of these categories are known, an average level of material deprivation could be ascribed to each of the 13 levels. The Carstairs Index deprivation score was used as this is a widely recognised geographical measure of material deprivation designed specifically for the Scottish context (Carstairs and Morris, 1989). The probability of individuals reporting LLTI was modelled using binary logistic regression. For more information on logistic regression, see Agresti (2002) or Dale et al. (2000). The explanatory variables included were: age-group, sex, social class, level of qualifications, ethnicity, economic status, marital status, car ownership, industry, family type, housing tenure and, of particular interest, migrant status. Results in this study are reported as probabilities of LLTI for the relevant explanatory variables, expressed as percentages.
143
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Figure 2. Modelled percentage reporting limiting long-term illness by age and distance moved. Source: Boyle et al. (2002) based on data extracted from the 1991 Household SARs
Figure 2 shows that for both migrants and non-migrants, the probability of reporting LLTI increases with age. Focusing on non-migrants and long-distance migrants, the results were as anticipated (Bentham, 1988; Findley; 1988) since for the three younger age-groups, non-migrants were more likely to be ill than long-distance migrants. Among the retired age group, however, the long-distance migrants were slightly more likely to be suffering LLTI than non-migrants. The results for short-distance migrants were less expected, though, since, when controlling for age, short distance migrants were more likely to report LLTI than both non-migrants and long-distance migrants. One explanation for this is that residents in public housing may be quite likely to migrate over short distances within a local authority area for health-related reasons. To investigate this further, a model was fitted comparing migrants and non-migrants among those 18,412 individuals resident in public housing in 1991. This model included the ONSCLASS area types, ranked left to right in Figure 3 by average level of deprivation. The results confirm that migrants in the public housing sector have higher probabilities of reporting LLTI than non-migrants in the nine more deprived area types, but had lower probabilities
144
in the four least deprived area types (except for the ‘Established owner occupier’ type where the percentages were virtually identical). This would suggest that within this tenure at least, ill migrants are more likely to move into or within deprived areas. Researchers interested in using the SARs to investigate migration and tenure will find Boyle (1995), Boyle and Shen (1997) and Boyle (1998) useful. While these findings tell us much about the health of migrants and non-migrants in different destinations, they do not inform us about the impact of migration on the changing geography of health (Norman, 2002). A drawback with the SARs as a data source for migration studies is the lack of geographical detail. The deprivation level of each area type is a useful device but it is only possible to consider this for destinations and not for origins. The impact of health-selective migration on area-based health inequalities cannot be fully determined without knowledge of the area type from which people moved. For short-distance migrants in public housing in relatively deprived Figure 3. Modelled percentage chance of reporting limiting long-term illness for residents in public housing. Source: Boyle et al. (2002) based on data extracted from the 1991 Household SARs
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
wards, the strong likelihood is that their origin was a similarly deprived ward, although this is not possible to confirm in the data.
Changing Relationship between Health and Deprivation: The Role of Migration In terms of health inequalities, Boyle (2004) suggests that the apparent widening mortality and morbidity gap may be influenced, at least to some extent, by population migration. Relating to the health-deprivation relationship, if healthier individuals are more likely to migrate away from deprived areas and less healthy individuals more likely to migrate into deprived areas, then the aggregate relationship between illness and deprivation may be exaggerated by the effects of migration (Norman, 2002). Rather than living in deprived areas causing poor health, some of those with poor health may end up living in deprived areas. A drawback with the SARs is that the impact of health-selective migration on areas cannot be determined without knowledge of the area from which people had moved. The microdata in the ONS LS includes whether members present at the 1991 or 2001 Censuses reported limiting long-term illness (LLTI) as well as mortality event data. In addition, the LS contains links to various aggregate-level data about the location in which each LS member was living at the time of the Censuses. As noted above, it is not possible to identify the particular small area in which individuals resided due to issues of confidentiality. However, since Carstairs deprivation scores (categorised into quintiles) are attached to individual records, it is feasible to track through time the journeys of LS members from their origins to destinations across ward-level deprivation ‘space’. Using this information, Norman et al. (2005) sought to determine if, over the 20-year period from 1971 to 1991, there had been any systematic sorting of healthy and unhealthy people across the gradient of deprivation.
The LS study sample involved a closed sample population present at the 1971, 1981 and 1991 Censuses. In line with previous LS studies (Sloggett and Joshi, 1998; Harding, 2003), international migrants and those who reported being permanently sick or disabled in 1971 or 1981 were excluded from the sample. Persons identified at a census as being resident in a communal establishment were also excluded since locations with care homes and long stay hospitals tend to receive exaggerated numbers of ill in-migrants (Bentham, 1988). The resulting dataset comprised 315,684 individuals. The study sample was differentiated by the deprivation quintile of their origin and destination ward and by whether they were subnational migrants or non-migrants over the 20 year period, 1971-1991. Migrants may or may not have moved between differently deprived areas. In addition, though, non-migrants may have lived in areas which became either more or less deprived in relation to other areas over time. The method used was to aggregate individuals into various origin-destination ‘transition’ categories between differently deprived areas and to calculate indirectly standardised illness ratios (SIRs) based on LLTI and indirectly standardised mortality ratios (SMRs) separately for migrants and non-migrants. Those SIRs and SMRs above 100 and below 100 represent worse and better health respectively than the national average for England and Wales. Mortality was an aggregate of events occurring after the 1991 Census and up to 1999. Figure 4 shows the SIRs for LLTI in 1991. Those persons who lived in the least deprived areas (Q1) in both 1971 and 1991 had the lowest SIRs and, although the difference is not significant, migrants were healthier than non-migrants. The pair of bars labelled ‘Q2-4 1971 Q1 1991’ represents the result for those persons whose area deprivation changed from more to the least deprived areas over the 20 year period. The pair of bars labelled ‘Q1 1971 Q2-4 1991’ represents the result for those persons whose deprivation
145
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Figure 4. Standardised illness ratios (1991) for migrants and non-migrants in seven origin-destination combinations. Source: Norman et al. (2005) based on data extracted from the ONS LS
Figure 5. Standardised illness ratios by deprivation quintile of residence in 1971 and 1991. Source: Norman et al. (2005) based on data extracted from the ONS LS
circumstances were the opposite; by 1991 they were no longer in the least deprived areas. The health of those persons whose circumstances improved was better than those whose deprivation circumstances worsened. In both of these situations migrants had better health than the non-migrants; more for those whose deprivation circumstances improved. Those persons resident in most deprived areas (Q5) over the 20 year period had poor health, particularly those who migrated between areas in the most deprived quintile. The pair of bars labelled ‘Q1-4 1971 Q5 1991’ are the results for persons whose deprivation circumstances worsened, having more from a less to the most deprived areas in 1991. These people had relatively poor health, particularly the migrants. The pair of bars labelled ‘Q5 1971 Q1-4 1991’ relate to those who were living in the most deprived areas in 1971 but by 1991 their deprivation circumstances had improved, either because they moved to better areas, or because the area improved around them. These persons had better health than the people whose area deprivation circumstances worsened but with no real difference between migrants and non-migrants.
Figure 5 illustrates the 1991 SIRs for each deprivation quintile based on where people were living in 1971 and then in 1991. Focusing on the least and most deprived areas, in Q1 the SIRs improved significantly over the 20 year period, but in Q5 they worsened significantly. Essentially, the transitions described above whereby people’s deprivation circumstances either worsened or improved resulted in a change in health which caused an increase in the health inequality between the least and most deprived areas. Migrants into the least deprived areas had particularly good health compared with those moving away from Q1 (whether migrants or non-migrants). Migrants into the most deprived areas had particularly poor health compared with those moving away from these areas (whether migrants or non-migrants). Along with these different health statuses, because numbers of migrants substantially exceed numbers of non-migrants over the 20 year period, Norman et al. (2005) conclude that it is migration between differently deprived areas rather than changes in the deprivation of the area in which non-migrants live that largely accounts for the change in the health-deprivation relationship. Whilst the changes are not quite so marked, the
146
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
same patterns are found for mortality as with limiting long-term illness.
CONCLUSION No other UK data source provides better information on local migration than the census. The area tables are easily accessible and provide crosstabulations relating migration to a limited number of other variables for national down to local levels. The interaction data enable geographical connections between origins and destinations to be explored but, again, only for limited crosstabulations. The SARs and LSs differ from the UK’s other census outputs since these sources are microdata and are population samples. All census outputs are defined to keep the data confidential but because the SARs and LSs are individual level data, the definitions of the data release and access arrangements are stringent. The SARs are cross-sectional data like the area statistics and interaction data, but the LSs track people over time. These sources include data on migration based on the respondent’s address one year before the census with the distance moved readily available. Unlike the other census sources, the LSs also allow inter-decade movement to be explored and information is included on migrant and non-migrants at both time points. A limited number of geographies are available in the SARs, particularly about migrant origins. In the LSs, each individual’s record is linked to a wide range of geographies existing at the time of each census. In theory, this offers choice of origin-destination geographies over a variety of time points. In practice, because the geographical definitions rarely stay fixed and comparable over time and because analysing outputs creates risk of disclosure, data are unlikely to be released for anything but relatively large geographies or broad area types. The inter-relationships between health, deprivation and migration are complex and liable to
change over time. Seeking greater clarity about the ways these facets impinge on the relative health status of geographical areas, Boyle et al. (2002) provide initial findings using the 1991 Census SARs using the one year migration indicator. The selective nature of the migration process is shown, especially in relation to health and area deprivation, but only with respect to characteristics at the destination. Analysing these effects on both origins and destinations effectively can be achieved using longitudinal data, such as that provided by the ONS LS, especially when individuals are linked to area deprivation. Norman et al. (2005) find that migrants into and between the least deprived areas had better health than nonmigrants, but migrants into and between the most deprived areas had the worst health. The overall effect of these changes has been to increase the inequality in health between differently deprived areas. A sorting, largely driven by selective migration had occurred. In contrast to aggregate data such as the Census Area Statistics and dedicated original/destination interaction data such as the Special Migration Statistics, the SARs and LSs are individual-level microdata. These microdata files are very versatile allowing multi-way cross-tabulations and enabling application-relevant derived and re-coded variables and study populations to be defined. A range of SARs files offer UK coverage although a UKwide study is challenging because data for each country are not always in one file and because the available variable detail may be country specific. Since their first release, the SARs have been used in almost 400 publications, over 40 of which relate to migration. The ONS LS for England and Wales has underpinned over 600 academic publications since its inception in the 1970s (Boyle et al., 2008) including at least 80 that include an interest in migration. This well-established source is now complemented by longitudinal data for Scotland and for Northern Ireland. Largely driven by the need to ensure respondent confidentiality, the SARs and LS have some
147
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
drawbacks though. To provide socio-demographic detail, some sacrifices need to be made (Norman, 2003). In addition to controlled access arrangements, the geographical area to which individuals are linked in the SARs tend to be rather coarse (large local authorities at best) and although the LS database records the small area geography in which the LS member was living at each Census, specific ‘place’ information is most unlikely to be considered non-disclosive unless for large geographies. However, generic, contextual information about the ‘space’ in which people live is a useful research avenue because it is then possible to describe results in relation to the area characteristics (we focused on deprivation), even though the actual places are not identified. We must remember that the SARs and LS are samples rather than being for 100% of the enumerated census population. They are, however, very large samples in comparison with other national surveys (such as the Labour Force Survey or General Household Survey) and represent first rate resources which complement other census and survey datasets.
ACKNOWLEDGMENT Census data are provided by the Office for National Statistics (ONS) the General Register Office for Scotland (GROS) and the Northern Ireland Statistics and Research Agency (NISRA). Census outputs are Crown copyright and are reproduced here with the permission of the Controller of HMSO and the Queen’s Printer for Scotland and remain Crown Copyright. For the SARs examples reported above in relation to research by Boyle, Norman and Rees (2002), the work of the SARs support team at the Cathie Marsh Centre for Census and Survey Research (CCSR), University of Manchester is gratefully acknowledged. For the LS examples reported above in relation to research by Norman, Boyle and Rees (2005), the permission of ONS to
148
use the Longitudinal Study for LS project 30033 is gratefully acknowledged, as is the help provided by staff of the Centre for Longitudinal Study Information & User Support (CeLSIUS). The Economic And Social Research Council (ESRC) Census Programme provides services to allow users in UK higher and further education institutions to access data from the 1971, 1981, 1991 and 2001 UK Censuses. The Programme services, including the SARs and LS support units, are funded by the ESRC with additional support from the Joint Information Systems Committee. Background work to Boyle et al. (2002) and Norman et al. (2005) was carried out by Norman during ESRC PhD CASE Award S00429937028 (1999-2002) on ‘Estimating small area populations for use in medical studies: accounting for population migration’.
REfERENCES Agresti, A. (2002). Categorical Data Analysis. Chichester, UK: John Wiley. doi:10.1002/0471249688 Al-Hamad, A., Hayes, L., & Flowerdew, R. (1997). Migration of the elderly to join existing households: evidence from the Household SAR. Environment & Planning A, 29(7), 1243–1255. doi:10.1068/a291243 Bentham, G. (1988). Migration and morbidity: implications for geographical studies of disease. Social Science & Medicine, 26, 49–54. doi:10.1016/0277-9536(88)90044-5 Blackwell, L., Akinwale, B., Antonatos, A., & Haskey, J. (2005). Opportunities for new research using the post-2001 ONS Longitudinal Study. Population Trends, 121, 8–16. Boyle, P. (1995). Public housing as a barrier to long-distance migration. International Journal of Population Geography, 1(2), 147–164.
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Boyle, P. (1998). Migration and housing tenure in South East England. Environment & Planning A, 30, 855–866. doi:10.1068/a300855 Boyle, P. (2004). Population geography: migration and inequalities in mortality and morbidity. Progress in Human Geography, 28(6), 767–776. doi:10.1191/0309132504ph518pr Boyle, P., Feijten, P., Feng, Z., Hattersley, L., Huang, Z., Nolan, J., & Raab, G. (2008). Cohort Profile: The Scottish Longitudinal Study (SLS). International Journal of Epidemiology, 2008, 1–8. Boyle, P., Halfacree, K., & Robinson, V. (1998). Exploring Contemporary Migration. Harlow, UK: Longman. Boyle, P., Norman, P., & Rees, P. (2002). Does migration exaggerate the relationship between deprivation and limiting long-term illness? A Scottish analysis. Social Science & Medicine, 55, 21–31. doi:10.1016/S0277-9536(01)00217-9 Boyle, P., Norman, P., & Rees, P. (2004). Changing places: do changes in the relative deprivation of areas influence limiting long-term illness and mortality among non-migrant people living in non-deprived households? Social Science & Medicine, 58, 2459–2471. doi:10.1016/j.socscimed.2003.09.011 Boyle, P., & Shen, J. (1997). Public housing and migration: a multi-level modelling approach. International Journal of Population Geography, 3, 227–242. doi:10.1002/(SICI)10991220(199709)3:33.0.CO;2W Boyle, P. J., Duke-Williams, O., & Gatrell, A. (2001). Do area-level population change, deprivation and variations in deprivation affect selfreported limiting long-term illness? An individual analysis. Social Science & Medicine, 53, 795–799. doi:10.1016/S0277-9536(00)00373-7
Brimblecombe, N., Dorling, D., & Shaw, M. (2000). Migration and geographical inequalities in health in Britain. Social Science & Medicine, 50, 861–878. doi:10.1016/S0277-9536(99)00371-8 Buck, N., Gershuny, J., Rose, D., & Scott, J. (Eds.). (1994). Changing Households. The British Households Panel Survey 1990-1992. Colchester: ESRC Centre for Micro-Social Change. Carstairs, V., & Morris, R. (1989). Deprivation and mortality: an alternative to social class? Community Medicine, 11, 210–219. Champion, T., Fotheringham, S., Rees, P., Boyle, P., & Stillwell, J. (1998). The Determinants of Migration Flows in England: a Review of Existing Data and Evidence, a report prepared for the Department of the Environment, Transport and the Regions. Department of Geography, University of Newcastle upon Tyne, Newcastle. Creeser, R., Dodgeon, B., Joshi, H., & Smith, J. (2002). The ONS Longitudinal Study: linked census and event data to 2001. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 221–229). Chichester, UK: John Wiley. Dale, A. (1993). The OPCS Longitudinal Study. In Dale, A., & Marsh, C. (Eds.), The 1991 Census User’s Guide (pp. 312–329). London: HMSO. Dale, A. (1998). The value of the SARs in spatial and area-level research. Environment & Planning A, 30, 767–774. doi:10.1068/a300767 Dale, A., Creeser, R., Dodgeon, B., Gleave, S., & Filakti, H. (1993). An introduction to the OPCS Longitudinal Study. Environment & Planning A, 25, 1387–1398. doi:10.1068/a251387 Dale, A., Fieldhouse, E., & Holdsworth, C. (2000). Analysing Census Microdata. London: Arnold.
149
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Dale, A., & Teague, A. (2002). Microdata from the Census: Samples of Anonymised Records. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 203–212). Chichester, UK: John Wiley. Exeter, D. J., Boyle, P., Feng, Z., Flowerdew, R., & Schierloh, N. (2005). The creation of ‘Consistent Areas Through Time’ (CATTs) in Scotland, 19812001. Population Trends, 119, 28–36. Farr, W. (1864). Supplement to the 25th Annual Report of the Registrar General. London: HMSO. Findley, S. E. (1988). The directionality and age selectivity of the health-migration relation: evidence from sequences of disability and mobility in the United States. The International Migration Review, 22, 4–29. doi:10.2307/2546583 Gatrell, A. C. (2002). Geographies of Health: An Introduction. Oxford, UK: Blackwell. Harding, S. (2003). Social mobility and self-reported limiting long-term illness among west Indian and South Asian migrants living in England and Wales. Social Science & Medicine, 56, 355–361. doi:10.1016/S0277-9536(02)00041-2 Hattersley, L., & Creeser, R. (1995). Longitudinal Study 1971-1991: History, Organisation and Quality of Data. OPCS Series DS 15. London: HMSO. Hayes, L., & Al-Hamad, A. (1997). Residential movement into elderly person households: evidence from the Household Sample of Anonymised Records. Environment & Planning A, 29(8), 1433–1447. doi:10.1068/a291433 Hayes, L., & Al-Hamad, A. (1999). Residential change: differences in the movements and living arrangements of divorced men and women. In Boyle, P., & Halfacree, K. (Eds.), Migration and Gender in the Developed World (pp. 261–279). London: Routledge.
150
Marsh, C. (1993). The Sample of Anonymised Records. In Dale, A., & Marsh, C. (Eds.), The 1991 Census User’s Guide (pp. 295–311). London: HMSO. Middleton, E. (1995). Samples of Anonymised Records. In Openshaw, S. (Ed.), Census Users’ Handbook (pp. 337–362). Cambridge, UK: GeoInformation International. Norman, P. (2002). Estimating small area populations for use in medical studies: accounting for migration. PhD Thesis, School of Geography, University of Leeds, Leeds. Norman, P. (2003). What are individual-level microdata and aggregate-level area census data? FAQ 11 Individual versus Aggregate. Retrieved from http://www.chcc.ac.uk/overview/faq11/ frame.html Norman, P. (2006). The joys and challenges of attaching area data to the ONS Longitudinal Study. CeLSIUS Newsletter 7. Retrieved from http:// celsius.census.ac.uk/news.html Norman, P., & Bambra, C. (2007). The utility of medically certified sickness absence data as an updatable indicator of population health. Population Space and Place, 13(5), 333–352. doi:10.1002/psp.458 Norman, P., Boyle, P., & Rees, P. (2005). Selective migration, health and deprivation: a longitudinal analysis. Social Science & Medicine, 60(12), 2755–2771. doi:10.1016/j.socscimed.2004.11.008 Norman, P., Rees, P., & Boyle, P. (2003). Achieving data compatibility over space and time: creating consistent geographical zones. International Journal of Population Geography, 9(5), 365–386. doi:10.1002/ijpg.294 SARs. (2004). What are the Samples of Anonymised Records? Retrieved from http://www. ccsr.ac.uk/sars/guide/introduction/
Using Migration Microdata from the Samples of Anonymised Records and the Longitudinal Studies
Senior, M. L., Williams, H., & Higgs, G. (2000). Urban-rural mortality differentials: controlling for material deprivation. Social Science & Medicine, 51, 289–305. doi:10.1016/S0277-9536(99)004542
Walters, W. H. (2000). Assessing the impact of place characteristics on human migration: the importance of migrants’ intentions and enabling attributes. Area, 32(1), 119–123. doi:10.1111/j.1475-4762.2000.tb00121.x
Sloggett, A., & Joshi, H. (1998). Indicators of deprivation in people and places: longitudinal perspectives. Environment & Planning A, 30, 1055–1076. doi:10.1068/a301055
Welton, T. A. (1872). On the effect of migrations in disturbing local rates of mortality, as exemplified in the statistics of London and the surrounding country, for the years 1851-1860. Journal of the Institute of Actuaries, 16, 153.
Verheij, R. A., Dike van de Mheen, H., de Bakker, D. H., Groenewegen, P. P., & Mackenbach, J. P. (1998). Urban-Rural variations in health in the Netherlands: does selective migration play a part? Journal of Epidemiology and Community Health, 52, 487–493. doi:10.1136/jech.52.8.487
151
Section 2
Spatial Interaction Analysis and Modelling Applications
153
Chapter 8
Internal Migration Patterns by Age and Sex at the Start of the 21st Century Adam Dennett University of Leeds, UK John Stillwell University of Leeds, UK
ABSTRACT Moving home is an event that most people experience at some stage in their lives. Previous research has shown that while men and women tend to have similar rates of migration overall, significant variations occur according to age. In this chapter, the authors examine these demographic influences on migration for internal migration in Britain using data from the 2001 Census Special Migration Statistics at district scale. The analysis of migrations rates reveals some subtle differences between males and females by age but spatial patterns of net migration for both sexes emphasise that losses for London and provincial urban centres and gains in rural Britain vary significantly by age. The chapter uses a national area classification framework to summarise the patterns of net migration taking place in the year before the 2001 Census at district scale and the latter part of the chapter explores indices of population stability – turnover and churn – that provide alternative insights into migration patterns across the country, particularly when disaggregated by age. These measures of migration are important because it is apparent that some areas that exhibit relatively low net rates of movement, actually have large numbers of migrants moving within their boundaries as well as inflows from and outflows to other areas – movements which clearly impact on the stability of their populations and have policy implications.
INTRODUCTION The migration process involves three key variables: the migrant, the origin and the destination. Variations in the particular attributes of all three of these DOI: 10.4018/978-1-61520-755-8.ch008
variables, together with intervening opportunities, make for complex migration behaviour. On a detailed micro level, any individual migrant will exhibit a unique combination of individual attributes relating to their age, sex, ethnicity, socio-economic status, et cetera, including their own perceptions of where they live now and where they might live in
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
the future based on their own understanding of the places involved. Whilst data on certain individual attributes are captured by the Census of Population, confidentiality constraints limit access to samples of anonymised records and detailed surveys are required to provide information about perceptions and values. At a macro level, on the other hand, a range of data sources (see Chapter 1) provide either total migration flows or flows disaggregated by certain variables. Moreover, there are particular features of origin and destination places that are commonly measured or quantifiable, perhaps relating to features such as job vacancies, unemployment or crime that we can use as potential factors with which to explain migration flows. Much research has focused on building models to quantify the significance on migration of particular attributes of origins and destinations as well as the distance between them. Thus, as Cadwallader (1989) has demonstrated, models are frequently developed for flows at the macro level and the importance of each explanatory factor is inferred from the coefficients calibrated by the model, whilst the real decisions to leave the origin or to choose a particular destination are a taken at the micro level. Whilst there may be a myriad of individual motivations and complex interactions between the migrant and their existing or desired locations, some migrant attributes and indeed the attributes of places are more likely to affect both the propensity to migrate and the origin or destination of migration than others. In this chapter we focus on two demographic attributes of migrant flows at the macro level – age and sex – and offer a national overview, based on data from the 2001 Census, of age/sex-specific migration patterns within Britain at the start of the new millennium.
THE INfLUENCE Of AGE AND SEX ON MIGRATION PROPENSITIES Stillwell (2008) makes the distinction between the characteristics of migrants or places which
154
influence the propensity to migrate (such as age or socio-economic status) and the factors which actually determine if a move takes place (such as the need for a job) and explains why the move occurs to a particular destination (such as more vacancies in one area rather than in others). Both influences and determinants are clearly interlinked and both have spawned research seeking to describe and account for internal and international migration around the world. Here our focus is on the propensity to migrate and the particular demographic characteristics of migrants that might make them more or less likely to change their permanent residence. Much research has been carried out on the particular attributes of migrants which might influence their migration behaviours. In this volume, for example, Stillwell (Chapter 9) explains in detail the influence of ethnicity on migrant behaviours; elsewhere, Finney and Simpson (2008) have carried out similar work using 2001 Census data and Owen (1997) provides a detailed examination from the 1991 Census. Other work, such as that by Champion and Coombes (2007) has flagged the importance of the socio-economic status of the migrant. One of the more major attributes affecting an individual’s propensity to migrate is their age. A large volume of work, including studies in the 1980s by (Rogers and Castro, 1981; Bates and Bracken, 1982; Rogers et al., 2002; Raymer et al., 2006; Raymer et al., 2007), has identified the influence of age on migration behaviour. The seminal work of Rogers and Castro (1981) was important in identifying the similarities in migration rate age ‘schedules’ across a range of countries and cities. From these common observations, Rogers and Castro were able to construct a model migration schedule consisting of a series of key age-related components. Consider the example shown in Figure 1. Whilst Figure 1 is an empirical schedule constructed from 2001 Census data for Britain as a whole based on data for quinary age groups, the
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Figure 1. Example age-specific migration schedule, Britain, 2000-01. Source: After Rogers and Castro (1981)
model components as identified by Rogers and Castro are included; these identify some distinct phases in the life course where the propensity to migrate fluctuates. Firstly there is a ‘pre-labour force’ component between ages (vertical lines) 1 and 2, which is characterised by a steady decline in the rate of migration. This decline can be compared directly with the decline shown between lines 1a and 2a. Horizontal arrow A links these two elements of the schedule and signifies a ‘parental shift’ where the two comparable rates of migration decline, starting at the average age at which parents have children. After the prelabour force component. there is a ‘labour force’ component between lines 2 and 4. Between these lines, line 3 represents the high peak of migration towards the beginning of the labour force component at age 20-24. This peak represents a period of the life course when individuals are more likely to move, and is associated with employment seeking moves associated with labour market influences. Arrow X signifies a ‘labour force shift’ in migration propensities between the pre work-age population and ‘first job’ age range. The ‘post-labour force’ component begins after line 4 at age 60-64 when retirement migration is most likely, and shows a steady decline in migration propensity from the initial peak,
followed by an increase towards the end of the life course associated with moves to be closer to family, or into communal establishments, or to be closer to health and other services as the ability to maintain independent living status declines. In the original work by Rogers and Castro (1981), a clear ‘retirement peak’ was identified. In this example of migration in Britain in 2000-01, the peak is much less evident, suggesting either that retirement is not the catalyst for migration as it has been in the past, or that people are retiring at different ages which has the effect of spreading the effect over several age groups. It is clear from the migration rate schedule in Figure 1 that the propensity of individuals to migrate at the beginning of the new millennium varies significantly in relation to their age or stage in the life course. The form of the schedule is consistent with that identified in previous research. It has generally been accepted, however, that sex has much less of an impact, especially at an aggregate level; an assertion made by Rogers and Castro (1981), Bates and Bracken (1982) based on earlier data, but also made more recently by Champion (2005). Whilst this is certainly true if we compare the overall propensities of migration for males and females, when we examine the age-specific differences between male and female propensities, some subtle yet important differences emerge. Figure 2 decomposes the data shown in Figure 1 into its sex-specific components and shows that during the period identified by Rogers and Castro as the ‘labour force shift’, clear differences between male and female migration propensities appear. At the labour force peak aged 20-24, the difference between the two schedules is as much as 5%, representing 85,000 more female migrants than males in absolute terms. For much of the remainder of labour force component, the propensity for men exceeds that for women, the difference varying from 2 to 3% before disappearing after age 50 and then reappearing again post-retirement, when the female rate increasingly exceeds that for males as age beyond 70 increases.
155
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Figure 2. Differences in male/female migration propensities with age, 2001. Sources: 2001 Census SMS and Standard Tables
This initial analysis suggests that some interesting variations in migration propensities appear when we examine the age and sex differentials. As such, we devote our attention in this chapter to understanding fully the nuanced influences of age and sex on migration patterns in Britain at the beginning of the twenty-first century.
AGE/SEX DATA fROM THE SMS As has already been outlined in Chapter 1 of this book, reliable sources of migration data in general are not abundant. If researchers require data disaggregated by different variables, the choice becomes even more limited. With age and sex variables being a more common disaggregation (than, say, ethnicity or socio-economic status) our choice of data sources is slightly improved, although there is still not a wide selection. A number of social surveys such as the Labour Force Survey and the General Household Survey provide allow flows between origins and destinations to be derived, although these are limited by small sample sizes and coarse geographies – flows between Government Office Regions are the best that can be obtained. There are only two sources of data with large sample sizes and reasonably
156
fine grained geographies which would allow meaningful analysis to be carried out; namely the NHS patient re-registration data (NHSCR and patient register) and the Census of Population. In this chapter, we decided to use the 2001 Census data, easily accessible from the WICID system, but more importantly, containing information about within area flows that allow important measures of migration churn to be calculated. Age/sex data are available from the Special Migration Statistics (SMS) at all three levels (output area, ward and district). At level 3 (output area), due to the potentially disclosive nature of more detailed disaggregation, only three very broad age groups are available: 0-15, 16 to pensionable age and pensionable age and above. At level 2 (ward level), the coarser geography allows for a far more detailed age disaggregation with 16 age groups available for persons, males and females, varying in size from single year (0 and 15 ages) to fifteen years (45-59 and 60-74) as well a 75+ age group. At the district level, yet more age groups are presented (24), this time ranging from single year to five year groups (with an additional 90+ group). Whilst a reduction in the level of variable detail accompanies an increase in geographical detail, the potential risk of still disclosing sensitive information has lead to the perturbation of small counts (known as the Small Cell Adjustment Method – SCAM) which has affected the accuracy of the data at each level. The affect of cell perturbation is more pronounced at finer geographical scales, and so for this reason, accompanied by other practical reasons (including the difficult of presenting national patterns at a level below that of the district), our analysis here will be at the district scale. The inclusion of single years 0 and 15, as well as double years (1-2, 3-4, 10-11, 16-17, 18-19) and one treble year (12-14) helps to provide a clearer understanding of the differences between propensities in the lower end of the age range. However, in the analysis which follows, we standardise these age groups to five years, given the quinary structure
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
of age groups from 20-24 onwards up to 85-89 and include a final age 90+ group.
MIGRATION PATTERNS BY AGE AND SEX AT THE DISTRICT SCALE The spatial patterns of internal migration in Britain will be examined in the context of both age and sex. Figure 3 demonstrates the rates of net gain or loss for each district in Britain for both males and females. It is evident that broadly, the patterns of gains and losses are very similar. Most districts of Greater London and those bordering (including those stretching out along the M4) are experiencing net out-migration rates. Other metropolitan areas, including Birmingham, Liverpool, Manchester and their surrounding districts, the North East and Glasgow are also experiencing net out-migration rates. In contrast, more rural areas covering large parts of East Anglia, the South West, Wales, the
Midlands and the North are all experiencing rates of net in-migration. Where generally the patterns of male and female net migration are similar, there are some districts where their migration propensities differ. Richmondshire, in the north of England, is perhaps the most noticeable example. Here, there is a significant in-migration of males and noticeable out-migration of females. This somewhat surprising pattern is explained when we take into consideration that Catterick Garrison (a major British Army base) is located in Richmondshire, housing around 12,000 soldiers, the majority of them likely to be male. Other areas of difference include the far northern and western parts of Scotland; in these significantly rural areas, the propensity of males to migrate out is greater than that of females; a pattern somewhat more difficult to explain, but one which is likely to be linked to gender roles and employment opportunities. By focusing on net migration and identifying key patterns of gain and loss between districts,
Figure 3. Net migration rates by sex, districts of Britain, 2000-01. Sources: 2001 Census SMS and Standard Tables
157
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
we are omitting the major component of internal migration − the migrants that are moving within districts. When the rates of intra-district migration in Britain are mapped for males and females (Figure 4), it is apparent that the spatial patterns of relatively high and low rates also differ to some degree. There is a noticeable cluster of districts with relatively high intra-area migration for both males and females centred on Leeds in the north of England. A number of areas in Scotland and west Wales also feature higher within-district moves. This is very understandable in the case of the Scottish Highlands, where both remoteness and the physical size of the Highland district are likely to be barriers to inter-district movement. Lack of accessibility to other areas in the country could also influence these patterns in western Wales. Less clear is why this would be the case with the areas around Leeds as the size of the districts and accessibility will not be issues. Lower rates of intra-district migration tend to be focused around
London and the South East and around the West Midlands, where it is likely that inter-district rates will be higher. In general terms, the spatial patterns of intra-district migration for males and females across the country at district scale are very similar. Whilst sex-specific patterns for all-age migrants are similar, the earlier discussion of the influence of age on the propensity to migrate suggest that we might usefully examine in more detail age-related migration for the whole country, paying attention to both inter and intra-district movements. It would be interesting to examine all movements according to distance moved but, in practice, replacing distance with the proxy of inter-zone and intra-zone flows makes for a more manageable analysis. Of course the levels of inter/intra-zone movements will depend completely on the scale of analysis, with the majority of flows for small areas such as output areas being inter-zonal and the majority of flows for large areas such as regions
Figure 4. Intra-district migration rates by sex, districts of Britain, 2000-01. Sources: 2001 Census SMS and Standard Tables
158
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
being intra-zonal (for a more detail discussion on the issues of scale, see Gober-Meyers, 1978). As we will not be comparing inter/intra area flows between different areas of a geographical hierarchy using this division is less problematic, although it should be acknowledged that inter/intra area flows will be noticeably different at different levels of a geographical hierarchy. Figure 5 shows the schedules of age-specific migration propensities in Britain for the year leading up to the 2001 Census for both inter and intra-area flows. Evident are the subtle differences between the schedules for males and females, as in Figure 2. The rate of migration for males and females remains very similar until the age of 10-14 both for inter and intra-area flows, as children of both sexes move with their parents. At the very beginning of the life course, intra-district flows are far more important than inter-district flows, with just over 10% of flows for males and females in the 0-4 age group being intra-district, and only around 5% being interdistrict. Between age group 0-4 and age group 10-14, there is a significant drop in the propensity to migrate with intra-district flows dropping to just under 6% and inter-district flows falling to just over 2%. This corresponds with the ages at which children are still dependent on their Figure 5. Age-specific migration schedules by sex and inter/intra area flows, 2000-01. Sources: 2001 Census SMS and Standard Tables
parents for support. Indeed, the downward curve corresponds closely to the downward curve of adults beginning at age group 30-34 – thirty being the approximate average age in 2001 at which parents have children (ONS, 2006). At the 15-19 age group there is a sharp change in this downward trend and a divergence between males and females, especially with intra-district flows, coinciding with the change in the dependency status of children and the move out of the family home, either to college or university or to a first job. At this age group, female intra-district flows become the most significant with around 9% of the population moving within districts. The next most significant flow is the female inter-district flow comprising around 7% of the population. At this age group, male inter and intra-district flows are almost identical at around 6% of the population. Combining intra and inter-district flows reveals that at this age group, around 16% of the female population are migrating, compared to only around 12% of the male population. When females do migrate at this age, there is a tendency for the moves to be shorter, intra-district moves, rather than longer inter-district moves. The gap in the propensities of males and females to migrate is maintained for the next age group. Age group 20-24 represents the peak age for migration events, for both males and females, and for both inter and intra-district flows. Again, females of this age are considerably more likely than males to be migrants, with female intradistrict migrants comprising around 19% of the total female population. Add the 14% of female inter-district migrants to the figure, and it is apparent that at age group 20-24, female migrants comprise around one third of the total female population. This compares with male migrants of this age who only make up around 28% of the total population. From this peak migration age group, there is a sharp decline in the propensity to migrate at age group 25-29. Most noticeably the female intra-district migration propensity drops to equal that of the males at around 13%. Female
159
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
inter-district migration also declines, yet the rate drops to below that of males at around 10%. From here there is a continued sharp decline in migration propensities for both male and female intra and inter-district migration until around age-group 45-49. Between 25-29 and 45-49, males take over from females as the gender with higher migration propensities both for inter and intra-district moves, although these differences are much smaller than the gender differences at the ages of peak migration. Between the age groups of 50-54 and 70-74 there is a continued decline in migration propensities for both males and females and for inter and intra-district moves. The decline though is gradual and the differences small although intra-district moves are still more common than inter-district moves. From age group 70-74 upwards, migrants as a percentage of the total population begin to increase again, with the proportion of migrants continuing to increase until the last (90+) age group in the schedule. Females again overtake males as the group with the highest proportion of the total population comprising of migrants in this old age range. This gap continues to widen as age increases with intra-district moves remaining the most important. Having reviewed entire district level migration schedules for Britain, it is apparent that the age-range where most migrant activity occurs is around 15 to 29. With the evidence pointing towards a greater propensity for females than males to migrate in their late teens and early twenties, one may ask why this is the case. A partial explanation is offered by Faggian et al. (2007), whose work using data from the Higher Education Statistics Agency (HESA) student leavers’ questionnaire concludes that evidence of increased female migration in the 21-25 age group must be related to women moving in order to maximise their employment potential in a market that discriminates in favour of males. Certainly this is a plausible explanation for at least some of the increased female propensity to migrate. It
160
does not, however, explain the differences in male/ female migration rates at age group 15-19. For this, examination of HESA statistics (2002) relating to new undergraduate students for 2000-2001 may provide some clues. These data reveal that for all first year students under 21 years of age in UK Higher Education institutions, there were 18,685 more females than males; this certainly accounts for at least part of this phenomenon. Whilst being a student does not necessarily automatically mean that an individual is also going to be a migrant, with an increasing proportion of students leaving the family home to go and study, it will increase the likelihood migration occurring. Other possible explanations for the higher intensities of migration among women in this age group might be associated with migration flows involving communal establishments as origins and/or destinations, including prisons, since flows between communal establishments as well as between households are included in the 2001 Census data. However, data from the Home Office (2003) for 2001 reveals that the migration of female prisoners is relatively insignificant: there were only 810 females living in prisons compared with 15,152 males aged between 18 and 24. One final possible explanation for the differences between male and female migration propensities at these ages could be to do with the average age differentials within male/female couples. It may well be that many moves are by individuals who are part of a couple, and that in many cases the female member of the couple is younger than the male, thus accounting for some of the difference at each age group. With the migration schedules giving us insight into the age groups where migration propensity is higher or lower, it will be valuable to examine the spatial patterns of migration where age is having the most influence. Age group 20-24 is the age group of peak migration propensity, with noticeable differences between males and females. Figure 6 reveals the net migration rates for each district for this age group. It is immediately apparent that the
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
patterns differ very much from the aggregate ones shown in Figure 3. For both males and females there is a general pattern of net out-migration from rural areas. In-migration, on the other hand, is concentrated almost exclusively on districts housing large urban centres. London is a key hot-spot for net in-migration with the majority of boroughs experiencing fairly high rates. Other urban areas providing destinations for many males and females in the 20-24 age group include Manchester, Leeds, Nottingham, Hull, Glasgow, Edinburgh, Brighton and Hove, Bristol and Portsmouth, as well as a number of peri-urban districts in the South East. There are some districts where noticeable differences in male and female net migration rates occur. For example, Richmondshire (as previously noted at the aggregate level) has dramatically different rates for males (over 7.5 per 1,000 population net in-migration) and females (between -1 to -8.5 per 100 population net out-migration) – shown in Figure 6. East Cambridgeshire, Eastbourne and North Tyneside, on the other hand, are examples of
areas where female net in-migration and male net out-migration are occurring at broadly similar rates, patterns which might quite feasibly be attributed to sex-specific job opportunities.
DISAGGREGATING AGE/SEX fLOWS BY AREA TYPE: THE UTILITY Of AREA CLASSIfICATIONS Through examining national age/sex migration within Britain at a district level, we begin to understand the broad trends and some of the spatial patterns. Whilst mapping these data provides an appreciation of the influence of place on particular patterns, geographical location tells us only so much. It is a straightforward task to identify urban or rural districts for anyone with a little knowledge of the geography of Britain, but it is less straightforward to make judgements about the characteristics of these districts and the people residing within them that might make them
Figure 6. Net migration rates for age group 20-24 by sex, districts of Britain, 2000-01. Sources: 2001 Census SMS and Standard Tables
161
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Figure 7. In/out migration ratios by broad age group for families, groups and classes, 2000-01. Sources: 2001 Census SMS and Standard Tables
more or less attractive to particular categories of migrants. In order to get some purchase on the influence of area type, we turn our attention to the use of an area classification. Area classifications have gained prominence partially through the growth of the commercial geodemographics industry. By distilling a large amount of socio-economic demographic data down to a few key components for small areas, geodemographics companies have been able to classify those areas by the key characteristics of their inhabitants. This has proved an extremely useful market analysis tool for commercial organisations and has turned the geodemographics industry into a multi-million pound sector in a relatively short period of time. Whilst many of the developments in geodemographics have come from the commercial sector, Farr and Webber (2001) point out that, much of the early methodological development in geodemographic classifications came through a
162
desire from local governments to prioritise resource targeting more effectively. Indeed there now exist a range of freely available classifications for small areas such as output areas (Vickers and Rees, 2006; 2007) and wards, as well as for larger areas, such as local authority districts, (http://www.statistics.gov. uk/about/methodology_by_theme/area_ classification/default.asp) in the UK. These classifications enable areas to be profiled, providing a valuable framework in the context of migration research as types of migrant may be moving between spatially varied but characteristically homogenous areas. A national classification of districts has been developed by Vickers et al. (2003) using the 2001 Census Key Statistics. The classification is hierarchical and assigns each district in the UK to a different ‘family’, ‘group’ or ‘class’ based on a range of socio-economic and demographic characteristics. Every district will be within a class, nested within a larger group, nested within an even
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
larger family. At the most aggregate family level, there are four categories of district: Urban UK, Rural UK, Prosperous Britain and Urban London; the categories increase to 12 at the group level and 24 at the class level. Whilst a similar classification of districts has been developed by the ONS, we adopt the Vickers et al. classification in this analysis because of its comprehensive, robust and transparent methodology. Figure 7 provides a summary of inflow/outflow ratios for each of the Vickers et al. classification types. Here we have disaggregated the flows by five broad age groups: 0-15, 16-29, 30-44, 45-pensionable age (pensionable age in this case defined as 65 for males and 60 for females) and pensionable age and above. These groups were chosen as they represent groupings of around 15 years, making it possible to draw comparisons with the relative numbers of migrants present in each group, but they also represent recognisable stages in the life course: ages 0-15 are the dependent child years; ages 16-29 contain a number of key life stages: leaving home to study or take a first job; graduating and moving to a first job; moving through the early stages of a career; starting a young family; ages 30-44 are the family rearing years; ages 45-pensionable age are the years after the children have left home; and pensionable age and above are the retirement years. What is immediately apparent from Figure 7 is that in virtually all classifications of district, the direction of migration for the majority of those in age group 16-29 is the opposite to all other age groups. Where the ratio is greater than 1, in-migration is higher than out-migration. Where the ratio is less than 1, the opposite is true. Taking family A (Urban UK) first, we can see that the ratio for the 16-29 age group is positive (1.2) with all other groups exhibiting negative ratios or net migration loss. Drilling down through the classification, we can start to deconstruct the pattern shown at the family level. Within Urban UK, the classes ‘Industrial Legacy’ and ‘M8 Corridor’ actually show the opposite balance of movement to that observed at
the family level. Here, the 16-29 age group are generally moving out in net terms, whereas all other age groups are moving in. In all other groups and classes, however, the balance of movement is the same as that at the more aggregate level. Particularly the classes ‘Regional Centres’, ‘Redeveloping Urban Centres’ and ‘Young and Vibrant Cities’ exhibit significantly positive ratios. When the list of districts which appear in these classes is examined, it becomes apparent why these balances of flows are occurring. The districts include large cities such as Glasgow, Manchester, Nottingham, Newcastle-Upon-Tyne, Bristol, Sheffield, Leeds, Edinburgh and Cardiff, as well as districts which are home to cities and large towns which, significantly, house major universities. These include, amongst others, Lancaster, Portsmouth, Cambridge, Oxford, Durham, Southampton and Norwich. Family B (Rural UK) shows a very different pattern with, in almost all cases, the balance of flows of the 16-29 age group being the opposite of all other groups. In this case though, the balance of flows in the 16-29 age group are generally negative. The exceptions are the ‘Isles of Scilly’, ‘Typical Towns’ and ‘Coastal Resorts’. The Isles of Scilly should probably be discounted as the total flows of migrants are extremely low (288 in total), thus having a distorting effect on the ratio. Typical Towns are exactly that, and whilst classified as ‘Rural’, in the context of the inflow of this age group to urban areas generally, it is not difficult to imagine the influence of slightly improved urban employment opportunities affecting this marginally positive balance. The pattern for Coastal Resorts is perhaps more interesting as it is the only classification category where all flows are in one direction – in this case positive. The extent of the positive flow, however, is less than for all other age groups. Of note in the Rural UK Family are the significantly positive in-flows of individuals aged 44 to pensionable age, and of pensionable age and above to ‘Aged Coastal Resorts’ and ‘Aged Coastal Extremities’; here there are in-flow ratios of above 2 for those
163
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
aged 44 to pensionable age and over 1.6 for the oldest age group. When we examine Family C (Prosperous Britain), what is immediately apparent is that the balance of flows for all age groups is more even (i.e. closer to 1). Overall the heaviest flows are negative and involve the 44-pensionable age group. In all groups and classes within this family, the balance of movement for this age group is negative. There is an interesting parallel with the 0-15 and 30-44 age groups both of which move into the ‘Commuter Belt’ in net terms – we can assume that the majority of these 0-15 year olds will be moving with their 30-44 year old parents. Something that fits in with the counterurbanisation trends (especially from London) noted for some decades now (Champion, 1989; Fielding, 1992; Champion et al., 2007). Family D (Urban London) is the final family in the Vickers et al. classification. As with the Isles of Scilly, it is sensible to ignore the City of London as the flows are so small (only 2,010 in and out migrants in total). The 16-29 age group, as with Urban UK and Rural UK are moving in the opposite direction to all other age groups. In all groups and classes of this family, the 16-29 age group exhibits positive migration balances whereas all other age groups experience quite significant net out-migration which increases with age. So where before by using standard thematic mapping techniques we were only able to discern particular age-related spatial migration patterns based on a prior knowledge of particular locations and their attributes (e.g. London or coastal areas), through the use of a district classification framework we can begin to explore the spatial patterns in a different way. All evidence points to a significant movement of individuals in their late teens and early twenties to urban areas, particularly London and other larger towns and cities. There is a general out-migration of this group from almost all types of rural area. Generally speaking, all other age groups are moving out of urban areas more than they are moving in, with rural areas almost exclusively pro-
164
viding the destinations. Low rates and balances of migration are found in smaller, often post-industrial urban areas, across all age groups. One further question in the context of this chapter relates to how sex affects the balance of migration movement for different classified areas in Britain. As we have already seen with the earlier maps, patterns of male and female internal migration are broadly similar, with one or two minor exceptions. Examining the in/out ratios for males and females by district classification reveals little difference (Figure 8). In most cases the balance of movement, where it is positive or negative for one sex, is matched by the other. The only differences are with the City of London, (where a caveat has already been given in relation to the reliability of this class due to the very small number of in/out migrants) and Typical Towns (where the positive/negative migration balances are very minor anyway).
UNDERSTANDING MIGRATION THROUGH MEASURES Of POPULATION STABILITY Thus far in this analysis we have concentrated principally on net migration and the balance of movements into and out of areas. We also briefly touched upon the national spatial patterns of intra-district flows and the importance of intra vis a vis inter-district flows. By concentrating on net flows we are not getting an insight into the volume of flow – areas may have very similar net balances, but very different numbers of migrants entering and leaving. This is important for the concept of population stability. For example, a hypothetical area with 1 million residents at the end of a year that had seen 100 residents move into the area and 101 residents depart over that period would have a net migration rate per 1,000 people of -0.001. This rate would be identical if, for the same area, 10,000 residents had moved in and 10,001 residents had moved out. Here it is clear that an identical rate is obscuring a very
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Figure 8. In/out migration ratios by sex for families, groups and classes, 2000-01. Sources: 2001 Census SMS and Standard Tables
different turnover of population for the area and a significant change in the composition of the resident population. We may postulate that a stable population is one where year-on-year it is comprised of similar individuals. A less stable population, on the other hand, has much more change in its composition. The limitations of net migration as a measure have been recognised before (Rogers, 1989) and one alternative to measuring net migration has been to measure migration effectiveness (also known as efficiency). This has been used in previous research both as an alternative to and in conjunction with net migration rates (Stillwell et al., 2000; 2001). Indices of migration effectiveness standardise rates of migration by using gross in-migration plus out-migration flows as the denominator rather than populations at risk (PAR) as shown in Chapter 6. The direction of flow is standardised by the magnitude of the flows in both directions
rather than the population of an area, but in doing so, the direction or symmetry of the flow is still of central importance. Consequently, the nature of the migration effectiveness measure does not make it the most suitable option for assessing the relative stability of underlying area populations. Getting to grips with population stability is important as it has implications for social policy. For example Rotolo and Tittle (2006) indicate that the stability of the population in an area may be an important influencing factor for social maladies such as crime or deviant behaviour – or indeed other more positive socially-related phenomena such as general happiness. Whilst natural change (births and deaths) will have some influence on population stability, by far the biggest influence will be migration. It is clear, therefore, that in order to deal with the concept of population stability we need to employ methods other than those more
165
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
commonly used to gain an appreciation of migration balances. To do this, we turn our attention to measures of population ‘turnover’ and ‘churn’. The first of these measures, population turnover (TO) by age group a and sex s for area i, is defined here as: æ D as + O as ö÷ ç TOias = çç i as i ÷÷ 1000 ÷÷ çè Pi ø
(1)
where • •
Dias is the in-migration of those in age group a and sex s to area i; Oias is the out-migration of those in age group a and sex s from area i;
and •
Pias is the population in age group a and sex s to area i.
The second measure, population churn (CH) by age group a and sex s for area i, includes withinarea migration as well as inflows and outflows, and can be defined as: æ D as + O as +W as ö÷ ç i i ÷ CH ias = çç i ÷÷ 1000 çè ÷ø Pias
(2)
where: Wias is total migrants of age group a and sex s within area i. Through measures of turnover and churn we can begin to appreciate the perturbing effects of population flows and assess the relative stabilities of populations in different areas. It could be argued that as churn incorporates within-area moves, it gives us more information than turnover. In the context of population stability, however, assertion should be treated with caution. If we accept stability in this context as being a measure of how much the population in an area remains the
166
same between the start and end of the period over which migration is measured – a stable population will contain many of the same individuals, then a within-area move will not result in any changes to the individuals in that area. It would be unwise, of course, to suggest that just because an area contains the same individuals after a move has taken place, the move has not resulted in a change in the overall structure of the population within the area. This is where the difficulty lies; the larger the area in question, the more likely it is to feature different population groups and thus experience perturbing within area moves. The smaller the area, the more likely it is to feature similar groups and experience less perturbing within-area moves – if we accept Tobler’s ‘first law’ of closer places being more related than distant places (Tobler, 1970). Moreover, this does not necessarily need to apply just to geographical distance, but can also be applied to the distances within variable clusters used to compile area classifications. Populations within a class in the Vickers et al. classification are likely to be more similar than populations within a family. Therefore within class moves are likely to have a less perturbing affect on the underlying population than moves within a family. Despite these difficulties in the precise interpretation of statistics incorporating within-area moves in relation to population stability, it should be reasonably safe to conclude the following: where two areas exhibit the same rates of turnover (inflow + outflow), the area with the higher rate of churn (inflow + outflow + within-flow) is likely to feature the less stable population, regardless of the size of the area. It is for this reason that we use both turnover and churn statistics in this analysis. Table 1 shows that the highest net migration rates for district families in Britain are Urban London and Rural UK, with considerably more people moving in than out, and out than in respectively for these two families. Both Urban UK and Prosperous Britain appear relatively unimportant when compared to these two areas. However, if we turn our focus to population stability, as different
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Table 1. Net migration, turnover and churn statistics by sex and district classification, 2000-01 District Classification (Family, Group, Class)
Total net migration rate
Total churn rate
Male net migration rate
A: Urban UK
-0.23
38.50
128.65
-0.35
40.25
130.40
-0.13
36.86
128.00
A1: Industrial Legacy
-0.95
38.58
110.50
-1.41
39.36
111.44
-0.52
35.91
109.62
A1a: Industrial Legacy
-0.95
38.58
110.50
-1.41
39.36
111.44
-0.52
35.91
109.62
A2: Established Urban Centres
-1.69
43.92
126.48
-1.85
46.06
128.11
-1.53
41.93
124.96
A2a: Struggling Urban Manufac.
-2.78
52.03
123.74
-2.88
54.70
125.14
-2.68
49.53
122.42
A2b: Regional Centres
1.96
95.40
184.51
2.58
100.13
189.55
1.40
91.00
179.83
A2c: Multicultural England
-3.13
48.63
128.37
-3.60
49.09
128.99
-2.69
46.25
126.78
A2d: M8 Corridor
-0.26
35.34
105.99
-0.58
38.51
108.86
0.03
33.36
104.26
A3: Young and Vibrant Cities
3.38
84.92
180.47
3.70
88.99
184.19
3.07
82.01
176.94
A3a: Redeveloping Urban Centres
3.89
81.13
173.63
4.03
84.27
178.44
3.76
78.15
170.01
A3b: Young Multicultural
1.61
108.79
209.38
2.56
112.24
213.17
0.71
105.54
205.81
2.72
43.25
120.92
2.59
44.97
122.83
2.85
41.62
119.11
3.55
65.34
132.74
3.60
68.94
135.46
3.51
62.87
130.14
B1a: Rural Extremes
0.67
65.31
134.88
0.99
68.34
136.68
0.37
63.37
133.16
B1b: Agricultural Fringe
4.32
76.02
138.18
4.32
79.34
141.18
4.33
72.88
135.33
4.09
82.99
140.53
4.07
86.69
144.16
4.11
79.42
138.04
B2: Coastal Britain
6.59
61.82
138.73
6.90
68.32
141.57
8.04
62.20
136.55
B2a: Coastal Resorts
8.27
84.09
160.15
8.84
88.34
163.81
6.75
81.14
156.82
B2b: Aged Coastal Extremities
5.20
59.96
134.37
5.28
66.22
138.10
6.35
61.51
133.97
B2c: Aged Coastal Resorts
10.62
86.84
144.07
11.51
90.82
148.73
9.81
83.24
140.77
B: Rural UK Britain
Fringe
geville
B1: Rural
B1c: Rural
B3: Avera-
Total turnover rate
Male turnover rate
Male churn rate
Female net migration rate
Female turnover rate
Female churn rate
-0.25
59.77
124.28
-0.76
63.55
126.88
-0.16
59.32
123.59
B3a: Mixed Urban
-0.45
61.85
121.33
-1.17
66.34
124.21
-0.39
62.39
121.30
B3b: Typical Towns
0.09
68.87
135.17
-0.06
70.39
138.22
0.23
65.46
133.20
continued on the following page
167
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Table 1. continued District Classification (Family, Group, Class) Scilly
B4: Isles of
Total net migration rate
Total turnover rate
Total churn rate
Male net migration rate
Male turnover rate
Male churn rate
Female net migration rate
Female turnover rate
Female churn rate
18.79
134.83
200.84
19.74
138.16
202.07
15.86
131.53
199.63
B4a: Isles of Scilly
18.79
134.83
200.84
19.74
138.16
202.07
15.86
131.53
199.63
C: Prosperous Britain
-0.52
66.56
138.67
-0.22
68.83
141.75
-0.81
64.38
135.70
C1: Prosperous Urbanites
0.58
101.67
169.70
1.10
105.60
174.08
0.09
98.90
165.51
C1a: Historic Cities
3.48
105.79
175.09
3.89
109.75
179.22
3.10
102.00
171.14
C1b: Thriving Outer London
-2.41
106.56
168.72
-1.77
110.99
173.62
-3.03
102.30
164.01
C2: Commuter Belt
-1.07
76.14
136.78
-0.87
78.94
139.92
-1.26
73.45
133.75
C2a: The Commuter Belt
-1.07
76.14
136.78
-0.87
78.94
139.92
-1.26
73.45
133.75
D: Urban London
-8.53
63.23
151.26
-8.17
65.15
153.93
-8.86
61.42
148.74
D1: Multicultural Outer London
-8.01
92.18
149.87
-8.51
94.50
152.13
-8.49
89.98
148.75
D1a: Multicultural Outer London
-8.01
92.18
149.87
-8.51
94.50
152.13
-8.49
89.98
148.75
D2: Mercantile Inner London
-10.21
148.24
225.15
-11.12
151.25
229.36
-9.36
143.54
221.25
D2a: Central London
-10.29
148.35
225.14
-11.31
151.40
229.42
-9.34
143.62
221.20
D2b: City of London
2.79
279.91
301.35
16.95
276.14
294.65
-13.45
284.22
309.03
D3: Cosmopolitan Inner London
-8.22
111.47
178.96
-8.31
114.94
182.16
-9.08
108.17
173.97
D3a: AfroCaribbean Ethnic Bor.
-6.07
128.47
190.17
-4.74
131.44
194.68
-8.33
123.74
185.93
D3b: Multicultural Inner London
-11.81
118.44
174.39
-11.23
119.49
173.90
-12.04
113.91
169.74
Sources: 2001 Census SMS and Standard Tables
picture is presented. The size of the underlying PAR for Rural UK in comparison with the other district families means that, despite the very large volume of in-migrants and the correspondingly high in-migration rate, the populations of areas within Rural UK are relatively stable compared to other families. Looking at the total population,
168
male and female turnover and churn statistics for the four families and comparing them to both the averages for all families, groups and classes and each other, it is clear to see that Rural UK has much greater stability, scoring a rate in the low 40s for turnover (the second lowest of the four families) and around 120 for churn (the lowest).
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
On the other hand, it appears that Urban London retains much of the importance shown in the net migration statistics. The turnover rates for males, females and overall are the second highest of all families (although the Inner London groups D2 and D3 are the highest of all groups) and churn rates are the highest across the board, for families, groups and classes (ignoring Isles of Scilly, which as mentioned before tends to be an outlier due to small gross population flows). The high level of churn suggests that movement of population within districts in Urban London is more significant than it is for other family categories. Prosperous Britain is interesting because, whilst it has a low net migration rate (especially for males), the rates of population turnover in total and for both sexes, are higher than for any other family. This means that despite far fewer people moving in and out of districts in Prosperous Britain than in Rural UK and Urban London, the movement is more perturbing than it is for all other families. At a more aggregate level, the Urban UK family is perhaps least interesting as it has the lowest rates for turnover and churn (as it also does for net migration). Drilling down through the classification, however, reveals that for some groups and classes (e.g. Regional Centres and
Young and Vibrant Cities), turnover and churn rates are relatively higher. Comparing all families, groups and classes, it is evident that males, on the whole, have higher rates of population turnover and churn than females, but lower rates of net in and outmigration. This may seem counter-intuitive and needs explanation. Net migration will only indicate the balance of movement in relation to the population; we can see if an area is gaining or losing population, and the relative level of this gain or loss. Rates of turnover and churn take into account total population movements and will not give an indication of the balance of movement but will give a standardised measure of the amount of movement in relation to the PAR. Increased rates of turnover and churn mean more people moving in total in relation to the underlying population, whereas higher levels of net migration just show that there are more people moving in a particular direction. Table 1 shows that when females move in or out of family, group or class categories, the balance of movement leans more heavily to either inmigration or out-migration (except for in Urban UK); in other words the direction of flow is more asymmetrical. Total turnover and churn rates for
Figure 9. Age-specific population turnover rates, Vickers et al. groups, 2000-01. Sources: 2001 Census SMS and Standard Tables
169
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Figure 10. Age-specific population churn, Vickers et al. groups, 2000-01. Sources: 2001 Census SMS and Standard Tables
females in most cases are low when compared to males – there are less females moving in relation to the underlying PAR. The net rate of male movement is lower for almost every district type, i.e. these movements are more balanced in either direction than those of females. In other words, whilst there are lower total rates of movement, the female population is being redistributed around the country through net migration at a rate that is faster than for males. A possible explanation for this may well the greater number of females than males in the student population. One important further question is how do turnover and churn statistics vary with age in the context of the district classification already used in this analysis? Figures 9 and 10 help to answer this question and reveal some important features of turnover and churn in relation to different group. We should note here that we just use total person data rather than sex disaggregation as there is little difference when the data are also disaggregated by group, as they are here. The first point to note on turnover (Figure 9) is that the age schedule is very similar in structure to the schedules calculated for standard rates, with identifiable pre-labour force, labour force and post-labour force components. As with standard rates, the peak in all cases is
170
at the 20-24 age group. The height of the peak, however, varies considerably between groups, with D2 (Mercantile Inner London) being the highest, reaching a turnover rate of 400 per 1,000 population, compared to A1 (Industrial Legacy) only reaching a rate of around 125 per 1000 people. It is noticeable that A1 and A2 (Established Urban Centres) are close together at the bottom of the schedule at this peak age, suggesting that, even at this most mobile age, populations in these two Urban UK groups are relatively stable, choosing to move between areas much less frequently than similar aged populations residing in different groups; further evidence of the its the schedule, the population of Mercantile Inner London is unstable when compared to all other areas. A slightly different pattern emerges for churn (Figure 10). When intra-group moves are taken into consideration, suddenly at the peak 20-24 age, group A3 (Young and Vibrant Cities) becomes the most important. The top two groups in the turnover schedule at this peak age, D2 and C1, still maintain high rates, only just below that of A3. Group A1 still remains lowest at this peak age. What this suggests is that, at this peak age group, movement between areas within the Young and Vibrant Cities group is more important than movement into and out of
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
it. The least stable populations are in Mercantile Inner London and the Prosperous Urbanites as they feature highly on both metrics, and continue to do so for most age groups. The most stable populations for all age groups are found in Industrial Legacy. Evidence from Figures 9 and 10 enables us to conclude that age has a similar influence on population stability as it does on standard net migration. One question arising from this observation is to what extent are the differences in turnover and churn rates between classifications of district attributable to the ages of migrants moving into, out of and within these areas? In order to answer this we standardised rates of turnover and churn by the ages ranges provided in the original data, and also by the broad 15-year age ranges used earlier, using a method of ‘direct standardisation’ proposed by Rowland (2006). Taking the standardised turnover rate for area i, STOi, as the example: STOi
å (to P ) = åP a
a i
a *
a a *
(3)
where the turnover rate for age group a in zone i is æO a + D a ö÷ ç to = çç i a i ÷÷1000 ÷÷ çè Pi ø a i
(4)
where Oia = åMi aj = thetotal outflows inage groupa from zone ito other zones
Calculating the age standardised rate of churn follows the same form but we add the term Wia to represent within zone moves so that the churn rate for age group a in zone i is:
STCH i
å (ch P ) = åP a i
a
a *
a a *
(8)
where the churn rate for age group a in zone i is: æO a + D a +W a ö÷ ç i i ÷ chia = çç i ÷÷ 1000 a çè ÷ø Pi
(9)
This method of age standardisation produces figures shown in Figure 11. It appears that standardisation has little impact on rates of turnover at the family level. Where age does have an influence is on the churn statistics. Figure 11 shows that while Urban London ranks the highest for nonstandardised churn rates, when age standardisation is applied Prosperous Britain becomes more important. This suggests that much of the increased churning of the population in Urban London is down to the increased number of young migrants and their increased propensity to move within the London area. Similarly the increasing importance of Rural UK relative to Urban UK is likely to be down to the older age structure in Rural UK leading to a lower churning of the population than it would experience with a standard age structure.
j
(5) Dia = åMjia = thetotal inflows inage groupa to zone j from other zones j
(6) and P*a = åPia = the population inage groupa inall zones in the system i
(7)
CONCLUSION This chapter has reviewed in detail the patterns of internal migration by age and sex within Britain from the 2001 Census. We began by reviewing the influence of age and sex on the propensity to migrate and were able to see that whist age has a more dramatic influence than sex, when the two are combined females have significantly greater
171
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Figure 11. Table of Age standardised rates of turnover and churn for Vickers et al. district families, 2000-01. Sources: 2001 Census SMS and Standard Tables
migration propensities than males at the age of peak migration. We have observed that when migrants are divided into broad groups, this has a dramatic effect on the direction of migration. Migrants aged 16-29 have an overall balance of movement which is, on the whole, the opposite of all other age groups. These migrants are on balance moving quite heavily into larger urban areas, especially London and the university cities up and own the country, and moving away from more rural areas in contrast to all other age groups, where the net direction of movement is, in most cases the reverse. In this analysis we have also introduced the concept of population stability and taken into consideration the importance of intra-area flows. We have shown that at the district level, intra-area flows are more important than inter-area flows and that this importance also differs by sex, with females migrating more frequently than males in their late teens and far more of these moves being shorter distance, within-area moves – a difference that is maintained at the age where individuals are most likely to be migrants, 20-24. Through examining these within-area flows along with total volumes of movement we have begun to gain a greater understanding of population stability. Indices of turnover and churn have shown that some areas (such as those found in Prosperous Britain), whilst exhibiting relatively low net rates of movement, actually have large numbers of migrants moving in, out and within – movements
172
which impact on the stability of the populations in these areas. Furthermore, it has been shown that turnover and churn vary by sex, such that we can conclude that whilst, overall, males might move more than females, female movements result in a greater redistribution of this population around the country. The findings from this chapter have been based on data from the 2001 Census. It was pointed out earlier that other data sets exist which would enable research in relation to the flows of people post-2001. NHS patient registration data will allow us to examine how far these patterns identified in 2001 have changed or been sustained. One way in which we may be able to monitor migration change effectively is to use an area classification similar to that used in this analysis. However, we believe that a productive way forward would be develop an alternative migration-based area classification. Using a classification excluding migration variables (such as the Vickers et al. or ONS local authority district classification) is not necessarily optimal for monitoring trends. Migration may or may not be indicative of the underlying population in the origin or destination. As such, we contend that it may be more useful to re-classify areas according to the attributes of migrants moving in, out or within particular areas. Once a new classification based on census migration variables has been created, it would be appropriate to use the classification with data
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
from the NHS patient registers and other sources to monitor migration patterns year-on-year.
REfERENCES
Farr, M., & Webber, R. (2001). MOSAIC: From and area classification system to individual classification. Journal of Targeting. Measurement and Analysis for Marketing, 10(1), 55–65. doi:10.1057/palgrave.jt.5740033
Bates, J., & Bracken, I. (1982). Estimation of migration profiles in England and Wales. Environment & Planning A, 14(7), 889–900. doi:10.1068/ a140889
Fielding, A. J. (1992). Migration and social mobility: South East England as an escalator region. Regional Studies, 26(1), 1–15. doi:10.1080/0034 3409212331346741
Cadwallader, M. (1989). A synthesis of macro and micro approaches to explaining migration: evidence from inter-state migration in the United States. Geografiska Annaler B, 2, 85–94. doi:10.2307/490517
Finney, N., & Simpson, L. (2008). Internal migration and ethnic groups, evidence for Britain from the 2001 census. Population Space and Place, 14, 63–83. doi:10.1002/psp.481
Champion, A. G. (Ed.). (1989). Counterurbanisation: The Changing Pace and Nature of Population Deconcentration. London: Edward Arnold. Champion, A. G. (2005). Population movement within the UK. In Chappell, R. (Ed.), Focus on People and Migration (pp. 92–114). Basingstoke, UK: Palgrave Macmillan. Champion, A. G., & Coombes, M. (2007). Using the 2001 census to study human capital movements affecting Britain’s larger cities: insights and issues. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 170(2), 1–20. doi:10.1111/j.1467-985X.2006.00459.x Champion, A. G., Coombes, M., Raybould, S., & Wymer, C. (2007). Migration and Socio-economic Change: A 2001 Census Analysis of Britain’s Larger Cities. York, UK: Joseph Rowntree Foundation. Faggian, A., McCann, P., & Sheppard, S. (2007). Some evidence that women are more mobile than men: gender differences in UK graduate migration behaviour. Journal of Regional Science, 47(3), 517–539. doi:10.1111/j.14679787.2007.00518.x
Gober-Meyers, P. (1978). Migration analysis: the role of geographical scale. The Annals of Regional Science, 12(3), 52–61. doi:10.1007/ BF01286122 ONS. (2006). Social Trends 36. Basingstoke: Social Trends. Owen, D. (1997). Migration by minority ethnic groups within Great Britain in the early 1990s. In 28th annual conference of the British and Irish section of the Regional Science Association International. Falmouth College of Arts. Raymer, J., Abel, G., & Smith, P. W. F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 170(4), 891–908. doi:10.1111/j.1467-985X.2007.00490.x Raymer, J., Bonaguidi, A., & Valentini, A. (2006). Describing and projecting the age and spatial structures of interregional migration in Italy. Population Space and Place, 12(5), 371–388. doi:10.1002/psp.414 Rogers, A. (1989). Requiem for the net migrant. Population Program Working Paper No. WP-895, Institute of Behavioural Science, University of Colorado, Boulder.
173
Internal Migration Patterns by Age and Sex at the Start of the 21st Century
Rogers, A., & Castro, L. J. (1981). Model migration schedules. Research Report-81-30. International Institute for Applied Systems Analysis, Laxenburg. Rogers, A., Raymer, J., & Willekens, F. (2002). Capturing the age and spatial structures of migration. Environment & Planning A, 34(2), 341–359. doi:10.1068/a33226 Rotolo, T., & Tittle, C. R. (2006). Population size, change, and crime in U.S. cities. Journal of Quantitative Criminology, 22, 341–368. doi:10.1007/ s10940-006-9015-x Rowland, D. T. (2006). Demographic Methods and Concepts. Oxford, UK: Oxford University Press. Stillwell, J. (2008). Inter-regional migration modelling: a review and assessment. In Poot, J., Waldorf, B., & Van Wissen, L. (Eds.), Migration and Human Capital: Regional and Global Perspectives. Cheltenham, UK: Edward Elgar. Stillwell, J., Bell, M., Blake, M., Duke-Williams, O., & Rees, P. (2000). Net migration and migration effectiveness: a comparison between Australia and the United Kingdom, 1976-96. Part 1: total migration patterns. Journal of Population Research, 17(1), 17–38. doi:10.1007/BF03029446
174
Stillwell, J. C. H., Bell, M., Blake, M., DukeWilliams, O., & Rees, P. (2001). Net migration and migration effectiveness: a comparison between Australia and the United Kingdom, 1976-96. Part 2: age related migration patterns. Journal of Population Research, 18(1), 19–39. doi:10.1007/ BF03031953 Tobler, W. R. (1970). A computer model simulation of urban growth in the Detroit region. Economic Geography, 46(2), 234–240. doi:10.2307/143141 Vickers, D., & Rees, P. (2006). Introducing the area classification of output areas. Population Trends, 125, 15–29. Vickers, D., & Rees, P. (2007). Creating the UK National Statistics 2001 output area classification. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 170(2), 379–403. doi:10.1111/j.1467-985X.2007.00466.x Vickers, D., Rees, P., & Birkin, M. (2003). A new classification of UK local authorities using 2001 Census key statistics. Working Paper 03/03, School of Geography, University of Leeds, Leeds.
175
Chapter 9
Internal Migration Propensities and Patterns of London’s Ethnic Groups John Stillwell University of Leeds, UK
ABSTRACT The ethnic dimension of internal migration in Britain below the district scale has been understudied despite its importance for understanding local and community development. Data from the 2001 Census shows that migration propensities by ethnic group and age for London migrants differ considerably from national migration propensities, especially when migrants within London are distinguished from those arriving in or leaving the capital. Whilst disaggregating ward net migration on this basis reveals processes of deconcentration within London, dispersal from outer wards to the rest of the country and net in-migration to inner wards from outside London and from overseas, patterns of net movement vary by ethnicity and age, influenced by the geographical pattern of ethnic population residential location. Evidence from an analysis of net migration, population concentration and deprivation by quintile group suggests that migrants from most of the non-white ethnic groups are tending to move within London to areas containing lower proportions of those in the same ethnic group. White migrants, on the other hand, are moving towards areas with higher white population concentrations. Finally, there is a tendency for all ethnic groups to move away from more deprived wards towards less deprived areas within London, particularly Indians aged over 25.
INTRODUCTION London has by far the largest concentration of ethnic minorities amongst its population compared with anywhere else in Britain, providing the most suitDOI: 10.4018/978-1-61520-755-8.ch009
able region within which to investigate the internal migration of ethnic groups at ward level. Whilst much research attention has been paid to ethnic population concentration in Britain (e.g. Phillips, 1998; Scott et al, 2001; Johnson et al., 2001) and in London in particular (e.g. Peach, 1996; Peloe & Rees, 1999; Johnson et al., 2002), studies of
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Internal Migration Propensities and Patterns of London’s Ethnic Groups
ethnic internal movement remain relatively few and far between. Champion (2005) has mapped white and non-white net migration at district scale, Finney & Simpson (2008) have used the 2001 Individual Samples of Anonymised Records (SAR) to identify characteristisics of ethnic group migrants, and the spatial patterns of migration have been investigated at district level using different district classification frameworks by Stillwell & Hussain (2008) and Hussain & Stillwell (2008). The DMAG Briefing by Mackintosh (2005) is one of the few studies examining internal migration by ethnic group in London. The aim of this chapter is to provide some insights into our understanding on the propensities, patterns and processes of ethnic migration in the capital city and between London wards and the rest of England and Wales using existing data from the Special Migration Statistics (SMS) but also specially commissioned tables provided by the Office of National Statistics (ONS) that crossclassify migration by ethnic group and by age. The analyses of these data allow the following research questions or objectives to be addressed: i.
ii.
176
How do migration propensities vary between those ethnic groups who were resident in London wards at the time of the 2001 Census and how does London compare with Britain as a whole in this respect? We know that, nationally, Asian groups have the lowest propensities to migrate and Chinese have the highest, particularly between districts (Finney and Simpson, 2008; Hussain and Stillwell, 2008), but our aim here is to consider how propensities vary between ethnic groups for moves taking place within London and how intensities of migration into and out of London for non-white groups compare with rates of migration for the white group. What spatial patterns of ethnic migration are evident in London at the ward level when we use net migration as a summary variable
iii.
and does the geographical variation tell us anything about processes of ethnic concentration or dispersal? Whilst the commissioned tables introduced in the next section do not provide data on ward to ward flows in London, they do allow the computation of ethnic net migration balances for wards and the analysis reported later shows how geographical patterns of net migration across the city for flows within London are very different to those patterns for flows between wards in London and the rest of England and Wales. Is there any evidence in London of ethnic groups moving away from or towards areas of ethnic concentration and from areas of higher deprivation to areas of lower deprivation? One of the key debates in the literature, particularly in the USA (e.g. Frey, 1996; Ellis & Wright, 1998), has been that relating to the concept of ‘white flight’, the notion that white people are leaving areas with increasing non-white populations with the inevitable consequence of greater ethnic polarisation. Trends such as these, the processes that underpin them, and the role of internal migration in the dynamics of neighbourhood change have not been explored very thoroughly in Britain or elsewhere in Europe for that matter. These ideas provide some context for an analysis of ward net migration rates and location quotients by quintile. The chapter investigates whether there is any evidence to indicate that ethnic groups are concentrating in areas of where members of the same ethnic group reside in London and examines net migration rates in relation to deprivation using the Townsend index for 2001.
Before reporting on these analyses, the following two sections introduce the data being used and present a short overview of the ethnic characteristics of London’s population in 2001.
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Table 1. Ethnic groups defined in the 2001 census Ethnic group defined in Special Migration Statistics (Level 1)
Ethnic group defined in Key Statistics
White
White British; White Irish; Other white
Indian
Indian
Pakistani and Other South Asian
Pakistani; Bangladeshi; Other Asian
Chinese
Chinese
Caribbean, African, Black British and Black Other
Caribbean; African; Other Black
Mixed
White and Black Caribbean; White and Black African; White and Asian; Other mixed
Other
Other
MIGRATION DATA BY ETHNICITY AND AGE GROUP AT WARD SCALE Data available for migration flows at the ward level (level 2) from the 2001 Census Special Migration Statistics (SMS) are relatively limited. Five tables provide ward-to-ward flows as follows: age by sex (MG201); moving groups (MG202); ethnic group by sex (MG203); moving groups by NS-SEC of group reference person (MG204); and moving groups by tenure (MG205). Only two of these tables (MG201 and MG203) contain counts of individual migrants since the other three relate to moving groups. Table MG203 is the ethnic table but only contains counts classified as total, white and non-white by sex. There is a more detailed breakdown of non-white ethnicity at level 1 (districts), where seven ethnic groups are classified in Table MG103, aggregations of the 16 groups used in the main census tables, with the white group containing all other white migrants as well as white British and white Irish (Table 1). Thus, whilst SMS Table MG201 provides data by age group, there is no cross-classification of migrants by ethnic group and age group simultaneously at district level and no ethnic breakdown beyond white/non-white at ward level. Following a request to ONS for data on flows between wards disaggregated by age and ethnicity, ONS advised that due to confidentiality constraints and the need to impose disclosure control
procedures using small cell adjustment methods (SCAM) on all commisioned data, any matrix of ward-to-ward migration flows disaggregated by ethnic group and quinary age group would be rendered unuseable because of the number of cells with small values. However, ONS Customer Services agreed to supply migration data in commissioned tables at different spatial scales by ethnic group for broad age groups. Commissioned Table CO711 contains counts of migrants between districts of England and Wales for seven ethnic groups and seven age groups which have been adjusted for SCAM and have been used for analysis reported elswhere (Hussain & Stillwell, 2008). Commissioned Table CO723, on the other hand was provided in two parts. Part 1 contains ward to region flows and Part 2 contains region to ward flows. More specifically, Part 1 contains migration flows during the year before the 2001 Census into all wards in England and Wales at census date from all Government Office Regions (GOR) in England and Wales, Scotland and Northern Ireland one year before the Census, by ethnic group and broad age group. Table 2 is an example of the table structure for flows into wards in the North East Region from other GORs. The migration flow data are provided for all ages but also disaggregated into seven age groups (0-15; 16-19; 20-24; 30-44; 45-59; 60+) for each of seven ethnic groups indicated in Table 1. Figure 1 shows the table layout for white migrants by age group
177
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 1. Example section of commissioned table C0723 (Part 1): migration from regions to wards in the north east
and the first age group for Indians (columns). The initial set of rows within the table contain cells for flows for migrants into the North East from other regions and subsequent flows are for individual wards, Brinkburn being the first of these. Part 2 of Table C0723 contains migration flows in the opposite direction from wards one year previous to the census in England and Wales into GORs in England and Wales at census date, disaggregated by the same classification of 14 ethnic/ age groups. The age bands reflect stages in the life course: children aged 0-15 who tend to migrate with their parents; teenagers aged 16-19 whose age range captures the movement away from home of those into their first independent living arrangement, including those moving to higher education; young adults aged 20-24 including those likely to be moving on from university into work as well as those moving between jobs or leaving the parental home for the first time; those in their late 20s also likely to be driven by economic forces or the desire to get onto the housing ladder; those aged 30-44 more likely to be moving to residential space more suitable for families; more mature migrants of working age (45-59) who may be looking to downsize their homes after their children have moved away; and the final 60 and over age group
178
which contains a mixture of migrants including those moving for retirement reasons as well as those in elderly age groups seeking to be nearer service facilities or family. In addition to flows from regions to wards and wards to regions, the commissioned tables contain counts of migrants from outside the UK, numbers of people with the same address at the Census and one year previously and counts of those with no usual address one year previously. It should be noted that while these categories appear as rows in this table and appear to be cross-tabulated with age and ethnicity, data values only appear where these other origin categories cross-tabulate with the same categories on the other axis. As a result, origins outside the UK, same address and no usual address can only be cross-tabulated with ethnicity, not by age as well. One error has been identified by ONS in the coding of some migrants and this has resulted in 3,191 migrants across England and Wales with country of origin within the UK being counted as migrants from outside the UK. This error inflates the count of migrants from outside the UK by less than 1% and underestimates the number of migrants moving from an address in the UK by less than 0.1%. The data from this commissioned table are used in this paper to
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Table 2. Region to ward and ward to region flows within london from C0723. Source: 2001 Census Standard Tables Ethnic group
Inflow to London wards
Outflow from London wards
White
444,001
444,000
1
Indian
30,551
30,595
-44
POSA
40,569
40,600
-31
8,523
8,443
80
Black
79,858
79,759
99
Mixed
24,572
24,526
46
Other
13,767
13,672
95
Total
641,841
641,595
246
0-15
110,250
110,306
-56
16-19
26,884
26,895
-11
Chinese
Difference
Age group
20-24
104,043
103,959
84
25-29
143,046
143,005
41
30-44
186,224
186,195
29
45-59
41,664
41,587
77
60+
29,730
29,648
82
Total
641,841
641,595
246
Source: 2001 Census Table C0723
examine migration for 628 wards comprising the 33 boroughs of London. The age and ethnic group structure of C0723 was derived through a process of negotiation with ONS. Since the cells in each part have to be randomly adjusted using the SCAM to avoid the release of confidential data, it was important to produce tables containing flows where the impact of SCAM was minimised. The impact of SCAM means that the flows into and out of wards within any one region will not necessarily be consistent. Table 2 contains a summary of the total flows into wards in London from the London region itself and flows and out wards in London to the London region for each ethnic group and each age group. It is encouraging that SCAM results in small differences in the flows from the two parts of the table, amounting to a difference of 246 in total, a very small percentage of the overall volume of migration taking place.
The precise magnitude of migration taking place within London derived from the commissioned data compares with a total of 644,904 migrants moving within and between wards extracted from SMS Table MG203, which is composed of 444,755 whites and 200,149 non-whites. Thus the commissioned tables provide counts for total, white and non-white migrants that are slightly lower in each case than totals from the SMS, a difference likely to be due to SCAM. In both cases, flows with no usual residence one year ago and with origins outside the UK one year ago have been excluded. Before looking at these data in more detail, the next section presents a short outline of the complexion and geographical distribution of London’s ethnic populations.
179
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Table 3. London’s ethnic population, 2001 Ethnic group
London population
% in London
% in GB
London share of GB population
Index of segregation GB
Index of segregation London
White
5,103,203
71.2
91.9
9.7
0.53
0.36
Indian
436,993
6.1
1.8
41.5
0.57
0.40
POSA
429,700
6.0
2.2
33.6
0.56
0.45
Chinese
80,201
1.1
0.4
33.0
0.32
0.31
Black
782,849
10.9
2.0
68.2
0.65
0.32
Mixed
226,111
3.2
1.2
33.6
0.34
0.21
Other
113,034
1.6
0.4
49.3
0.44
0.32
Total
7,172,091
100.0
100.0
12.6
-
-
Source: 2001 Census Standard Tables
LONDON’S ETHNIC POPULATIONS London’s total population had reached almost 7.2 million in April 2001 according to the 2001 Census, of which just over 2 million were nonwhite residents, representing 29% of the capital’s population. This compares with an 8% non-white share of the population resident in Great Britain in 2001. As Table 3 shows, whereas London contains less than 10% of the country’s white population, it is the location of over two thirds of the black population, almost 50% of the non-white other population, 42% of the Indian population and around a third of the Pakistani and other South Asian (POSA), Chinese and mixed populations. The largest of the non-white populations is the
black group with over 782,000 residents although the Asian groups collectively contain over 850,000 people. It is the relative size of the black population amongst all non-white groups that is the major difference between the population composition of London and Britain as a whole (Figure 2). Table 3 also provides indices of segregation for ethnic groups in London compared with Britain. The segregation index is computed as 0.5 ∑|Pie/P*e - Pir/P*r| at district (borough in London) level where Pie/P*e is the proportion of the population in ethnic group e that lives in district i and Pir/P*r if the proportion of the rest of the population that lives in district i. Given the larger share of the non-white populations living in London, it is not surprising that the indices are significantly lower
Figure 2. Ethnic group shares of the non-white population, Britain and London, 2001. Source: 2001 Census Standard Tables
180
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 3. Ethnic composition of boroughs of London, 2001. Source: 2001 Census Standard Tables
for London, except for the Chinese, the smallest of all the non-white populations. The indices indicate that whilst the black population is most segregated nationally, blacks in London are less spatially segregated than either of the Asian groups at the borough scale. The spatial distribution of ethnic groups by borough is shown in Figure 3, where the white population share is shown by ranged choropleth shading and the non-white population shares are represented by pie charts, proportional in size to the resident population in 2001. Newham and Brent have the largest non-white populations, with large concentrations of blacks in Hackney and Haringey as well as south of the river in Lambeth, Southwark, Lewisham, Wandsworth and Greenwich. Indians make up high proportions of the population in west London boroughs of Brent, Ealing, Hillingdon, Hounslow and Harrow whereas the POSA group is prominent in Tower Hamlets and Newham. The concentrations of different ethnic groups at the ward level can be represented using location quotients defined as (Pie/Pi)/(Pe/P**), where the share of population in ward i in ethnic group e is standardised by the share of that ethnic group in the population of London as a whole. Figure
4 illustrates the location quotients for whites, whereas Figure 5 shows the distribution for each of the six non-white groups. The shading intervals used in these maps divide the wards into quintiles based on the value of the location quotients (to two decimal places) in each case so there are roughly equal numbers of wards in each of the five categories. As the values increase above unity, this reflects increasing concentration whereas values decreasing below unity indicate greater under-representation. Figure 4 shows how whites are over-represented in the outer wards whereas each of the non-white groups has a particular geography of over and under-representation (Figure 5). The areas of over-representation are more distinct and discrete for the black and Asian populations as indicated previously, whereas the Chinese pattern is much more haphazard, as are the patterns for the mixed and other groups, although over-representation is more confined to the inner suburbs for the mixed group and to western and northern wards for the other group. The average location quotients for each ethnic group by ward quintile are presented in Figure 6, summarizing how the extent of representation varies. On the right hand side, quintile means for the white group’s location quotients indicate Figure 4. White location quotients by ward, 2001. Source: 2001 Census Standard Tables
181
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 5. Location quotients for non-white ethnic groups by ward, 2001. Source: 2001 Census Standard Tables
relatively little variation around unity. On the left hand side of the graph, the mean location quotients by quintile for the Indian group indicate the most variation with significant concentration in the top quintile but decreasing under-representation in the other four quintiles. This pattern is repeated with the other groups although the mean location quotients in the first quintile are lower and all the other non-white groups have means that indicate over-representation in the second quintile as well at the first. As we might expect, the mean quintiles for the mixed group most closely approximate those of the white group. We will return to the use of these data later in the chapter, after looking at the variations in ethnic propensities to migrate and the spatial patterns of net migration at ward level across London.
182
ETHNIC MIGRATION PROPENSITIES As shown earlier, the commissioned data indicate that over 640,000 individuals migrated within London in the 12 months before the 2001 Census, of which the majority (69.2%) were white. Almost 198,000 migrants were non-white, of which nearly 80,000 or 40% were black. Asian migrants numbered over 70,000, 43% being Indian and the 57% being of POSA ethnicity. Amongst the remaining 46,600 migrants, over half were of mixed ethnicity, 13,600 were recorded as other non-white and the smallest group (8,400) were the Chinese. Counts of flows from regions outside London in the rest of England and Wales and flows from wards of London to these regions are presented in Table 4. Unlike Table 2, the net flows will not necessarily sum close to zero since London is losing migrants in net terms to the rest of the country. In fact, the figures indicate a net loss from London wards overall of almost 53,300, mainly based on losses of whites (44,800). The largest net loss (4,300) of non-white migrants from London is for blacks. The Asian and mixed groups also show net losses from London and the capital only gains in net terms migrants in the Chinese and other ethnic groups, but these balances are very small indeed. Figure 6. Mean location quotients by quintile, ethnic groups, 2001. Source: 2001 Census Standard Tables
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Table 4. Region to ward and ward to region flows between London and other regions, from C0723 Ethnic group
Other regions to wards in London
Wards in London to other regions
Difference
124,337
169,116
-44,779
White Indian
6,074
7,033
-959
POSA
4,093
5,642
-1,549
Chinese
2,227
2,071
156
Black
5,091
9,413
-4,322
Mixed
3,517
5,357
-1,840
Other
1,884
1,855
29
Total
147,223
200,487
-53,264
Age group 0-15
9,131
33,293
-24,162
16-19
11,721
13,860
-2,139
20-24
53,770
27,629
26,141
25-29
34,720
31,398
3,322
30-44
26,514
57,621
-31,107
45-59
6,905
19,432
-12,527
60+ Total
4,462
17,254
-12,792
147,223
200,487
-53,264
Source: 2001 Census Table C0723
The age profile of net migration also shown in Table 5 shows London losing in all ages except in the 20s, where the attraction of the capital for job opportunities is clearly important. Net migration balances derived from the commissioned table and from SMS Table MG103 for each ethnic group for London are compared in the top part of Table 5 and suggest a reasonable level of consistency. Each ethnic group displays the same sign and the difference overall between the net flow from the two sources is only 880 people, most of which is attributable to the difference in the white group. The bottom part of Table 5 contains net migration counts derived at the ward level from MG203 for London and indicates closer fit than with counts derived at district level. Overall, the difference between the commissioned and SMS data for net migration reduces to 69. These comparisons provide further evidence of the reliability of the commissioned data, given that the
flows in each of the tables have been adjusted independently by the SCAM. The aggregate net migration loss from London is dominated by whites, with the major difference between inflows and outflows occurring in the 0-15 and 30-44 age groups (Figure 7) as families move Figure 7. In-migration and out-migration by age group, whites, 2000-01. Source: 2001 Census Table C0723
183
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Table 5. Net balances for London derived from commissioned table and SMS table MG103 and MG203 compared Ethnic group
From commissioned table C0723
From SMS Table MG103
Absolute difference
White
-44,779
-43,918
861
Indian
-959
-885
74
POSA
-1,549
-1,525
24
156
353
197
Black
-4,322
-4,456
134
Mixed
Chinese
-1,840
-2,071
231
Other
29
118
89
Total
-53,264
-52,384
880
White
-44,779
-44,804
25
-8,485
-8,529
44
-53,264
-53,333
69
From SMS Table MG203 Non-white Total Sources: 2001 Census Tables CO723 and MG203
Figure 8. Non-white in-migration and outmigration by age group, 2000-01. Source: 2001 Census Table C0723
184
out, but also in the two older age groups where losses are also considerable. These net losses more than offset the net inflows of migrants in their twenties. Amongst the non-white ethnic groups (Figure 8), certain features stand out such as the generally larger volumes of outflow than inflow across all age groups for most ethnic groups, the significant outflows of mixed ethnicity migrants aged 0-15 and of black migrants aged 30-44, and the relatively high inflows of Indians and Chinese aged 20-24. Whilst absolute numbers provide information about the magnitude of gross and net migration taking place within London and between London and other regions, migration rates are computed using census date populations to enable standardised comparisons. Figure 9 presents the rates of migration based on SMS data by ethnic group for migrants within London and between London and the rest of Britain. Within London, the highest migration rates are experienced by the non-white other ethnic group, with mixed, Chinese and black all being above 10% of their respective populations in comparison. Although white migrants within London are by far the
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 9. Migration rates by ethnic group, 200001. Source: 2001 Census SMS
dominant group in volume terms, their rate of migration is lower than all the other groups apart from the Indians. However, white rates of migration away from London are the highest of all ethnic groups resulting in whites having the highest rates of net migration loss, although those of mixed ethnicity are not far behind. Chinese have the highest rates of net gain. The net migration rates shown in Figure 9 are computed as rates per 1,000 persons to highlight the differences between the ethnic groups. Commissioned data from Table CO711 at the district scale have been used to compute rates of migration by ethnic group and by age group in London (Figure 10a) that can be compared with those in Britain as a whole (Figure 10b). Rates in London tend to be lower because, although there are more migrants, the populations at risk are larger and the variation between rates for ethnic groups is less for the 16-29 age groups. However, one distinctive difference between the sets of schedules is that only the Chinese migration peaks at age 20-24 in London; the others either peak at age 25-29 or have similar rates in both these age groups. Indians in London have lower rates of migration than other South Asians in all
age groups apart from 60 and over, whereas for Britain as a whole, it is the POSA group that has the lowest migration rates in the late teenage and both age groups in the twenties. The migration rates schedules contrast significantly with those of migrants into and out of London, shown in Figure 11. Apart from the Chinese in-migrants, who experience relatively high rates of migration at age 16-19, white migration rates are the highest of all ethnic groups and across all ages for both in-migration and out-migration. Apart from the Chinese, all the in-migration schedules have significant peaks at age 20-24 as London attracts individuals because
Figure 10. Migration rates by age and ethnic group, London and GB, 2000-01. Source: 2001 Census Table C0723
185
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 11. Migration rates by age and ethnic group, to and from London, 2000-01. Source: 2001 Census Table C0723
of its employment opportunities. Out-migration rate scedules are much less peaked although out-migration rates for all age groups are still at their highest amongst the 20-24 year olds. Blacks appear to have the lowest rates of migration into London, whereas Indians have relatively high rates of in-migration and out-migration compared with their rates of migration within London. Whilst migration flows and rates indicate the magnitude and intensity of migration by different ethnic groups, an alternative method of comparison is to use the percentage shares of migration in each age group for each ethnic group. Figure 12 is a stacked histogram of migration within London which demonstrates how age structures vary between ethnic groups. The Indian age profile has some similarities with the white group
186
although it contains a higher proportion of 0-19 year olds and a lower proportion of 25-29 year olds. The mixed group, as expected, contains the highest proportion of children but the POSA, black and other non-white groups also contain higher proprtions of children than the white group. The Chinese migrant age structure, on the other hand, shows a lower proportion of children but a much higher proportion of 20-24 year olds that any of the other groups. The black and other non-white groups both have relativly high proportions aged 30-44. Simliar age structure graphs are shown in Figure 13 for inflows to and outflows from London. The main difference between the two is the higher proportion of migrants aged 20-24 who move into London compared with those moving in the opposite direction. In summary, we have seen in this section that the aggregate propensity to migrate associated with London’s population conceals a range of migration intensities that vary with age and by ethnic group and according to where the flows are taking place. Of course, further variations would become evident were we to consider individual boroughs or wards. In the next section we examine the geographical variation in net migration at ward level for the seven ethnic groups.
ETHNIC NET MIGRATION PATTERNS The analysis reported in this section is based the the separation of (a) migration flows between wards within London and (b) migration flows between wards in London and regions in the rest of England and Wales outside London. In each case, net migration flows for wards have been computed as summary measures of migration behaviour across London. Net migration patterns for whites are shown in Figure 14. Figures 14c and 14d are continuous surface representations of the proportional symbol maps (Figures 14a and 14b) and are included to facilitate visualisation. A comparison of the net migration maps provides
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 12. Age group shares of migration within London by ethnic group, 2000-01. Source: 2001 Census Table C0723
Figure 13. Age group shares of migration to and from London by ethnic group, 2000-01. Source: 2001 Census Table C0723
187
Internal Migration Propensities and Patterns of London’s Ethnic Groups
some important insights into the migration processes taking place in London. In general terms, the pattern of white net migration within London shows inner wards losing migrants to more outer wards with the largest absolute losses occurring from the most central boroughs. However, the patterns of net migration between wards and the rest of England and Wales is the reverse, with net migration gains in the central wards and net migration losses from Outer London. These maps show emphatically that, as white migrants are leaving inner London wards for destinations in the outer suburbs, those living in Outer London are moving beyond the city boundary altogether whilst inner London wards remain the destination of in-migrants from the rest of the country as well as white immigrants from overseas. Data from the commissioned table allows us to observe variations in surfaces of net migration for different ethnic groups. Figure 15 shows the distributions of net migration for non-white ethnic groups occurring within London. Each of
the maps shows a distinctive spatial pattern of net migration ‘hot spots’ (most gains in red) and ‘cold spots’ (most losses in blue). However, in general terms, and with the exception of the Chinese, there appears to be some replication of the white pattern with net losses concentrated in the centre and net gains in the outer wards. One of the more regular patterns of deconcentration is for the black group with higher net losses taking place from areas with high black population shares in Lambeth, Southwark and Tower Hamlets and high net gains in parts of Greenwich and Barking and Dagenham in the east and Enfield in the north. Indians appear to be moving from areas of concentration in the western boroughs of Ealing and Brent further westwards into Hillingdon and Harrow, for example, whereas the POSA group has concentrations of net migration gain in Redbridge and Newham. The pattern of net migration for the Chinese appears much more complex with more discontinuity between areas of loss and those of gain. The pattern for the mixed group is similar to
Figure 14. White net migration by ward, London, 2000-01. Source: 2001 Census Table C0723
188
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 15. Surface representations of net migration within London by ward, non-white ethnic groups, 2000-01. Source: 2001 Census Table C0723
that of whites whereas the other non-white group appears to have more extensive gains and losses in wards north of the river. The data permit the mapping of distributions of net migration by age and ethnicity but this creates 98 maps (seven ethnic groups x seven age groups for within and outside London), far too many to show in this paper. Consequently, we select four maps of white net migrants for illustration in Figure 16, comparable since each uses the same proportional symbol scale, and a further set of four maps of black net migrants in Figure 17. The first set of maps shows net migration balances for young adult white migrants aged 20-24 and for older white migrants aged 60 and over both within London and between wards and the rest of England and Wales. The pattern for young adults within London shows losses from the central west end wards north of the river and gains in the inner suburbs but there is a mixture of gains and losses in most of the outer areas. The pattern for older migrants shows more regularity
with gains in many of the outer wards and losses predominating in inner London. In stark contrast, the patterns of net migration between wards and the rest of the country reveal gains in the young adult age group across most of London, but particularly in the central and inner wards, whereas older white migrants are leaving wards across the whole of the city in what appears to be similar net magnitudes. In Figure 17, we show the same set of net migration balances for black migrants, the the largest of the non-white ethnic groups in 2001. Here the flows are much smaller so the scale of graduated symbol has been reduced from 200 to 50. The pattern of net migration within London for the 20-24 age group is quite chaotic with gains and losses occurring in a widespread manner throughout the city but appearing to be lower in volume on the peripheries. In the case of flows to and from ‘outside London’, the gains in the young black adult age group are much less evident in comparison with the losses in this age group
189
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 16. White net migration within and outside London by ward for ages 20-24 and 60 and over, 2000-01. Source: 2001 Census Table C0723
Figure 17. Black net migration within and outside London by ward for ages 20-24 and 60 and over, 2000-01. Source: 2001 Census Table C0723
190
Internal Migration Propensities and Patterns of London’s Ethnic Groups
although there are, once again, intricate patterns of negative and positive balances that do not appear to show much correlation with the areas in which the black population is concentrated. For those migrants aged 60 and over, the ‘within London’ net balances are lower but with losses from wards with higher concentrations of black inhabitants. Finally, the net exchange of older black migrants with the rest of England and Wales is very low. In summary, the patterns of net migration across London for shorter and longer distance migrants suggest a complex mosaic of gains and losses which need careful intrepretation to reveal the processes of migration behaviour that underpin the patterns. The next section reports on an analysis that attempts to ascertain whether the patterns of net migration by ethnic group show any relationship with population concentration and deprivation.
NET MIGRATION, POPULATION CONCENTRATION AND DEPRIVATION To complete the analyses reported in this paper, we have attempted to use the ward-based net migration data to establish whether there is any
evidence in London of ethnic groups moving away from areas of ethnic population concentration and from areas of higher deprivation to areas of lower deprivation? The first hypothesis that ethnic minorities are leaving areas of ethnic concentration follows from the debate as to whether non-white ethnic groups are dispersing from their localities or continuing to cluster in enclaves where their shares of the population are increasing. In this instance we have used data for all ages to compute ward net migration rates for quintile groups and adopted location quotients as a measure of over or underconcentration by expressing the ethnic percentage of a ward’s population as a ratio to that across the whole of London. The average net migration rates for each quintile based on location quotients are shown in the graph in Figure 17 for each of the non-white groups. Thus, for Indians, we observe a negative net migration rate for areas in the top quintile with a large over-representation of Indians in the population and a positive net migration balance for all other quintiles. The most positive gains are in the botom quintile where the location quotients are lowest. The same observation is made for the POSA group and also for the black group where the mean rate of net migration loss
Figure 18. Net migration rate by ethnic group and location quotient quintile, all groups, 2000-01. Source: 2001 Census Table C0723 and Standard Tables
191
Internal Migration Propensities and Patterns of London’s Ethnic Groups
from the most concentrated areas is substantial and there is a gradient of balances through the quintile range. The patterns for the other three groups are less ordered. The Chinese appear to be gaining migrants most in areas with lowest representation of Chinese and the same is true for the non-white other group. In contrast, migrants in the mixed ethnicity group tend to have negative net migration rates in all quintiles except the fourth. When we examine the quintile averages for whites the gradient of net migration rates by quintile group is reversed. Thus, white migrants within London are tending to migrate to wards that have a high location quotient representing an over-representation of whites and away from wards that have a low location quotient or an under-representation of whites. In order to test the second hypothesis, that ethnic groups migrate to areas of less deprivation, we examine net migration rates in relation to deprivation using Townsend scores for 2001. The Townsend index is one of the more mature measures of material deprivation, containing a combination of census variables including car ownership, home ownership, unemployment and overcrowding. High negative scores represent areas of lowest material deprivation whereas high Figure 19. Townsend index for London wards, 2001
192
positive scores represent areas of high material deprivation (Figure 19). The graph on the right hand side in Figure 19 indicates the average Townsend scores for each quintile with quintile 1 containing the least deprived wards with an avarage index of -2.82 and quintile 5 containing the most deprived with an average index of 6.92. In this case we undertake the analysis for all age net migration but also for migration by each age group. The graph of all age net migration (Figure 20a) once again show the numerical domination of total migration by whites and indicates how net migration gains are occurring in less deprived wards, with higher losses in areas with more deprivation. This is also manifest in the net migration rates (Figure 20b), but the graph shows that the pattern is not confined to whites, and there are significant differences in the balances between ethnic groups with blacks showing the highest rates of gain in less deprived areas at one extreme. It is evident that the spatial patterns of internal migration within London are associated with movement to better neighbourhoods for all ethnic groups. The same approach can be used to assess the relationship between net migration rates and deprivation for each of the ethnic groups by age
Internal Migration Propensities and Patterns of London’s Ethnic Groups
Figure 20. Net migration by ethnic group and deprivation quintile, all ages, 2000-01. Source: 2001 Census Table C0723
(Figure 21), demonstrating that the aggregate patterns conceal a wide range of variation. The children’s age group graph is dominated by significant rates of net loss and gain for the mixed ethnic group but with a evidence of a gradient for all ethnic groups from gains to losses as deprivation worsens. This gradient is matched by those in the 30-44 year age range, likely to include most of the parents of those aged 0-15. Here, however, it is the Indian group that demonstrates the highest mean net gains in the least deprived quintile, a feature which is also apparent for those aged 45-59 and 60+. As might perhaps be expected, it is the late teenage and young adult age groups where there is a tendency to differ from the all age pattern.
Figure 21. Net migration by ethnic group and deprivation quintile, by age group, 2000-01. Source: 2001 Census Table C0723
In the 16-19 age group, those of mixed ethnicity conform with the norm but the Asian groups – the Indians in particular – lose migrants from the least deprived areas and show a rate of gain in the most deprived areas. This is the same for the Chinese aged 20-24, although the overall net migration rate for persons in this age group in quintile 1 is negative. It is likely that the pattern of migration in these age groups is influenced by movement of students to higher education institutions and further research is required to investigate where those students who leave home in order to study move to within London.
CONCLUSION This chapter is based primarily on the analysis of commissioned data on migration by ethnic group and age supplied by the ONS. By using region to ward and ward to region tables, the impact of SCAM appears to be much less significant than
193
Internal Migration Propensities and Patterns of London’s Ethnic Groups
it would have been on a ward to ward scale and our aggregate comparisons of the commissioned data indicate a large degree of agreement with published flows. In concluding, we return to the three initial research questions. Firstly, the migration propensities of those resident in London in 2001 vary considerably depending on whether they moved within London or arrived from the rest of England and Wales. The rates of those in-migrating from the rest of the country show a much more pronounced peak at age 20-24 than those leaving London in the opposite direction, where rates are higher in the age groups over 30 years old. Whereas the white population has the highest propensity to migrate both in and out of London in most age groups – Chinese aged 16-19 being the exception – white migrants within London only have the highest rates in the 25-29 age group. Unlike the schedules for Britain as a whole, where the peak in migration propensity occurs at age 20-24, the peak rate of migration within London for all age groups apart from the Chinese is for those aged 25-29. Indians tend to have the lowest rates of migration within London, whereas Pakistanis and other South Asians have the lowest rates in the country as a whole. Secondly, London’s net migration loss of 53,000 migrants to the rest of the country contained a net loss of over 44,000 whites, but there were also net losses of all ethnic groups except Chinese and others. In fact the national picture of ethnic migration is dominated by white counterurbanisation whilst net migration losses and gains for non-white ethnic groups are more concentrated in metropolitan parts of the country (Stillwell and Hussain, 2008). There are significant differences in the spatial patterns of ethnic group migration across the wards within London. By decomposing the net migration for London wards into the balances of flows within London and the flows between London wards and the rest of England and Wales, we have been able to expose the patterns of suburban decentralisation from inner to outer wards within London, dispersal or counterurbanisation from outer London to the
194
rest of the England and Wales and centralisation or reurbanisation from the rest of the country to inner London for white migrants. Comparing net migration flows for wards within London by ethnic group, distinctive spatial hotspots of net in-migration and cold spots of net out-migration have been identified. Whilst there is evidence of outward movement from inner London wards in the case of the southern Asian and black groups, the distribution of net migration for the Chinese is much more haphazard. Finally, the chapter has reported some analysis at ward level within London that support hypotheses about dispersal and deprivation. The results show that migrants in each non-white ethnic group have a tendency to move away from areas where members of the same ethnic group are concentrated to areas of lower concentration whereas the reverse is true for white migrants within London. The findings support the hypothesis that non-white migration in London is serving to disperse rather than concentrate non-white ethnic populations. However, the reverse is evident for white migrants – the main losses are in those areas in which they are under-represented and the main gains are in areas where there are already higher concentrations of whites. The results also support the hypothesis that migrants in all ethnic groups are leaving more deprived areas for less deprived wards within the city, although there are differences from the norm for those migrants in the 16-19 and 20-24 age groups. Moreover, for those aged over 25, it appears that it is the Indians who are the most upwardly mobile as far as deprivation is concerned. Whilst the focus of the chapter has been on internal migration propensities and patterns, relatively little attempt has been made to discuss the characteristics of different ethnic groups that might explain their varing propensities, let alone the causal variables that underpin the spatial processes of movement inferred by the data. Neither does the analysis consider the propensities and patterns of international migration, particularly those of
Internal Migration Propensities and Patterns of London’s Ethnic Groups
immigration that occurred during the 12-month period before the 2001 Census, whose an impact on population change in London was considerable and whose relationship with internal migration remains to be investigated. Each of these issues belongs on the agenda for further research.
ACKNOWLEDGMENT The research reported in this paper has been supported by an ESRC small research grant (Ref: RES-163-25-0028) commissioned under the Understanding Population Trends and Processes (UPTAP) programme. The author is grateful to Terry Familio at ONS Customer Services for supplying the commissioned tables, to Serena Hussain for preparing the London migration data for analysis and mapping and to Paul Norman for providing the Townsend scores.
REfERENCES Champion, A. G. (2005). Population movement within the UK. In Chappell, R. (Ed.), Focus on People and Migration (pp. 92–114). Basingstoke, UK: Palgrave Macmillan. Ellis, M., & Wright, R. (1998). The balkanisation metaphor in the analysis of US immigration. Annals of the Association of American Geographers. Association of American Geographers, 88(4), 686–698. doi:10.1111/0004-5608.00118 Finney, N., & Simpson, L. (2008). Internal migration and ethnic groups: evidence for Britain from the 2001 Census. Population Space and Place, 14, 63–83. doi:10.1002/psp.481 Frey, W. (1996). Immigration, domestic migration and demographic balkanisation in America: new evidence for the 1990s. Population and Development Review, 22(4), 741–763. doi:10.2307/2137808
Hussain, S., & Stillwell, J. (2008). Internal migration of ethnic groups in England and Wales by age and district type. Working Paper 08/03, School of Geography, University of Leeds, Leeds. Johnson, R., Forrest, J., & Poulsen, M. (2002). Are there ethnic enclaves/ghettos in English cities? Urban Studies (Edinburgh, Scotland), 39(4), 591–618. doi:10.1080/00420980220119480 Johnson, R., Poulsen, M., & Forrest, J. (2001). The ethnic geography of EthniCities: the American model and residential concentration in London. Ethnicities, 2, 209–235. doi:10.1177/1468796802002002657 Mackintosh, M. (2005). 2001 Census: The migration patterns of London’s ethnic groups. DMAG Briefing Paper 2005/30. London: Greater London Council Data Management and Analysis Group, Greater London Council. Peach, C. (1996b). Does Britain have ghettoes? Transactions. Institute of British Geographers NS, 22, 216–235. doi:10.2307/622934 Peloe, A., & Rees, P. (1999). Estimating ethnic change in London, 1981-91, using a variety of census data. International Journal of Population Geography, 5, 179–194. doi:10.1002/ (SICI)1099-1220(199905/06)5:33.0.CO;2-P Phillips, D. (1998). Black minority ethnic concentration, segregation and dispersal in Britain. Urban Studies (Edinburgh, Scotland), 35(10), 1681–170. doi:10.1080/0042098984105 Scott, A., Pearce, D., & Goldblatt, P. (2001). The sizes and characteristics of the minority ethnic populations of Great Britain – latest estimates. Population Trends, 105, 6–15. Stillwell, J., & Hussain, S. (2008). Ethnic group migration within Britain during 2000-01: a district level analysis. Working Paper 08/2, School of Geography, University of Leeds, Leeds.
195
196
Chapter 10
Migration and SocioEconomic Polarisation within British City Regions Tony Champion Newcastle University, UK Mike Coombes Newcastle University, UK
ABSTRACT In recent years, census-based and other studies have documented a widening gap between better-off and more deprived residential areas in Britain. While much of this will have come about in situ, through increasing disparities in household wealth and incomes across the social scale, migration may also be contributing. The decennial population census is the only source that can provide robust statistical data on the social composition of residential movement between sub-regional and local areas. This chapter uses the 2001 Census Special Migration Statistics to examine whether migration is increasing the degree of socio-spatial polarisation within Britain’s larger city regions. Following an introduction to the study approach and the intricacies of the census data on migration, the results of data analysis are presented in three sections. The first looks at the social composition of the migration exchanges taking place between the 27 cities and the rest of their city regions, testing to see whether the cities’ migration balances are less favourable for people of higher occupational status. This identifies three types of city region, based on whether there is a positive, negative or no strong relationship between migration and socio-economic status. An example of each of these types of city region – London, Birmingham and Bristol respectively – is selected for a more detailed examination of the patterns of movement between their constituent residential zones. For these three cases, the second set of analyses compares the migration performance of each of the residential zones with its existing social status in order to see whether or not these within-city-region movements are reinforcing the existing socio-economic patterns. The third set of results seeks a better understanding of the dynamics of the migration through examining the residential movements between all pairings of the zones in each of the three city regions and identifying how consistently the balance of these migration exchanges favours the better-off of the two zones. DOI: 10.4018/978-1-61520-755-8.ch010
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Migration and Socio-Economic Polarisation within British City Regions
INTRODUCTION The context for the substantive research question addressed in this chapter is the debate about socio-spatial polarisation in British cities. In recent years census-based and other studies have documented a widening gap between better-off and more deprived residential areas in Britain (see, for instance, Gregory et al., 2000; Lupton, 2005; Dorling et al., 2007). While much of this will have come about in situ through increasing disparities in household wealth and incomes across the social scale (Barclay, 1995; Hills, 1995; Brewer et al., 2006), migration may also be contributing. Certainly, there has been extensive research on the socially selective nature of the suburbanisation process, dominated by middle-class white families (Champion, 2001; Champion and Fisher, 2004). More recently, the emergence of ‘low demand neighbourhoods’ has been attributed not just to the effects of social-housing allocation policies that direct problem families to so-called ‘sink estates’ but also to the more general process of residential sorting whereby people will tend to move to areas offering better schools, less crime and a generally higher quality of life if they can (Bramley et al., 2000; Palmer et al., 2006). This chapter reports the results of work which has examined the latest available evidence on the degree to which migration is reinforcing existing socio-spatial differences in cities. While the results are of theoretical and policy significance in their own right, for the purposes of this book the primary aim of what follows is to emphasise new and/or unusual elements in the approach used in this study. As outlined in the next section, the research task (and consequently the chapter structure) is broken down into three separate operational questions that are each addressed in a distinctive way. This provides the opportunity for demonstrating the strengths and shortcomings of the decennial population census, which is the only source of robust statistical data on the social composition of residential movement between sub-regional
and local areas. All the data are taken from the 2001 Census, where the way that the information on migration is presented differs in a number of important ways from that of previous censuses. This study also breaks new ground in adopting a broader than usual geographical scale for this form of urban analysis, covering the whole city region rather than just the main built-up area and examining its internal heterogeneity on the basis of zones that are larger than individual residential neighbourhoods.
STUDY APPROACH AND DATA Before moving to the three sections that present the results of the empirical analyses, here we provide more detail about the study approach and the census data used in the quest to improve our understanding of migration’s role in altering the socio-economic patterning of city regions. In the first place, it is important to stress that these analyses concentrate entirely on the residential movements that are internal to the cities as we have defined them (see below), excluding migration between each city and the rest of the UK as well as international migration. In terms of the three questions, the first concerns the migration exchanges between the continuously built-up core of each city and the rest of its city region. •
Is the balance of migration exchanges between the city’s core and the rest of its region less favourable for people of higher socio-economic status?
The other two are pitched towards the residential movement taking place between a more disaggregated set of zones than the simple core/ rest dichotomy. •
Do the within-region moves reinforce the existing socio-economic geography?
197
Migration and Socio-Economic Polarisation within British City Regions
Here the focus is on the differential performance of zones in their migration exchanges with all the other zones in their city region combined, looking at each socio-economic class separately and comparing this pattern with the between-zone variations in the importance of this class in the whole residential population. •
Does the net migration flow between each pairing of zones in their city region always move people towards the better-off of the two zones?
This represents an unpacking of the previous question, exploring how consistently this component of migration is leading to the stronger population growth of the city region’s more prosperous areas, thereby causing the less prosperous areas to slip back in relative terms. The analyses here use the cities and city regions of Great Britain as defined by Coombes (2002). We selected the city regions with the largest core populations, this being defined in terms of the 2001 Census population of their Primary Urban Areas (PUA, i.e. the continuously built-up areas of their main cities) and being implemented through a cut-off size of 195,000 residents. As described more fully in Champion et al. (2007), this procedure yielded the 27 cities shown in Figure 1. For the first question, then, we will be looking at the migration exchanges taking place within each city region between its PUA and the remainder. Three city regions act as case studies for examining the more local patterns of migration for the second and third questions. For reasons explained below, these were London, Birmingham and Bristol. As regards the ‘zonation’ of these three city regions, the default geography here is the ‘tract’ as used by Dorling et al. (2007) to track change over time in the patterning of poverty and wealth across Britain. These are areas of fairly consistent population size – between 30,000 and 50,000 in most cases – that represent a compromise in meeting a mix of criteria including socio-economic
198
homogeneity and alignment with local authority and other statistical reporting areas. This basis is used for the zones of the whole of Bristol’s city region and the PUA element of Birmingham’s city region. However, in order to achieve some consistency between the three differently-sized city regions in terms of the total number of zones, it was decided that, for the rest of Birmingham’s city region and the whole of London’s city region, the tracts were too small, leading us to opt for local authority areas, either singly or grouped. This produced the migration-zone geographies for the three city regions shown in Figure 2, with a total of 46 zones for London, 60 for Birmingham and 34 for Bristol. The migration data used in the study are the 2001 Census Special Migration Statistics (SMS). Figure 1. The 27 city regions
Migration and Socio-Economic Polarisation within British City Regions
Figure 2. Three case study city regions of London, Birmingham and Bristol and their zones
This was not a difficult choice to make, as they are the only source of data on the socio-economic composition of within-UK migration at the geographical scale of individual local authorities and their wards (Bulusu, 1992; Rees et al., 2002). In fact, the 2001 SMS are the first to include a table with the socio-economic classification of migrants. At the same time, users need to be aware of a number of limitations of this data. First, although the SMS data covers 100% of the migrants recorded by the 2001 Census, there are issues over underenumeration that ONS has address through imputation (ONS, 2005). The 2001 Census was the first that attempted to be a ‘One Number Census’, imputing people and households for addresses where an expected census form had not materialised. Also imputed where necessary was the address one year ago of people who ticked the box indicating that they had been living at a usual address one year ago that was different to their usual address at the time of the 2001 Census. While this is an advantage over previous censuses which had classified these as migrants with ‘origin not stated’, now making it possible to include them in a place-specific origin/destination migration
matrix, there is no way of checking the quality of the imputed data. At the same time, the 2001 Census introduced a new category which cannot be included in the migration matrix, namely migrants who considered that they did not have a usual address one year ago but were in some form of temporary and/or unofficial accommodation. In all, 4.5% of the 2001 Census’s one-year migrants had their previous address imputed, while 6.7% of migrants have had to be excluded from the SMS place-to-place migration flows because of classifying themselves as having ‘no usual address one year ago’. The Small Cell Adjustment Mechanism (SCAM) is another general feature of the 2001 Census data (though not for Scotland), but is one that impacts on the migration data more seriously than for most other variables, given that one-year migrants comprise not much more than 1 in 10 of the population. As documented more fully by Stillwell and Duke-Williams (2007) and in Chapter 3 of this book, by preventing the release of counts of 1 and 2 and reassigning them values of 0 or 3 through a more randomised method than straight rounding, SCAM will have had a particularly distorting effect on the SMS part of
199
Migration and Socio-Economic Polarisation within British City Regions
the migration output, where the vast majority of cell counts are normally very small. Where possible, migration matrix users are advised to use the ‘district’ level geography of SMS Set 1 that is less affected by SCAM, as we do in answering the first of our three questions below, but for the finer geographical framework we have adopted for the second and third questions (Figure 2), it is necessary to rely on the ward-to-ward matrices of SMS Set 2. Turning to the information contained in the SMS on the socio-economic classification of migrants, it should be noted that neither the relevant Set 1 table (MG109) nor Set 2 table (MG204) contain the full NS-SEC breakdown of 40 categories. On the other hand, they both present data for 12 classes which, given the sparsity of the migration matrices, is more than enough for present purposes. In fact, to increase the robustness of the results, the following analyses aggregate these 12 into six groups, comprising four occupationally-based classifications plus full-time students and a residual category of ‘other unclassified’ (Table 1). A further limitation of the two NS-SEC tables in the SMS is that their counts are for Moving Group Reference Persons (MGRP). A moving
group is defined as a “group of people who are migrants … and who have the same postcode … for their address one year before Census day” (ONS 2001, p.115), i.e. a lone person moving or a group of people moving together. The reference person for a moving group of 2+ members is identified in the same way as that of a multiperson household, namely by means of rules that essentially look for the main breadwinner. While around two-thirds of all moving groups comprised only one person, this still leaves a lot of people moving in larger moving groups who are excluded from these tables. This way of presenting the data on the socio-economic classification of migrants also means that it is impossible to calculate true migration rates, since unlike for all persons – or indeed for households – there is no equivalent of MGRP in the non-migrant population that could be included in the population-at-risk denominator. Also note that these two NS-SEC tables do not include any migrant who was living in some form of communal establishment (e.g. university hall of residence, nursing accommodation, hostel) at the time of the 2001 Census. A final limitation of the information in these two NS-SEC tables is that they provide no information about people’s situation one year earlier,
Table 1. Grouping of SMS NS-SEC categories for this study NS-SEC in SMS 1.1 Large employers and higher managerial occupations
NS-SEC groups Higher M&P (HMP)
1.2 Higher professional occupations 2 Lower managerial and professional occupations
Lower M&P (LMP)
3 Intermediate occupations
Intermediate
4 Small employers and own account workers 5 Lower supervisory and technical occupations
Low
6 Semi-routine occupations 7 Routine occupations L15 Full-time students
Full-time students
L14.1 Never worked
Other unclassified
L14.2 Long term unemployed L17 Not classifiable for other reasons
200
Migration and Socio-Economic Polarisation within British City Regions
other than their usual address then. This means that it is not possible to use this information to gauge precisely the net effect of migration on the overall socio-economic composition of the areas affected. The most obvious example of this is a person who was a university student one year ago but, during the pre-census year, graduated and moved away from the university’s city to take a job (in, say, a lower managerial and professional occupation) somewhere else. Because of the absence of information about the person’s previous student status, the 2001 Census will record this as the move of a lower managerial and professional person away from the university city. This type of situation can, however, arise more widely, given that changes in economic position and job are amongst the most common reasons for people moving. On the other hand, as these types of moves tend to take place over longer distances, this issue is likely to create less of a difficulty for the present study because it is entirely focussed on moves taking place within city regions. Clearly, analysing the socio-economic composition of migration and gauging its impact on local population profiles is by no means straightforward. All we can say in defence of using the Census data, as we did in a parallel analysis of migration flows between as opposed to within city regions (Champion and Coombes, 2007), is that it is not just the best source available but the only one that is in any way suitable for present purposes. All the same, users need to be mindful both of all the caveats described above and, in presenting the results derived from this data, of the need to consider as far as may be possible whether any of the findings might have more to do with the peculiarities of the data than with the patterns of migration behaviour that have actually occurred.
MIGRATION BETWEEN THE CITIES AND THE REST Of THEIR CITY REGIONS With this sobering thought in mind, we now come to the first of the three research questions that this chapter tries to answer, namely how far there exists a process of population decentralisation within city regions that favours people of higher occupational status over lower-status people. Picking up on the methodological discussion above, for this purpose we use the 27 city regions shown in Figure 1. For each, the two zones of the main built-up area of the city (its PUA) and the rest of its city region are delineated on the basis of their best-fit local authority areas, so as to be able to use the SMS set 1 data that is likely to suffer less from the effects of SCAM. Being precluded from calculating rates of migration because of there being no denominator equivalent to the MGRP used for the counts, we compare the patterns across the NS-SEC groups and also those between city regions on the basis of the ‘in/out ratio’, defined as the ratio of the numbers of MGRPs moving to the PUA from the rest of the region to the numbers of those moving in the opposite direction. The overall picture for the 27 cases is shown in Figure 3. Looking first at all MGRPs so as to provide context (top bar), it is found that the in/out ratio is somewhat less than the unity that would result if the number moving into the 27 PUAs from the rest of their city regions was exactly the same as the number moving outwards from the PUA. The ratio of 0.84 reflects the fact that there were 16% fewer people in the former group than the latter, with the actual flow numbers being 93,494 and 110,720 respectively. In this way of measuring internal migration for these city regions, in aggregate, the 27 are seen to have been experiencing decentralisation during the pre-census year. The overall pattern is very similar, if a little more marked, for the totality of MGRPs who are classified by occupation, with their in/out ratio of 0.82. Amongst the unclassified, there is a major
201
Migration and Socio-Economic Polarisation within British City Regions
contrast between those who were full-time students at the time of the 2001 Census, for whom there were nearly twice as many moving into the PUAs from the rest of their city regions as moving in the opposite direction, and the other unclassified MGRPs, for whom the inflow was not much more than half the outflow. For the four occupationally classified groups, none had an in/out ratio in excess of unity, signifying that all these were decentralising as a result of these residential movements. Comparing the ratios for these groups, it is found that the imbalance in flows is least for the higher managerial and professional (HMP) group and lowest for the low skilled, with a slight but continuous gradient between them. On the basis of this evidence for the 27 city regions combined, it would appear that the cities’ main built-up areas are marginally better at holding onto their higherstatus residents. This finding is at odds with the traditional picture of urban decentralisation being stronger among the better-off, but it could reflect the recent revival of city-centre living and longerterm urban gentrification. More detail on this key aspect can be obtained by looking at the in/out ratios for the 27 cities separately. The simplest way of doing this is shown
in Figure 4, where ratios for the higher managerial and professional MGRPs are compared with the ratios for all four occupation-based groups combined. The 27 cases are arranged in descending order of their ratios for the HMP group, permitting a ready insight into the degree of variation in that group’s ratios as well as into how those ratios compare with the picture for all the classified MGRPs. In relation to the former, there can be seen to be a considerable range around the global HMP ratio of 0.86 shown in Figure 3. At the top of the ranking, Norwich’s ratio of 1.24 indicates that in that city region, almost five HMP MGRPs moved into the PUA from the rest of its region for every four moving in the opposite direction. There were a further six cases where the number of such highly skilled in-migrants was larger than the number of out-migrants, namely Reading, Plymouth, Glasgow, Portsmouth, Bristol and Northampton, though the latter three were quite close to a balance. At the other end of the scale, Hull’s ratio of 0.49 signifies that there was barely one in-migrant HMP MGRP to the PUA for every two leaving it for the rest of its city region. Birmingham, Coventry, Middlesbrough, Leicester, Stoke and Nottingham registered the next lowest
Figure 3. In/out ratio for aggregate of 27 PUAs’ exchanges with the rest of their city regions, for six broad NS-SEC groups. Source: 2001 Census SMS
202
Migration and Socio-Economic Polarisation within British City Regions
in/out ratios, all with less than two moving into their PUA for every three leaving it for the rest of their region. As regards whether the in/out ratio for the HMPs was higher or lower than for the aggregate of classified MGRPs, Figure 4 has to be studied more closely. In all, it is only in 11 of the 27 cases that the experience of the individual city regions parallels the aggregate picture of the PUAs having the stronger position for the HMPs shown in Figure 3. The ratio for HMPs is found to be higher than that for all classified MGRPs (i.e. with the longer bar in Figure 4) only in the cases of Reading, Glasgow, Bristol, Northampton, Liverpool, Cardiff, London, Manchester, Leeds, Bradford and Brighton. Our inspection of the ratios for all four occupational groups allows the 27 city regions to be allocated to one of three types. One type is where there is a positive association between in/out ratio and socio-economic status for the PUA with respect to the migration exchanges between it and the rest of the city region, such that the ratio is highest for the HMPs and lowest for the low-skill group. A second type is the reverse situation, where the ratios tend to rise with falling socio-economic status, while a third type is where there is no regular relationship. Figure 5 presents archetypes of these three situations. London exemplifies the first type: indeed, it is the only city region out of the 27 with this 1-2-3-4 ranking of ratio from highest to lowest socioeconomic group, though there are five further cases where the HMPs’ ratio is the highest of the four classified groups. Birmingham exemplifies the opposite situation of a 4-3-2-1 ranking and is one among five that share this ranking and one among 10 where the HMPs’ ratio is the lowest of the four groups. Finally, Bristol represents the situation where the in/out ratios are very similar for all four of the occupationally classified groups, with no clear gradient across them. In sum, these analyses indicate that decentralisation dominates the pattern of migration
exchanges within these 27 city regions in the pre-census year. In terms of this study’s key concerns, for 16 of the 27 cities the in/out ratio for HMPs was lower than for all classified MGRPs, indicating positive social selection in the decentralisation process. Along the way, these analyses have demonstrated the value of the NS-SEC data available in the SMS from the 2001 Census for the first time, introduced the concept of MGRP, and applied a measure (the in/out ratio) that can be used to compare the strength of migration flows in the absence of the population-at-risk data needed for calculating rates. They have
Figure 4. In/out ratio for 27 PUAs’ exchanges with the rest of their city regions, for higher managerial and professional MGRPs (used for ranking) and all classified MGRPs. Source: 2001 Census SMS
203
Migration and Socio-Economic Polarisation within British City Regions
also shown that the geographical framework of a city’s main built-up area within its wider city region would seem to provide just as satisfactory a representation of the inner and outer parts of the twenty-first century British city as previous studies’ more localised distinction between city core and contiguous suburbs.
RELATIONSHIP BETWEEN MIGRATION AND EXISTING SOCIO-ECONOMIC PATTERNS The second of our three research questions concerns whether residential mobility within the city regions tends to reinforce the existing socioeconomic geography or reduce the differentials between its component parts. For this part of our study, as mentioned above, we focus on three of the 27 city regions and the zonation of these shown in Figure 2. To test the relationship between the relative migration performance of the zones of each city region and their socio-economic characteristics, we use binary correlation and scatterplots to compare the variation in zones’ in/out ratios
for a particular social group with the variation in the proportions which that group makes up of the zones’ total classified populations. If there is a positive association between in/out ratio and group share across the zones of a city region, then this is interpreted as migration reinforcing the existing socio-economic geography, with a negative relationship denoting that migration is working towards a diminution of the betweenzone differentials. Figures 6-8 provide an illustration of this approach, using the case of the HMP group for London city region. Figure 6 shows that, across the 46 zones, the in/out ratio of HMP MGRPs for migration exchanges with the rest of the city region ranges from a high of 1.42 (more than 14 arriving for every 10 leaving) to a low of 0.69 (with less than seven arriving for every 10 leaving). Broadly, it is the outer areas along the northern and southern boundaries of the city region that are most favoured, with the lowest ratios for outer west London, the Thames estuary and east Kent. By contrast, the existing socio-economic geography shown in Figure 7 primarily displays an east/west split, along with having the highest concentration
Figure 5. Three main types of relationship between in/out ratio and all 4 broad NS-SECs, as exemplified by 3 cities. Source: 2001 Census SMS
204
Migration and Socio-Economic Polarisation within British City Regions
Figure 6. In/out ratio for migration exchanges of higher managerial & professional MGRPs between zones and the rest of the London city region. Source: 2001 Census SMS
Figure 8. In/out ratio (logged) of within-region exchanges of higher managerial and professional mgrps plotted against the HMP share of classified residents, for the 46 zones of the London city region. Sources: 2001 Census SMS and Standard Tables
of HMPs in the southwestern sector of the Greater London area, where their share reaches 28.5% of the classified population – more than four times
Figure 7. Higher managerial & professional residents as a proportion of all classified residents, by London city region zone, 2001. Source: 2001 Census Standard Tables
higher than the 6.3% of the city region’s lowestshare zone. The lack of relationship between the two geographies is confirmed by the scatterplot in Figure 8 and the correlation coefficient (Pearson’s r) of +0.006, where the logged version of the in/out ratio is used to achieve a more normal distribution. When the same approach is applied to the other three classified socio-economic groups for the London city region and to the other two city regions, a more mixed picture is found. As shown by the correlation results presented in Table 2, the correlation for London’s HMPs is the lowest of all the 12 results. On the other hand, the correlations are not significant at the 5% level in a further six cases, these comprising all four social groups for the Bristol city region and the two lowest groups for Birmingham. By contrast, the correlations for London’s three other groups are highly significant, though while migration appears to be reinforcing the existing geography of the Intermediate and Low groups, it is taking the Lower Professional and Managerial (LMP) MGRPs more to the zones
205
Migration and Socio-Economic Polarisation within British City Regions
with the smallest proportions of this group in their classified populations. In the case of Birmingham, the within-region movements of both HMPs and LMPs appear to be more towards the zones already having the largest shares of those groups. The overall picture presented by this form of analysis is that intra-regional migration mainly parallels the existing socio-economic geography, but the degree of fit is generally not at all strong. While 10 of the 12 analyses yield positive correlations, only five of these are significant at the 5% level. Even the highest correlation coefficient – that of +0.566 for London’s Intermediate group – means that the between-zone differentials in the existing importance of that group in the population account for less than one third of the variance across the 46 zones in the in/out ratio for their migration exchanges of this group. In general, therefore, it has to be concluded on the basis of this evidence that, if socio-economic polarisation is taking place in these three city regions, within-region migration can be considered to be playing only a minor role in this process. At the same time, looking at the relationship the other way round, it means that the patterning of intra-regional migration must be subject to other drivers besides the existing social geography. Along the way, this approach has been useful in demonstrating that city-region geography is much more heterogeneous than the simple core/ rest split used in the previous section. It has also demonstrated the value of maps, scatterplots and correlations as exploratory devices that prompt
further questions about the links between migration and socio-economic complexion.
AREA CHARACTERISTICS ASSOCIATED WITH INTERZONAL NET MIGRATION GAINS This third and final empirical section explores the relationships between the net migration flows between each pairing of the zones in the three city regions and the differences between each pair of zones in their relevant characteristics. The aim is to see how common it is for migration to shift population towards the better-off of the two zones. Given the novel and exploratory nature of this work, we started by looking at the migration of all persons before focusing in on the highest social group. In the former case, if there is a sustained tendency for people to move away from areas with characteristics associated with deprivation, then such areas are at risk of being ‘residualised’ within their city region. We might expect this process to be particularly marked among the HMPs, as they have the means to enter the more prestigious localities, including opting for stronger housing markets in the hope of reaping greater capital gains in the future. Alternatively, a negative association with locality wealth would arise where HMP moves were contributing to a ‘gentrification’ process in areas where previously there had been little or no demand for housing from more affluent groups. As in the previous section, analysing the three city
Table 2. Correlations (r) between the in/out ratios (logged) of NS-SEC group and proportions of classified residents in the relevant social group, for zones of three city regions NS-SEC group
London city region
Birmingham city region
Bristol city region
+0.006
+0.314**
+0.045
LMP
-0.466*
+0.304**
+0.037
Intermediate
+0.566*
-0.012
+0.043
Low
+0.538*
+0.081
+0.268
HMP
Note: * Significance levels of better than 1%; ** Significance levels better than 5%. Source: 2001 Census SMS and Standard Tables
206
Migration and Socio-Economic Polarisation within British City Regions
regions separately allows any differences between, for example, the two provincial city regions and the capital to emerge. We use multiple regression analysis for this part of our study. The dependent variable is the net flow between any two zones in a city region. Flows that cross a city-region boundary are ignored, as are flows within the same zone. In the modelling, the zone pairs are weighted by the sum of the flows in both directions. This is in response to the fact that zone pairs with large flows which almost cancel out provide important evidence on the relationship between net flow levels and differences between the zones’ characteristics, whereas those net flows which are close to zero simply due to there being very few flows between the zones – perhaps because they are on opposite sides of the city region – do not provide very useful evidence. A total of 15 independent, or ‘explanatory’, variables have been selected for this analysis (Table 3). These represent five different ‘domains’ of zone quality, namely demographic, cultural/socio-economic, labour market, housing and environmental. The 15 have been reduced from a much longer list of variables, primarily on the basis of their being relatively independent from each other as determined by correlation analysis, with inter-correlations (r) of 0.7 or higher leading to one of the two variables being rejected. This ensures more robust results, though interpretation needs to bear in mind that any named variable may be acting as a proxy for another variable that is highly associated with it, which may be one of the rejected ones or another for which we have no data. To explore influences on net flows between the zone pairs, variables were processed so as to express the difference between the two zones in each pairing on that measure. Both the net flow and this difference can be either positive or negative, but the regression model identifies not only whether the between-zone differences are a significant influence on the strength of net migration flows but also whether any such rela-
tionship is in fact positive or negative. Note that the modelling used backward step-wise regression which identifies only the variables that are statistically significant in each analysis. Three principal questions are therefore being addressed in examining the modelling results shown in Table 3: • • •
How far can the pattern of net flows be accounted for by the zone differences? Which indicators of zone differences are significant influences in the modelling? Which of these influences are positively, and which negatively, related to the zoneto-zone net flows in each city region?
For ease of reference in Table 3, the variables representing change leading up to the 2000/1 year covered by the 2001 Census migration data are shown in italics. Also, the variables with a positive association with deprivation are differentiated by background shading. Part of the answer to the first question is provided by the results being shown for only four of the six possible models. It can be seen from the final row of Table 3 that in none of the four cases shown is the modelling able to account for a very large share of the variation in the size and direction of net flows between zone pairs. The highest ‘level of explanation’ (the adjusted r2 which shows how successful the model was in explaining the variation between zone pairs in the dependent variable) was for all migrant persons in the Bristol city region, with 52.3% of the variance accounted for. The proportion is less than half of this for the all-migrants model for the Birmingham city region, and lower still for London. In all three city regions, the level of explanation is lower for the model of the HMP MGRP flows than for the analysis of all migrants. In particular, the modelling was notably less successful in the London and Birmingham cases, which is why it is only for Bristol that results are reported here for a model of HMP migration.
207
Migration and Socio-Economic Polarisation within British City Regions
Table 3. Selected regression models of zone-to-zone net migration flows All individual migrants
Variables measuring change Variables positively correlated with IMD
London
Under 16 Students
+
No religion Ethnic diversification
HM&P
Birmingham
Bristol
+
+
-
-
-
+
+
-
-
-
Down-skilling
-
-
Household income
+
-
Bristol
+
Employment rate Employment rate change
+
-
Local job growth
-
Commuting 10km(+)
-
Semi-detached price
+
+
-
+
Semi-detached price change
-
-
Unoccupied dwellings
-
Green space
-
Crime
+ Adjusted r2
As regards the second question about which indicators of zone difference are significant in the models, the picture conveyed by Table 3 is rather mixed. The only real consistency is that the difference between the two zones in employment rate does not feature in any of the models. Only one variable – the difference between the two zones in the proportion of full-time students in the population – appears in all four models, but twice with a positive relationship with net migration and twice with a negative relationship. Four variables appear in just one of the four models, while though the other nine variables appear in two or three of the models, the direction of the relationship varies in all but one case. In terms of our key concern about the relationship between residential mobility and socioeconomic differentiation, the identity and direction of the significant indicators suggest there is no consistent pattern of moves either to or from areas with higher values on the factors correlated with higher deprivation levels. In the London case,
208
0.160
0.245
0.523
0.427
the overall pattern of net flows between zone pairs would seem to be favouring the zone that is wealthier and upgrading occupationally the more rapidly, but the positive association with crime and full-time students also suggests a link to the inward movement of younger professionals, possibly to places where gentrification is still at an embryonic stage. In the Birmingham model, the net flow tends to be away from the more deprived member of the zone pair, but there must be other processes operating which produce the negative associations with variables such as longer-distance commuting and higher household income. Similarly, the all-migrants model of Bristol includes, with negative signs, some variables associated with deprivation, but the other variables might be argued to be suggestive of ‘mature’ suburbia. As regards the experience of HMP MGRPs in the Bristol city region (the final model shown in Table 3), the results appear to be inclined towards the zone that has a more city centre type of feel, with fewer children but more students, increas-
Migration and Socio-Economic Polarisation within British City Regions
ing ethnic diversity, less green space and fewer unoccupied dwellings. In sum, as mentioned above, this part of our study has been essentially exploratory. It is therefore probably unsurprising that the models are unable to account for a larger proportion of the difference in net flows between all pairs of zones of all types. The distortion of the SMS ward-level data through SCAM may well be partly to blame, given the small number of flows between zones located on opposite sides of each city region, despite giving these smaller flows less weighting. More substantively, the differences between zones cannot by simply reduced to a single dimension, whether in terms of high/ low deprivation, more/less ethnic diversity, or whatever. The net flows will have equally diverse patterns with the possibility that, for example, there are net flows from affluent to ‘studentified’ areas, net flows from student areas to areas with many children, and also net flows from areas with many children to the most affluent areas. Clearly, such a multi-dimensional pattern is not readily reduced to a set of high/low parameters associated with the difference between a pair of zones on the characteristics measured here.
CONCLUSION The availability of district-level and ward-level data on the socio-economic characteristics of between-place migration flows is a valuable innovation of the 2001 Census output. In this chapter these datasets have been used to address research questions about how migration internal to city regions varies between occupationally-defined groups and, in particular, whether this residential movement seems to be leading to socially selective urban decentralization and greater local polarization within these broadly defined cities. The main thrust of the chapter, however, has been conceptual and methodological, seeking the best ways of specifying the research questions and
considering the adequacy of the SMS datasets for answering them. In terms of the substantive findings produced by the three sets of empirical analyses presented above, the overall impression is that the migration patterns do vary by social group but not in a consistent way across city regions. In three quarters of the 27 cases, the main built-up area of the city lost more HMP MGRPs to the rest of the city region than it gained from it in the pre-census year, but in more than one third of the cases the in/out ratio for this high-status group was higher than that for the aggregate of occupationally classified MGRPs. In the majority of cases, there was a broadly negative relationship between in/out ratio and social status for the urban core, but there were also a few cases (notably London) where the relationship was positive for the core, and rather more where they was no clear relationship across the four social groups. As regards the way in which the more localized patterns of residential movement in three differentiated city regions relate to their inherited social geography, usually the most positive migration balances for any social group is for the zones that already have the highest representation of that social group. On the other hand, in less than half of the instances examined was this trend towards greater social polarization significant statistically. Similarly, in examining whether the net flows between all possible pairings of zones in each of these three city regions favoured the more wealthy or the more deprived localities, the former seems to be the case in just two of the four model results presented, namely for all persons in the Birmingham and Bristol city regions. In the other two cases – for all persons in London and HMPs in Bristol – the net gains were positively associated with variables representing greater deprivation, possibly reflecting the onset of gentrification and perhaps due to the inflow of recent graduates in some cases. In statistical terms, however, the relationships were again generally rather weak, suggesting that much of this within-
209
Migration and Socio-Economic Polarisation within British City Regions
region migration is driven by other factors besides between-zone social differences. How much confidence should we place in these results? As with any exploratory research, it can be argued that adopting a different approach – for example, using an alternative zonation for the latter two analyses – might have made a difference. That said, it should perhaps be no great surprise that the levels of explanation are not higher, given the multi-dimensional nature of migration stressed by Champion et al. (1998) and ODPM (2002), among others. Fundamentally, migrants vary greatly in their personal characteristics and circumstances and in their reasons for changing address. It is also likely that the nature of the 2001 SMS data will have militated against achieving clearer results. The datasets have a much higher sensitivity to SCAM than most other 2001 Census outputs and, no doubt, a social bias due to the omission of migrants with no usual address one year ago. Additionally, our analyses of migration by social group have had to ignore migrants who are not MGRPs but comprise either other members of moving groups or migrants living in communal establishments at the census. The availability of the socio-economic counts only for MGRPs has also meant that we have not been able to model migration rates. Finally, the single year of migration covered by the 2001 Census means that migrant flow counts are more likely to be subject to the effect of short-term events than for datasets assembled for a run of years. The experience gained from the above analyses therefore points up several improvements that should be made in the next census, both relating to the questions on the form and in terms of subsequent data processing. As regards the former, migrants should be asked to indicate their whereabouts one year ago, even if they did not feel that they possessed a ‘usual address’, so that they can be included in the place-to-place migration flows in the SMS. Secondly, those who were full-time students one year ago should be required to indicate this fact, so that those who
210
are no longer students by the time of the census can be distinguished from other migrants. Ideally, all persons would be asked about their economic position one year ago. Thirdly, an additional question on address five years ago – as asked alongside the one-year-ago question in the 1971 Census but not in the latest three censuses – would help to smooth out the effect of short-term events. In terms of data processing, it is vital that the method of disclosure control used for the 2011 SMS is a lot less damaging than SCAM. Finally, in terms of the SMS output, it would be much better if the socio-economic tables were for all persons rather than just for the MGRPs.
ACKNOWLEDGMENT Most of the empirical results presented in this chapter were generated by a Joseph Rowntree Foundation census programme project on migration and the socio-economic complexion of communities (see Champion et al., 2007, for further details). Our thanks go to Colin Wymer and Simon Raybould for help with data preparation, analysis and mapping. The results are all based on 2001 Census data supplied on compact disk by the Office for National Statistics. These data are Crown Copyright.
REfERENCES Barclay, P. (Chair). (1995). Inquiry into Income and Wealth, Volume 1: Report. York, UK: Joseph Rowntree Foundation. Bramley, G., Pawson, H., & Third, H. (2000). Low Demand and Unpopular Housing. London: Department of the Environment, Transport and the Regions. Brewer, M., Goodman, A., Shaw, J., & Sibieta, L. (2006). Poverty and Inequality in Britain 2006. London: Institute of Fiscal Studies.
Migration and Socio-Economic Polarisation within British City Regions
Bulusu, L. (1991). A Review of Migration Data Sources. OPCS Occasional Paper 39. London: Office of Population Censuses and Surveys. Champion, T. (2001). Urbanisation, suburbanisation, counterurbanisation and reurbanisation. In Paddison, R. (Ed.), Handbook of Urban Studies (pp. 143–161). London: Sage Publications. Champion, T., & Coombes, M. (2007). Using the 2001 census to study the human capital movements affecting Britain’s larger cities: insights and issues. Journal of the Royal Statistical Society A, 170, 447–467. doi:10.1111/j.1467985X.2006.00459.x Champion, T., Coombes, M., Raybould, S., & Wymer, C. (2007). Migration and Socio-economic Change: A 2001 Census Analysis of Britain’s Larger Cities. Bristol, UK: Policy Press. Champion, T., & Fisher, T. (2004). Migration, residential preferences and the changing environment of cities. In Boddy, M., & Parkinson, M. (Eds.), City Matters: Competitiveness, Cohesion and Urban Governance (pp. 111–128). Bristol, UK: Policy Press. Champion, T., Fotheringham, S., Rees, P., Boyle, P., & Stillwell, J. (1998). The Determinants of Migration Flows in England: A Review of Existing Data and Evidence. Department of Geography, University of Newcastle upon Tyne, Newcastle upon Tyne. Coombes, M. (2002). Localities and city regions codebook. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System. Chichester, UK: Wiley. Dorling, D., Rigby, J., Wheeler, B., Ballas, D., Thomas, B., & Fahmy, E. (2007). Poverty, Wealth and Place in Britain, 1968 to 2005. Bristol, UK: Policy Press.
Gregory, I., Southall, H., & Dorling, D. (2000). A century of poverty in England and Wales, 18981998: a geographical analysis. In Bradshaw, J. R., & Sainsbury, R. (Eds.), Researching Poverty. Aldershot, UK: Ashgate. (1995). InHills, J. (Ed.). Inquiry into Income and Wealth: Vol. 2. A Survey of the Evidence. York, UK: Joseph Rowntree Foundation. Lupton, R. (2005). Changing Neighbourhoods? Mapping the Geography of Poverty and Worklessness Using the 1991 and 2001 Census, CASEBrookings Census Brief 3. London: London School of Economics. ODPM. (2002). Development of a Migration Model. London: Office of the Deputy Prime Minister. ONS. (2001). Census 2001 Classifications. Titchfield. Hampshire, UK: Office for National Statistics. ONS. (2005). Census 2001: Quality Report for England and Wales. Basingstoke: Palgrave Macmillan. Palmer, G., Kenway, P., & Wilcox, S. (2006). Housing and Neighbourhoods Monitor. London: New Policy Institute. Rees, P., Thomas, F., & Duke-Williams, O. (2002). Migration data from the census. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 245–267). Chichester: Wiley. Stillwell, J., & Duke-Williams, O. (2007). Understanding the 2001 census migration and commuting data: the impact of small cell adjustment and problems of comparison with 1991. Journal of the Royal Statistical Society A, 170, 425–445. doi:10.1111/j.1467-985X.2006.00458.x
211
212
Chapter 11
Issues Associated with the Analysis of Rural Commuting Martin Frost Birkbeck College London, UK Adam Dennett University of Leeds, UK
ABSTRACT It is important to acknowledge that the reliability of the 2001 Census interaction data depends on spatial scale and geographical location. As the spatial scale becomes more refined, small cell adjustment becomes more significant because sets of flows are likely to contain more ones and twos prior to adjustment. Likewise, data for areas such as large towns and cities that have many commuters and migrants will tend to be more reliable than data for rural and sparsely populated areas where flows of commuters and migrants are likely to be relatively small. This chapter is concerned with commuting flows for rural areas and examines the sources of unreliability in the 2001 Census data by considering statistical disclosure controls, the quality of census responses and the implications of table specifications. The chapter also addresses some of the issues associated with analyzing data for rural areas and the anomalies that exist between area classifications defined by the Department for Environment, Food And Rural Affaires (DEFRA) and the Office of National Statistics (ONS).
INTRODUCTION This chapter considers some of the issues surrounding the effective analysis of rural commuting. Much of the discussion focuses on the use of commuting data taken from the Census of Population – and particularly the Census of 2001. This remains the principal comprehensive source of commuting DOI: 10.4018/978-1-61520-755-8.ch011
information within the UK, particularly for small spatial units. While some government surveys, such as the National Travel Survey, collect information on people’s work journeys, their sample size is relatively small and their results cannot be used much below the spatial scale of local authorities. The discussion in this chapter is set within two important policy contexts. The first is the evolution of rural economies. For several decades employment in the land-based
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Issues Associated with the Analysis of Rural Commuting
economy that once dominated many rural areas has been declining, while, at the same time, many towns and villages have been growing rapidly, often linked to in-migration of people employed in a range of service sector activities (for a review of relevant literature, see Winter and Rushbrook, 2003). This has led to increasingly complex patterns of inter-dependence between cities, towns and villages – powered in part by rising levels of personal mobility that allow people more choice in their relative locations of home and work. Simple questions like ‘how important are market towns in comparison with larger cities as a contemporary source of employment for rural residents?’ that are crucial for the development of rural economic development strategies can only be answered through an analysis of commuting patterns. In many respects, the second context is derived from the first. All rural development policies stress the importance of attempting to reconcile economic development and population growth with environmental sustainability (Office of the Deputy Prime Minister, 2004). The problem here is that rural settlements are an intrinsically less compact way of combining residence and work opportunities for large numbers of people than major towns and cities. At a simple level this is reflected in the fact that commuting journeys are consistently longer for rural residents than they are for people living in larger urban settlements. The environmental implications of this difference are intensified by the fact that the vast majority of rural work journeys are made by car. In general, a key use of commuting data is in investigating the ‘environmental load’ created by population and employment growth in rural areas; a particular use is in identifying what kinds of rural settlement structures provide the least environmentally damaging solution to meeting these challenges. It should be stressed, however, that although the policy contexts for analysing rural commuting are, in some respects, distinctive, many of the analytical problems and data issues that are raised are shared with equivalent studies of commuting
throughout the settlement system. What focuses these issues more sharply in rural areas is that rural settlements are smaller and less dense than cities and metropolitan areas. Consequently, the commuting flows between rural zones tend to be smaller, with a higher proportion of pairs of zones with only a single person moving between them – and thus more exposed to the efforts of the census authorities to ensure individual anonymity. In addition, for rural settlements, a small number of very long commuting journeys can have a greater proportional impact on their commuting profiles than would be the case for a larger town or city. And, finally, the boundaries of small settlements are notoriously difficult to approximate with the areal units used for tabulating census results. For large settlements, problems of approximating an outer boundary with census output areas or wards often have little impact on the aggregate employment, resident and commuting profile for the settlement as a whole. The smaller the settlement becomes the greater are the potential distortions generated by these approximations. Within these contexts, this chapter focuses on four principal issues. The first involves the consequences of the data disclosure controls used in the release of results from the 2001 Census to ensure that individual travellers could not be identified from the census tables. The second is the issue of the quality of responses made by individuals filling in those parts of the census form that provide the basis for identifying respondent’s work journeys and the mode of travel they used. The third issue is the interaction between the table specifications chosen by the Office of National Statistics (ONS) for the release of commuting data and the problems of approximating settlement boundaries using the standard ‘geographies’ used for the release of the data. Finally, the fourth issue is concerned with issues of analysing and interpreting flow data for rural areas, especially in the context of data constraints and definitions of rural that are static and, in some cases, difficult to interpret
213
Issues Associated with the Analysis of Rural Commuting
and incompatible with other definitions offering supposedly similar typologies. Throughout the majority of this chapter, ‘rural’ is identified as those areas classified as ‘rural’ by the current ONS Rural-Urban Definition established in 2004. A full discussion of this can currently be found at http://www.statistics.gov. uk/geography/nrudp.asp and its methodology is explained in Bibby and Shepherd (2005). In brief, it classifies all areas of England and Wales that lay outside settlements with 10,000 residents or more in 2001 into either ‘Less Sparse’ or ‘Sparse’ rural areas. Each of these primary categories can then be broken down at the ward level to ‘Rural Town and Fringe’ or ‘Village and Dispersed Population’ sub-categories. With the higher resolution offered by output areas it is possible at this scale of spatial units to distinguish between ‘Villages’ and ‘Dispersed Populations’. The purpose of this piece is not to assess whether census-based commuting data is ‘fit for purpose’ in the analysis of rural commuting. This will depend strongly on the design and priorities of individual research projects. Instead, it seeks to illustrate with particular examples the effects of the issues outlined above. It is intended as an informative caution for anyone using commuting data in a rural setting to help them make informed judgements within their research designs and to help interpret the published results of other commuting analyses.
THE IMPLICATIONS Of STATISTICAL DISCLOSURE CONTROL Statistical disclosure control (SDC) is the term used to capture the range of methods adopted by national statistical agencies to protect respondent confidentiality. Essentially there are three parts to the approach used by ONS to protect the results of the 2001 Census. The first stage is record swapping where a number of records relating to individuals or households are ‘swapped’ between neighbour-
214
ing zones to create uncertainty about the location of specific individuals within the published data. The second stage is suppression of small numbers of respondents in individual cells within tables – referred to by ONS as small cell adjustment. The third stage relates to the design of the tables that are published. In general, this means that the more detailed breakdowns of respondent’s characteristics are reserved for tables published at the spatial scale of census wards rather than at the smaller output area scale. This section considers the specific implications of the first two procedures on commuting data. Table design is discussed thereafter. A much fuller discussion of the general issues of SDC in the 2001 Census can be found elsewhere (Williamson, 2007; Duke-Williams and Stillwell, 2007). It is relevant to note that the 2001 Census was the first to apply technical measures of SDC to commuting data. In earlier censuses, all equivalent tables were based on a 10% sample of census records which was seen as a sufficient form of SDC to maintain respondent confidentiality. A key problem with any discussion of the effects of SDC on the published results of the 2001 Census is that, in order to prevent analysts re-creating the original data, the details of the precise methods used by ONS are never released. We are dependent on inferring their character from the published tables and from the range of equivalent methods adopted by other national statistical agencies. In this context, record swapping is the most difficult to detect. ONS is committed to swapping records between zones that are close together and to swap records of a similar nature. This is a sensible approach that minimises the distortion created in the data while creating uncertainty over the identification of individuals. We do not know the range of characteristics used to establish ‘similarity’. It is reasonable to assume that they are likely to be based on sociodemographic characteristics such as gender, age, socio-economic group and, maybe, housing tenure. It is unlikely that the characteristics would extend
Issues Associated with the Analysis of Rural Commuting
to an individual’s workplace, identified on the census records as a unit postcode. This raises the question of whether the effects of this hypothetical approach to record swapping might have significant effects for the analysis of rural commuting patterns and structures. The answer to this question depends strongly on the extent to which commuting journeys in a particular area share common destinations. Close to large urban employment centres it is likely that many journeys will share a common destination, at least at the settlement scale. However, some recent studies (for example, Shepherd et al., 2007) have shown that residents of rural areas that lie some distance from a ‘dominant’ urban centre often have much more varied commuting patterns. Taking the ONS approach of defining Travel to Work Areas, discussed in detail by Mike Coombes in Chapter 12, it is apparent that, over large areas of rural East of England, significantly less than half of the employed residents work within their home Travel to Work Area. In simple terms, the more diverse an area’s workplace destinations, the greater is the potential effect of record swapping. Counterbalancing this, however, is the fact that, if the residents of a rural area have many workplace destinations it will be difficult to establish any
clear patterns of commuting alignment with or without record swapping! The effects of the second form of SDC, small cell adjustment, are more obvious within commuting data. Although ONS do not release a definition of what a ‘small number’ is, no 2001 Census tables, including the commuting tables, contain any individual cells with counts of either one or two. It is clear that these values, at least, have been adjusted to either 0 or 3. The effects of this on the range of cell counts for car commuters between pairs of census wards in rural England can be seen in Figure 1. Nearly 60% of all recorded flows between pairs of wards suggest that three people made that trip. There are no counts of either one or two. A small subsidiary peak can be seen at six travellers suggesting that adjustment is not wholly restricted to values of one and two. At first sight, this may appear to be a substantial distortion of the data. However, much depends on the exact character of the adjustment technique that is used. As part of a longer discussion of the impact of small cell adjustment on interaction data in general, Duke-Williams and Stillwell (2007) demonstrate a ‘data neutral’ algorithm in which values of one have a two thirds chance of being
Figure 1. Frequency of cell counts for car commuters in rural England by ward, 2001. Source: Census of Population, 2001: Crown Copyright.
215
Issues Associated with the Analysis of Rural Commuting
adjusted to zero (a one third chance of being adjusted to three) while values of two have the opposite probabilities (two thirds chance of moving to three, one third chance of moving to zero). Taken over a fairly large number of cells this is unlikely to have a noticeable effect on the total number of travellers, although it will inevitably increase the numbers of threes and zeros that are found in the data. A partial test of the ‘data neutrality’ of the method used by ONS can be achieved by analysing particular commuting flows using data at different spatial scales. In general, the smaller the spatial units on which data is based, the greater will be the number of cells affected by small cell adjustment method (SCAM). If the adjustment method were not neutral in its effects on flows one would expect this to be consistently reflected in the flows aggregated from different spatial scales. The example shown in Table 1 is for commuters travelling between North Hertfordshire and London. The figures are based on aggregating all flows at output area scale, at ward scale, and for the local authority as a whole, extracted directly from the census local authority tabulations. The estimates of the total number of commuters vary a little across the aggregations from different spatial units. It is interesting to note that the variation is not directly related to the size of zone used, with wards producing the largest figure and the local authority scale the smallest. The statistical advice from ONS is to use the largest spatial units possible in any analysis in order to minimise the effects of cell adjustments, which implies that the local authority estimate of 5,692 is
likely to be the closest to the correct figure. In this context the results suggest that the larger number of small flows linked to the output area scale has not biased the estimates upwards, lending support to the notion that the adjustment methods do not greatly distort aggregated flows when these are based on fairly large numbers of individual journeys. Where cell adjustment has greater effects is at more local scales where, for example, one might wish to identify the commuting catchment areas associated with small rural towns.
THE QUALITY Of CENSUS RESPONSES It is important to remember that, in its current form, the Census of Population is a large survey – arguably the largest survey conducted for any purpose within the UK. As with all data collected through surveys, its quality is determined by the accuracy, care and sense of understanding that respondents achieve when filling in the census form. The census is an unsupervised survey in the sense that individuals and households fill in the forms normally without external assistance and with only a set of notes to guide them. Considerable effort is invested by ONS and the other Census Offices in Scotland and Northern Ireland to ensure that the census form is clear, compact and simple. However, there is evidence that suggests that not all respondents find it easy to fit their patterns of activity into the compact and simple census form. Commuting data are particularly exposed to these problems. The main focus of the census is to
Table 1. Commuting flows between North Hertfordshire and London, 2001 Source areas
Number of commuters
Employed residents
Commuters as % of total employed residents
Output areas
5,735
59,837
9.6
Census wards
5,840
59,392
9.8
Local authority
5,692
58,853
9.7
Source: 2001 Census SWS
216
Issues Associated with the Analysis of Rural Commuting
record the numbers and characteristics of residents. It does have, however, a number of questions that relate to respondents’ employment. In particular, question 33 on the 2001 Census form asks each person to specify ‘the address of the place where you work in your main job’ (ONS italics). This address should include a full postcode, processing of which is the key to identifying each person’s commuting journey between their home address and their stated place of work. Additional options are provided for respondents who work ‘at or from home’, on offshore installations or who believe they have ‘no fixed place’ of work. There is no question on the form that deals with how regularly an individual travels between their home address and their place of work, nor whether they have more than one place of work associated with their main job. The Census Quality Report (ONS, 2005), indicated that ‘respondent difficulties’ for question 33 included “respondents who have put down a part-time job, people who have more than one occupation, and those who were unsure as to which was their main job”. No indication is given of how frequently these problems were identified. In addition to these identified difficulties, the Quality Report states that nearly 8% of respondents failed to fill in a response to the question at all. In addition to form-filling difficulties, there are also processing difficulties associated with the question. Full unit postcodes are notoriously sensitive to error. Often a single misplaced character or wrongly entered digit can make them very difficult to use. The Quality Report indicates that checks on the automatic scanning of the responses to this question showed them to be 86.1% accurate, well short of the 94.5% accuracy target set by ONS for the outside contractors (Lockheed Martin) who undertook the processing. In mitigation, ONS claims that many ‘impossible’ postcodes were incorrect only in their final two characters but it is not clear how such postcodes were ‘corrected’ or allocated to counts within the census zones. All surveys, and the census is no exception, have problems of non-response and inaccurate
form filling. The concern about their effects on the census commuting data is that there are a non-trivial number of travellers that have journeys that are sufficiently long to suggest that daily travel would be impossible. While their numbers are small in relation to the total number of commuters overall, their long journeys can inflate the estimation of average journey lengths and can greatly increase calculations of the total distances travelled by commuters either over all modes or within individual modes of travel. An indication of the effects of a small number of long journeys can be seen in Table 2. This shows the mean and median journey lengths for all commuters originating in the rural areas of England (analysed at the scale of census wards). For most modes of travel, with the exception of rail, the mean journey lengths are roughly twice the value of the median. This reflects the existence, in terms of trip length frequencies, of distributions that are substantially right skewed by the long distance trips. Figure 2 shows the distribution of journey lengths for all commuters in rural England. It has the characteristic right tail of relatively long journeys – with some very long journeys. It is notable that 2% of all declared commuting journeys in the graph are between two hundred and three hundred kilometres. The effects of the level of skewness in the distribution can be seen in the divergence between the overall mean travel distance (37.7kms) and the median (20.6 kms) which are shown on the graph. Table 2 repeats the comparison of mean and median journey lengths for individual modes of travel. Two conclusions can be drawn from the results contained in the table. The first is that using the median appears to be an effective way of reducing the impact of long journeys. The second is that great care is needed when interpreting mean journey lengths. Unfortunately, many census tables based on commuting data present only average journey lengths and do not contain medians as an alternative.
217
Issues Associated with the Analysis of Rural Commuting
Figure 2. Journey lengths for all commuters in rural England by ward, 2001. Source: 2001 Census SWS
This leaves the question of whether these long journeys are ‘real’ or in some way the product of squeezing complicated lives into a simple census form – or are just errors in form filling. The problem in assessing this is that there is no completely effective benchmark with which to compare the distributions of journey lengths recorded in the census. The closest comparator is based on the National Travel Survey (NTS), part of which records commuting trips for a sample of individuals. However, there are significant difficulties in making a direct comparison between the results of the 2001 Census and the NTS.
One difference between them is that the NTS relies on participants filling in travel diaries for a short period (usually a week) recording their journeys and their purpose. It is a more direct and potentially more accurate approach than inferring commuting journeys from a stated place of residence and a stated place of work which is the basis for census tabulations. However, because participants tend to make long journeys more infrequently than short journeys, long journeys are recorded separately on a three week diary. In addition, because the number of long journeys is small, counts are published
Table 2. Mean and median journey lengths for commuters originating in rural England, 2001 Mean journey distance (kms)
Median journey distance (kms)
Train
Mode of travel
53.0
42.2
Bus
13.3
7.9
Car
17.1
9.7
Motor cycle
15.1
8.7
Bicycle
8.6
4.0
Foot
8.3
3.2
Source: 2001 Census SWS
218
Issues Associated with the Analysis of Rural Commuting
for a three year combined period rather than as a single count per year and are not combined with the results of the one week diaries within the published results. As a final problem, the sampling methodology of the NTS was revised substantially for the 2002 survey producing a discontinuity with earlier counts. Thus, in the comparison shown in Figure 3 between the proportion of relatively long commuting journeys identified in the NTS and the 2001 Census, the NTS data are based on long journeys recorded by three week diaries over the three years the between 2002 to 2004 (inclusive). To derive a total number of journeys that can be combined with the annual records of the one week diaries, the number of long journeys is divided by three to allow for the longer recording period. The number is further divided by three to allow for the fact that the journeys were recorded on three week rather than one week travel diaries. The resulting estimate of the number of long journeys is added to the number of shorter journeys recorded in 2003 to produce an estimate of the total number of commuting journeys that can be used as a base from which to calculate the proportion of long journeys. This is based on the assumption that, unlike many other forms of journey, commuting trips are made on a regular basis. This, of course, is similar to the assumption that underpins the use of the census results based on specified places of residence and work. Overall, this procedure produces an estimate that only about 0.6% of commuting journeys recorded in the NTS exceed 80 kms in comparison with the significantly larger proportion of 2.2% recorded by the census. Although even the 2.2% appears to be a relatively small proportion of commuting journeys recorded by the census, it represents close to 450,000 individual trips. Acknowledging the potential incompatibilities in the comparison between the NTS and the 2001 Census, there appear to be clear differences in the number of long journeys that the two surveys identify, with the 2001 Census producing larger
counts in all but the longest (and smallest) category. It is a cause for concern, particularly given that the level of supervision and support offered by fieldworkers is higher in the case of NTS travel diaries in contrast to the unsupervised completion of the census form raising the possibility that some ‘improbable’ journeys are picked up at the data collection stage. As indicated in the earlier discussion in this section, the effects of this relatively small number of unusual journeys can be diminished by using medians rather than means as an overall summary measure of distance travelled. The area where they become important is when attempts are made to measure the total person kilometres generated by commuting. This is a central component of any attempt to assess the environmental impacts of commuting and the ways in which these impacts might be, or have been, influenced by public policies relating to urban form or transport provision. The impact of long journeys on estimates of total travel length can be illustrated with some simple figures. If all rural areas of England and Wales are considered, the 2001 Census results (analysed at output area scale) show that car journeys generated about 30 million person kilometres of travel. Nearly a quarter of this total (7 million person kilometres) was contributed by journeys greater than 150kms in length. This threshold is roughly equivalent to a daily commuting journey between Bristol and London (160kms). As Figure 2 shows, some journeys appear to be considerably longer than this. It is clear that a relatively small number of long journeys may substantially weight any calculation of the environmental impacts of commuting. They may be particularly influential if the ‘commuting footprint’ of individual settlements is considered and, by chance, captures some very long commuting trips. One possible way of limiting the role played by a small number of unusual journeys is to apply a travel time threshold to trips. It might be argued, for example, that daily
219
Issues Associated with the Analysis of Rural Commuting
Figure 3. A comparison of long commuting journeys between the 2001 census and the national travel survey. Sources: 2001 Census SWS; National Travel Survey, 2004
journeys of more than three hours in each direction are so improbable in most people’s working lives that they are either not daily journeys or are the product of census form filling difficulties. The difficulty in this approach lies in estimating realistic travel speeds that are applicable to the study area in question. If the NTS is used on a national basis to estimate car commuting speeds, all those journeys of more than 150kms referred to above would be excluded, but these speeds include car journeys in both urban and rural areas and may not be an accurate reflection of predominantly non-urban journeys. In practical terms, it is an issue that needs to be addressed on a project by project basis given the particular settlements that are analysed and the apparent influence of long distance journeys – but is one that needs to be recognised and handled with care!
TABLE SPECIfICATIONS AND THEIR IMPLICATIONS The key issue here is that, in the publication of commuting data within the 2001 Census, different classifications of travellers are available at
220
different spatial scales. In particular, there are more detailed classifications available at the scale of Census Area Statistics (CAS) wards than for output areas. For output areas, travellers are only classified by mode of travel. If analysis is needed for different socio-economic groups, different genders, or for people with different employment status (e.g. full-time versus part-time employees), which is often the case when relating commuting behaviour to labour market structure, it is necessary to use a ward-based file. In predominantly rural areas, it is also often the case that analysis is needed for identifiable settlements rather than the somewhat heterogeneous local authorities, or for the somewhat too detailed individual wards. This is particularly the case when the environmental impacts of settlement form and position are considered or when the economic functions of rural towns are considered in relation to the areas around them. In these circumstances settlement boundaries need to be approximated by aggregations of either output areas or CAS wards depending on whether the more detailed classifications of the CAS ward scale is needed. For cities or large towns, problems of zone approximation around their boundaries are often
Issues Associated with the Analysis of Rural Commuting
only a minor influence on the numbers of employed residents that are identified. For analyses of rural commuting, however, the generally smaller towns are more strongly affected, producing a situation where non-town dwellers are included in their records producing potential distortions in the profiles of their residents’ commuting behaviour. Figure 4 shows the urban settlements contained within a classic rural area in East Anglia – the local authority district of Mid Suffolk. The settlements and their boundaries are those contained within the ONS definition of urban areas. In essence, this definition seeks to identify areas of continuously built-up land through a set of operational rules. The combined output areas that intersect with these built-up areas are shown in grey, with CAS ward boundaries in the background. It is visually clear that the wards are large spatial units which are typical of relatively sparsely populated rural areas. An attempt at estimating the numbers of ‘non-town’ residents included in the settlements by approximating their boundaries with different census zones is shown in Table 3. This is based on counting residential delivery points within unit postcodes allocated to the settlements cores, their output area approximations and their CAS ward approximations. The base of 23,020 delivery points is the most accurate estimate, although it
is still possible that some unit postcodes may cross the ONS settlement boundaries. The output area approximation includes only about 9% more delivery points in spite of the grey-shaded areas appearing larger than the settlement cores. However, it is the CAS ward approximation that is much more potentially damaging – including about 58% more delivery points because the spatial extent of these rural wards is almost always larger than the size of the rural towns identified within the ONS classification. Once again, this is a problem for which there is no ‘cure’. It is a reflection of the fundamental nature of the data and the decisions, largely based on a desire to preserve anonymity, that lie behind the specifications of the tables in their published form. It can be viewed from both a positive and negative perspective. On the positive side it is true that the 2001 Census was the first to release commuting data at a scale as fine as that of output areas. Prior to this, all data was released at the ward scale or larger. It means that, for the first time, it is now possible to measure effectively numbers of commuters and their modes of travel for quite small rural towns. On the negative side, what is less feasible is providing a socio-economic or gender perspective on the structure of commuting for these towns.
Table 3. Estimates of households within urban settlements of Mid Suffolk approximated by different census zones Postcodes
Residential delivery points
Mid Suffolk local authority
Area
3,499
39,919
Mid Suffolk settlements
1,312
23,020
Mid Suffolk output area approximations
1,548
25,078
Mid Suffolk ward approximations
3,015
36,287
OA approximation ‘surplus’ Ward approximation ‘surplus’
236
2,058
1,703
13,267
Source: Postcode Address File
221
Issues Associated with the Analysis of Rural Commuting
Figure 4. Settlement boundary approximations in Mid Suffolk using CAS wards and output areas. Source: 2001 Census
ISSUES WITH ANALYSING AND INTERPRETING RURAL COMMUTING DATA The issues of urban and rural zone approximation outlined above mean that researchers wishing to carry out analysis of rural commuting are constrained to a certain extent by the limitations of the data. Anyone using census data to analyse rural commuting patterns is faced with a trade-off between using output area level data and having more accuracy in terms of spatial extent of rural and urban zones – subject to the accuracy of any classification being used to define the area, or using ward level data and potentially sacrificing spatial accuracy for increased attribute information and less small number perturbation. When output area data are available, the key question is whether there is any value in using these data at all or should coarser geographies be preferred. Furthermore, it is necessary to consider what other issues are presented when defining the extent of
222
‘rural’ – whilst analysing flow data in the context of a classification used to differentiate between rural and non-rural areas is sensible, the use of ‘off-the-shelf’ classifications is not without some issues. So far in this chapter the definition of a rural area has been based upon the ONS (2004) ruralurban classification. This classification, whilst providing a useful framework for spatial analysis, can also lend itself to some confusion – principally through the distinction of ‘Sparse’ and ‘Less Sparse’ areas. Take the maps produced from this classification on the DEFRA website (http://www. defra.gov.uk/rural/ruralstats/rural_atlas/atlas-bygeography.htm), for example. Cartographers at DEFRA have produced maps of this classification at various geographical levels, although all have a common shading schema. These maps group ‘Sparse’ and ‘Less Sparse’ categories together with the same colours when it could be argued that a more accurate schema would group settlement types; thus a ‘Sparse’ town and a ‘Less Sparse
Issues Associated with the Analysis of Rural Commuting
town’ are likely to have more in common than a ’Sparse’ town and a ’Sparse’ hamlet. As this demonstrates, mistakes can be made and researchers need to be cautious in their choices when using this framework. Furthermore, the ONS rural-urban classification is not the only classification to include rural definitions. At both ward and output area levels, the general purpose geodemographic classifications produced by ONS have rural definitions which classify areas very differently from the rural-urban classification. Of course, the definitions underlying the area names are very different and few would take a similar named area to be a similar type of area without looking into the definition in more detail. However, the clusters are also very different. Figures 5 and 6 below exemplify these differences in Mid Suffolk. For example, in the ST ward classification (Figure 5), grouping two of the Stowmarket wards and a neighbouring ward along with five wards to the north-east of the district in the same ‘Coastal and Countryside’ category, whereas in the rural-urban classification of CAS wards (Figure 6), these areas are classified in three different groups. Furthermore, the only area identifiable as a significant urban area
in the rural-urban classification is one of the few classified as relatively non-urban in the general purpose classification. Accepting these potential classification pitfalls, attention now be turns to the issue first raised in this section of whether the use of more spatially precise output area data can be justified in preference to more attribute rich and accurate ward data where the association with rural or urban may also be key. As shown in Figures 6 and 7, the extent of the rural areas in Mid Suffolk defined by ward and output area geographies vary greatly. The most urban category defining Stowmarket is broadly similar at both scales (only two output areas are categorised differently). At the ‘Town and Fringe’ level, however, large areas in the west of the district defined as ‘Town and Fringe’ at the ward level are defined as ’Village’ or ’Hamlet’ at output area level. An example of where the opposite is true can be found in the far south of the district. We can, to some extent, assess the impact of these differences by looking at the flows between the classified areas within this district. Of course the differences will not be the same across all districts in Britain, and it is a little difficult to know how much influence should be apportioned
Figure 5. Wards in Mid Suffolk classified by the ons classification of ST wards
Figure 6. Wards in Mid Suffolk classified by the ons classification of CAS wards
223
Issues Associated with the Analysis of Rural Commuting
Figure 7. Rural areas in Mid Suffolk as defined by the ONS (2004) rural/urban classification using output areas
to other factors already mentioned in this chapter such as small cell adjustment, but we can make some observations from the evidence that is presented. The ‘urban > 10K’ destination is a logical starting point as the spatial extent of this area is very similar for both wards and output areas. Table 4 shows that the flows into this area from areas classified as ‘Town and Fringe’ and ’Village, Hamlet and Isolated Dwellings’ are similar, although, as the flows are relatively small, the percentage difference between data from wards and data from output areas is over 10% in two cases. As higher flows are estimated at ward level for the ’Town and Fringe’ origin and lower flows are estimated for ‘Village, Hamlet and Isolated Dwellings’, it might be suggested that less accurate spatial extents of the areas defined by wards are resulting in an overestimation of flows from the most rural areas into the most urban areas in Mid Suffolk. Outside of the urban destination, it becomes a little more difficult to reach conclusions in relation to the origin and destination flows between classified areas as the spatial extents of the areas become very different. What can be concluded,
224
however, is that there are, in some cases, relatively large differences between flows computed from ward and OA levels (almost 25% in some cases), and furthermore, these differences are not consistent with higher flows in one area not always corresponding with lower flows in another. These issues cast doubt on the reliability of any analyses carried out on flows aggregated to types of area as defined by the ONS rural/urban classification. So the answer to the first question posed at the beginning of this section is not straightforward. Output areas are certainly more accurate at defining rural areas than wards, and therefore offer an opportunity for more accurate spatial definition if the main focus of the research is on the type of area that the flows are occurring between areas. However, disparities between flows aggregated to types of area from OA and ward levels which are quite large in some cases, mean that researchers should be very cautious of the accuracy of the data at both levels, perhaps carrying out the analysis at both levels in parallel and comparing the findings of both before conclusions are made. In addition to these issues, at the point of access to the data, researchers should also be aware of the potential issues when trying to standardise flows to aid comparison. Comparison of gross flows is not always desirable, so net rates are commonly calculated to standardise flows by the populations they are leaving, entering (or entering and leaving in the case of intra-area flows). Defining a population at risk (PAR) is important in this process. Earlier on in this chapter, a rate of commuting between an area in Hertfordshire and London was calculated using total employed residents as a PAR. Whilst this is useful for producing a rate when information on all commuters is not known, it is not ideal. For example, if comparing the intensity of car commuter traffic entering or leaving rural settlement A to travel to B or rural settlement C to travel to B, it is important that the PAR are defined by the number of commuters and not by the rest of the population. It may be, for example,
Issues Associated with the Analysis of Rural Commuting
Table 4. Variations in flows between classifications of wards and output areas in Mid Suffolk Origin
Destination
Town and Fringe - Less Sparse
Urban > 10K - Less Sparse
Ward
OA
612
695
Difference between ward and OA flows No. % -83
11.94
Urban > 10K - Less Sparse
Urban > 10K - Less Sparse
3,354
3,245
109
-3.36
Village, Hamlet & Isolated Dwellings - Less Sparse
Urban > 10K - Less Sparse
1,668
1,511
157
-10.39
Town and Fringe - Less Sparse
Town and Fringe - Less Sparse
3,842
3,114
728
-23.38
Urban > 10K - Less Sparse
Town and Fringe - Less Sparse
401
399
2
-0.50
Village, Hamlet & Isolated Dwellings - Less Sparse
Town and Fringe - Less Sparse
1,297
1,414
-117
8.27
Town and Fringe - Less Sparse
Village, Hamlet & Isolated Dwellings - Less Sparse
1,322
1,724
-402
23.32
Urban > 10K - Less Sparse
Village, Hamlet & Isolated Dwellings - Less Sparse
1,169
1,369
-200
14.61
Village, Hamlet & Isolated Dwellings - Less Sparse
Village, Hamlet & Isolated Dwellings - Less Sparse
10,684
11,315
-631
5.58
Source: 2001 Census
that settlement A has far more people but far fewer commuters than B. Calculating all commuters is difficult unless the whole dataset is available, so it is understandable that proxy PAR are used at times. Fortunately, CIDER provides a facility for downloading PAR constructed from total commuter numbers at the same time as flow data from WICID which enables more accurate rate calculations to be performed. However, whilst CIDER offers commuting flows aggregated to rural classification areas in WICID, PAR for these aggregated areas are not offered. These can be obtained from the parent geography, derived from the total inflows, outflows and within area flows. However, summing all inflow PAR, for example, for an aggregate area will not produce the correct inflow PAR for that area as many of the flows would now count as intra area flows, so researchers should exercise some caution in computing the correct denominators.
CONCLUSION As the introduction to this chapter explained, its purpose was not to produce a single judgement on the suitability of census data for analyses of rural commuting but to illustrate with selected examples the possible difficulties that may arise from its use. These cautions are particularly important for researchers who do not use these data frequently. The widespread – and free release of the 2001 Census on compact media (CDs and DVDs) has meant that the interaction data sets relating to migration and commuting are much easier to use than in previous censuses and this ease of access has been further enhanced by the services offered by the Centre for Interaction Data Estimation and Research. While this is wholly positive, it is sometimes tempting to assume that census data are a completely accurate ‘benchmark’ for analyses of commuting and the underlying labour market structures which they reflect. The census remains the most comprehensive source of such data within the UK and is the only source that allows analysis for relatively small areas or
225
Issues Associated with the Analysis of Rural Commuting
settlements. The 2001 Census, albeit in a somewhat aged form, will remain a key source of information on commuting for several years to come in both an urban and a rural setting. However, the fundamental structure of the census as a residence-based survey is not ideally suited to identifying residents’ journeys to work, particularly when it is becoming increasingly common for people to hold more than one job, to make work journeys that are not wholly on a daily basis, and, in some cases, to have more than one place of residence. In addition, the price that the census pays for releasing commuting information for small areal units is that it applies strict – some may say draconian – measures to protect individual’s anonymity. These factors can combine to produce some of the potential difficulties demonstrated in this chapter. The relatively small settlement sizes and sparse commuting flows of many rural areas mean that they may be particularly exposed to these effects. The conclusion is not that census data cannot be used for analyses of rural commuting, but it is certain that such research should always ‘proceed with caution’.
REfERENCES Bibby, P., & Shepherd, J. (2005). Developing a New Classification of Urban and Rural Areas for Policy Purposes: The Methodology. Final Report to DEFRA, DEFRA, London.
226
Duke-Williams, O., & Stillwell, J. (2007). Investigating the potential effects of small cell adjustment on interaction data from the 2001 Census. Environment & Planning A, 39, 1079–1100. doi:10.1068/a38143 Office for National Statistics. (2005). Quality Report for England and Wales (Census 2001). Basingstoke: Palgrave-Macmillan. Office of the Deputy Prime Minister. (2004). Planning Policy Statement 7: Sustainable Development in Rural Areas. London: HMSO. Shepherd, J., Bibby, P., & Frost, M. (2008). Mapping Socio-Economic Flows Across the Region. Final Report to East of England Development Agency, EEDA, Histon. Williamson, P. (2007). The impact of cell adjustment on the analysis of aggregate census data. Environment & Planning A, 39, 1058–1078. doi:10.1068/a38142 Winter, M., & Rushbrook, L. (2003). Literature Review of the English Rural Economy. Report to DEFRA. London: HMSO.
227
Chapter 12
Defining Labour Market Areas by Analysing Commuting Data: Innovative Methods in the 2007 Review of Travel-To-Work Areas Mike Coombes Newcastle University, UK
ABSTRACT This chapter draws on research undertaken in revising a set of functional regions known as TravelTo-Work Areas (TTWAs) which are the only official statistical areas in the UK defined by academics. The objective of the research is to define the maximum possible number of separate TTWAs that satisfy appropriate statistical criteria that ensure the areas meet guiding principles for labour market area boundary definition. Thus, the research is an example of a functional regionalisation which is highly constrained by the purpose to which the resulting boundaries will be put. The chapter briefly reviews previous TTWA definition methods, setting this in the context of the very limited academic research on regionalisation methods. The production of the 2001 Census commuting data provided opportunities for defining new labour market areas and the chapter explains how the TTWA research has responded with several key innovations. The empirical component of the chapter then illustrates the effect of these innovations by presenting a new visualisation of the workings of the definition method and also some analysis of the sensitivity of the results to changes in the method. Finally, there is a very brief look at some possible ways in which this field of research could be extended.
INTRODUCTION: THE RESEARCH CHALLENGE This chapter reflects on many years of research leading to the Coombes and Bond (2008) revision of TTWAs, the statistical geography of the Office DOI: 10.4018/978-1-61520-755-8.ch012
for National Statistics (ONS) that represent a set of sub-regional labour market areas. The basis of TTWA boundary definitions is an analysis of recent patterns of commuting. These patterns change over time so TTWAs are reviewed once each decade by analysing the Special Workplace Statistics (SWS) from the population census because in Britain this is the only data available on commuting flows at
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Defining Labour Market Areas by Analysing Commuting Data
the local level. For several decades now, each new census has led to a review of TTWAs with the explicit objective of providing a consistently defined set of appropriate areas for the reporting of local labour market statistics in general, and unemployment statistics in particular. The core objective of the TTWA definitions is thus to identify patterns in the commuting data as a means of consistently defining a set of labour market area boundaries. The key benefit of TTWAs to statistics users is that they enable valid comparisons of labour market conditions and trends across the country. This is because they have been specifically defined to be comparable in relation to key labour market statistical characteristics relevant to labour market analysis. The underlying statistical logic is about using appropriate classifications (Rose and O’Reilly, 1998) which, in this case, means an appropriate geographical classification. An additional advantage which TTWAs have over local authorities (LAs) which in Britain – as in most countries – are the ‘default’ areas for publishing official statistics, is that they can provide more local detail in areas like the Highlands of Scotland where recently-revised LAs are so large that statistics published at that scale ‘average away’ the distinctive circumstances and trends of numerous contrasting local economies. This very brief description of the context sets the parameters for the research challenges which arise in the definition of TTWAs. The first constraint is the fact that the continued existence of this statistical geography depends upon them retaining their statistical properties which are valued by users; those properties will be detailed later in this chapter. The second constraint is the one which determines the nature of those properties: TTWA boundaries must represent a set of well-formed local labour market areas and, as such, meet criteria relevant for academic and policy debates around sub-regional economies (HM Treasury et al., 2007). The third constraint on the TTWA definitions is that their derivation
228
from analysing localised patterns of commuting must result from a consistent approach applied nationally. It is worth using this set of three constraints to assess the potential benefit of TTWAs to users over the alternative set of sub-regional statistical areas, the current set of LAs. It could hardly be expected that all LAs would have the statistical properties required of labour market area boundaries because they are defined to meet other criteria (e.g. Boundary Committee for England, 2008). The result is that no ‘tier’ of administrative areas forms a set of meaningful local labour market areas, as the case of the Scottish Highlands has already exemplified. Although, in some cases, commuting data has been referred to in adjusting LA boundaries, this has certainly not been a consistent national process. As a result, the case for the continued production of updated TTWAs remains a strong one, so long as the boundaries meet the above three constraints and are also widely accepted as providing a set of intuitively reasonable sub-regional entities in all parts of the country. Note that the claim is not made that they will be ‘ideal’ labour market area boundaries; such a claim would emphasise the fact that what might be ‘ideal’ for one set of users will not be so for others. Instead, the aim is to provide the generality of users with a set of boundaries which is at least plausible in all parts of the country and can meet a high proportion of user needs. To some extent, describing this objective as one which sets a considerable challenge to the analyst is to hark back to earlier times. When the boundaries of TTWAs were being revised prior to the 1970s, the process depended entirely on regional and local knowledge; this at least meant that the stakeholders who were involved would consider that the boundaries produced were well defined! Most obviously, it also meant that there was no possibility that they were consistently defined and this became a key problem in the latter 1960s when the boundaries became used more intensively to determine where public funding
Defining Labour Market Areas by Analysing Commuting Data
for economic development was available. This was also the time when more census commuting data became available and so a new opportunity for consistent evidence-based definitions arose. It was Smart (1974) who provided the first comprehensive analyses underlying a set of TTWA definitions, at that time developing an algorithm which he was able to implement with a slide-rule in analysing commuter flows between around 2,000 building-block areas (or zones as this chapter will term all such areas used in census commuting datasets). It was also in the early 1970s that computerised analysis of socio-economic data became widely adopted in academic disciplines such as geography and this quantitative revolution slowly extended to analyses of data on flows such as commuting or migration patterns. Coombes and Openshaw (1982) computerised the analyses of Smart (1974) and the rigid consistency of this analysis highlighted the substantial degree of flexibility in the process which occurred between the results of the slide-rule analysis and the boundary definitions which were finally agreed to by civil servants. In addition, the computerisation made possible numerous repeated analyses to reveal the sensitivity of the results to specific features of the algorithm or fine tuning of the statistical criteria set for the TTWAs. This was also a period when alternative approaches were being developed to define boundaries that, in broad terms, were similar to TTWAs in seeking to internalise the larger commuting flows; this was the rather specialist form of spatial analysis termed functional regionalisation (Coombes, 2000). Given this flurry of activity around 30 years ago, how is it that the definition of TTWAs can still pose a significant challenge? The simplest part of the explanation is that academic interest in computerised spatial analysis waned after the 1980s, although it has recently become re-invigorated with developments termed geo-computation (e.g. Openshaw and Rao, 1995). In relation to TTWAs, something of a shadow was cast over further de-
velopment by the intensive innovations leading to the computerised definitions based on analysing the 1981 Census data by Coombes et al. (1986): the approach proved successful not only then but also when applied to 1991 Census data and even to the data of other European countries (Eurostat, 1992) and has been lauded elsewhere (e.g. Frey and Speare, 1995). As will be described in a little more detail later, the 1980s approach to defining TTWAs was radical due to several key features which made it evaluate myriads of alternatives to seek the more optimal set of results, at a time when most regionalisation methods sacrificed optimality because the limited computation power then available encouraged approaches that minimised the number of calculations performed. In particular, the TTWA algorithm was radical in not restricting the options considered to contiguous pairs of zones: the computation task is hugely reduced if, when grouping zone z, the choice is limited to the six (on average) zones with which z shares a common boundary, rather than evaluating the possibility of grouping z with any one of the hundreds (on average) of zones that z has some flows to or from. In most cases, the best choice will in fact be one of the contiguous zones, but if all other options are denied, then inevitably the result will quite often be that a sub-optimal grouping will be made. While computing power has, of course, increased very markedly since the 1980s, there has been the countervailing effect of the number of zones used for Census commuting data increasing too: circa 2,000 in 1971 then circa 9,000 by the 1980s and over 40,000 in the 2001 dataset used here. With flow data coming in matrix form, the scale of increase is the square of what these numbers suggest so that, for example, the data from the 2001 Census is a matrix with 40,000*40,000 cells, that is over 1.5 billion cells. Given that the aspiration is to evaluate all the different possible solutions before selecting the one which best meets the statistical objectives set, major analytical chal-
229
Defining Labour Market Areas by Analysing Commuting Data
lenges do persist due to this increase in numbers of zones to be analysed, despite the increases in computational power available. From the user point of view, these technical challenges are merely background issues which do not affect their need for acceptable local labour market area definitions which adhere consistently to relevant statistical criteria. The key statistical criteria were identified originally in Goodman (1970): •
•
commuting self-containment (i.e. few of the work trips to or from areas within the boundary should cross that boundary); combined with commuting integration (i.e. there are significant numbers of journeys to work between most of the areas within the boundary).
These are ideal attributes and, as more commuting trips have become more lengthy, the latter has become ever more difficult to satisfy. With more long-distance commuting, TTWAs can still be defined to meet fixed levels of self-containment: the problem here is that TTWAs must become larger to internalise the same proportion of commuting. For example, the 1981 commuting dataset yielded 334 TTWAs, but on the same self-containment criterion only 308 separable 1991-based TTWAs could be defined a decade later. This reduction in number of TTWAs conflicts with the wishes of most users who want as many separate areas as possible for their analyses! This is an inevitable trade-off which the analyses can best deal with through sensitivity testing, illustrating the difference it makes to apply different self-containment criteria to the definitions. What is not possible is to mitigate the inevitable reduction in commuting integration which comes with the larger TTWA boundaries that longer-distance commuting necessitates. For example, it is well known that cities and towns like Peterborough and Brighton, many kilometres from London, now have significant commuting flows to the capital, but if a TTWA
230
boundary internalises these places then it will not be at all integrated because few people commute directly between these outlying places. This is not to say that commuting patterns remain simply centralised on large city centres. Along with the growth of longer-distance commuting associated with such factors as the increase in car use and the decline in traditional sectors where local working was common, there are other trends that allow more polycentric labour markets to emerge. For example, some workplaces have been de-centralised to city edges, while more households have two earners who will find it difficult to live near both workplaces, perhaps leading them to join the growing minority with more complex working patterns (e.g. working at home for part of the week). In short, there are many trends changing the pattern of commuting in Britain that are making it more challenging for TTWA definitions to meet their objective of providing users with numerous separate labour market areas which have the required statistical characteristic of a reasonable level of self-containment while also being recognised as reflecting local economies across the whole country.
REGIONALISATION METHODS fOR OffICIAL STATISTICAL AREAS Although it may be true that TTWAs pre-date the equivalent areas in many other countries, research by Cattan (2001) found that there were few OECD countries where official local labour market areas were not defined. Of course, most modern countries have seen a growth in longerdistance commuting and this means that any fixed set of boundaries will become less useful for analytical purposes; given that the LAs in most other countries have remained unchanged for far longer than those of Britain – where LA boundary change occurs with most unusual frequency – the very long-established LA boundaries in most other
Defining Labour Market Areas by Analysing Commuting Data
Table 1. Principles for local labour market area definitions Principle Objectives Purpose Relevance Constraints Partition Contiguity Criteria (in descending priority) Autonomy Homogeneity Coherence Conformity
Practice To be statistically-defined areas appropriate for policy Each area to be an identifiable labour market Every building block to be allocated to only one area Each area to be a single contiguous territory Self-containment of flows to be maximised Areas’ size range to be minimised (e.g. within fixed limits) Boundaries to be reasonably recognisable Alignment with administrative boundaries is preferable
Source: Eurostat (1992)
countries are even less plausible as labour market areas than their British equivalents. As an initial recognition of the potential value of a consistent cross-national approach, Eurostat (1992) outlined the basis of the definition methods used in several countries and intimated that the TTWA method was ‘best practice’ when measured against its check-list of principles. Table 1 shows these international ‘standards’ for evaluating labour market area definitions. Table 1 suggests a number of criteria over and above commuting self-containment and these should be recognised and considered here. One is a minimum size; the key argument behind this is that a data series will be far more likely to be unhelpfully volatile if it relates to an area with a small population. Additional criteria (Table 1) are largely uncontroversial, such as that these official boundaries should not overlap each other. As a result, the review of TTWAs described in this chapter has followed past practice by defining TTWA boundaries in line with these nine principles. One additional guideline which has emerged from the experience of previous reviews of TTWAs is that the more separate areas that are recognised, the more the areas are acceptable to users. Thus, the basic goal in defining TTWAs can be expressed as: to define as many separate TTWAs as possible with the 2001 commuting data, subject to the statistical criteria set in applying the principles above (Table 1).
This guidance, which has emerged from the background to TTWA definitions, leaves a fair degree of flexibility over how exactly the commuting data should be analysed to create the boundary definitions. The next section of the chapter describes some of the alternatives considered in the review, including some opportunities and challenges that were a direct result of key innovations in the 2001 Census and so had not been part of any previous review.
NEW OPPORTUNITIES AVAILABLE IN ANALYSING DATA fROM 2001 There are two possible responses to the need to update TTWA definitions in response to the changes in commuting patterns across the country. The first option is to simply apply the same definition method to the 2001 Census commuting data set; this was very much the option taken after the 1991 Census data had become available, with the method developed for the 1981 dataset used largely unchanged (ONS and Coombes, 1998). The second option is to adapt the method to take advantage of new opportunities which were not available a decade earlier, not only in terms of increased computational power but also changes to census data collection and output practices. This section of the chapter discusses the latter, but in the light of the potential afforded by the former
231
Defining Labour Market Areas by Analysing Commuting Data
to analyse ever larger matrices ever more quickly and hence take advantage of any opportunities for methodological innovation that changes in the available census data made possible. The most fundamental change, compared to previous British census commuting datasets, comes from the 2001 Census commuting data providing 100% coverage of employed people. Any matrix with many cells is likely to have many values that are very small numbers or zero, so the decision to code data on everyone who was enumerated rather than just a 10% sample is a huge step towards mitigating small number problems in data analysis. For the TTWA analysis, this change can only serve to make 2001-based boundary definitions more robust than their predecessors. The importance of this change was reinforced by change in the grain of areas used as the zones in the commuting data matrix. In previous census data sets, the smallest zones used were wards, the small areas that local authorities are divided into which number around 10,000 across the whole country. In an immense shift of policy, the 2001 Census data on commuting have been made available for output areas that number around 175,000; in making this change, ONS has undoubtedly increased by far the size of the largest commuting data matrix ever published by any national statistical agency. Of course, this massive increase in matrix cell numbers effectively reinstates all the small cell number problems reduced by the decision to shift to 100% data coverage. Late in the census data production process, another change was introduced that very much exacerbated small number problems. It was decided that, unlike the 1991 data, the published 2001 Census commuting dataset – apart from data on residents in Scotland – must be made subject to a disclosure control procedure called small cell adjustment method (SCAM). SCAM altered values of 1 or 2 so that they become values of 0 or 3: this process most acutely affects matrix datasets like that on commuting because their large number of cells makes them very prone to include many low
232
values (see Chapter 3 in this book) Fortunately, the TTWA research was undertaken in collaboration with ONS itself and so ONS made available for the analyses reported here a dataset not subject to SCAM. Thus, SCAM has not affected the definition of the 2001-based TTWAs. A regrettable consequence of the research using a ‘SCAM-free’ dataset is that other researchers, who must use one of the published census commuting datasets with their SCAM effects, cannot exactly replicate the results produced here. Given all these innovations, ONS had to decide on the zones to use for TTWA boundary definition exercise, with the inevitable trade-off between increased potential boundary precision by using smaller zones and increased robustness of analysis from the reduced small number problems with larger zones. The decision was to take some of the advantage from the 100% data coverage by increasing the number of zones analysed from the roughly 10,000 wards but not to go as far as using output areas. This led to the use of lower-level super output areas (LLSOAs). More precisely, the zones used for this research vary slightly between the UK’s countries: 32,482 LLSOAs in England, 1,896 LLSOAs in Wales, 6,505 data zones (similar to but slightly smaller than LLSOAs) in Scotland and 890 SOAs in Northern Ireland. This set of building block areas will all be referred to as zones in this chapter. Analyses using these 41,773 zones can be much more precise than any based on just a quarter as many wards, so the 2001-based TTWA boundaries can more precisely match the detailed pattern of commuting. At the same time, the reduced risk of small number problems resulting from the 100% data coding has been at least partially lost due to this zone dataset distributing commuters over a much larger matrix; the 41,773 zones yield a matrix of over 1.5 billion cells, compared to a 0.1 billion cell ward matrix.
Defining Labour Market Areas by Analysing Commuting Data
INNOVATIONS IN THE DEfINITION Of TRAVEL-TO-WORK AREAS The crucial differences between the commuting data available from the 2001 and the 1991 Censuses mean that defining 2001-based TTWAs could not be a simple ‘updating’ of the 1991-based TTWA boundary definitions (Figure 1). Given that it was not possible to gain the benefits of consistency though time, which official statisticians may be inclined to prize more highly than the less certain benefits of innovation, there was every reason to use the enforced break with the practices of the past to consider a number of other changes to the ways TTWAs had been defined over the last two decades. In particular, it was timely to address some questions raised about the established TTWA definition method. •
Is it possible for the algorithm applied to the commuting data to be simplified?
• •
Can the levels of size and self-containment required of all TTWAs be altered? Should the ruling that no TTWA can span across England’s borders be dropped?
These changes are now considered in turn. It may be helpful to state immediately that all these changes have in fact been implemented in the definition of 2001-based TTWAs. It must be stressed that such changes do not alter the underlying objective for the research which is to define as many separate local labour market areas as possible with the most recent commuting data, subject to the statistical criteria set. Both the 1981-based and 1991-based TTWA definitions relied on a computerised algorithm which involved several steps and numerous separate parameters (see ONS and Coombes, 1998, for a detailed description). Part of the reason for a multi-step approach in the 1980s was that this split the computational burden into multiple stages
Figure 1. The 2001 TTWAs in central and eastern England and Wales
233
Defining Labour Market Areas by Analysing Commuting Data
at a time when processing a matrix with over a billion cells was very time consuming. Now this practical constraint barely exists; a moderately powerful laptop proved able to rapidly process the multi-billion cell matrix used here. This allowed experimentation in simplifying the algorithm, ultimately leading to the multiple steps of the earlier algorithm being replaced by a single process with many iterations. It was found that most of the steps in the earlier method contributed little to the final results, so the final step of the earlier method could be relied upon to complete the whole task of grouping over 40,000 zones into a set of TTWAs numbered in the hundreds. Most importantly, it was this final step that ensured that the eventual TTWAs would satisfy all the statistical criteria, so basing the whole method on this one step still ensures that the definitions satisfy these criteria. The process is as follows, remembering that, at the outset, every individual zone is considered a ‘proto’ TTWA). A.
Rank all proto TTWAs in terms of their size and self-containment values B1. if the lowest-ranked proto TTWA meets the requirements set, STOP B2. if not, then continue to C C. Dissolve the lowest-ranked proto TTWA into its constituent zones D. Group each zone with that proto TTWA it is most strongly linked with E. Re-calculate the size and self-containment values of altered proto TTWAs F. Return to A Considerable experimentation has led to the choice of the formula to determine in which way a zone should be grouped to maximise the likelihood that the resulting TTWA definitions most closely meet their objectives. The key need in practice is to enable smaller places near major centres to consolidate as separable TTWAs (where commuting flows justify this) because otherwise the TTWAs that include major centres expand remorselessly
234
to engulf all surrounding areas, with the result that the set of defined TTWAs is less numerous than the maximum possible which meet the set criteria. This formula combines four flow measures: a b c d
is the flow i to j as a percentage of all flows from i (including flows from i to itself); is the flow i to j as a percentage of all flows to j (including flows from j to itself); is the flow j to i as a percentage of all flows from j (including flows from j to itself); is the flow j to i as a percentage of all flows to i (including flows from i to itself);
where the final formula (ONS and Coombes, 1998) is computed in the following way: (a * b) + (c * d) The single step process used for the 2001-based definitions allowed experimentation with the statistical criteria that the TTWAs must satisfy. This experimentation was carried out within the framework of the approach established in earlier definitions, because this provides for flexibility in the specific size and self-containment values required of the TTWAs. These two parameters are combined in a linear spline function which partially trades-off the size and self-containment values, with the overall function giving a single index value by which all proto TTWAs can be ranked (Coombes et al., 1986). It should be noted that in all TTWA analyses, the single measure area self-containment refers to what could be seen as the critical value for any area or, to be specific, the lower of the two self-containment values derived from the commuting data for the area: •
•
the supply side self-containment, i.e. the percentage of working residents who work locally; and the demand side self-containment, i.e. the percentage of people working in the area who live locally.
Defining Labour Market Areas by Analysing Commuting Data
The limited trade-off between an area’s critical self-containment value and its size means that large areas can remain separate TTWAs with lower self-containment levels than the level that is required of areas with small workforces. Both the self-containment and size criteria have target and minimum levels set, so that every TTWA must surpass both the minima values and, if it does not also pass both target values, its value on the combination of the two criteria must be at least equal to that of an area which meets one of the target values as well as the minimum value on the other criterion. A series of sensitivity analyses then yielded the preferred set of parameter values after evaluating the results from alternative values against the objectives set for the TTWA boundary definitions (taking as the ‘base line’ for this evaluation the results of retaining the parameter values used in the 1991-based definitions). A third change to the method, as signalled earlier, was to remove the prevention of TTWAs crossing the borders between England and either Wales or Scotland. It would also not have been permitted for TTWAs to include both parts of Northern Ireland and parts of another UK country but in practice this was never a possible outcome. Here again, sensitivity analyses offered an assessment of how much difference this change made to the results. In the vicinity of Chester and Berwick most especially, national borders cut right through the heart of genuine TTWAs in the final set of 243 defined by Coombes and Bond (2008) which ignore the historic artefact of national borders to focus exclusively on patterns of commuting flows. Figure 1 shows the boundaries of 2001-based TTWAs in central and eastern England and Wales with physically built-up areas providing the back-cloth to help with orientation. Compared to almost any set of British administrative areas, TTWAs are distinctive in their high degree of similarity of area size, as a result of few people wishing to commute very long distances in any part of the country. For example, the physical size
of the London TTWA can be seen to be similar to most of those surrounding it – and indeed to the York TTWA in its more rural surroundings – whereas with administrative areas there is usually a stark contrast between physically small areas in metropolitan regions and the much larger areas covering more rural localities (so the areas have more similarly sized populations). Figure 1 also shows the TTWAs for the comparable provincial cities of Birmingham and Manchester and these boundaries provide one example of the TTWAs drawing attention to important local geographical realities as a result of the strict consistency of their definitions. It can be seen that the Manchester TTWA encompasses a high proportion of the built-up area in that region: the commuting flows linking the central areas with previously distinct ‘satellite’ towns such as Bury and Stockport are too strong for the latter to be self-contained enough to remain separate TTWAs. In contrast, Birmingham TTWA embraces only the southeastern half of its conurbation while the area to the north-west continues to be divisible into three TTWAs that are separate from each other as well as being relatively self-contained from Birmingham just a few miles away. This contrast between two comparable cities is rooted in patterns of local geography that are far too detailed to be pursued here, but the key point for the present chapter is that the TTWA analyses have succeeded in revealing deep-seated differences due to contrasts in industrial structure and other local features which will be recognisable and relevant to many users of the TTWAs.
IMPACT Of THE INNOVATIONS: SENSITIVITY AND VISUALISATION As has been indicated already, it is important to assess how sensitive the TTWA definitions are to changes in the analysis which underpins them. In particular, the questions to answer are:
235
Defining Labour Market Areas by Analysing Commuting Data
• • •
How much difference do changes make to the definitions? Which areas are more affected by the changes? Can one set of results be shown to be preferable?
In the 2001-based TTWA definition procedure, the change to a single step process is a key change from the method producing the 1991-based boundary definitions. Another key change is the lowering of the self-containment minimum value for TTWAs, with some associated adjustments to the other settings on the formula which operationalises the trade-off between self-containment and size. Other changes since the 1991-based definitions that warrant further investigation include the change from wards to LLSOAs as analysis zones, and the new decision to allow the national borders of England to be spanned by TTWAs. Basic sensitivity analyses answer the first and perhaps the second of the questions above by identifying where there are differences between two sets of boundaries. It is only possible to tackle the third question after identifying what is looked for in TTWA definitions: what is it which would make one set of boundaries observably superior to another? The key consideration has already been identified in this chapter; the preference is for maximising the number of separate TTWAs. One possible secondary objective is to limit the size of the London TTWA because numerous users in the past have emphasised that a very large London TTWA makes the boundaries less valuable to them. The last mentioned sensitivity issue – national border imposition – is the least significant and has no effect on either numbers of TTWAs or the size of the London TTWA. The latter result is no doubt due to London being some distance from either Wales or Scotland so it is not affected by the ‘ripple’ of effects spreading out from the border areas. Changing the zones analysed to wards has
236
less effect that might have been expected; the total number of TTWAs falls by just one, and the size of London increases just slightly, so both these minor impacts are contrary to the preferences which were identified. London has a notably smaller TTWA if the analysis method is changed back from the single step process to the multi-step process used prior to the latest innovation in definition method. This beneficial impact is outweighed, however, by this older approach producing fewer separable TTWAs. On this criterion – the key concern of users, who always resent reductions in TTWA numbers – the worst impact comes from restoring the self-containment and size standards to the levels used in the 1991-based definition. Thus, the summary of the sensitivity analyses is that none of the alternatives to the 2001-based definitions offers a real overall improvement in the results, and certainly none can find a larger number of TTWAs meeting the predefined self-containment and size criteria. The difficulty of visualising what happens in the process of creating a few hundred TTWAs from many thousands of zones is approached here in two ways. First, it is useful to understand clearly how the analysis of commuting data proceeds by aggregating zones in order to ensure that the maximum possible numbers of separate TTWAs are identified which all meet the self-containment and size criteria. Figure 2 shows both supply-side and demand-side self-containment and also the workforce size of the proto TTWAs which are rejected during the process of aggregation that ensures that all the resulting TTWAs are self-contained and large enough to meet the required levels. On the left-hand side of the chart the set of proto TTWAs is in fact the set of 41,773 individual zones that the analysis starts with. It can be seen that in the very earliest stages – when the first groupings of zones are taking place, but when most of the proto TTWAs will be individual zones – many rejected proto TTWAs have very low demand side selfcontainment values. These will be zones in the City of London and similar job foci, because a low
Defining Labour Market Areas by Analysing Commuting Data
Figure 2. Values on self-containment and size of deleted proto TTWAs
value on the demand-side self-containment measure means few of the jobs in the area are taken by local people. Figure 2 shows that after this initial phase, the bulk of the analysis process involves grouping areas with low supply-side self-containment values; these areas are typically ‘suburban’ and, of course, there are many more such zones than there are job centre zones. Figure 2 shows that it is only towards the latest stages of the grouping process that area size plays an important role in the assessment of proto TTWAs. This indicates that in most areas the decisive factor causing zones to be aggregated together to meet the required size and self-containment criteria is that they are found to be insufficiently self-contained: it is only when the number of proto TTWAs is numbered in the low hundreds (on the very right-hand side of the chart) that small size is likely to be the cause of a proto TTWA being identified as needing to be grouped with other areas. Figures 3 and 4 show maps of two stages in the grouping of areas in north-central England as the process builds up TTWAs which all meet the required levels of self-containment and size. Thus, for this one selected part of the country, Figures 3 and 4 show two snapshots of the ‘state of play’ as the single step procedure iterates many thousands of times through the process just described in sta-
tistical terms. Figure 3 shows the situation after the process had reduced the initial set of zones by four-fifths (to 8,000 proto TTWAs), and then Figure 4 shows proto TTWA boundaries close to the final TTWAs; there are just 250 of this set of proto TTWAs across the whole country. Figures 3 and 4 show final TTWA boundaries as black boundaries, superimposed upon LA boundaries which are those in grey. The star-like features are proto TTWAs. The lines in each star join the centroids of all the zones constituting that proto TTWA in such a way that two zones will appear as a single line, three zones will appear as a triangle, and so on until a multi-sided polygon appears as a star. This device dramatises the progressive grouping of the analysis process but, at the same time, illustrates the strongly localised nature of the groupings. A better understanding of the process can be gleaned from looking at the results in two contrasting areas. On the west coast – the left-hand side of the maps – are two notable bays: the one to the north is Morecambe Bay and the town of Morecambe itself is on the coast where Morecambe Bay reaches furthest to the east. Figure 3 shows Morecambe as a small cluster of lines that is joined on to a similar size cluster of lines with a distinct north-south orientation: the latter is the city of Lancaster which forms a continuous
237
Defining Labour Market Areas by Analysing Commuting Data
Figure 3. Zone groupings in north-central England when there are 8000 proto TTWAs
Figure 4. Zone groupings in north-central England when there are 250 proto TTWAs
built-up area with Morecambe and the fact that the two clusters of lines are linked shows that the towns have become a single proto TTWA even at this relatively early stage of the process. The line which encloses the towns – as well as nine small stars – represents the final TTWA boundary which emerges at the end of the process. In fact, this TTWA is unusual in exactly matching the LA boundary in the area (but as the TTWA boundaries are superimposed upon the LA boundaries here, the Lancaster LA boundary cannot be seen). The
238
nine small stars within the LA are clusters of small towns or villages that have sufficient local working for their self-containment levels to be high enough for them to remain separate proto TTWAs at this stage of the process. Figure 4 reveals that, at this very late stage of the process, the cluster of zones which includes Morecambe has absorbed almost every constituent zone of the nine small proto TTWAs that Lancaster LA had included at the earlier stage (Figure 3). In fact, the cluster does not quite include all,
Defining Labour Market Areas by Analysing Commuting Data
and only, zones in the LA; this is because, following the computerised analysis, there was a brief consultation process and in this area it was found that shifting a few zones between TTWAs allowed the boundary to exactly match that of the LA. Table 1 had shown that such matching was desirable, although not as a high priority. It must be stressed that the consultation process was rigidly constrained so that the final set of TTWAs has to meet all the statistical criteria in terms of self-containment and size, and any changes made must be within the ‘degrees of freedom’ left by these constraints. The way these constraints were applied to the consultation process is detailed in Coombes and Bond (2008). The other case worthy of discussion here is that of Liverpool, which is represented by the large star to the east of the larger bay at the southern end of the west coast on the maps. Liverpool had already consolidated as a large cluster at quite an early stage of the process (Figure 3). Further east – toward the centre of the map – is Manchester which at this early stage is still quite fragmented due to it being a much older conurbation with many more closely spaced old town centres with local job opportunities. Figure 4 shows that by this late stage of the process, both Liverpool and Manchester have absorbed very many previously separate proto TTWAs, with Liverpool having more of a north-south orientation. This is due to areas between the two cities having much stronger links with Manchester because the latter has been the more economically dynamic of the two cities and so provides more opportunities for longer-distance commuting. In fact, Figure 4 shows larger stars from this stage of the process than the final TTWAs. The reason for this is that the consultation stage produced some suggestions for ‘reinstated’ smaller TTWAs. These suggestions were tested against the set size and self-containment criteria for the final TTWAs and if the smaller TTWAs could meet these criteria – without causing any of the TTWAs they had previously been part of to then fail the criteria – these changes could be accepted. This enables the final set of TTWAs
to better meet their principal objective of providing the largest possible number of separable areas meeting the set statistical criteria.
CONCLUSION As has been emphasised several times in this chapter, the core objective of the research reported here was to analyse the 2001 Census data in ways leading to the definition of the maximum possible number of TTWAs that satisfy all the set statistical criteria. These criteria ensure the TTWAs meet the requirements for a set of labour market areas used for reporting official statistics. A summary evaluation is that the research has indeed met these objectives in that the defined TTWAs all satisfy the set statistical criteria and the sensitivity analyses suggest that it is unlikely that changing the way of analysing the 2001 Census data will lead to the definition of additional TTWAs which all meet the set statistical criteria. The value to statistics users of the 2001-based TTWAs will be proven over time through them enabling more valid comparisons of labour market conditions across the country. One of the more immediately obvious advantages they offer – over LAs which remain the default areas for reporting local official statistics – is the level of detail in areas like the Highlands of Scotland where several separate TTWAs can provide insights into distinctive local circumstances ‘averaged away’ by statistics for the single LA area. It is valuable to reflect upon how this essentially positive outcome has been achieved. A large number of innovations were required, with each one building on the opportunities that were opened up by the others, and all of them facilitated by seminal increases in computing power. The basic innovations have come from ONS with the production of commuting data for 100% of the employed population for the first time. The ten-fold increase in data volume has allowed more robust measurement of flows in
239
Defining Labour Market Areas by Analysing Commuting Data
more sparsely populated areas in particular. This also made it feasible to deliver a near five-fold increase in the number of zones that the data are analysed for, and the decision of ONS to give the researchers temporary access to data not subject to the disclosure control applied to published data was also a vital innovation, without which it would have been almost pointless to analyse data for very small zones because a huge proportion of the values in the matrix would have been altered by the disclosure control process. Innovations in the analysis method has sought to extract maximum benefit from these enhancements to the data available. The method of computerised analysis developed in the 1980s TTWA definition processes of previous decades has been radically simplified, allowing the analysis to cope elegantly with the vast matrix of very small areas without any apparent loss of coherence to the results. The statistical criteria have been adjusted to produce more appropriate results with the 2001 data: the required level self-containment level is now set at a lower level which is in keeping with the trend for more longer-distance commuting, with a greater trade-off between this criterion and the size measure. In addition, there is now no longer a bar on individual TTWAs including not only some areas in England but also some parts of Wales or Scotland across the border. This chapter itself has sought to take forward the innovation by providing some new information on TTWAs. Two specific departures have been made, albeit briefly. The first has outlined some sensitivity analysis and thus indicated the sorts of experimentation which laid the groundwork for the final decisions on how exactly the data should be analysed to define the final set of TTWAs. The second new form of information is the mapping of interim results from the many thousands of iterations of the analytical process from which the final set of TTWA definitions eventually emerge. As for necessary future innovations, it seems important to find better ways of comparing different sets of results than simply overlaying two sets 240
boundaries and speculating as to why the boundaries differ. The different sets of boundaries could be from different census years; it is certainly true that, as yet, there are few if any good examples of the mapping of changing commuting patterns. Other sets of boundaries which need to be contrasted are the results from applying an analysis, such as that which has been developed to define TTWAs, to datasets on selected sub-groups of the workforce: it is very well known that, for example, workers with part-time jobs tend to commute less far than full-time workers, but how do these commuting patterns differ in each part of the country? For the moment, being able to appreciate the geography of segmented labour local labour market areas awaits further innovation to enable better visualisation of contrasts between the patterns of spatial interaction of different workforce groups between many thousands of small areas.
ACKNOWLEDGMENT This chapter draws on the CURDS research to define TTWAs for the Office for National Statistics (ONS); the author gratefully acknowledges that research funding and the permission to use the research for academic and related purposes. The dataset used for the analyses was generated from the 2001 Population Census which in general is Crown Copyright (although the specific dataset used here has not been published). The author is also very grateful for the essential inputs of other members of the Travel-to-Work Areas research team, in particular Colin Wymer and Simon Raybould in CURDS and Steve Bond (ONS).
REfERENCES Boundary Committee for England. (2008). Draft Proposal for Unitary Local Government in Devon. London: The Boundary Committee for England.
Defining Labour Market Areas by Analysing Commuting Data
Cattan, N. (2001). Functional Regions: A Summary of Definitions and Usage in OECD Countries, OECD (DT/TDPC/TI(2001)6), Paris. Coombes, M. G. (2000). Defining locality boundaries with synthetic data. Environment & Planning A, 32, 1499–1518. doi:10.1068/a29165 Coombes, M. G., & Bond, S. (2008). Travel-toWork Areas: The 2007 Review. London: Office for National Statistics. Coombes, M. G., Green, A. E., & Openshaw, S. (1986). An efficient algorithm to generate official statistical reporting areas: the case of the 1984 Travel-to-Work Areas revision in Britain. The Journal of the Operational Research Society, 37, 943–953. Coombes, M. G., & Openshaw, S. (1982). The use and definition of Travel-to-Work Areas in Great Britain: some comments. Regional Studies, 16, 141–149. doi:10.1080/09595238200185161 Eurostat. (1992). Study on Employment Zones, Eurostat (E/LOC/20), Luxembourg. Frey, W., & Speare, A. (1995). Metropolitan areas as functional communities. In Dahmann, D. C., & Fitzsimmons, J. D. (Eds.), Metropolitan and Nonmetropolitan Areas: New Approaches to Geographical Definition (pp. 139–190). Washington, DC: US Bureau of the Census.
Goodman, J. F. B. (1970). The definition and analysis of local labour markets: some empirical problems. British Journal of Industrial Relations, 8, 179–196. doi:10.1111/j.1467-8543.1970. tb00968.x ONS, & Coombes, M. (1998). 1991-based Travelto-Work Areas. London: Office for National Statistics. Openshaw, S., & Rao, L. (1995). Algorithms for re-engineering 1991 Census geography. Environment & Planning A, 27, 425–446. doi:10.1068/ a270425 Rose, D., & O’Reilly, K. (1998). The ESRC Review of Government Social Classifications. Office for National Statistics, London. Retrieved from www. statistics.gov.uk/downloads/theme_compendia/ ESRC_Review.pdf Smart, M. (1974). Labour market areas: uses and definitions. Progress in Planning, 2, 239–353. doi:10.1016/0305-9006(74)90008-7 Treasury, H. M. BERR & CLG. (2007). Review of Sub-national Economic Development and Regeneration. HM Treasury, Department for Business, Enterprise and Regulatory Reform and Communities and Local Government, London.
241
242
Chapter 13
Estimating Spatially Consistent Interaction Flows Across Three Censuses Zhiqiang Feng University of St. Andrews, UK Paul Boyle University of St. Andrews, UK
ABSTRACT A significant problem facing geographical researchers who wish to compare migration and commuting flows over time is that the boundaries of the geographical areas, between which flows are recorded, often change. This chapter describes an innovative method for re-estimating the migration and commuting data collected in the 1981 and 1991 Censuses for the geographical units used in the 2001 Census. The estimated interaction data are provided as origin-destination flow matrices for wards in England and Wales and pseudo-postcode sectors in Scotland. Altogether, there were about 10,000 zones in 1981, 1991 and 2001, providing huge but sparsely populated matrices of 10,000 by 10,000 cells. Because of the changing boundaries during inter-censal periods, virtually no work has attempted to compare local level migration and commuting flows in the two decades, 1981-91 and 1991-2001. The re-estimated spatially consistent interaction flows described here allow such comparisons to be made and we use migration change in England and commuting change in Liverpool to demonstrate the value of these new data.
INTRODUCTION Census interaction data or origin-destination flow statistics provide the UK academic community with a potentially rich source of information on spatial mobility. Census interaction data comprise the Special Migration Statistics (SMS) and the Special Workplace Statistics (SWS). However, in DOI: 10.4018/978-1-61520-755-8.ch013
the 2001 Census, Special Travel Statistics (STS) were released for Scotland which include travel to work as well as travel to study data. Census interaction data have been collected and published from 1981, theoretically providing an opportunity for researchers to study spatial patterns and trends for small geographical areas over the two decades from 1981 to 2001. Unfortunately, it is not a straightforward task to compare census interaction data through time.
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Many difficulties arise including: changes in the definition of census questions; changes in the selection of census topics; alterations to the sample coverage; and the adoption of different disclosure control methods (see Chapter 3). Boundary changes between censuses are also a major problem that hinders the comparison of census data though time (Gregory and Ell, 2005; Champion, 1995). At a relatively small area scale, wards are a commonly used set of geographical units for analysing census data, but they change frequently because they are electoral areas and boundary changes are necessary to satisfy electoral equality. Hence, between 1981 and 2001 wards in Britain have experienced constant review and adjustment by the Boundary Commissions of local governments (Rees, 1995). According to our calculations, for example, based on the ONS Ward History Database, about 46% of wards in England and Wales have been subject to one or more geographical boundary change in the 1990s. At a more aggregate spatial scale, major changes took place in the mid 1990s at the district scale with the reform of local government and at the regional scale with the emergence of Government Office Regions. The conventional response to the problem of boundary change has been to examine sociodemographic changes using larger, more consistent areal units. Thus, Boyle (1994; 1995), Champion (1994) and Champion et al. (1998) used relatively large areas to study population migration. Similarly, Frost et al. (1996; 1997; 1998) compared commuting data for 1981 and 1991, but were forced to aggregate wards into concentric bands from metropolitan centres, or to summarise flows for entire inner cities. These approaches are limited because, on the one hand, most people tend to move or commute over short distances and, on the other hand, studies at more aggregate levels may misinterpret the true factors underlying these flow patterns. Hence the need to develop a suitable methodology to estimate interaction flows from censuses for spatially
consistent zones so that they could be compared both spatially and temporally. In the following sections, we briefly introduce the characteristics of the SMS and SWS for 1981, 1991 and 2001, highlight why estimating flow data for different geographies requires a novel methodology, and describe the main stages of the estimation strategy and explain how we solved the various problems associated with integrating interaction data through time. We also present two separate examples involving the investigation of migration change in England and commuting patterns in Liverpool to demonstrate the use of the estimated migration and commuting flow data.
CENSUS INTERACTION DATA fOR 1981, 1991 AND 2001 Migrants are defined in British censuses as those whose address at the time of enumeration was different from one year previously. The questions asked were identical in 1981, 1991 and 2001, making the three sources compatible in this respect at least. For all three censuses, the SMS were based on 100% of the census returns. In 1981, the SMS were divided into sets 1 and 2. Six tables were provided in Set 1, the first four providing counts of migrants by economic position, age, marital status and economic position by tenure. The fifth tabulated wholly moving household heads by economic position and age, and the sixth tabulated persons in wholly moving households by economic position and age. Due to disclosure control, Set 1 was geographically complex, severely restricting its use. Inter-ward flows were not provided – only flows within wards or between larger units, such as districts, were included. Boyle (1993) made use of these data at the inter-county level – the lowest level at which thresholding was not a factor – to analyse the relationship between migration and tenure, but little further use has been made of them, mainly because of the inadequate geography.
243
Estimating Spatially Consistent Interaction Flows Across Three Censuses
The 1981 SMS Set 2 were less geographically complex than Set 1 and both intra-ward and interward level flows were provided. They represent a 100% count of the population and were not subject to data adjustments or thresholding. However, only flows for total migrants, males or females were provided (Table 1). These matrices were re-estimated for the 1991 and 2001 wards in this research. One insurmountable problem with the 1981 Census SMS is that the interaction data for movement within Scotland was missing and, as a result, our research only includes flows within England and Wales, from England and Wales to Scotland, and from Scotland to England and Wales. In the 1991 Census, the SMS flow data were divided into sets 1 and 2. Set 1 was equivalent to the 1981 SMS Set 2 and comprised two wardlevel tables not subject to disclosure control. The first table covered migrants by age and sex and the second provided the count of wholly moving households and residents in wholly moving households (Table 1). The complexity of the 1981 SMS Set 1 was improved for the 1991 SMS Set 2, but only inter-local authority district (LAD) flows were provided. The tables disaggregated individual migrants by age and sex, marital status and sex, ethnicity, limiting long-term illness and economic position. Other tables covered wholly moving households by tenure, sex and economic position, and economic position of the household head. In Scotland and
Wales tables were provided for Gaelic and Welsh speaking migrants. Even so, suppression was used in most of the tables (other than those that only disaggregated the migrants by age, sex or wholly moving household status) where the inter-LAD flows involved less than 10 total persons. Work has been undertaken to ‘fill in the holes’ in these 1991 data (Rees and Duke-Williams, 1997) as reported in Chapter 3. We did not estimate 1981 equivalents of these 1991 LAD-level data sets as the same district-level detail was not available in the 1981 datasets. In each census, journey to work data were derived from two questions identifying the individual’s residential location and place of work. They were less reliable than the migration data, because they were only drawn from the 10% sample data in the 1981 and 1991 Censuses, and they also relied upon the individual providing a detailed address for the workplace, preferably including a full unit postcode from which the geographical location could be identified. However, a benefit was that because they were taken from a sample of returns, little data adjustment for confidentiality was necessary. In the 1981 SWS, three sets of commuting data were presented. Set A and Set B provided statistics for employed or self-employed people by their places of residence or work respectively. These were not origin-destination interaction data matrices but simply flows out of residential
Table 1. Tables and counts available from 2001, 1991 and 1981 ward level SMS data Variable
2001 Level 2 Tables
Counts
Tables
Counts
Tables
Counts
Gender
Table 1
51
Table 1
10
Table 1
2
Age
Table 1
51
Table 1
10
Moving groups
Table 2
4
Ethnicity
Table 3
9
Moving groups by NS-SEC
Table 4
24
Moving groups by tenure
Table 5
8 Table 2
2
Wholly moving households
244
1991 Set 1
1981 Set 2
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Table 2. Tables and counts available from 2001, 1991 and 1981 ward level SWS data Variable
2001 Leve2
1991 Set 1
1981 Set 2
Tables
Counts
Tables
Counts
Tables
Counts
Gender
Table 201
51
Table 1
10
Table 1
2
Age
Table 201
51
Table 1
10
Moving groups
Table 202
4
Ethnicity
Table 203
9
Moving groups
Table 204
24
Moving groups
Table 205
8 Table 2
2
Wholly moving
zones and into workplace zones. Set C contained the origin-destination matrices at the ward level. For each origin-destination pair, 172 counts were available in five tables, providing the mode of transport to work, social class, socio-economic group, occupational orders, industrial division and the age structure of males and females aged 16 and over in employment (Table 2). In the project described here, the flows in each of these five tables were estimated for 1991 and 2001 ward boundaries. Unfortunately, like the 1981 SMS, the SWS interaction data for movement within Scotland was missing in 1981. Hence, the 1981 SWS matrices could only be estimated for flows within England and Wales, from England and Wales to Scotland, and from Scotland to England and Wales. The 1991 SWS Set C provided the same information as in the 1981 SWS Set C. But, in addition, tables were included on economic position, hours worked, family position and car ownership (Table 2). All flows in the nine tables in the 1991 SWS Set C were estimated for 2001 ward boundaries. The 2001 Census interaction data are much larger and more complex than the 1981 and 1991 data. There are three sets, each of which refers to a particular level or set of spatial units. Besides district level (Level 1) and ward level (Level 2) data, the output from the 2001 Census included flows between output areas (Level 3) which are much smaller zones. The wards used in the Level
2 data set are a mixture of Census Area Statistics (CAS) wards in England, Wales, and Northern Ireland and standard table (ST) wards in Scotland. These spatial units have been jointly called ‘interaction wards’. Unlike the SWS data in 1981 and 1991, the 2001 SWS included 100% census returns but the counts were only for people who were aged between 16 and 74. Tables 1 and 2 provide a summary of the SMS and SWS at the ward level for the three censuses. Further details of the 1981 and 1991 SMS and SWS can be found in Flowerdew and Green (1993), Cole et al. (2002) and the 2001 SMS and SWS in Stillwell et al. (2005).
AREAL INTERPOLATION fOR INTERACTION fLOWS Areal interpolation refers to the transfer of areal data based on one (source) geography onto another different (target) geography. The source and target geographies could be two different zoning systems for one period in time, or they may refer to the same types of zones which have been altered through time (the example we deal with here). Problems arise when the area-based data from different geographies need to be compared. Various forms of areal interpolation method have been developed (Flowerdew and Green, 1992; Goodchild et al., 1993), although the majority are associated with
245
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Figure 1. Disaggregating a single migration flow
‘static’ variables, rather than more complicated flow variables. Atkins et al. (1993) conducted research on transferring 1991 Census data onto 1981 ward boundaries. Gregory and Ell (2005) also used areal interpolation to build historical census data sets on consistent geographies. Areal interpolation usually involves some kind of areal weighting method. The assumption behind this method is that source zone data are evenly distributed within each zone. Therefore the geographical area of intersection zones between source zones and target zones can be used as the weight in estimating the data for the intersection zones. The data values can then be re-aggregated from the source and intersection zones to the target zones. However, such a method is not suitable for the interpolation of flow data. Suppose that in 1981 there is a migration flow from origin ward A to destination ward B (Figure 1a). In 2001, the ward B splits into two equal-area wards B1 and B2. It is inadequate to simply divide the flow from the origin zone into two equal-volume flows because each new zone has the same geographical area (Figure 1b). First, it overlooks the fact that migration decays with distance and, if the two new destination zones are not equidistant from the origin zone, it is more likely that more of
246
the migrants would terminate in the destination that is closer to the origin (Figure 1c). Second, it makes the assumption that the population is evenly distributed across the zones. This is unrealistic. In fact, the half with the larger population would probably attract more migrants, all other factors being equal. Interpolating flow data is therefore more difficult than interpolating static data and some form of innovative strategy and modelling is required to solve these problems. There have been studies on ‘intelligent’ areal interpolation methods in which ancillary information is taken into account to give better estimates than are possible with simple areal interpolation (Goodchild et al., 1993; Gregory and Ell, 2005). This information may be available for the source and target zones. Basically, the method works through establishing a regression relationship between the variable of interest and one or more ancillary variables. Once this relationship has been established, it can be used, along with area, to estimate values for the variable of interest for the target zones (Flowerdew and Green, 1992). Our method is in keeping with the spirit of this ‘intelligent’ path and is described in detail below.
RE-ESTIMATION METHODOLOGY Here we describe the technique for re-estimating the 1981 SMS data for the 2001 geography as an example. The same technique was used for reestimating 1991 SMS data for the 2001 geography and also for re-estimating 1981 SMS and SWS data for 1991 boundaries. Broadly speaking, the same methods were used for re-estimating the SWS, although there were some small differences which are not discussed here. The estimation of the 1981 migration flows for the 2001 ward geography involved a number of phases: • •
identification of boundary changes; calculation of distances between areal units;
Estimating Spatially Consistent Interaction Flows Across Three Censuses
• •
• • •
modelling the 1981 migration at the ward level; estimating the 1981 enumeration district (ED) flows using parameters derived from ward-level models; estimating intra-ED flows; estimating flows which had an incorrectly recorded origin; and re-aggregation of these 1981 ED flows into 2001 ward flows.
Boundary Changes Only those flows that involved changes in either the origin or the destination wards, or both needed to be re-estimated. First, the changes between the 1981 and 2001 ward boundaries therefore needed to be identified. One approach would be to overlay the two sets of boundaries and to derive intersection zones using a geographical information system (GIS). This would be time-consuming because of the need to distinguish genuine boundary changes from those resulting from inconsistent digitising of identical 1981 and 2001 wards. Numerous ‘slivers’ that had not resulted from boundary changes would need to be removed from the synthesised map. An alternative estimation procedure was adopted here which involved using the 1981 EDs as a bridge between the 1981 and 2001 wards. The objective was to estimate the flows between and within these 1981 EDs from the known 1981 intraand inter-ward flows using the modelling procedure described below. The illustrations in Figure 2 demonstrate how the procedure works. For simplicity, this is described for a two-ward system, wards I81 and J81 in 1981 (Figure 2a), which transferred to a two-ward system, wards I01 and J01 in 2001 (Figure 2b). The 1981 and 2001 boundaries, shown respectively by solid and dashed lines, overlap (Figure 2c). Each of the two wards of 1981 has two EDs: A and B in ward I81, C and D in ward J81 (Figure 2d). Also suppose that there were two ward flows originating from I81, one of which was an internal flow (Figure 2a). The ward level flow within I81
could be disaggregated into four potential intra- and inter-ED flows, while the flow from I81 to J81 could be disaggregated into four inter-ED flows (Figure 2e). In total, the two ward-level flows originating from I81 could be broken down into eight flows at the ED level. Once the two ward flows have been disaggregated into the respective ED flows, these can then be allocated to the appropriate 2001 ward using a point-in-polygon procedure (Figure 2f). In this case, the centroids of EDs A and C fall into 2001 ward I01 and EDs B and D fall into 2001 ward J01 and the 1981 ED flows can be aggregated equivalently (Figure 2g). The flow from ward I01 to ward J01 is the aggregate of flows from ED A to B and from ED A to D, for example. Also, it can be seen from Figure 2f that part of an intra-ward flow in 1981 may become an inter-ward flow in 2001 (from ward I01 to ward J01), and vice versa (the inter-ward flow from ED A to C becomes a flow within ward I01). Eventually four flows (two intra and two inter-ward) are produced for 2001 from the original two flows in 1981. Figure 2. Transferring flows from a two-ward system (1981) to another two-ward system (2001)
247
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Modelling 1981 Ward Level Migration flows A reliable modelling method is the key to the success of areal interpolation for interaction data. Census interaction data at the ward level are huge sparsely populated data matrices. The analysis of each data matrix is subject to the problems resulting from the inappropriateness of conventional modelling strategies when the response variable is a small non-negative integer which is often zero. The use of regression methods based on the Poisson distribution (Lovett and Flowerdew, 1989; Flowerdew, 1991) has reduced these problems considerably and this was the approach adopted here. First, the flows between the 10,444 1981 wards were fitted as a Poisson regression model, where the flow between each pair of wards was estimated as a function of the origin and destination populations and the distance between them: ˆ = exp(b + b ln P + b ln P + b ln d ) M ij 0 1 i 2 j 3 ij (1) ˆ is the estimated migration between where M ij
dij is the distance between the 1981 wards i and j; and β0-3 are parameters to be estimated. The goodness of fit for this model is evaluated using a likelihood-based statistic known as the deviance which can be computed as: D = 2(
ååM i
j
ij
ˆ ) ln(M ij / M ij
)
(2)
where M ij is the observed value for the flow between 1981 ward i and ward j. The model deviance may be compared to the deviance from the null model to calculate what proportion is explained by the variables used in the model. The null model is the baseline model which includes a constant, but no explanatory variables. The larger the proportion of deviance explained by the variables in the model the better its fit. Table 3 shows the results of Poisson modelling for the 1981 SMS data in Britain. Male and female migrations were separately modelled. Both models fitted reasonably well, accounting for just over 60% of the null model deviance, with migration being positively associated with the size of the origin and destination populations and negatively associated with the distance between each pair of wards in each case.
1981 wards i and j; Pi is the population of the 1981 ward i; Pj is the population of the 1981 ward j; Table 3. Modelling results for the 1981 migration flows in Britain Men
Women
7,565,418
740,2517
105,036,234
105,036,234
0.6083
0.6222
Model fit Deviance Degree of freedom Proportion of null deviance explained Model parameters (standard errors) Constant
1.8654 (0.0132)
2.3167 (0.0131)
-1.6554 (0.0005)
-1.6761 (0.0005)
Log of population in ward i
0.6072 (0.0013)
0.5755 (0.0013)
Log of population in ward j
0.4960 (0.0013)
0.4829 (0.0013)
1,693,558
1,706,693
Log of distance between wards i and j
Observed total flow
248
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Measuring Distance The distance measure used in interaction models is not simply a measure of physical distance but is usually designed to capture perceived distance and most models use some form of distance decay function. However, there were also some practical problems with measuring distance. The Euclidean distance is most widely used but it is inadequate when estuaries or other physical barriers are crossed. For example, the straightline distance between wards on either side of an estuary is shorter than the actual travel distances between this pair of wards. Alvanides et al. (1996) suggested a methodology for solving this at the ward level. This involved creating arcs between the centroids of contiguous wards in a GIS and then calculating the shortest path between any pair of wards through this network. This solved the estuary problem, but might overestimate the distances between nearby (but not contiguous) wards. Therefore, a mixed distance measure integrating Euclidean distance (where there is no natural barrier) and network distance (where there is a natural barrier to lie between two wards) was a more reasonable solution. To explore the distance problem we used the flows of total inter-ward migrants for Scotland, extracted
from the 1991 SMS. Table 4 provides the results for two models fitted using Poisson regression with two distance measures – (i) Euclidean distance for all pairs of wards and (ii) mixed distance which used the Euclidean distance for some pairs and the network distance for others. The two models give very similar results and the coefficients showed that migration decayed strongly with distance and increased with the population size of both the origin and destination wards. Unexpectedly, though, the proportion of deviance explained by the model with the Euclidean distance was slightly greater (58.11%) than that for the model with the mixed distance (58.01%). Another issue is how to measure distances to, from, or between islands. It is a general assumption that flows to, from and between islands are hindered by relatively poor transportation connections and therefore may be over-estimated by the Euclidean distance model. We examined this assumption, again using a model of the 1991 total migration flows in Scotland. The results are shown in Table 5. There were a total 11,840 migrants who moved between island and land. However, the predicted number of migrants was only 398, which is an underestimation. The same was also true for flows between island zones. For both these reasons we used the Euclidean distance in the models for our re-estimation exercise.
Table 4. A comparison between the Euclidean distance model and the mixed distance model for the 1991 ward-level total migration flows in Scotland Euclidean distance
Mixed distance
921,422
923,586
1,002,998
1,002,998
0.5811
0.5801
Model fit Deviance Degrees of freedom Proportion of null deviance explained Model parameters (standard errors) Constant
1.3390 (0.0396)
1.1811 (0.0394)
-1.3888 (0.0014)
-1.3644 (0.0014)
Log of population in ward i
0.7144 (0.0031)
0.7124 (0.0031)
Log of population in ward j
0.6714 (0.0030)
0.6697 (0.0030)
Log of distance between wards i and j
Observed total flow
292,440
249
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Table 5. Estimation of migration flows from and to islands and between islands in Scotland, 1991, using regression models with Euclidean distances Observed flow = 0
Observed flow > 0
Sum of observed
0
11,840
Sum of estimated
941
398
Sum of observed
0
5,868
Sum of estimated
19
275
Between island and land zones
Between island zones
Estimating 1981 Inter-ED Migration flows from the 1981 Ward-Based Model The parameters from the ward-level models described above were then used to estimate the 1981 inter-ED flows using ED populations and the distance between each pair of EDs as the explanatory variables (Boyle and Flowerdew, 1997): ˆ = exp(b + b ln P + b ln P + b ln d ) M AB 0 1 A 2 B 3 AB
(3) ˆ is the estimated migration between where M AB EDs A and B; PA is the population of ED A; PB is the population of ED B; dAB is the distance between ED A and B. We know that the 1981 EDs can be aggregated back into 1981 wards and therefore the estimates at the ED level from the model need to match this constraint: M ij = å AÎi
å Mˆ B Îj
AB
(4)
This imposed rule adjusts the flows between each pair of EDs proportionally such that when aggregated they match the 1981 inter-ward flows.
The Estimation of 1981 Intra-ED Flows In theory, it was possible to include all flows no matter whether they were intra- or inter-ward
250
flows in one model and apply the parameters to estimate the 1981 intra- and inter-ED flows using the same function. One problem was that this required the calculation of intra-ED distances. In the modelling of spatial data aggregated for areas, intra-zone distances have previously been measured as one-half or one-fourth of the radius of the equivalent-area zone assuming the zone as a circle (Boyle 1991). This was possible for 1981 wards, as we knew their areas. However, the area of 1981 EDs was not known and, as their boundaries were not digitized when we conducted the re-estimation exercise, the intra-ED distances could not be calculated. Consequently we decided to deal with intraED and inter-ED flows separately. First, we estimated the proportion of intra-ED flows to the total flows. This was calculated using a linear regression model which estimated the proportion of flows that occurred within areas, rather than between areas (wards, districts, counties and regions), using the logged average populations of these zones. As expected, there was a high r square values (0.98). From this model we extrapolate the line to predict the proportion of flow that is likely to be within EDs at 7% (see Boyle and Feng, 2002 for more detail). Intra-ward flows were broken down to intra-ED flows and inter-ED flows applying the proportion values estimated above. The intra-ED flows were estimated in proportion to the population of EDs within the same ward. The inter-ED flows are
Estimating Spatially Consistent Interaction Flows Across Three Censuses
estimated using equation 3 inputting parameters from the inter-ward flow models.
Below we provide two simple case studies which demonstrate the value of these data.
Estimation of Origins for flows Whose Origins were Unknown
CASE STUDY 1
For various reasons the origin of the migration for some migrants was not captured in the 1981 and 1991 Censuses. About 1.8% of the 1981 migration flows had unknown origins and we needed to develop a method to estimate these, based on the observed pattern of flows. In some cases, the origin district was known, but the origin ward was unknown, while in other cases neither the origin ward nor district was known. In the latter case, it was assumed that the origin could have been any ward in Britain that had a flow to the same destination ward. We therefore allocated the flow to the origin from which the largest flow was recorded into that destination. For those migrant flows with known district origins and unknown ward origins, we used the same method as used for flows with totally unknown origins. But since the district was known, the origin was constrained to that district. In addition, there were a few cases where the origin district was known but the origin ward was unknown, but there was no observed flows from that district into the destination ward. In this case, we designed an index as the basis to estimate the origin: Pi Pj / dij
(5)
where Pi and Pj were respectively populations at the origin and the destination ward, and dij was the distance between them. The index values were calculated for all wards in the origin district and used as a basis so that a ward with a larger index value always had a higher probability to be identified as an origin ward. Thus, a series of complex stages were combined to produce temporally and spatially consistent estimates of migration and commuting for Britain.
Internal Migration in England Between 1981 and 2001 Over recent decades, population redistribution has increasingly been determined by internal migration instead of natural change (Champion, 1994; Champion, 1996; Rees et al., 1996). The predominant migration pattern since the 1970s has been one of deconcentration from the cores of city regions to hinterlands for both the largest metropolises and also their subsidiary partner cities. Research at region, county, and district level suggests the deconcentration trends identified by Champion and others during the 1970s continued in the 1980s and 1990s. However, analysis at the finest scale is needed to confirm this trend (Rees et al., 1996). A general classification of districts developed by the Office of Population Censuses and Surveys (now part of Office for National Statistics) based on the 1991 Census has been used by Champion (1994; 1996), Rees et al. (1996), Dorling (1995) and Raymer et al. (2007) to analyse population change and internal migration. The ONS area classification provides an opportunity to look at the relationship between population movement and the demographic and spatial character of areas. The use of the area classification also allows us to simplify and summarise the very complex pattern of migration at the ward level. The area classification at the Standard Table (ST) ward level was created using 43 variables derived from the 2001 Census. All UK ST wards were organised into nine supergroups, 17 groups and 26 subgroups (see Office for National Statistics (2004) for methodology, description and maps). The classification is therefore derived from a
251
Estimating Spatially Consistent Interaction Flows Across Three Censuses
summary of the socio-economic characteristics of each ward. Here we used the re-estimated 1981 and 1991 SMS for the 2001 geography and 2001 SMS to explore internal migration in England. Table 6 presents the resident population in 1981, 1991 and 2001 for the nine supergroups and rates of population change between 1981 and 1991, and 1991 and 2001. The only type of area which lost population consistently was the Traditional Manufacturing cluster. The Industrial Hinterlands cluster lost population in the 1980s but had a turnaround in the 1990s, although the increase was small. Caution should be exercised when interpreting the population changes between three censuses because the population bases are not entirely comparable (Rees et al., 1996; Champion, 1995). Figure 3 shows the rates of migration per thousand across the nine supergroups for the three periods of 1980-1981, 1990-1991 and 2000-2001. The bar on the left represents the in-migration
rate while the bar on the right indicates the outmigration rate. In addition, the net migration rate was added to whichever side had the lowest rate. For example, in 2000-2001, the Built-up Areas cluster experienced an out-migration rate of 94.4 per 1,000, while the in-migration rate was 106.6 per 1,000. This led to a positive net migration rate of 12.2 per 1,000, as shown on the right hand side of each graph. The proportion of migrants who crossed areas defined by supergroups as a percentage of total population rose from 42.7 per 1,000 in 1980-81, to 44.3 per 1,000 in 1990-91, and to 52.9 per 1,000 in 2000-01. However, the scale of the increase may be offset by the fact that the 1981 and 1991 Censuses were taken at times of economic recession when migration volumes were relatively low (Rees et al., 1996, p7). There was a consistent pattern that Built-up Areas, Student Communities and Prospering Metropolitan areas experienced the highest population
Table 6. Resident population 1981, 1991 and 2001 1981
1991
2001
Population change (%) 1981-1991
Population change (%) 1991-2001
Industrial Hinterlands
9,542,675
9,417,653
9,431,228
-1.3
0.1
Traditional Manufacturing
5,104,857
4,802,473
4,645,229
-5.9
-3.3
903,695
920,574
938,173
1.9
1.9
Prospering Metropolitan
1,580,779
1,630,442
1,802,782
3.1
10.6
Student Communities
2,302,017
2,307,483
2,615,956
0.2
13.4
Multicultural Metropolitan
3,622,064
3,623,821
3,912,784
0.0
8.0
12,849,448
13,828,735
14,850,027
7.6
7.4
Coastal and Countryside
7,319,758
7,800,218
8,154,847
6.6
4.5
Accessible Countryside
2,542,699
2,691,299
2,787,805
5.8
3.6
45,767,992
47,022,698
49,138,831
2.7
4.5
Supergroup
Built-up Areas
Suburbs and Small Towns
Total
Sources: 1981, 1991 and 2001 Censuses
252
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Figure 3. Rates of migration for supergroups, (a) 1980-81, (b) 1990-91 and (c) 2000-01. Sources: 1981, 1991 and 2001 Censuses
253
Estimating Spatially Consistent Interaction Flows Across Three Censuses
turnover while the Industrial Hinterlands, Suburbs and Small Towns, Coastal and Countryside had the lowest. As we would expect, areas with high out-migration experienced high in-migration as well. The urban clusters comprising the Industrial Hinterlands, Traditional Manufacturing, Prospering Metropolitan, and Multicultural Metropolitan persistently lost net out-migrants to other areas. In contrast, the rural or suburban areas including the Suburbs and Small Towns, Coastal and Countryside, were net gainers over the 20 years. This confirms that population deconcentration was occurring at the ward level. Of the supergroups, Multicultural Metropolitan was the biggest loser in population in the internal migration process as the net migration rate was constantly over 10 per 1000. The trend of net migration rates in the Traditional Manufacturing and Prospering Metropolitan supergroups fluctuated as the absolute rate declined in 1990-1991 compared to 1980-1981 and then rose again in 2000-1001. The Industrial Hinterlands lost population through net migration but the rate was quite small compared to other areas. A few area types switched between net gainers and net losers during the three periods. The Built-up Areas experienced a net loss of migrants in 1980-1981, but gained in 1990-1991 and 20002001. The Student Communities exhibited a loss of migrants in the first two time periods but a gain during 2000-2001. The Accessible Countryside experienced a gain of migrants in the first two time periods but a loss in the third time period. Table 7 summarises the net migration by supergroups of ward for the three time periods. Each matrix is composed of two sets of statistics. In the bottom left of the matrix are the net flows between supergroups. For example, for 2000-2001, there is a net inflow of 11,850 from Industrial Hinterlands to Coastal and Countryside. In the top right of each matrix are set out the effectiveness ratios which express the net flows as a percentage of the gross inflows and outflows. For flows between the Industrial Hinterlands and the Coastal and 254
Countryside clusters in 2000-2001, the effectiveness ratio was 5.9% which indicates a medium level of net redistribution into the Coastal and Countryside cluster. The flow matrix provides a rich source of information on interaction between different types of areas. From Figure 3 we see that the Accessible Countryside gained population in the first two periods and then lost population in the last period. However, from Table 7 we can see in the first two time periods Accessible Countryside did not gain population from all types of areas. In fact, the Accessible Countryside lost population to Built-up Areas and the Coastal and Countryside cluster but this was outweighed by the gain from other types of areas. In 2000-2001, the Accessible Countryside lost migrants to all types of areas except the Industrial Hinterlands and Suburbs and Small Towns. The Multicultural Metropolitan cluster experienced heavy net losses in the three periods, but it did not lose to all area types. From Table 7 we see that it lost to each of the area types in 19901991, but in 1980-1981 it gained migrants from the Prospering Metropolitan cluster, and in 20002001, it gained migrants from the Prospering Metropolitan, Student Communities and Accessible Countryside clusters. The net migration figures show that the Coastal and Countryside cluster was the largest gainer from the loss of migrants from the Multicultural Metropolitan cluster and the effectiveness measure was over 20%, indicating that the migration between these two also had very high turnover efficiency compared to flows between other types of areas.
CASE STUDY 2 Comparing 1981, 1991 and 2001 Commuting flows for The 2001 Ward Geography In this section, we use a simple example to demonstrate the potential for analysing commuting
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Table 7. Net migration rates and effectiveness ratios for flows between supergroups of wards, England, 1981-2001 1980-1981 Destination Origin
IH
IH TM
-15,018
BA
-330
TM
BA
PM
SC
MM
SST
CC
AC
8.2
1.2
-0.8
3.2
15.0
-8.2
-6.6
-12.5
5.4
-2.7
3.5
9.2
-12.4
-10.6
-14.4
3.6
-2.9
7.8
2.8
0.4
7.5
6.2
-8.5
-12.1
-2.4
-4.6
0.3
-6.9
-6.2
-1.2
-24.4
-18.4
-20.3
-4.5
-6.4
-1,538
PM
98
177
-190
SC
-2,263
-1,627
392
-1,134
MM
-7,162
-4,171
-281
6,794
-86
SST
18,668
10,079
-766
8,696
8,150
26,203
CC
10,869
7,863
-133
319
2,371
3,995
9,086
AC
4,198
1,361
-395
411
230
1,497
8,859
-3,242
TM
BA
PM
SC
MM
SST
CC
3.3
-3.6
0.4
3.4
12.1
-4.7
-6.1
-8.6
0.4
-5.8
2.9
8.0
-7.4
-6.6
-3.8
3.8
7.0
16.2
7.9
2.5
7.4
7.3
1.6
-6.7
-8.6
-7.9
1.0
-6.0
-6.2
-5.6
-20.6
-21.6
-19.1
-6.4
-5.5
6.1
1990-1991 Destination Origin
IH
IH TM
-5,676
BA
1,075
-122
PM
-48
369
-216
SC
-2,511
-1,359
-1,046
-1,738
MM
-5351
-3,321
-710
-1,542
-319
SST
11,716
6,369
-2,563
5,136
8,438
23,217
CC
10,289
5,013
-1,099
1,153
2,681
4,690
14,064
AC
2,825
358
-448
828
1,268
1,558
8,210
AC
6.0 -3,214
2000-2001 Destination Origin
IH
IH
TM
BA
PM
SC
MM
SST
CC
AC
6.1
-3.7
-3.2
-1.4
20.0
-3.8
-5.9
-6.6
-1.6
-8.5
-1.1
17.7
-6.0
-6.6
3.8
9.0
-0.2
14.9
12.4
4.4
20.9
6.8
-5.4
-8.0
-2.9
1.8
5.0
14.0
22.0
-21.9
-17.8
2.2
-6.7
-4.3
TM
-11,815
BA
1,265
483
PM
450
606
-696
SC
1,434
683
54
-2,920
MM
-9,780
-8,204
-866
6,276
1,241
-2.7
SST
10,936
6,320
-5,009
8,035
-10,873
30,805
CC
11,850
5,967
-2,332
510
-9,470
4,728
18,514
AC
2,280
-436
-1,584
-257
-8,542
-225
7,329
9.6 -6,150
IH: Industrial Hinterlands; TM: Traditional Manufacturing; BA: Built-up Areas; PM: Prospering Metropolitan; SC: Student Communities; MM: Multicultural Metropolitan; SST: Suburbs and Small Towns; CC: Coastal and Countryside; AC: Accessible Countryside. Sources: 1981, 1991 and 2001 Censuses
255
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Figure 4. Percentage changes in number of commuters who travel to work in Liverpool between 1981 and 1991
patterns focusing on Liverpool. We chose wards in 1981 where 40 or more people travelled into Liverpool to work, and examined the changes in the number of commuters in 1991 and 2001. The map in Figures 4 shows the percentage change in the number of commuters who travelled to work in the Liverpool city between 1981 and 1991. Figure 5 shows the same percentage change between 1991 and 2001. Between 1981
and 1991, 38% of wards, out of total 210, did not change much in the number of commuters with the percentage change in the range of -25 to 25. About 14% of wards experienced commuter growth over 50% while the same share of wards experienced a decline over 50%. Between 1991 and 2001, 60% of wards showed no considerable increase or decline in number of commuters. The percentage of wards where there was an
Figure 5. Percentage changes in number of commuters who travel to work in Liverpool between 1991 and 2001
256
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Figure 6. Comparison of changes in number of commuters who travel to work in Liverpool in 19811991 and 1991-2001
increase of over 50% in commuters went up to 17% while only 4% of wards suffered a decline over 50%. The third map in Figure 6 combined the information from the two maps comparing changes in the number of commuters over the two decades. If a ward experienced an increase of over 25% in number of commuters it is labelled as ‘increase’, while if a ward experienced a decrease of over 25% in the number of commuters it is labelled as ‘decrease’. Overall, 76% of wards did not show considerable changes over the two decades, but some areas experienced dramatic changes. Three per cent of wards experienced a continuous increase, and an equivalent percentage of wards displayed a continuous decline. For example, Hindley Green experienced a continuous decrease while Lymm experienced a continuous increase. In total about 18% of wards experienced different patterns in the two decades. In Cuddington and Oakmere, Kelsall, Elton, Aspull-Standish and Swinley, where the number of commuters to Liverpool decreased by more than 50% between 1981 and 1991, the trend reverted to an increase of more than 50% between 1991 and 2001. In stark contrast, some areas, such as Parbold, experienced the opposite process with the number of commuters
increasing between 1981 to 1991 but decreasing significantly between 1991 and 2001.
CONCLUSION Local interaction data are an important source of information for understanding population mobility. However, changes in ward boundaries between different censuses have hampered the descriptive and statistical examination of significant changes in local interaction patterns over time. This research has developed an innovative methodology and has re-estimated large matrices of interaction data from the 1981 Census for 1991 and 2001 ward geographies and from the 1991 Census onto the 2001 ward geography so that flows from all three censuses can now be analysed for one consistent geography. Basically, for the example of 1981 flow data, the estimation strategy involved modelling the 1981 ward-level flows and using the resulting gravity model parameters to estimate ED-level flows which are smaller areas that aggregate neatly into wards at each census. The estimated ED flows were then re-aggregated for the target 1991 and 2001 wards. Our project re-estimated
257
Estimating Spatially Consistent Interaction Flows Across Three Censuses
two intra- and inter-ward migration matrices and 170 intra- and inter-ward commuting matrices from the 1981 census, and 12 intra- and inter-ward migration matrices and 254 intra- and inter-ward commuting matrices from the 1991 census, all onto the 2001 ward geography. The data have since been loaded into the Web-based Interface To Census Interaction Data (WICID) and are available for academic use (Stillwell and Duke-Williams, 2001). The technique described earlier was slightly modified for the commuting flows, but the underlying principles were the same. The method could be applied to other cases where interaction data for one set of zones need to be re-estimated for another overlapping geography and more detail is provided in Boyle and Feng (2002). We presented two case studies of applying the re-estimated flow data from 1981, 1991 to 2001 in studies of migration change in England and commuting change in the Liverpool area. The study of migration patterns in England was based on the ONS ward classification. This research confirmed that population deconcentration continued in 2000-2001 with metropolitan areas losing migrants and rural and suburburban areas gaining migrants. The commuting patterns in the Liverpool area demonstrated some dramatic changes. Some local areas shifted from an increase of out-commuting to Liverpool city to a decrease of out-commuting to Liverpool over the two decades while some local areas experienced the opposite pattern. These two examples demonstrated that the spatially consistent flow data can be used to further our understanding of the complex interaction between local areas over time.
ACKNOWLEDGMENT The authors acknowledge financial support for the Centre for Interaction Data (CIDs) from the Economic and Social Research Council (ESRC) and the Joint Information System Committee
258
(JISC) under the 2001-2006 Census Programme (Project H507255177) and from the ESRC under the 2006-2011 Census Programme (Project RES348-25-0005). The British census data are Crown Copyright and were bought by ESRC and JISC for use in the academic community. The ESRC Data Archive and the Manchester Computing Centre made the British census data available to us. We would like to thank Keith Cole (Manchester Computer Centre) for help in accessing the 1981, 1991 data for England and Wales; Frank Thomas and Susan Wallace (General Registry Office, Scotland) for help in accessing the 1981, 1991 data for Scotland; Robin Flowerdew and Phil Rees for useful discussions about the methodology.
REfERENCES Alvanides, S., Boyle, P. J., Duke-Williams, O., Openshaw, S., & Turton, I. (1996). Modelling migration in England and Wales at the ward level and the problem of estimating inter-ward distances, Geocomputation Conference, Leeds University, Leeds, September 17-19. Atkins, D., Charlton, M., Dorling, D., & Wymer, C. (1993). Connecting the 1981 and 1991 Censuses, NERRL Research Report 93/9, University of Newcastle: Newcastle-upon-Tyne. Boyle, P., & Feng, Z. (2002). A method for integrating the 1981 and 1991 GB Census interaction data. Computers, Environment and Urban Systems, 26, 241–256. doi:10.1016/S0198-9715(01)00043-6 Boyle, P., & Flowerdew, R. (1997). Improving distance estimates between areal units in migration models. Geographical Analysis, 29(2), 93–107. Boyle, P. J. (1991). A theoretical and empirical examination of local-level migration: the case of Hereford and Worcester, Unpublished PhD Thesis, Lancaster University, Lancaster.
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Boyle, P. J. (1993). Modelling the relationship between migration and tenure. Transactions of the Institute of British Geographers, 18, 359–376. doi:10.2307/622465 Boyle, P. J. (1994). Metropolitan out-migration in England and Wales 1980-81. Urban Studies (Edinburgh, Scotland), 31, 1707–1722. doi:10.1080/00420989420081591 Boyle, P. J. (1995). Rural in-migration in England and Wales, 1980-81. Journal of Rural Studies, 11, 65–78. doi:10.1016/0743-0167(94)00058-H Champion, A. G. (1994). Population change and migration in Britain since 1981: evidence for continuing deconcentration. Environment & Planning A, 10, 1501–1520. doi:10.1068/a261501 Champion, A. G. (1995). Analysis of change through time. In Openshaw, S. (Ed.), Census Users’ Handbook (pp. 7–35). Cambridge, UK: GeoInformation International. Champion, A. G. (1996). Population review: (3) migration into, from and within the United Kingdom. Population Trends, 83, 5–19. Champion, A. G., Fotheringham, A. S., Rees, P., Boyle, P. J., & Stillwell, J. C. H. (1998). The Determinants of Migration Flows in England: A Review of Existing Data and Evidence. University of Newcastle upon Tyne, Newcastle upon Tyne. Cole, K., Frost, M., & Thomas, F. (2002). Workplace data from the census. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 269–280). Chichester, UK: Wiley. Dorling, D. (1995). A New Social Atlas of Britain. Chichester, UK: John Wiley & Sons. Flowerdew, R. (1991). Poisson regression modelling of migration. In Stillwell, J., & Congdon, P. (Eds.), Migration Models: Macro and Micro Approaches (pp. 92–112). London: Belhaven Press.
Flowerdew, R., & Green, A. (1993). Migration, transport and workplace statistics from the 1991 census. In Dale, A., & Marsh, C. (Eds.), The 1991 Census User’s Guide (pp. 269–294). London: HMSO. Flowerdew, R., & Green, M. (1992). Developments in aerial interpolation methods and GIS. The Annals of Regional Science, 26, 67–78. doi:10.1007/BF01581481 Frost, M., Linneker, B., & Spence, N. (1996). The spatial externalities of car-based worktravel emissions in Greater London, 1981 and 1991. Transport Policy, 3, 187–200. doi:10.1016/S0967070X(96)00027-3 Frost, M., Linneker, B., & Spence, N. (1997). The energy consumption implications of changing worktravel in London, Birmingham and Manchester: 1981 and 1991. Transportation Research Part A, Policy and Practice, 31, 1–19. doi:10.1016/ S0965-8564(96)00011-0 Frost, M., Linneker, B., & Spence, N. (1998). Excess or wasteful commuting in a selection of British cities. Transportation Research Part A, Policy and Practice, 32, 529–538. doi:10.1016/ S0965-8564(98)00016-0 Goodchild, M. F., Anselin, L., & Deichmann, U. (1993). A framework for the areal interpolation of socioeconomic data. Environment & Planning A, 25, 383–397. doi:10.1068/a250383 Gregory, I. N., & Ell, P. S. (2005). Breaking the boundaries: geographical approaches to integrating 200 years of the census. Journal of Royal Statistical Society A, 168, 419–437. doi:10.1111/ j.1467-985X.2005.00356.x Lovett, A., & Flowerdew, R. (1989). Analysis of count data using Poisson regression. The Professional Geographer, 41, 190–198. doi:10.1111/ j.0033-0124.1989.00190.x
259
Estimating Spatially Consistent Interaction Flows Across Three Censuses
Office for National Statistics. (2004). Area Classification for Statistical wards – methods. Retrieved from http://www.statistics.gov.uk/ about/methodology_by_theme/area_classification/wards/methodology.asp) Raymer, J., Abel, G., & Smith, P. W. F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of Royal Statistical Society A., 170, 891–908. doi:10.1111/j.1467985X.2007.00490.x Rees, P., Durham, H., & Kupiszewski, M. (1996). Internal migration and regional population dynamics in Europe: United Kingdom case study. Working Paper 96/20, School of Geography, University of Leeds, Leeds. Rees, P. H. (1995). Putting the census on the researcher’s desk. In Openshaw, S. (Ed.), Census Users’ Handbook (pp. 27–81). Cambridge, UK: GeoInformation International.
260
Rees, P. H., & Duke-Williams, O. (1997). Methods for estimating missing data on migrants in the 1991 British Census. International Journal of Population Geography, 3, 323–368. doi:10.1002/ (SICI)1099-1220(199712)3:43.0.CO;2-Z Stillwell, J., & Duke-Williams, O. (2001). WebBased Interface to Census Interaction Data (WICID), Final report and demonstration. ESRC/JISC 2001 Census Development Programme. Leeds: Fourth Workshop. Stillwell, J., Duke-Williams, O., Feng, Z., & Boyle, P. (2005). Delivering census interaction data to the user: data provision and software developments. Working Paper 05/01, School of Geography, University of Leeds, Leeds.
261
Chapter 14
Modelling Migration with Poisson Regression Robin Flowerdew University of St. Andrews, UK
ABSTRACT Most statistical analysis is based on the assumption that error is normally distributed, but many data sets are based on discrete data (the number of migrants from one place to another must be a whole number). Recent developments in statistics have often involved generalising methods so that they can be properly applied to non-normal data. For example, Nelder and Wedderburn (1972) developed the theory of generalised linear modelling, where the dependent or response variable can take a variety of different probability distributions linked in one of several possible ways to a linear predictor, based on a combination of independent or explanatory variables. Several common statistical techniques are special cases of the generalised linear models, including the usual form of regression analysis, Ordinary Least Squares regression, and binomial logit modelling. Another important special case is Poisson regression, which has a Poisson-distributed dependent variable, linked logarithmically to a linear combination of independent variables. Poisson regression may be an appropriate method when the dependent variable is constrained to be a non-negative integer, usually a count of the number of events in certain categories. It assumes that each event is independent of the others, though the probability of an event may be linked to available explanatory variables. This chapter illustrates how Poisson regression can be carried out using the Stata package, proceeding to discuss various problems and issues which may arise in the use of the method. The number of migrants from area i to area j must be a non-negative integer and is likely to vary according to zone population, distance and economic variables. The availability of high-quality migration data through the WICID facility permits detailed analysis at levels from the region to the output areas. A vast range of possible explanatory variables can also be derived from the 2001 Census data. Model results are discussed in terms of the significant explanatory variables, the overall goodness of fit and the big residuals. Comparisons are drawn with other analytic techniques such as OLS regression. The relationship to Wilson’s entropy maximising methods is described, and variants on the method are explained. These include negative binomial regression and zero-censored and zero-truncated models. DOI: 10.4018/978-1-61520-755-8.ch014
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Modelling Migration with Poisson Regression
INTRODUCTION Poisson regression analysis is a standard but relatively unpublicised (but see Griffith and Haining 2006) statistical technique that is particularly suited to analysis of migration flow data. The Poisson distribution was first identified in 1837 by the French mathematician, Simeon-Denis Poisson (1781-1840). It applies to count data where the variable being analysed must take the form of non-negative integers (i.e. zero or a positive whole number). For large counts, there is little difference between Poisson regression and weighted Ordinary Least Squares (OLS) regression but it does make a difference where some of the counts are small. OLS regression, based on the normal distribution, is the usual form of regression taught in introductory statistics classes. In addition to its theoretical appropriateness, Poisson regression has additional advantages, including ease of constructing multiple regression models and ability to judge model goodness-of-fit. The classic example of the application of the Poisson model is the distribution of soldiers in corps of the Prussian army who died from mule kicks. Counts of deaths were available for 10 of 14 corps for each of 20 years (Griffith and Haining, 2006). These deaths were fairly rare and independent of each other, so it was appropriate to investigate if the data followed the Poisson distribution. Whilst this example is rather unusual, essentially any count data can be modelled as Poisson provided that it can be regarded as a total of events of any kind that occur within a time period. For example, the number of people contracting a rare disease, the number of cars passing a checkpoint or the number of convicted criminals coming from different areas, are all possible data sets for Poisson regression. Senior (1987) has used it to study the number of family planning clinics in a set of Nigerian cities. Guy (1991) has used it for analysis of retailing data. The most frequent use of Poisson models is in the analysis of contingency tables. The counts
262
recorded can be modelled as functions of the main effects of each cross-classifying variable and interaction effects involving these variables in any combination. The interest is in determining which interactions are significant and which are not. This is a special case of Poisson regression but it is the case which has received most attention in the statistics literature, from Nelder and Wedderburn (1972) onwards. Flowerdew and Aitkin (1982) introduced Poisson regression in the context of migration analysis, and Flowerdew (1991) provided an updated account of Poisson models of migration, including comparisons with other modelling strategies. Lovett and Flowerdew (1989) published a pedagogic account of Poisson models in geography. Poisson models are not discussed in much detail in texts presenting statistical techniques to geographers, although Bailey and Gatrell (1995) and Haining (2003) do deal with them briefly, and O’Brien (1992) in a bit more detail. Similarly, discussions in statistics or econometrics texts are relatively few, exceptions including Greene (1999), Kirkwood and Stern (2005, Chapter 24) and Petrie and Sabin (2005, Chapter 31). The fullest account of Poisson regression and its variants is that by Cameron and Trivedi (1998). This chapter is intended to present Poisson regression and some of its variants as a suitable method for analysing migration flows. The argument is illustrated through an analysis of inter-district migration in Great Britain from the 2001 Census. It shows how a Poisson regression model can be fitted to such data, and discusses issues which may arise in the process, including the use of regression models based on other count distributions such as the negative binomial.
THE INTER-DISTRICT DATA SET The 2001 Census in Great Britain included a question on place of usual residence one year before. This allows the creation of a large matrix showing
Modelling Migration with Poisson Regression
how many people moved from each origin to each destination in Great Britain. Because it was desired to show that Poisson regression can be used with large data sets, and because it is quite important in the modelling process that analysts have some knowledge of the places with unusual sets of flows, the data set was created at the inter-district level. Note that the units used are London boroughs in the former Greater London, metropolitan boroughs in the former metropolitan counties, Council Areas in Scotland, Unitary Authorities in Wales and those parts of England where they exist, and local authority districts in the remainder of England. In this study, the term ‘district’ is used to include entities of all these types. Between them, they cover the whole country. There are 408 such units in Great Britain. They are quite diverse in population, ranging from Birmingham (977,085) to Isles of Scilly (2,136), with a mean of 139,960 people. The shortest inter-district distance is 2.1 km, the mean distance is 244.5 km and the maximum is 1205.5 km. The analysis does not include immigration from, or emigration to, Northern Ireland or other countries. Internal mobility within districts is excluded, as are moves where the origin is incompletely specified. This results in a data set with 408 x 407 = 166,056 possible flows, many of them zero. The largest flow is 4,151 (Bristol to South Gloucestershire). The mean flow size is 15.1, the median is 3 and the total number of inter-district migrants is 2,507,446. The largest flows are shown in Table 1, and Figure 1 shows the locations of the London boroughs, and districts around Aberdeen, Bristol and Hull. Four of the top ten can be regarded as suburbanisation, from fairly large central cities to neighbouring less developed districts; three others are counterstream moves, showing that not all migration goes in the same direction; the remaining three are moves connecting adjacent London boroughs. There are 3,407 counts over 100 and 198 counts over 1,000. The Office Of National Statistics (ONS), in its desire to safeguard people’s anonymity, decided to
make minor changes to the data. Because it might be possible for somebody to identify a person who has moved from place i to place j, small flows to places in England and Wales have been changed. The method of doing this has not been divulged, but it is likely that flows of 1 were changed to either 0 or 3 according to some chance mechanism. Similarly counts of 2 were changed to either 0 or 3, but flows of 0 or 3 were left untouched. Thus the number of 0 flows in the data set released to the public (60,843) consists of many genuine zero flows, plus an unknown number of 1s recoded to 0, plus an unknown (but probably smaller) number of 2s recoded to 0. Figure 1 shows the resulting frequency distribution of flows. DukeWilliams and Stillwell (2007) have studied the impact of this ‘small cell adjustment’ process. This may be particularly crucial for Poisson models, which are usually regarded as less susceptible to small number problems. This casts doubt on any analysis using the data, although the Poisson regression algorithm automatically downweights the smaller flows. Note that the General Register Office (Scotland) did not use this adjustment procedure, so flows of 1 and 2 remain in the data set for people enumerated in Scotland. Figure 1 shows the rather strange shape of the distribution, a negative exponential relationship present only for multiples of 3.
Figure 1. Frequency distribution of migration flows. Source: 2001 Census SMS
263
Modelling Migration with Poisson Regression
Table 1. Largest inter-district migration flows, 2000-01 Rank
Origin
Destination
Migrants
1
Bristol
South Gloucestershire
4,151
2
Hull
East Riding
3,736
3
Wandsworth
Lambeth
3,659
4
East Riding
Hull
3,651
5
Birmingham
Solihull
3,609
6
Haringey
Enfield
3,261
7
Aberdeen City
Aberdeenshire
3,207
8
South Gloucestershire
Bristol
3,109
9
Wandsworth
Merton
3,051
10
Aberdeenshire
Aberdeen City
2,997
Source: 2001 Census SMS
LnYˆi = b0 + b1 X1i + b2 X2i + b3 X3i + …
HOW DOES POISSON REGRESSION WORK? The Poisson distribution is derived under the assumption that an event has a constant probability of occurrence λ. Each event is independent. Under these conditions, the probability of k events occurring in one time period is given by: p (X = k) = e-λ λκ / κ!
(1)
As with all generalised linear models, the probability distribution of the Y variable must be specified, as must a link function connecting its mean to the linear predictor. The linear predictor is a linear function of the X variables, chosen to fit the observed data as well as it can. In Poisson regression, the probability distribution is Poisson, of course, and the link function is logarithmic. This means that the estimated value of λ is not given by the linear predictor itself, but by exp (linear predictor):
λˆ i = exp (b0 + b1 X1i + b2 X2i + b3 X3i + …) (2) Equivalently, the estimated value of lnY is the value of the linear predictor:
264
(3)
The model is fitted iteratively. First, an OLS model is fitted. Then values for each case are calculated using the fitted b coefficients. The next step is to re-run the model for the newly calculated values, derive new b coefficients, and repeat until convergence, which usually occurs after six or seven iterations. All this is done by the computer in response to the ‘Poisson’ command in Stata (Poisson regression is not available in SPSS). The output includes the estimated b coefficients, with their standard errors. The log likelihood is provided as a measure of goodness of fit. The deviance, also used to measure goodness of fit, is not printed out, but can easily be calculated as minus twice the log likelihood. The pseudo R2 statistic may also be useful as an indicator of model fit. Its denominator is the deviance of a ‘null model’, i.e. one with no independent variables, so that all cases have identical estimated values equal to the overall mean. The numerator is the deviance accounted for by the model. The pseudo R2 statistic ranges from 0 to 1 and is broadly similar in function to R2 in OLS regression, although the two measures are not strictly comparable because one is based on variance and the other on deviance. It is also possible to save sets of fitted values for
Modelling Migration with Poisson Regression
each origin-destination pair, and from these it is possible to calculate residuals, which may be useful in generating ideas for further variables to try adding in to the linear predictor.
USING MULTIPLE REGRESSION TO fIT THE GRAVITY MODEL Data on migration from one place to another over the year before census date are count data, and have been used in many studies from the ‘social physics’ school (e.g. Stewart, 1948) onwards. The traditional gravity model (so-called by analogy with Newton’s Law of Gravity) took the form: Mij = k Pi Pj / dij2
(4)
where Mij is the number of migrants recorded between place i and place j, Pi is the population of i, Pj is the population of j, and dij is the distance between i and j. This equation has the useful property that, if both sides are logged, the equation takes the form of a multiple regression equation. Letting lnMij be the Y variable, ln k be the intercept term b0, and allowing the coefficients of lnPi, lnPj and lndij to vary from the original 1, 1 and -2 so that new values can be estimated from the data, we get lnYij = b0 + b1 lnPi + b2 lnPj + b3 lndij
(5)
where b1 and b2 would be expected to be around 1 and b3 around -2. This model can then be fitted using multiple regression analysis. This can be done using OLS regression or, more satisfactorily, Poisson regression analysis. Both models have been fitted and their results are compared below.
Possible Explanatory Variables There are many possible variables which could be useful in modelling the migration flow data. We
have already discussed gravity models, and it is clearly worth including origin population, destination population and distance. All three variables are expected to have multiplicative relationships, so it is appropriate to include them in logarithmic form. Distance is defined as a straight line, measured in kilometres, between the population-weighted centroids of the two districts concerned. This distance measure may not capture the reality of the situation. Most of the migration between i and j may actually be short moves across a common border. Accordingly, migration may be underestimated unless a contiguity variable is included as well as distance. The simplest way to do this is to code each pair of districts which border on each other as 1 and the rest as 0. This dummy variable is then added to the model. There are many other variables which may have an influence on migration, including employmentrelated criteria, housing-related criteria, demographic criteria and economic linkages between places with related industrial structures. In most cases, it is possible to think of variables having effects at both the origin (push factors) and the destination (pull factors). Employment-related factors might include the employment rate, the unemployment rate, the youth unemployment rate (because young adults have the highest migration rates) and the vacancy rate. Housing criteria include tenure (privately rented properties may aid high mobility, while local authority housing may deter longer-distance moves because of local access rules), new housing construction and house prices. Demographic factors might include age structure (some places may be particularly attractive to young families while others are attractive to older people). There may be people who move to be closer to (or to get away from) others of similar ethnicity, religion or culture. Most of these variables are obtained from the 2001 Census, except for employment information which comes from NOMIS.
265
Modelling Migration with Poisson Regression
Table 2. Results from the OLS regressions Variable
Coefficient (b)
t
p>|t|
Constant
- 9.650
-101.52
.000
Logopop
0.697**
128.78
.000
Logrpop
0.667**
123.29
.000
Logdist
-0.972**
- 237.05
.000
R = 0.368 2
fitting the OLS Model Table 2 shows the results of fitting the gravity model using standard OLS techniques. It shows the variables in the model and their coefficients, the latter being the amount one would expect the dependent variable to increase if the independent variable increases by 1. The output also shows the results of a set of t tests. A large t value indicates that we can be confident that the coefficient could not have been generated just by sampling variation. Indeed, the last column (headed p>|t|) is the probability of random chance generating data with a t value higher than the calculated one – in this case, the constant and all three independent variables have less than a 1 in 1000 chance of having come from random variation. If the relationship is negative (e.g. migration declines as distance increases) the t value, like the b coefficient, is negative, so the absolute value of t must be used in calculating p, which can also be regarded as the level of statistical significance. The usual criterion for evaluating an OLS model is the coefficient of determination (R2) which varies from 0 to 1, and is calculated as the regression sum of squares divided by the total sum of squares – in other words, the larger the proportion of the variance in the dependent variable (lnMij) that can be accounted for by the independent variables; the higher the value of R2, the better the model’s goodness of fit. The R2 value for the OLS gravity model is 0.368, a value indicating that the size and distance variables do
266
have some usefulness in predicting migration, but there is a large amount of variation not accounted for. How much of this is because there are important variables not in the model, and how much because the model is mis-specified, remains to be seen. Another worrying result is the low values of both the population coefficients. In the original versions of the gravity model, both were expected to have a coefficient of 1. This seems sensible on the assumptions that all residents of place i have equal probabilities of moving, and that destination population is closely correlated with opportunities for prospective migrants. The low values recorded in the analysis (0.697 and 0.667) could perhaps be understood in terms of higher proportions of migrants from large cities finding opportunities without crossing district boundaries, and of larger places being relatively less attractive than smaller places.
fitting the Poisson Model Poisson regression can be carried out in Stata using the poisson command. Output includes the regression coefficients and their standard errors, and a number of measures of goodness of fit; see the first column of numbers in Table 3. Unfortunately these do not include the deviance, the standard measure used in other packages like GLIM. However the log likelihood is printed out by Stata. The deviance is defined as: D = Σ yi ln(yi /θi)
(6)
Modelling Migration with Poisson Regression
Table 3. Parameters from the Poisson and related models Variable
Gravity
Contiguity
Contiguity
Negative
Zero
model
model
model + offset
Binomial model
Inflated Poisson model
constant
-6.701**
-9.057**
-12.1467**
-11.08**
-7.09
logopop
0.695**
0.718**
1.000
0.765**
0.619
logrpop
0.625**
0.650**
0.0619**
0.745**
0.556
logdist
-1.334**
-0.973**
-0.955**
-0.917**
-0.853
1.529**
1.536**
1.994**
1.572
contig alpha pseudo R
1.872* 2
0.669
0.731
0.702
0.741
0.721
** significant at the .001 level; * significant at the .05 level
where θi is the estimate of Yi derived from the model. The deviance, calculated as minus twice the log likelihood, is equal to 3,289,405, and can be compared with the chi-squared statistic with the same degrees of freedom (number of cases minus number of parameters fitted). If the model fits, it should have a deviance less than the 95% point of the chi-squared statistic. As a rule of thumb, deviance should be a little larger than the number of degrees of freedom, but of the same order of magnitude. In this example, the number of cases (166,056) is about one twentieth of the deviance, so the model is nowhere near fitting the data. As with all regression results, a good fit does not imply any causal conclusions, merely that the model could have generated the observed data; in this case the poor fit means that the model could not have generated the data. Results of all the models discussed in the remainder of this chapter are reported in Table 3 and reference is made to particular model results from the corresponding sections of the chapter that follow. It is useful to start the analysis with a Poisson regression with no independent variables, in effect testing the hypothesis that there is no systematic variation in the data. The deviance from this model (the ‘null model’) can act as a benchmark for subsequent analysis. In this case, the null model deviance was 9,920,154; the gravity model deviance was 3,289,405, so the proportion of deviance
accounted for was .669. This figure is referred to as G2 and represents a simple way of assessing the model’s success. Another measure of this type is pseudo R2 or the likelihood ratio index, which is based on the log-likelihood functions of the null and fitted models. Although both measures range from 0 to 1, G2 is not computed in the same way, and is not directly comparable. However, both statistics measure the success of a model, and it is tempting to regard the difference between the OLS R2 value and the Poisson G2 as an indicator of the superiority of the Poisson model. Goodness of fit measures for Poisson regression are discussed in more detail by Greene (2005, p. 742). It is also useful to look at the change in deviance as variables are added to, or subtracted from, the model. Again, it can be compared with chi-squared. If one new variable is introduced into the model it must produce a significant reduction in deviance to account for the loss of 1 degree of freedom. At the .05 level, this amounts to a reduction of 3.84 being necessary to justify the variable’s inclusion. If the variable is categorical with k categories, there will be a loss of k-1 degrees of freedom, with the critical value equal to chi-squared with k-1 degrees of freedom.
267
Modelling Migration with Poisson Regression
Table 4. Largest positive residuals from the Poisson gravity model Rank 1
Origin Aberdeen City
Destination Aberdeenshire
Observed Flow
Estimated Flow
Deviance Residual
3,207
158.5
9,645.1
2
Aberdeenshire
Aberdeen City
2,997
159.2
8,796.4
3
Hull
East Riding
3,736
461.7
7,811.7
4
East Riding
Hull
3,651
469.9
7,485.3
5
Bristol
South Gloucestershire
4,151
716.7
7,291.1
6
Glasgow City
South Lanarkshire
2,427
251.5
5,502.1
7
South Gloucerstershire
Bristol
3,109
695.2
4,657.0
8
Coventry
Warwick
2,518
412.7
4,553.5
9
Blackpool
Wyre
1,734
165.2
4,076.7
10
Edinburgh
East Lothian
1,530
111.7
4,003.9
Analysis of Residuals Another way of assessing the model is to inspect the residuals, measures of the cases where the model is particularly unsuccessful. These can be defined in several ways. The most basic is the arithmetic difference between observed value and model estimate; because the model is fitted on a logarithmic scale, one could use residuals derived as the difference between the logs of the observed and estimated values. In both cases, it is difficult to compare residuals between different data sets, so residuals are often standardised (calculated by subtracting the mean value and dividing by the standard deviation). Residuals can be positive (the model underestimates the amount of migration) or negative (the model overestimates migration), both of which may be of interest. In the case of Poisson regression, it seems logical to use the contribution to the deviance made by particular cases. These deviance residuals represent the cases which contribute most to the model’s poor fit, and it is suggested that they may give clues to variables which should be included in the model. So, residuals are potentially useful for suggesting new variables to include in the analysis, or to evaluate how successful existing variables are in accounting for effects already included in the model. This procedure can be applied to the
268
inter-district data set. Thus an initial look at the largest flows (Table 1) showed the importance of short-distance moves. A look at the residuals from the gravity model (Table 4) shows that many of the largest residuals are still between adjacent districts. Many of the people making up these flows may actually have moved short distances which happened to cross a district boundary. Accordingly, a contiguity dummy variable was introduced, intended to account for this effect. The results of this new model are shown in Tables 3 and 5. The pseudo R2 statistic rises substantially from 0.669 to 0.731. The effect of distance on the model is reduced somewhat and the contiguity variable is significant and positive. However, the origin and destination population coefficients remain low, and the biggest residuals still include some large flows between adjacent areas. A reasonable next step might be to assess whether a more sophisticated measurement of contiguity (such as length or proportion of common boundary) would improve model fit further. Another large residual that illustrates a more general trend is the flow from Richmondshire to Edinburgh (649). Richmondshire is a rural area in North Yorkshire but it includes the largest army base in Great Britain at Catterick. Troop movements seem likely to make up most of the flow. Richmondshire also features in more of the
Modelling Migration with Poisson Regression
Table 5. Largest positive residuals from the Poisson gravity model with contiguity Rank
Origin
Destination
Observed Flow
Estimated Flow
Deviance Residual
418.5
6,530.6
1
Aberdeen City
Aberdeenshire
3,207
2
Aberdeenshire
Aberdeen City
2,997
420.5
5,886.0
3
Hull
East Riding
3,736
1,000.9
4,920.6
4
East Riding
Hull
3,651
1,018.7
4,660.4
5
Bristol
South Gloucestershire
4,151
1,445.7
4,378.2
6
Hammersmith & F
Wandsworth
2,336
410.2
4,063.6
7
Edinburgh
Fife
1,590
167.3
3,580.0
8
Fife
Edinburgh
1,436
164.5
3,111.8
9
Aberdeen City
Edinburgh
883
27.6
3,059.8
10
Richmondshire
Edinburgh
649
7.4
2,902.1
larger residuals, such as the flows from Richmondshire to Shepway (83) and Salisbury (89) and to Richmondshire from Fylde (422) and Rushmoor (80). There are other migration flows involving relatively small and distant districts which have military installations within their boundaries. Examples include Bridgnorth to Moray (235) and North Kesteven to Moray (117), places linked together by having Royal Air Force bases. It is also useful to look at the flows which make up the largest negative residuals, those cases where the model greatly overstates the observed migration. Table 6 shows that eight out of the top
ten negative residuals are flows between adjacent London boroughs (whose fitted values are raised by the contiguity effect). This is surprising, given that some of the largest positive residuals were for flows between London boroughs. A possible reason might be that migration tends to follow sectors radiating out from the centre while few migrants cross from one sector to another. Also high on the list of large negative residuals are moves between Birmingham and the other West Midlands boroughs.
Table 6. Largest negative residuals from the Poisson gravity model with contiguity Rank
Origin
Destination
Migrants
Estimated Flow
Deviance Residual
1
Sandwell
Birmingham
1,725
3,841.4
-1,381.0
2
Hammersmith & F
Kensington & Chelsea
1,216
3,549.4
-1,302.6
3
Southwark
Lambeth
2,011
3,691.0
-1,221.2
4
Birmingham
Sandwell
2,738
4,185.3
-1,161.9
5
Kensington & C
Hammersmith & F
2,009
3,539.9
-1,138.0
6
Hackney
Islington
1,044
2,816.5
-1,036.1
7
Lewisham
Southwark
1,300
2,857.2
-1,023.7
8
Ealing
Brent
1,040
2,728.8
-1,003.2
9
Islington
Camden
1,186
2,746.8
-996.1
10
Lambeth
Southwark
2,490
3,712.4
-994.5
269
Modelling Migration with Poisson Regression
ISSUES ENCOUNTERED WHEN fITTING MODELS There are many factors which affect migration and hence it is to be expected that the overall model will include a lot of X variables. Unfortunately the best estimates of migrant flows may vary between different sets of Xs. An initial strategy is to calculate the correlation coefficients between each of the available X variables and the Y variable. The X variables with higher correlations (positive or negative) are the most likely to be important in a multiple regression, Poisson or otherwise. A stepwise approach is often used to fit the model. First, Y is regressed on the highest correlated X variable. Then the residuals from this regression are correlated with the remaining X variables and the one with the highest is added in with the first X variable to a new multiple regression. Then the residuals from this regression are correlated with the remaining X variables, the best one selected and added to the regression model, and so on. If one or more of the X variables is not statistically significant, it drops out of the model. Sometimes a variable which seems strongly related to Y becomes insignificant when other X variables are in the model. When all variables have been tried in the regression analysis, the equation based on the remaining X variables is accepted as the best model. An alternative method is backward elimination. This starts with a model containing all the candidate X variables. The least significant variable is omitted and the regression is refitted with the remaining variables. Then the least significant of these is omitted and the regression is run again. The process is continued until only significant variables are left. The resulting regression, however, will not necessarily give the same results as a stepwise approach. Usually backward elimination performs better, but there is no guarantee that either method will find the overall best model. Other approaches to model fitting may be guided to a greater or lesser extent by theory. A
270
model of place-to-place migration which includes variables theoretically related to migration may be preferable to one where there is no convincing link between the X variables and migration, even if the latter has a more impressive goodness of fit. The issues here are no different from those faced in OLS multiple regression.
Weighted OLS Regression Defenders of the OLS approach may argue that the main reason for the apparent poor performance of OLS is the homoscedasticity (equal variance) assumption. This can be relaxed by weighting each observation appropriately. The weight should be the square root of the expected value for each case, but because these are not available before the analysis, we instead used the square root of the observed value. The model fitted was the gravity model with contiguity, and it achieved an R2 value of .721, considerably better than the unweighted OLS model but not as good as the Poisson model. This supports the idea that much of the superiority of the Poisson approach is due to the removal of the homoscedasticity (equal variance) assumption. These issues are discussed in more detail by Flowerdew (1982).
Offsets In many circumstances, it may be reasonable to expect Y to be in proportion to an X variable, such as that representing population at risk. In such a case, the regression coefficient for that variable might be expected to be 1. It is possible in Poisson regression to force the coefficient to be 1 by declaring the variable as an offset. For example, the coefficient of the logarithm of origin population might be expected to be 1, assuming that the same proportions of people become migrants regardless of city size. In our modelling analysis (Table 3), we found that the population coefficients were considerably below 1, suggesting that this assumption is unjustified. The model was fitted
Modelling Migration with Poisson Regression
with the logarithm of origin population as an offset and results showed a decline in pseudo R2 from 0.731 to 0.702, confirming the view that an offset is not appropriate on this occasion
Overdispersion It is frequently found in Poisson regression that the deviance is so high that the Poisson distribution clearly does not fit. Possibly there is one (or more) additional explanatory variable(s) whose inclusion would reduce the deviance to an acceptable level. It is more likely, however, that the independence assumption made in Poisson regression is not realistic. Many migrants do not move independently. They move with friends, partners, children and other household members. This has the effect of increasing the variance to be greater than the mean, increasing the goodness of fit by allowing extra-Poisson variation. The most common approach to this issue is to try negative binomial regression. The negative binomial distribution, like the Poisson, can only have non-negative integers as its values. Unlike the Poisson, however, the negative binomial has two parameters; one is the mean and the other (alpha) is an indication of how much overdispersion there is. The negative binomial model can be regarded as a generalised Poisson model (Flowerdew and Lovett, 1989). Thus, if we regard the number of households moving from i to j as a Poisson process, and if the number of people in a household is logarithmically distributed, the total number of movers has a negative binomial distribution. A negative binomial model was fitted to the data with the results as shown in Table 3. The output includes a test of a null hypothesis that alpha is zero, or equivalently that the negative binomial model is not significantly better than the Poisson model. Clearly the null hypothesis can be rejected. Note that the pseudo R2 value is not comparable with the statistics generated for the Poisson models.
We can arguably do a little better than this. Rather than having to assume that the number of people in a household is logarithmically distributed, we can take advantage of information on household size data from the census to fit a model where the Poisson model of household movement is generalised by the observed household size distribution. Further details are given by Flowerdew and Lovett (1989). Unfortunately, however, ONS’s disturbance of the migration model to constrain non-zero migrant flows to be multiples of 3 is a major barrier to this type of analysis.
Underdispersion While overdispersion is characterised by deviance values much greater than the appropriate chi-squared statistic, the much rarer phenomenon of underdispersion occurs if the deviance is substantially less than chi-squared. This sometimes happens in migration studies, especially where the number of migrants is small compared with the size of the origin-destination matrix. Boyle and Flowerdew (1993) encountered this situation in their study of inter-ward migration in Hereford and Worcester. An alternative way of assessing model goodness of fit was developed based on simulation of migrant counts, assuming that the fitted values are correct. By repeating the simulation a large number of times, it is possible to see if the real data have similar deviances to the simulated data, in which case the model can be regarded as fitting the data. If the real data have a much higher deviance than the simulated data, it suggests that the model is inadequate and may even be overdispersed despite the initial underdispersion.
Appropriate Measures of Distance In this data set, any type of analysis shows the importance of distance as a predictor of the number of migrants. However, as we found in develop-
271
Modelling Migration with Poisson Regression
ing our models, Euclidian distance does not fully capture the importance of the relative location of i and j. The use of a dummy variable representing contiguity produced a major improvement to the model, but even so it was argued that a better measure of contiguity would be needed to model some of the major residuals effectively. As far as the distances themselves are concerned, they are based on the straight-line distance between the grid references for the centroids of origin and destination districts. These centroids are intended to represent the population centre rather than the geometric centre. Because migrants are more likely, other things being equal, to have moved shorter distances, the use of population-weighted centroids will overestimate the distance moved by most inter-district migrants. Boyle and Flowerdew (1997) discuss this issue in more detail, and suggest a method which, for each pair of districts, derives migration-weighted centroids to be used instead of population-weighted centroids (although a contiguity variable still improves model fit). In more unusual cases, location of the centroid may have major effects on the model. An extreme case is the ‘doughnut’ where a district entirely
surrounds another, and the centroid of the outer ring may actually fall inside the inner district. Given a suitable distribution of population in the outer district, the same thing may happen without the inner district being completely surrounded. It is possible for a district to have its centroid in another district if it is ‘boomerang’-shaped with populations concentrated at both ends. Even if this does not happen, it is likely that the location of the centroid will yield shorter distance estimates (and hence more predicted migration), especially to the area which is partially surrounded. Figure 2 shows the situation for some of the places where migration flows are biggest. Another possibility is that zonal boundaries of adjacent districts may be such that their centroids are fairly close to each other. This occurs to a degree in North London (Figure 3), where the boroughs of Hackney, Islington, Camden, Westminster, Kensington and Chelsea and Hammersmith and Fulham are all relatively long and narrow, running in a Northerly direction from the centre of London (the last two in particular). In this situation, centroids would be located at roughly the same distance out from Central London and
Figure 2. District boundaries and centroids: (a) Aberdeen; (b) Bristol; (c) Edinburgh; and (d) Hull
272
Modelling Migration with Poisson Regression
would be reasonably close to the centroids of the neighbouring boroughs. The inter-centroid distance will therefore be considerably shorter than the distance actually moved, contributing in part to the high negative deviance residuals shown in Table 6. A similar effect may be detectable in south London, involving Lambeth, Southwark and Lewisham. Straight-line distances may seem rather crude in this context, but attempts to substitute road distances, travel times or travel costs have found very high correlations between these measures. Clearly, travel times will be longer in Central Wales and the Scottish Highlands, and this means that they are really less accessible than the model suggests. Another problem occurs with islands, where the straight-line distance is even more inappropriate. This may mean that the distance effect for Anglesey or the Isle of Wight is underplayed. Straight-line distances are also relevant for flows between places on either side of a body of water, such as the Thames, Forth and Severn estuaries.
Figure 3. Boundaries and centroids of London boroughs
Intervening Opportunities The distance variable has so far been central to the discussion, but there are alternative ways of thinking about places. Thirty miles in South East England is very different from 30 miles in the North West Highlands. Stouffer (1940) developed an approach which saw distance as unimportant in itself, but significant in modelling as a fairly poor surrogate for intervening opportunities, and argued that people were less likely to move to a place if there were many closer opportunities. Fotheringham’s theory of competing destinations (1983) has much in common with this approach. Intervening opportunities models can be fitted within a Poisson regression framework. It may be interesting to compare the relative contribution of distance and opportunities and whether one acts as a surrogate for the other.
The Modifiable Areal Unit Problem The modifiable areal unit problem (Openshaw, 1984) is that the same data can give totally different answers depending on how the data have been aggregated. The effect is more complicated in the context of place-to-place migration because the total number of inter-zonal migrants will vary according to the size and configuration of the zonal system. This is because neighbouring areal units may have a large migration flow between them, but if they are grouped together, these become intrazonal migrants and disappear from the system. For example, in the inter-district analysis of 2001 data, simply amalgamating Aberdeen City and Aberdeenshire would reduce the total number of inter-district moves by 6,204. The best practice with regard to the modifiable areal unit problem is probably to bear it in mind and, if possible, to try the same analysis with a different zonal system, to check for the robustness of the analysis. Research along these lines is made easier by the availability of inter-zone migration data for Great Britain at different scales.
273
Modelling Migration with Poisson Regression
CONSTRAINED MODELS Many applications of spatial interaction models have introduced constraints as part of the modelling process. The first constraint refers to the total number of migrants in the system; the number predicted by the model should be the same as in the observed data. This is not necessarily true in OLS regression, but is true for Poisson models. It is also common to set origin and/or destination constraints. Flowerdew and Lovett (1988) use an inter-urban migration data set to illustrate how these constrained models can be fitted as special cases of Poisson regression. Within a city of k zones, those migrant arriving at each destination from zone i must equal the sum of all migrants leaving zone i. This is the origin constraint. Similarly, those flows leaving all zones for zone j must sum to the total flows entering zone j. This is the destination constraint. An unconstrained model is used to estimate the generation of migrants in zone i, the attraction of migrants to place j, and the distribution of migrant flows across the study area. An origin-constrained model treats the generation of migrants as an exogenous process, and concentrates on attraction and distribution. A destination-constrained model treats the attraction of migrants separately from generation and distribution. A doubly constrained model treats both generation and attraction as exogenous. It is straightforward to convert an unconstrained Poisson regression into an origin-constrained model. What is needed is a set of dummy variables to correspond with each of the origin districts. Explanatory variables relating to the origin should be left out, but ones concerned with destination characteristics or with links between origin and destination (such as distance and contiguity) are left in the model. In addition to the remaining explanatory variables, the dummy variables will be printed out with their own coefficients. For a destination-constrained model, dummy variables must be created for each destination and explanatory variables relating to destinations should be
274
left out (origin variables coming back in). Each place has its own coefficient in the model, but these raise questions of interpretation. A doubly constrained model will be based on two sets of dummy variables and only those other variables (like distance and contiguity) which connect the places concerned. So far we have assumed that distance as a deterrent to migration has the same effect everywhere, but it might be expected for example that people in central England would have very different attitudes to distance compared to people from the Scottish Highlands. It is possible to investigate this further by constructing terms to represent interaction between the dummy variables and others. For example, origin-specific distance decay parameters can be calculated (see, for example, Fotheringham and O’Kelly, 1989, pp. 98-106). Likewise, a destination-constrained model can be accompanied by destination-specific parameters formed in a similar way. Constrained models are likely to have a better fit to the data, at the cost of losing k-1 degrees of freedom (where k is the number of places in the data set). The loss would be doubled if both origin and destination constraints were used, or if a set of origin-specific (or destination-specific) parameters were fitted. One reason why constrained models have better goodness of fit than unconstrained models is that they have less to account for. An origin-constrained model for example takes out-migration from each place to be given and only works out how these migrants are distributed among the set of destinations. Another disadvantage is the difficulty of interpreting the dummy variable coefficients and origin-specific or destination-specific parameters. Mapping these values may show a degree of spatial pattern but it is not always clear what they mean. The decision whether to use constrained or unconstrained models depends on one’s view of the migration process. An origin-constrained model fits with the idea that the decision to move occurs before and independently of the choice of
Modelling Migration with Poisson Regression
a destination. An unconstrained model would be appropriate if there was a series of opportunities that arose in different places, and the decision was simply to accept or reject each opportunity. In terms of labour markets, the distinction is related to the contrast between speculative and contracted migrants (Silvers, 1979). Most labour market theory assumes migrants are speculative (and hence might well decide to migrate before choosing a destination). But, in modern Great Britain, most labour migrants may be contracted, making a decision to migrate (or not) to a particular place, with the question of whether to migrate entirely subsumed by the evaluation of the proposed destination. For these people, an unconstrained model would be more appropriate. Following the innovative work of Alan Wilson (1970), a slightly different approach to spatial interaction modelling was developed using the principle of entropy maximisation. Given certain assumptions, such as the constancy of total travel, he identified the most likely sets of flows. He developed a set of models corresponding to the various constraints identified above. Stillwell (1991) provides a good discussion of these models in the context of migration. The entropy maximising models are clearly an improvement on OLS regression. They have been shown, for example by Baxter (1982), to be based on the Poisson distribution. The method as a whole is equivalent to a special case of Poisson regression analysis. The relationships between the two styles of model are discussed in some detail by Flowerdew (1991). Many of the differences are a matter of convention or preferences rather than substantive issues.
NEGATIVE BINOMIAL MODELS The problem of overdispersion has been mentioned above, and fitting a negative binomial model is one way to deal with it. In the case of inter-district migration, the Poisson assumption that all events
are independent does not seem very realistic. Generalised probability distributions can recognise that many people move as families rather than as isolated individuals. The negative binomial distribution has its variance bigger than the mean (in the Poisson, mean and variance are equal). This means that it is less unlikely for observed values to be distant from the expected values. Deviance figures are considerably lower than in the Poisson. For example, the null model deviance for the migration data is 1,040,162 compared to 8,920,154 for the Poisson. The negative binomial distribution has an additional parameter α which reflects the relative spread of data around the Poisson mean. The Stata command is called nbreg; the output includes a test of whether α is zero. The negative binomial is just the Poisson if α = 0. Ιn this case, α is clearly non-zero, indicating that the negative binomial distribution is considerably more compatible with the data. For comparative purposes, nbreg was run with the origin population, destination population, distance (all logarithms) and contiguity. Results are in the final column of Table 3. It can be seen that there are differences but they are not very big. Lists of the largest residuals in Table 7 (positive) and Table 8 (negative) can be compared with the equivalent results from the Poisson (Tables 5 and 6). Again there are some differences, though the lists are broadly similar.
ZERO-TRUNCATED AND ZERO-INfLATED MODELS Poisson and related models for count data are usually defined to take values for non-negative integers 0, 1, 2, 3, and so on. However, data may only be available for positive integers, excluding zero. In the case of interdistrict migration, this could be useful if the data set only included pairs of places between which somebody had moved. The Stata reference manual gives the example of
275
Modelling Migration with Poisson Regression
Table 7. Largest positive residuals from the negative binomial model with contiguity Rank
Origin
Destination
Observed Flow
Estimated Flow
Deviance Residual
1
Aberdeen City
Aberdeenshire
3,207
606.9
5,338.8
2
Aberdeenshire
Aberdeen City
2,997
607.7
4,782.2
3
Hammersmith & F
Wandsworth
2,336
341.7
4,490.1
4
Edinburgh
Fife
1,590
166.3
3,589.5
5
Hull
East Riding
3,736
1,459.7
3,511.0
6
East Riding
Hull
3,651
1,457.2
3,328.3
7
Fife
Edinburgh
1,436
165.5
3,102.8
8
Aberdeen City
Edinburgh
883
29.4
3,3003.4
9
Richmondshire
Edinburgh
649
7.5
2,897.5
10
Bristol
South Gloucestershire
4,151
2,078.1
2,872.1
Table 8. Largest negative residuals from the negative binomial model Rank
Origin
Destination
Migrants
Estimated Flow
Deviance Residual
1
Birmingham
Sandwell
2,738
6,265.0
2,266.4
2
Sandwell
Birmingham
1,725
6,110.5
2,181.7
3
Southwark
Lambeth
2,011
4,893.2
1,788.2
4
Lambeth
Southwark
2,490
4,901.5
1,686.4
5
Hammersmith & Fulham
Kensington & Chelsea
1,216
4,257.0
1,532.6
6
Birmingham
Walsall
1,416.4
4,135.4
1,517.6
7
Kensington & Chelsea
Hammersmith & Fulham
2,009
4,253.6
1,507.0
8
Lewisham
Southwark
1,300
3,806.4
1,396.6
9
Walsall
Birmingham
964
4,024.5
1,377.6
10
Brent
Ealing
1,360
3,731.7
1,372.8
a study of the number of days patients undergoing certain treatments stayed in hospital; records held by the hospital might well exclude patients who did not stay there at all (StataCorp, 2005). More generally, the data collection system may only be activated when the first relevant event takes place. The zero-truncated Poisson model is available in Stata. Using the same data set (without the cases with zero flow) and the same set of X variables as before, the model has a deviance of 1,980,564. Pseudo R2 is 0.741, slightly higher than the equivalent non-truncated model. Neither the parameters nor the list of major residuals are 276
very different. As usual, there is an overdispersion problem and, just as the negative binomial can be used to generalise the Poisson, so there is a zero-truncated negative binomial model, also available in Stata. The alpha statistic confirms that the Poisson model does not fit the data. The variable coefficients are all slightly closer to zero, and again there is little change to the residual lists. These models incorporate two processes, a logistic model, wherein each case must pass a test in order to feature in the second process which is Poisson. For example (StataCorp 2005, p.529), one might wish to model the number of fish caught by visitors to a park. Here there is likely to be
Modelling Migration with Poisson Regression
an inflated number of zeros. First, many of the visitors did not go fishing and hence caught zero fish. Second, if catching fish is a Poisson process, some of the fishermen may have caught nothing purely by chance. The logistic model indicates which of the observed zeros are relevant to the Poisson model and which merely indicate nonparticipation. It is possible to apply this model to migration, with some simplifying assumptions. Consider the number of times somebody moves over a five year period. Assume that the population is divided into two types, movers and stayers (Goldstein 1964). Movers are willing to move if a suitable opportunity is available; stayers will not move. In this case also, zeroes can sometimes be recorded by a mover but will also be recorded by all the stayers. The appropriate strategy is to fit a logit model to estimate the number of zeroes who are stayers The remaining zeroes, with all the positive values, are then used in fitting a Poisson model. Both the logit and the Poisson model can involve a linear combination of explanatory variables. These explanatory variables could (but need not) be the same in both estimating equations. Bohara and Krieg (1996) have fitted zero-inflated Poisson models of this type to US migration data, with some success. It is harder to see how zero inflation would be relevant to the inter-district flow data that have been used as an example in this chapter. Nevertheless a zero-inflated Poisson model was fitted to the data, using destination unemployment rate as a variable in the logit model. The deviance was 2,193,394 with a pseudo R2 = 0.721. It was far from obvious which independent variables should be used in the logit model. After a little experimentation, the best result was obtained using the same variables (gravity model variables and contiguity) in both the logit and the Poisson models. Table 3 shows the results of fitting the model. Compared to other models, the values of the regression coefficients are relatively low, with the exception of contiguity. In addition to the zip
command (zero-inflated Poisson), Stata has a command zinb (zero-inflated negative binomial). It is appropriate to use in the same circumstances as zip, and would be preferred in situations where data are overdispersed.
CONCLUSION There can be little doubt that it is more appropriate to analyse count data using a discrete probability distribution like the Poisson rather than OLS regression. Count data clearly cannot have normally distributed error terms. Indeed, Griffith and Haining (2006, p.136) say that the “use of log-normal approximations .. should be a practice of the past.”. This is particularly important where some of the counts are low. The equality of mean and variance automatically weights the influence of each case at an appropriate level. This chapter has been written to cover most of the issues that may arise when modelling migration. These have included issues concerned with measures of distance and the effects of centroid location, the impact of the modifiable areal unit problem, the role of constraints, the problems of overdispersion and the pros and cons of negative binomial models compared to Poisson. Although Poisson regression is still not a very trendy approach, a lot more has been written about it in the last few years by statisticians and economists. Its availability through Stata makes it more accessible to social scientists than it has been in the past. The discussion was centred around the use of Poisson regression to analyse a large British migration data set. The standard gravity model variables of size and distance had the strongest influence on inter-district migration, together with the contiguity dummy variable, and the consistent importance of short-distance moves in the lists of big residuals suggest that a more sensitive approach to characterising contiguity is needed. There are particular problems in accounting for movement within London, some of the London
277
Modelling Migration with Poisson Regression
boroughs appearing in lists of both large positive and large negative residuals. It has been suggested that centroid locations for some boroughs lead to fitted values being poor predictors of migration. It appears that much migration within London follows a sectoral pattern, with many moves going out from the centre in the same direction and very few moves between sectors. The other variables made minor contributions to understanding migration patterns. Almost all variables tried were statistically significant, but none made as much difference as size, distance and contiguity. Places with high unemployment tended to have less in-migration, but also, more surprisingly, had low out-migration. Housing tenure had some effect, but not a great deal. The final factor which stood out from the residual lists was the occurrence of large flows in, between and out of relatively small areas – such as Richmondshire, Rushmoor, Moray, Bridgnorth, North Kesteven and several others. The explanation is probably that all these districts include military bases, and the large flows between them may reflect troop movements.
Baxter, M. (1982). Similarities in methods of estimating spatial interaction models. Geographical Analysis, 14, 267–272.
ACKNOWLEDGMENT
Flowerdew, R. (1982). Fitting the lognormal gravity model to heteroscedastic data. Geographical Analysis, 14, 263–267.
This chapter is based on a presentation to the Research Methods Festival at St Catherine’s College, Oxford, 2 July 2008. Census output is Crown copyright and is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland. The data were bought for the academic community by ESRC and JISC. I acknowledge the CIDER facility for aid in putting the data set together, and Zhiqiang Feng for technical help.
REfERENCES Bailey, T. C., & Gatrell, A. C. (1995). Interactive Spatial Data Analysis. Harlow, UK: Longman.
278
Bohara, A. K., & Krieg, R. G. (1996). A zeroinflated Poisson model of migration frequency. International Regional Science Review, 19, 211–222. Boyle, P., & Flowerdew, R. (1993). Modelling sparse interaction matrices: interward migration in Hereford and Worcester and the underdispersion problem. Environment & Planning A, 25, 1201–1209. doi:10.1068/a251201 Boyle, P., & Flowerdew, R. (1997). Improving distance estimates between areal units in migration models. Geographical Analysis, 29, 93–107. Cameron, A. C., & Trivedi, P. K. (1998). Regression Analysis of Count Data. Cambridge, UK: Cambridge University Press. Duke-Williams, O., & Stillwell, J. (2007). Investigating the potential effects of small cell adjustment on interaction data from the 2001 Census. Environment & Planning A, 39, 1079–1100. doi:10.1068/a38143
Flowerdew, R. (1991). Poisson regression modelling of migration. In Stillwell, J., & Congdon, P. (Eds.), Migration Models: Macro and Micro Approaches (pp. 92–112). London: Belhaven. Flowerdew, R., & Aitkin, M. (1982). A method of fitting the gravity model based on the Poisson distribution. Journal of Regional Science, 22, 191–222. doi:10.1111/j.1467-9787.1982. tb00744.x Flowerdew, R., & Lovett, A. (1988). Fitting constrained Poisson regression models to interurban migration flows. Geographical Analysis, 20, 297–307.
Modelling Migration with Poisson Regression
Flowerdew, R., & Lovett, A. (1989). Compound and generalised Poisson models for inter-urban migration. In Congdon, P., & Batey, P. (Eds.), Advances in Regional Demography (pp. 246–256). London: Belhaven. Fotheringham, A. S. (1983). A new set of spatial interaction models: the theory of competing destinations. Environment & Planning A, 15, 15–36. doi:10.1068/a150015 Fotheringham, A. S., & O’Kelly, M. E. (1989). Spatial Interaction Models: Formulations and Applications. Dordrecht, The Netherlands: Kluwer. Goldstein, S. (1964). The extent of repeated migration: an analysis based on the Danish Population Register. Journal of the American Statistical Association, 59, 1121–1132. doi:10.2307/2282627 Greene, W. H. (2003). Econometric Analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall. Griffith, D., & Haining, R. (2006). Beyond mule kicks: the Poisson distribution in geographical analysis. Geographical Analysis, 38, 123–139. doi:10.1111/j.0016-7363.2006.00679.x Guy, C. M. (1991). Spatial interaction modelling in retail planning practice: the need for robust statistical methods. Environment and Planning B, 18, 191–203. doi:10.1068/b180191 Haining, R. (2003). Spatial Data Aanalysis: Theory and Practice. Cambridge, UK: Cambridge University Press. Kirkwood, B. R., & Sterne, J. A. C. (2003). Essential Medical Statistics (2nd ed.). Oxford, UK: Blackwell. Lovett, A., & Flowerdew, R. (1989). Analysis of count data using Poisson regression. The Professional Geographer, 41, 190–198. doi:10.1111/ j.0033-0124.1989.00190.x
Nelder, J., & Wedderburn, R. W. M. (1972). Generalised linear models. Journal of the Royal Statistical Society A, 135, 370–384. doi:10.2307/2344614 O’Brien, L. (1992). Introducing Quantitative Geography: Measurement, Methods and Generalised Linear Models. London: Routledge. Openshaw, S. (1984). The Modifiable Areal Unit Problem (Concepts and Techniques in Modern Geography 38). Norwich, UK: GeoBooks. Petrie, A., & Sabin, C. (2005). Medical Statistics at a Glance (2nd ed.). Oxford, UK: Blackwell. Senior, M. L. (1987). The establishment of family planning clinics in south-west Nigeria by 1970: analyses using logit and Poisson regression. Area, 19, 237–245. Silvers, A. L. (1979). Probabilistic income maximising behaviour in regional migration. International Regional Science Review, 2, 29–40. doi:10.1177/016001767700200103 StataCorp. (2005). Stata Statistical Software: Release 9. Reference R-Z. College Station, TX: StataCorp LP. Stewart, J. Q. (1948). Demographic gravitation: evidence and applications. Sociometry, 11, 31–58. doi:10.2307/2785468 Stillwell, J. (1991). Spatial interaction models and the propensity to migrate over distance. In Stillwell, J., & Congdon, P. (Eds.), Migration Models: Macro and Micro Approaches (pp. 34–56). London: Belhaven. Stouffer, S. A. (1940). Intervening opportunities: a theory relating mobility and distance. American Sociological Review, 5, 845–867. doi:10.2307/2084520 Wilson, A. G. (1970). Entropy in Urban and Regional Modelling. London: Pion.
279
280
Chapter 15
Analysing Structures of Interregional Migration in England James Raymer University of Southampton, UK Corrado Giulietti University of Southampton, UK
ABSTRACT In this chapter, we explore the age and ethnic structures of interregional migration in England, as measured by the 1991 and 2001 Censuses. In doing so, we first analyse the main effect and two-way interaction components of migration flow tables cross-classified by (1) origin, destination and age and (2) origin, destination and ethnicity. Second, we test the significance of three-way interaction terms over time by comparing various unsaturated log-linear model fits. The aim is to identify the key structures in the migration flow tables and how they have changed over time. This is important for understanding the mechanisms underlying the more general patterns of migration. These analyses could also be used to inform the estimation or projection of migration flows. Our findings are that, despite a large increase in the levels of interregional migration, migration structures in England have remained fairly stable over time. The main changes have to do with the increases in the relative levels of ethnic migration over time, which has been unequal across space.
INTRODUCTION A methodology for identifying key structures in migration flow tables is presented in this chapter. The tables contain flows of persons cross-classified by origin, destination and some other characteristic, such as age or ethnicity. This work is important for both understanding aggregate migration patterns in DOI: 10.4018/978-1-61520-755-8.ch015
general and for estimating flows in the context of incomplete migration data. The research contained in this chapter represents an important first step in larger project funded by the Economic and Social Research Council on combining migration data in England and Wales. In order to combine data, we need to understand the comparability between different data sources, as well as the underlying structures that drive the migration patterns. For example, is an increase in the
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Analysing Structures of Interregional Migration in England
migration from origin i to destination j caused by an overall increase in the level of mobility within a country, a relative increase in the numbers leaving the origin, a relative increase in the attractiveness of the destination, an increase in the connectivity between the two, or some combination of any or all of these factors (or ‘structures’)? The approach taken in this study follows from Baydar (1984) and Willekens and Baydar (1986) and, more recently, from Raymer et al. (2006) and Raymer and Rogers (2007), who disaggregated migration flows into multiplicative components for analysis and estimation. We begin the chapter with a description of a general modelling framework for describing and analyzing various structures of interregional migration flows. The approach decomposes observed flows of migration into their multiplicative structures. Two applications follow. First, we illustrate the multiplicative decomposition of the 1991 and 2001 age-specific and ethnicspecific interregional migration flows in England. Second, we identify the key structures contained in the observed 1991 and 2001 age-specific and ethnic-specific interregional migration flows by fitting several unsaturated log-linear models to them. The chapter ends with a summary of key findings and a discussion of how these results may be used.
framework is extended to include migration by age and ethnicity, providing a better understanding of population change and redistribution. Studies of age and ethnic differences in the migration patterns of England have been explored in previous studies (see, for example, Bates and Bracken, 1982; 1987, for age-specific migration and Finney and Simpson, 2008; Stillwell and Duke-Williams, 2005, for ethnic-specific migration patterns). We extend this work by analysing the underlying (multiplicative) structures of these patterns.
Multiplicative Component framework Two-way tables of interregional migration flows (i.e., origin by destination) can be disaggregated into four separate structures or components (Rogers et al., 2002): an overall component representing the level of migration, an origin component representing the relative ‘pushes’ from each region, a destination component representing the relative ‘pulls’ to each region, and a two-way origin-destination interaction component representing the impacts of physical or social distance between places not explained by the level or main effects of origin and destination. This breakdown is multiplicative, such that nij = (T )(Oi )(D j )(ODij )
ANALYSING MIGRATION STRUCTURES A sequence of recent papers have set out an analytical framework for describing and estimating the age and spatial structures of internal migration, represented by a multiplicative log-linear model (Raymer et al., 2006; Raymer and Rogers, 2007; Rogers et al., 2002; Rogers et al., 2001; 2002b; 2003). This chapter adds to that research by in two ways. First, the structures of migration are examined for interregional migration in England over time to identify both continuity and change. Second, the analytical
(1)
where nij is an observed flow of migration from region i to region j, T is the total number of migrants (i.e., n ++ ), Oi is the proportion of all migrants leaving from region i (i.e., ni+ / n ++ ), and Dj is the proportion of all migrants moving to region j (i.e., n + j / n ++ ). The interaction component ODij is defined as nij / éêë(T )(Oi )(D j )ùúû or the ratio of observed migration to expected migration (for the case of no interaction). This general type of model is called a multiplicative component model.
281
Analysing Structures of Interregional Migration in England
Next, consider three-way tables of interregional migration (i.e., origin by destination by some other variable). The multiplicative component model for this general case is specified as: nijk = (T )(Oi )(D j )(X k )(ODij )(OX ik )(DX jk )(ODX ijk )
(2) where Xk is the proportion of all migrants in group k of variable X, such as males and females if X is sex and different age groups if X is age. This model is more complicated because there are now three two-way interaction components and a single three-way interaction component between the origin, destination, and the variable X. However, the interpretations of the parameters remain relatively simple and follow the same format as presented for the two-way table (e.g., Raymer et al., 2006).
Log-Linear Models As outlined in Raymer and Rogers (2007), the multiplicative component descriptive model set out in equation 2 can be expressed as a saturated log-linear statistical model: ODX ln(nijk ) = l + liO + ljD + lkX + lijOD + likOX + ljkDX + lijk
(3) where the l s are simply the natural logarithms of the variables appearing in equation 2. In multiplicative form, this model is expressed as: ODX nijk = ttiO t jD tkX tijOD tikOX t jkDX tijk
(4)
where the t s denote the model’s multiplicative parameters or ‘effects’. We use this form to be consistent with the multiplicative component model. The saturated model is expressed as (ODX), using the notation set out in Agresti (2002, p. 320). The parameters of the log-linear model can be analyzed using standard statistical techniques for categori-
282
cal data analysis to identify key structures in the data. Simplified versions of the model set out in equations 3 and 4 are called unsaturated models. For example, the model that only includes the main effects of origin, destination, and variable X is specified as nˆijk = ttiO t jD tkX
(5)
and is known as the mutual independence model (Agresti, 2002 p. 318). This model assumes independence between each of the categories of origin, destination, and variable X and is designated (O, D, X). A model that includes the interaction between origin and destination plus all of the main effects is designated as (OD, X) and corresponds to the following model: nˆijk = ttiO t jD tkX tijOD
(6)
Such notations are used because these models are hierarchical, that is, for two-way interactions, the main effect parameters must be included, and for three-way interaction terms all the main effects and two-way interactions must be included. In this chapter, we exclude non-migrants or intraregional migrants by incorporating structural zeros (Willekens, 1983). Various unsaturated models in this chapter are evaluated by using the likelihood-ratio statistic (G2) and Akaike Information Criteria (AIC). G2 is defined as: G 2 = 2å nijx ln(nijk / nˆijk )
(7)
where nˆijk denotes the predicted age-specific migration flows, and where values of G2 closest to zero are associated with ‘good’ fits. Furthermore, we adjust for model complexity by dividing G2 by the residual degrees of freedom, obtained by subtracting the number of parameters used in the estimation from the total number of observations. This basically penalises the model for complexity.
Analysing Structures of Interregional Migration in England
AIC is defined as: AIC = -2l + 2p
(8)
where l is the log-likelihood for the Poisson distribution l = å (nijk ln(ˆ nijk ) - nˆijk - ln(nijk !))
(9)
and p denotes the number of parameters. Similar to the correction applied to the G2 values, the AIC penalises for model complexity.
ENGLAND’S MIGRATION fLOW DATA Description of Data The migration data used in this chapter come from censuses carried out in England in 1991 and 2001. The 1991 tables were obtained from the Special Migration Statistics (SMS) dataset called SMSGAPS available on the Centre For Interaction Data Estimation And Research (CIDER) website (http:// cider.census.ac.uk/cider). For 2001, the data were obtained from the SMS CD-ROM provided by the Office for National Statistics (ONS, 2004). Both data sets contained origin-destination-age and origin-destination-ethnicity tables (but not origin-destination-age-ethnicity tables) at the local authority level. The geography of England changed from ‘districts’ to ‘local authorities and unitary authority districts’ between 1991 and 2001. To make the data consistent over time, we used the ‘CIDS 1991/2001 common geography’ filter for Great Britain available on the CIDER website. The harmonised to 2001 geography consists of 349 common local authority districts. These data were then aggregated to Government Office Regions (GOR) consisting of the North East, North West, Yorkshire and the Humberside, East Midlands, West Midlands, East of England, South East, South
West and London. For the purposes of this paper, migration flows within regions are excluded. The age variable contains sixteen five-year age groups, ranging from 0-4 years to 75+ years. The ethnicity variable contains four ethnic groups, as classified in the 1991 data: White, South Asian, Black and Chinese and Other.
Comparability of 1991 and 2001 Censuses The comparison of the 1991 and 2001 Census migration data is not straightforward because of change in the definitions of ethnicity and usual residence of students. Compared to 2001, there is less ethnic detail available in 1991 migration flows. In 1991, only White, Black, Indian, Pakistani and Bangladeshi (which for our purposes we have renamed ‘South Asian’) and Chinese and Other flows were made available. In 2001, White, Indian, Pakistani and other South Asian, Chinese, Black Caribbean, Black African and any other Black background, Mixed and Other ethnic groups were made available. This limited our analyses to the 1991 ethnic group classifications. The 2001 ethnic groups were aggregated as follows: White and Black remain the same; Indian, Pakistani and other South Asian represent South Asian; Chinese, Mixed and other represent Chinese and Other. Note, the main inconsistency with this aggregation is with the Mixed ethnic group. In 1991, the Mixed ethnic population may exist in any ethnic group category, whereas in 2001, this group has been merged with Chinese and Other. Thus, the comparison of migration over time for this group is likely to be overstated. Although there are techniques to compare 1991 and 2001 Census populations – which for example redistribute the Mixed population across other groups (see Rees, 2004) – these seems to be inappropriate in the case of migration data, in that migrants are a non-random subset of the total population. The second issue concerns the treatment of student migration. In 1991, the addresses of students
283
Analysing Structures of Interregional Migration in England
corresponded with their parents’ usual residence, while in 2001, the addresses were located with their place of study. The effect of this is likely to inflate the number of 2001 student-aged migrants relative to 1991 (see Stillwell and Duke Williams, 2007 for analysis and discussion).
INTERREGIONAL MIGRATION: ANALYSIS Of MAIN EffECTS AND TWO WAY INTERACTIONS In this section, the interregional migration flows are analysed by describing the main effect and two-way interaction components for three-way tables that include age or ethnicity. The aim is to identify regularities or changes in these components over time, as well as the main contributors to the migration patterns. Driving much of the change in patterns over time is the increase in the overall levels (i.e., n+++): interregional migration increased from 630 thousand in 1991 to 920 thousand in 2001.
Interregional Migration Common to both the age and ethnic migration tables are the aggregate interregional migration
flows. The proportions of migration from and to each region are set out in Figure 1. These represent the origin and destination main effect components (Oi and Dj, respectively) described earlier. For the most part, there were no major changes over time, with the minor exceptions of migration from London (decrease) and migration to East Midlands (increase), South East (decrease), South West (decrease) and London (increase). The South East and London sent out about 20% of all migrants each. The South East received the highest number of migrants (over 20%), followed by East of England, South West and London (which received about the same levels). The North East stood out in both cases as the region that sent and received the fewest amount of migrants. The origin-destination interaction components for 1991 and 2001 are set out in Figure 2. These ratios capture the connectivity between regions by comparing the observed level with the expected one. If the ratio is greater than one, the connection between two regions is considered to be relatively strong, whereas the opposite is true for ratios less than one. From the information in Figure 2, we find that the connectivity is stronger between neighbouring regions, for example, between Yorkshire and the Humber and North East which exhibited ratios of 2.8 in 1991 and 3.2 in 2001. This means that
Figure 1. The origin and destination main effects of interregional migration in England, 1991 and 2001
284
Analysing Structures of Interregional Migration in England
Figure 2. Table of origin-destination interaction components of interregional migration in England, 1991 and 2001
the levels of migration between these two regions were about three times the expected value, which is based on the total number leaving Yorkshire and the Humber and the total number going to the North East, and that the connection between these two places increased over time. The connection between London and Yorkshire and the Humber, on the other hand, was relatively weak with ratios of 0.54 in 1991 and 0.48 in 2001. Here, there were roughly one half the numbers of migrants than expected, with the connection between these two places getting slightly weaker over time. Finally, in Figure 2, we find that the connections between the southern regions and the north-
ern regions have gotten weaker over time, while the connections between northern regions got notably stronger. Also, the connections between the Midlands and South West with London got considerably stronger (over 10% increase).
Interregional Migration by Age The proportions of migration in each five-year age group are set out in Figure 2 for the 1991 and 2001 flows. Thus the area under each curve sums to unity. These represent the age main effects of the multiplicative component models for the origin by destination by age tables. Specifically,
285
Analysing Structures of Interregional Migration in England
Figure 3. The age main effect components of interregional migration in England, 1991 and 2001
the proportions represent the Ax term in the following model: nijx = (T )(Oi )(D j )(Ax )(ODij )(OAix )(DAjx )(ODAijx )
(10) Figure 4. The origin-age interaction effects of interregional migration in England, 1991 and 2001, north east, south west and London
286
About 70% of migrants are aged 15-44 years (see Chapter 8 for a more detailed analysis of age-specific rates in 2001). The patterns stayed the same over time for the most part. The exceptions were migrants in the 15-19 year age group which increased and migrants in the 5-9 year and 25-29 year age groups which declined. These changes may have been caused by the differences in the measurement of students in the two censuses. Note: Y axes = proportions; X axis = age group. Next, we consider the OAix and DAjx interaction components. These values are obtained by dividing the observed age profile of migration from each region by the overall age profile of migration, i.e., (ni+x / ni++) / (n++x / n+++) for OAix and by dividing the observed age profile of migration to each region by the overall age profile of migration, i.e., (n+jx / n+j+) / (n++x / n+++) for DAjx. Here, the overall age profile of migration is the expected one, which is set out in Figure 3. Examples of OAix are set out in Figure 4 for migration from the North East, South West and London. The patterns over time are generally stable. For migration from the North East, the age profile resembles the expected one, with the exception of the relatively higher levels exhibited in the 15-19 and 20-24 year age groups. For migration from the South West, the patterns are very similar to the expected one (Figure 3). For London, however, we see that the patterns differed substantially from the overall age profiles. Here, migrants over the age of 30 years (and their chil-
Analysing Structures of Interregional Migration in England
Figure 5. The destination-age interaction effects of interregional migration in England, 1991 and 2001, north east, south west and London
choose South West. As for patterns over time, we see a large increase in the ratio for 15-19 year olds migrating to the North East, a slight increase in elderly migration to the South West and a slight shift to the right in the labour force peak for migration to London.
Interregional Migration by Ethnicity The proportions of migration by each ethnic group are set out in Figure 6 for the 1991 and 2001 flows. These represent the ethnic main effects of the multiplicative component models for the origin by destination by ethnicity tables. Specifically, the proportions represent the Ez term in the following model: nijz = (T )(Oi )(D j )(E z )(ODij )(OEiz )(DE jz )(ODEijz )
(11)
dren) were more likely to leave whereas persons 15-24 years were much less likely to leave. Also, the ratios shifted slightly over time from 1991 to 2001 with lower levels in the 60+ age groups and higher levels in the 0-14 and 30-49 age groups. Examples of DAjx are set out in Figure 5. Migration to the North East generally resembled the overall age pattern found in Figure 3. Migration to the South West and to London, on the other hand, exhibited age patterns of migration that were quite different than the overall age profile. Here, we find that 15-29 year olds were less likely to choose South West and more likely to choose London, while the remaining age groups were less likely to choose London and more likely to
The proportion of White migrants decreased substantially from 95% in 1991 to 90% in 2001. The three ethnic minority groups all experienced substantial increases in their proportions of overall migration, with roughly equal increases across groups (about 80%). The OEiz and DEjz interaction components are set out in Figure 7 for the four ethnic groups. These values are obtained by dividing the observed proportion of ethnic migration from each region by the overall proportion of ethnic migration, i.e., (ni+z / ni++) / (n++z / n+++) for OEiz and by dividing the proportion of ethnic migration to each region by the overall ethnic proportion of migration, i.e., (n+jz / n+j+) / (n++z / n+++) for DEjz. Here, the overall proportions of ethnic migration are the expected ones set out in Figure 6. The OEiz and DEjz patterns over time show some interesting and notable differences between Whites, South Asians, Blacks and Chinese and Other ethnic groups. Note, because the OEiz and DEjz components for White migrants are on average smaller than for the other ethnic groups,
287
Analysing Structures of Interregional Migration in England
Figure 6. Table of the ethnicity main effects of interregional migration in England, 1991 and 2001
the proportions in the Y-axis are represented with a different scale. The proportions of White migrants were lower than expected from/to London and West Midlands, while they were higher than expected from/to North East and the South Figure 7. The origin-ethnicity and destinationethnicity interaction effects of interregional migration in England, 1991 and 2001, four ethnic groups
288
West. Over time, we find increases in the ratios for North East and South West and decreases in the ratios for London. The proportions of South Asian migrants were substantially higher than expected for migration from/to West Midlands and London but substantially lower than expected for migration from/to South West. Over time, the ratios decreased for West Midlands and London (destination only). The proportions of Black migration were much higher than expected from/to London and much lower than expected for most other regions. Interestingly, most of the ratios changed considerably over time, with London losing its attractiveness for these migrants. Finally, Chinese and Other migrants exhibited a different set of patterns, with London being the only origin and destination with higher than expected proportions. In summary, the analysis of the main effect and two-way interaction effects of interregional migration in England by age and ethnicity over time provides several interesting findings. First, connectivity between regions is not uniform across origin and destinations of England, although it is rather stable over time. Second, the age profile of migration has exhibited strong regularities over time, but there are some differences in the age patterns of migrants leaving from or going to each region. Third, the main effects and two-way interactions of ethnicity differ greatly according to each ethnic group. The importance of these structures, however, has not been assessed. Also, we have not examined the three-way interaction terms. This is carried out in the next section by comparing unsaturated loglinear model fits of the patterns over time with the corresponding observed values.
Analysing Structures of Interregional Migration in England
Figure 8. Table of unsaturated log-linear model fits of migration tables cross-classified by origin, destination, age and time
LOG-LINEAR ANALYSES Of MIGRATION fLOW TABLES: 1991 AND 2001 CENSUSES In this section, we analyse unsaturated log-linear models of two four-way tables. These tables represent migration by origin, destination, age and time and migration by origin, destination, ethnicity and time. There are many possible models for four-way tables. We only compare eight selected models, starting with a very simple mutual independence model: 1. 2. 3.
4.
5. 6.
7. 8.
O, D, X, T Our simplest model that assumes mutual independence between variables. OD, X, T Model including only the interaction between origin and destination. OD, OX, DX, T` Model with the interactions between origin and variable X and destination and variable X added to the model above. OD, OX, DX, XT Model with the interaction between the variable X and time added to the model above. ODX, T Model with the three-way interaction between origin, destination and variable X. ODX, XT Model with two-way interaction between variable X and time added to the model above. ODT, X Model with three-way interaction between origin, destination and time. ODT, XT Model with two-way interaction between X and time added to the model above.
The comparison of these models allows us to test the importance of various structures in the tables. For example, we can test whether including a three-way interaction term produces better estimates than a model that only includes two-way interaction terms.
Spatial and Age Structures over Time The unsaturated log-linear models compared in this section focus on the age structures of migration. The results of the eight model fits are set out in Figure 8. These models can be compared by examining the G2 and AIC goodness of fit statistics (see equations 7 and 8, respectively), where values closer to zero implies better fits. These measures are meant to be interpreted relatively. Here, we see that the worst fitting model is the [O, D, A, T] and the best fitting model is [ODA, AT]. However, these statistics do not take into account model complexity. We do this by dividing the G2 statistic by the residual degrees of freedom. For the models set out in Figure 8, there are 2304 observations (i.e., 9 origins * 9 destinations * 16 age groups * 2 time points, less the 288 diagonal elements between origin and destination). So, for example, the [O, D, A, T] model is the simplest with only 33 parameters. This model, not surprisingly, fits poorly. The [ODA, AT] model performs well but requires more than twice the number of parameters than the simplest model. A good compromise appears to be the [OD, OA, DA, AT] model. The AIC does
289
Analysing Structures of Interregional Migration in England
Figure 9. Estimated age patterns of migration between north east and London and south west and London, 2001, a comparison of two unsaturated log-linear models
not appear to penalise as much as the G2 divided by the residual degrees of freedom. To get a better sense of the differences between the two best models, [OD, OA, DA, AT] and [ODA, AT], we compare the predicted values in Figure 9. Here, we see that the differences between the two are negligible. This means that for estimation purposes, one could use the simpler two-interaction model and expect good results. Finally, the reason why [ODT, T] and [ODT, AT] do not perform well is that they do not include the interactions between origin and age or destination and age. The interaction between age and time appears to be important.
Spatial and Ethnic Structures over Time The results of the eight unsaturated log-linear models that include ethnicity are set out in Figure 10. Similar to the models for age, it appears that the two best models are [OD,OE,DE,ET] and [ODE,ET]. The predicted values for South
290
Figure 10. Unsaturated log-linear model fits of migration tables cross-classified by origin, destination, ethnicity and time
Asian migration from North East, South West and London are set out in Figure 11. Again, both models appear to predict similar values and one could probably get away with the simpler two-way interaction model for estimation purposes. Notice that the totals do not match up with the observed. This is because the interaction between origin and time was not included in either model. The reason for not including this term was based on the comparison of the main effects set out in Figure 1. The inclusion of the interaction between ethnicity and time improves the model fit considerably.
CONCLUSION We have explored the age and ethnic structures of interregional migration in England, as measured by the 1991 and 2001 censuses. Despite a large increase in the level of interregional migration, we find that the structures have remained fairly stable over time with the important exceptions of the main effects and two-way interaction terms involving ethnicity. The analyses of these ethnic structures reveal the large differences existing between the four main groups, in terms of levels and spatial patterns. The most important change that has occurred has been the substantial decrease in the proportion of White interregional migration from 95% in 1991 to 90% in 2001. As for the other structures in the three-way migration tables, we find some broad changes over time, notably the increasing connectivity between Northern regions
Analysing Structures of Interregional Migration in England
Figure 11. Estimated south Asian migration flows from the north east, south west and London, 1991 and 2001: A comparison of two unsaturated log-linear models
and the decreasing connectivity between Northern and Southern regions. The analysis of the age components found some important differences in the patterns across regions, as captured by the two-way interactions with origin and destination. The three way interaction terms, however, did not contribute much additional information. For estimation purposes, these terms could be dropped. The approach described in this chapter for analysing migration structures may be generally classified as demographic migration modelling (Stillwell, 2009). Here, the emphasis is on analysing the macro-level structures contained in the data rather than the development of an explanatory-type model (e.g., Fotheringham et al., 2004). Willekens and Baydar (1986) and Rogers et al. (2001; 2002) provide good examples of the demographic modelling approach. In these works, the emphasis is on modelling the generation and distribution components of interregional migration. The generation component captures the relative levels of migration from each region; the distribution component captures the allocation of that level to the various destinations. Our work takes a slightly different approach by focusing
on the main effects and interaction terms, where the interaction terms are expressed in terms of observed to expected ratios. This approach is more in line with standard analyses for categorical data (e.g., Agresti, 2002). In both the generation and distribution model and the multiplicative component model, macro structures exhibit strong stability over time. This means that the modelling of migration may be simplified to focus on the key structures that explain most of the patterns or that capture the most important changes (van Wissen et al., 2009). In conclusion, the results presented in this chapter are important for several reasons. First, this analysis may be used to better understand the factors that drive changes in the aggregate migration patterns over time. Second, these findings may be used to guide the estimation or projection of migration flows in the context of missing or inadequate migration flows. Finally, these findings may be used to develop models for combining migration obtained from different sources with different measurements (Raymer et al., 2007; Raymer et al., 2008).
291
Analysing Structures of Interregional Migration in England
ACKNOWLEDGMENT Support for this research came from the Economic and Social Research Council (RES-000-22-2501) and from the University of Southampton’s Annual ‘Adventures in Research’ Grant, 2006-2007. The data were provided by the Office for National Statistics and are Crown Copyright. The authors take full responsibility for the analyses and interpretations.
REfERENCES Agresti, A. (2002). Categorical Data Analysis. Hoboken, NJ: Wiley. doi:10.1002/0471249688 Bates, J., & Bracken, I. (1982). Estimation of migration profiles in England and Wales. Environment & Planning A, 14, 889–900. doi:10.1068/ a140889 Bates, J., & Bracken, I. (1987). Migration age profiles for local authority areas in England, 19711981. Environment & Planning A, 19, 521–535. doi:10.1068/a190521 Baydar, N. (1984). Issues in multiregional demographic forecasting. Ph.D. Dissertation, Vrije Universiteit Brussel.
ONS. (2004). Origin-Destination Statistics: Local Authorities (CD-ROM). London: Office for National Statistics. Raymer, J., Abel, G., & Smith, P. W. F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of the Royal Statistical Society. Series A (General), 170, 891–908. Raymer, J., Bonaguidi, A., & Valentini, A. (2006). Describing and projecting the age and spatial structures of interregional migration in Italy. Population Space and Place, 12, 371–388. doi:10.1002/psp.414 Raymer, J., & Rogers, A. (2007). Using age and spatial flow structures in the indirect estimation of migration streams. Demography, 44(2), 199–223. doi:10.1353/dem.2007.0016 Raymer, J., Smith, P. W. F., & Giulietti, C. (2008). Combining census and registration data to analyse ethnic migration patterns in England from 1991 to 2007. Paper presented at the European Population Conference, Barcelona, 9-12 July. Rees, P., & Butt, F. (2004). Ethnic change and diversity in England, 1981-2001. Area, 36(2), 174– 186. doi:10.1111/j.0004-0894.2004.00213.x
Congdon, P. (2005). Bayesian Models for Categorical Data. Chichester, UK: Wiley. doi:10.1002/0470092394
Rogers, A., Willekens, F. J., Little, J. S., & Raymer, J. (2002). Describing migration spatial structure. Papers in Regional Science, 81, 29–48. doi:10.1007/s101100100090
Finney, N., & Simpson, L. (2008). Internal migration and ethnic groups: evidence for Britain from the 2001 Census. Population Space and Place, 14, 63–83. doi:10.1002/psp.481
Rogers, A., Willekens, F. J., & Raymer, J. (2001). Modeling interregional migration flows: Continuity and change. Mathematical Population Studies, 9, 231–263. doi:10.1080/08898480109525506
Fotheringham, A. S., Rees, P., Champion, T., Kalogirou, S., & Tremayne, A. R. (2004). The development of a migration model for England and Wales: overview and modelling out-migration. Environment & Planning A, 36, 1633–1672. doi:10.1068/a36136
Rogers, A., Willekens, F. J., & Raymer, J. (2002). Capturing the age and spatial structures of migration. Environment & Planning A, 34, 341–359. doi:10.1068/a33226
292
Analysing Structures of Interregional Migration in England
Rogers, A., Willekens, F. J., & Raymer, J. (2003). Imposing age and spatial structures on inadequate migration flow datasets. The Professional Geographer, 55(1), 56–69. Stillwell, J. (2009). Inter-regional migration modelling: a review. In Poot, J., Waldorf, B., & van Wissen, L. (Eds.), Migration and Human Capital (pp. 29–48). Cheltenham, UK: Edward Elgar. Stillwell, J., & Duke-Williams, O. (2005). Ethnic population distribution, immigration and internal migration in Britain: What evidence of linkage at the district scale? Paper presented at the British Society for Population Studies, University of Kent, Canterbury. Stillwell, J., & Duke-Williams, O. (2007). Understanding the 2001 UK census migration and commuting data: the effect of small cell adjustment and problems of comparison with 1991 flow datasets. Journal of the Royal Statistical Society A, 170(2), 425–445. doi:10.1111/j.1467985X.2006.00458.x
Van Wissen, L., van der Gaag, N., Rees, P., & Stillwell, J. (2009). In search of a modelling strategy for projecting internal migration in European countries: Demographic versus economicgeographical approaches. In Poot, J., Waldorf, B., & van Wissen, L. (Eds.), Migration and Human Capital (pp. 49–74). Cheltenham, UK: Edward Elgar. Willekens, F. J. (1983). Log-linear modelling of spatial interaction. Papers / Regional Science Association. Regional Science Association. Meeting, 52, 187–205. doi:10.1007/BF01944102 Willekens, F. J., & Baydar, N. (1986). Forecasting place-to-place migration with generalized linear models. In Woods, R., & Rees, P. (Eds.), Population structures and models: Developments in spatial demography (pp. 203–245). London: Allen & Unwin.
293
294
Chapter 16
Commuting to School: A New Spatial Interaction Modelling Framework Kirk Harland University of Leeds, UK John Stillwell University of Leeds, UK
ABSTRACT The education sector in England and Wales is becoming increasingly data rich, with the regular collection of the Pupil Level Annual School Census (PLASC) and school preference information, together with the compilation of school performance league tables. However, it is also a rapidly changing environment both in terms of demographic demand as well as policy responses from Government. The latest policy documents require that local education authorities provide fair and equitable admissions policies for all, while at the same time limiting the number of surplus school places. Moreover, funding has to be targeted appropriately in the face of significant changes in the complexion and number of state educated school pupils. Therefore, it is crucial for education planners to be able to interpret the large quantities of data collected each year into valuable intelligence to support planning and decision making. This chapter explores the use of classic spatial interaction models with journey to school data for the purpose of school network planning for the city of Leeds. The limitations associated with the application of spatial interaction models in the education sector will be discussed, and modifications to the computational form will be explored using a genetic algorithm. Spatial interaction models representing pupils from different socio-demographic backgrounds will be calibrated and incorporated into an overarching logic model called the Spatial Education Model (SEM). Finally, the SEM will be used to forecast pupil numbers attending schools in the study area up to the year 2013. DOI: 10.4018/978-1-61520-755-8.ch016
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Commuting to School
THE EDUCATION SECTOR School education is commonly seen in national news headlines as one of the major political debating topics. Changes in policy since the introduction of the 1988 Education Reform Act (ERA) have created a quasi-competitive market within which schools now operate (Harris & Johnston, 2008). The 1988 ERA also devolved much responsibility away from Local Education Authorities (LEAs) to individual schools and the 2007 School Admissions Code incorporated mandatory provisions for the first time to ensure that school oversubscription policies were implemented in a manner so as to not disadvantage any particular section of society. This has produced an environment where LEAs have less direct control over schools but more responsibility to ensure that education provision is ubiquitous in their areas, in the face of an overall declining pupil population (Figure 1). These conditions make effective planning in the education sector crucial and the collection of pupil level data, initiated in 2002 with the first collection of the Pupil Level Annual School Census (PLASC), has increased the volume of information that is available to education planners. As explained in Chapter 1, PLASC contains data about each individual pupil, including their address, free school meal eligibility, age, ethnicity and institution attended. This is an invaluable in-
formation source for education planning, although the data is affected by errors and inconsistencies which researchers and education planners should be aware of (Harland & Stillwell, 2007a; Ewens, 2005). The PLASC is a very large dataset with over 8,000,000 records collected and collated by the Department for Children, Schools and Families (DCSF) on each collection date (Jones and Elias, 2006). An additional dataset of significant value in education research and policy evaluation is pupil preference data. Similar to the PLASC, this dataset contains information on the location of each individual pupil but it also contains information on the preferred schools each individual pupil would like to attend. It provides an insight into the choices made by pupils and their families when selecting a school. Although collected nationally and collated by the DCSF, the dataset is unfortunately not published in its raw form. In compliance with the Information as to Provision of Education (England) Regulations 2008, a report based on the national preference data, showing aggregate school preferences and allocations figures is published on an annual basis (DCSF, 2008). Academically, studies into ethnic segregation in schools have proliferated in the last decade (Gibson & Asthana, 2000a; 2000b; Gorard, 1999; 2000; 2004; Johnston et al., 2004; 2005; 2006). This is at least in part due to the release of the
Figure 1. 2004-based school-age population projections, England. Source: Government Actuary 2004 (GAD, 2004)
295
Commuting to School
PLASC dataset for academic study through the National Pupil Database (NPD) gateway hosted by the University of Bristol and the DCSF. Other studies have concentrated on pupil mobility at non-conventional times of year (Demie, 2002; Dobson et al., 2000; Dobson & Pooley, 2004, Wilson, 2009) and some have undertaken qualitative research studies of school choice (Pooley et al., 2005; Gereluk, 2005). However, few attempts have been made to apply modelling techniques to this data rich sector. Many LEAs use direct extrapolation to produce future school roll forecasts on which to base planning decisions but pupil populations are dynamic (Harland & Stillwell, 2007b). The challenges faced by LEAs to ensure high standards of education are available to an ever-changing pupil population demands more sophisticated simulation and forecasting techniques. PLASC data provide information on each pupil’s home location and pupil preference data facilitates the identification of each school’s relative popularity. It is the resulting interaction between pupils’ homes and schools as a product of their relative locations in space and each school’s relative attractiveness that is of interest in school roll forecasting. This type of interaction data analysis lends itself very well to spatial interaction modelling, which is our adopted approach.
SPATIAL INTERACTION MODELLING THEORY Spatial interaction flows are predicted using the mass of an origin, the mass of a destination and the
relative locations in space. These characteristics are represented in Figure 2 where Oi is the mass of the origin area i or the total number of flows leaving area i; Dj is the mass of the destination j or the total number of flows arriving at destination j; and the difference in the relative locations in space is represented by dij. Spatial interaction models work on aggregate data; this is to say that an origin does not represent an individual but rather an area or zone which has an associated mass, measured by the amount of available funds, goods or people in the zone. Destinations can be the same as origins; for example, if modelling migrations between regions in the UK, the origins would be a list of all regions in the UK, and the destinations would be the same list. The only difference between the origin list and destination list would be the associated origin and destination masses. However, destinations need not necessarily be the same as origins. In our case, the destinations are schools and are identified in Figure 2 as a point, in contrast to the rectangle representing the origin area of pupil residence. A spatial interaction model considers each possible origin-destination pair in turn and calculates the level of interaction (predicted interaction is ..
denoted by, Tij observed interaction by Tij) occurring between each origin area and destination school, given the information provided. To achieve this, each possible origin-destination pair is considered with the relevant information for the origin mass, destination mass and difference in relative locations in space being entered into an appropriate equation. Equation (1) is a typical representation of the spatial interaction model
Figure 2. Diagrammatic representation of the component parts of a spatial interaction model
296
Commuting to School
based on the Newtonian gravitational theory, the gravity model: ..
Tij = kOi D j f (dij )
(1)
Equation (1) has two other elements associated with it, the terms k and f. The first of these is a balancing factor which ensures that the results from the model conform to some information known about the data the model is being applied to, such as the total volume of interaction taking place in the system being modelled. The term f is a function that is used to simulate the decreasing interaction between two locations the further apart they are in space, normally referred to in spatial interaction modelling theory as distance decay.
Using Constraints for Information Capture Deficiencies in gravity model theory were highlighted by Wilson (1967) who noted that if a particular Oi value doubled and a particular Dj value doubled, the resulting interaction between the two did not double, as one would expect; it actually quadrupled. He proposed a family of four models, each one incorporating certain known information about a system, and applying that information in the form of a constraint. Wilson (1971) clarified the notation used when deriving gravity model equations, stating that if the Oi or Dj terms are unknown, then they should be replaced by a proxy term referred to as an attractiveness value. The mathematical symbols Wilson uses for these terms are Wj1 when Oi is unknown and Wj2 when Dj is unknown. Wilson’s family of spatial interaction models includes the unconstrained, more accurately referred to as the total constrained model as an overall constraint is usually applied; the origin or production constrained model; the attraction or destination constrained model; and the double or production-attraction constrained model. In the
context of the journey to school, the origin locations of pupils are known and it is the number of pupils attending each destination school that is of interest. Therefore, an origin constrained model is the most appropriate, the Oi term is known but the Dj term is not, and is therefore substituted Wj2 for. Because the Oi term is known the following constraint is applied: .. i
Tij
0i
(2)
This means that all the simulated flows to destinations for a given origin i must add up to the known origin outflow value Oi. To satisfy this constraint, the overall balancing factor k is replaced by the origin specific balancing factor Ai, and the model equation becomes: ..
T ij
2 AOW f dij i i j
(3)
with the balancing factor Ai calculated as:
Ai
1 2 j
W f dij
j
(4)
further Developments Further developments in the field of spatial interaction modelling include the analogy drawn by Wilson (1970) between the movement of gas particles and the movement of people in space, and his subsequent derivation of the entropy maximisation model. Senior (1979) explains the development of the entropy maximisation model from the gravity model in detail. In summary, an entropy maximising spatial interaction model requires the addition of a constraint relating to the cost of travel, therefore the overall distance that can be travelled in a system must be known. The existing model constraints are applied alongside
297
Commuting to School
the new travel constraint and the entropy maximisation procedure determines the most probable flow distribution from all the possible distributions that satisfy all the constraints (Eyre, 1999). There are distinct disadvantages to this methodological approach for education planning. Entropy maximisation is maximising the uncertainty about micro-level events and behaviour in the data for which the user has no information (Senior, 1979). The entropy maximising procedure will attempt to satisfy as many choice possibilities while remaining within the constraints specified. As stated by Senior (1979, p. 196), when using a basic journey-to-work example, the solution when entropy is maximised “permits the maximum variety of residential choices at the micro level”. The model constraints then limit the resulting distribution to reflect reality. Therefore, it is critical to have significant amounts of observed data about a system to inform the model through the application of constraints. However, in the education sector, inflows to destinations are what education planners are interested in simulating. A robust model facilitating the generation of future projections due to population changes, changes to the school network (school closures, mergers or openings), or a combination of both network and population changes. In this sector, it is known where pupils live but information about school destinations is restricted to attractiveness estimates and the distance travelled constraint will probably be unknown. An alternative approach to spatial interaction modelling is described by Wilson & Bennett (1985), where random utility theory is used to replicate the choice process of individuals. As noted by Eyre (1999), the measurement or estimation of utility and making sure that this measurement or estimation is an appropriate representation of individual behaviour, is a problem with this method. Pooler (1994) derives spatial interaction models using an information minimisation approach. However, as noted by Eyre (1999) and Roy &Thill (2004), the use of prior information in
298
the model to improve performance makes future planning, where network or zone changes occur, highly problematic. Other developments, such as the intervening opportunities model first introduced by Stouffer (1940) and the competing destinations model presented by Fotheringham (1983) have more specific relevance to the retail sector than modelling in the education sector. However, these model definitions prompt an interesting discussion regarding the calculation of proxy attractiveness values for stores in retail modelling, and how these attractiveness values relate to consumer choice. Studies by McCarthy (1980), Fotheringham & Trew (1993) and Oppewal et al. (1997) show how store choice is more complicated than the commonly used attractiveness proxy of store size. Subjective factors relating to how a store is perceived by consumers were shown to have a high level of importance as well as objective factors relating to store characteristics, such as store size. This is of particular interest when considering studies relating to school selection by parents and pupils and what makes a school attractive to ‘education consumers’. Studies by Parsons et al. (2000) and Pooley et al. (2005) both indicate that subjective factors, such as the quality of the head teacher or the range of subjects taught, are important choice drivers for ‘education consumers’.
Spatial Interaction Model Limitations Spatial interaction models derived from the gravity model have been criticised for not having an established theoretical base, but rather being established solely on an analogy with Newton’s Law of Gravitation (Foot, 1981). Wilson (1967; 1970; 1971) produced a theoretically based derivation of the gravity model, in the ground-breaking research that also introduced the concepts of constraints and entropy maximisation. Despite this ‘considerable’ achievement (Gould, 1972), spatial interaction models have still had criticism levelled at their aggregate nature and their
Commuting to School
fundamental inability to accurately represent the choice- making behaviour of individuals (Huff, 1961; Thomas & Huggett, 1980). Attempts to represent different behaviour in commuting to work more accurately were suggested by Wilson (1971) through the disaggregation of the interaction model to represent different modes of travel in a transport example. This produced a three dimensional model with the standard two dimensional Oi by Dj matrix being split over s modes of transport (Figure 3). This disaggregation approach has been adopted to resolve issues of differing consumer demand characteristics, for example, different age or income profiles. It has also been used to represent different supply characteristics such as different supply channels, internet banking or in-branch banking for example (Birkin et al., 2004). Model disaggregation is an approach that will be adopted and expanded upon here, in order to resolve specific issues with journey to school model development for the education sector. Further limitations with using aggregate data for modelling and analysis are issues such as the Modifiable Areal Unit Problem (MAUP), ecological fallacy and distance. The MAUP is a combinatorial problem (Openshaw, 1984) with carrying out analysis and modelling using specific geographical areal units. Openshaw (1996) suggests that the issue will resolve itself once researchers isolate the correct areal unit for studying the problem in hand. The ecological fallacy is an issue relating to the assumptions made about the distribution of a population within an areal unit. The normal assumption is that the population is distributed evenly throughout the zone, but this may not be the case. Additionally, zone classification systems assume that all people in a zone fall into a certain category. In reality, people can differ quite widely by certain characteristics within small geographical distances. The classification may be true for the majority of the population in a zone, but it is very unlikely that every person in a zone is going to fit a specific classification exactly, especially in today’s diverse and dynamic society.
Figure 3. Theoretical representation of a spatial interaction model with s dimensions
The population distribution problem also has implications when considering measurements of distances for use in a spatial interaction model. Should the closest boundary of the destination zone be used for distance calculations or the distance between the geographically weighted centroid of the origin and the destination zone? In this research, because of the nature of the datasets available in the education sector, the pupil population weighted centroid of zones will be calculated using both Euclidean and network distances to evaluate the performance of each distance calculation.
MODELLING IN THE EDUCATION SECTOR Spatial interaction models have been applied in the many different public sector planning and commercial business development contexts. Each application of spatial interaction models has demanded innovation, as each sector has presented its own specific characteristics and peculiarities. Some innovations, such as the layered approach to representing choice sets adopted by Wilson, are applicable in a journey to school context; others
299
Commuting to School
such as competing destinations theory are less so. However, the education sector does have certain peculiarities and characteristics that present their own challenges to developing a simulation model for education supply and demand interactions. •
•
•
•
•
300
School capacities: Schools have a maximum number of pupils that they can accept. The school capacity does not have to be met, although it cannot be exceeded. Over-subscription policy: Prior to entry, parents and pupils are entitled to register their preferences for schools in the area, and LEAs allow a number of selections, usually between three and six. In the case where a school is over-subscribed, an over-subscription policy will be invoked. The guidance for valid over-subscription policies is given in the School Admissions Code (DfES, 2007). Each school can have its own policy or it may implement the overarching policy of the LEA. Additionally, each LEA may have a different over-subscription policy, such as policies based on random allocation better known as ‘lottery admissions’ or Euclidean distances. Admissions criteria: Some schools are selective in the pupils that they admit. Although, selection on pupil performance is contentious, to say the least, some schools, mainly independent schools, do still select pupils based on performance. Other schools impose specific religious criteria. Data rich: The education sector is data rich, with pupil numbers and, more importantly, pupil locations known. However, pupil preference data is not freely available across LEAs. School selection behaviour: When selecting a school, different behaviour is exhibited in the primary school sector to that displayed in the secondary school sector. Additionally, family background and
•
social status may affect the way pupils and their parents perceive and select different schools. Boundary effects: When simulating interaction flows in a specific area, the boundary of this area can influence the results produced by a model, especially for interactions starting or ending near the edge of the study area. However, in the education sector, although PLASC data are available nationally, at the time when education planners undertake school role forecasting projects, the national data may not be available. Additionally, boundaries between different LEAs display different propensities for pupils to travel across them, depending on pupil demand and school provision.
Equation Definition Developing spatial interaction models to meet the specific requirements of the education sector is a demanding task. The equation development stage is the most difficult, and although conventional spatial interaction modelling theory does provide us with a selection of possible solutions to certain problems, it is less straightforward to establish whether the re-implementation of a conventional spatial interaction model is the best representation. Few would argue against spatial interaction models having a firm grounding in theoretical analysis of what they are to represent, but, in an information rich age, and having increasingly powerful computers at the researchers’ disposal, advanced computational techniques can be applied to aid and guide model development. Such ‘model breeders’ or ‘model crunchers’ were the subject of research by Openshaw (1983; 1998) and the focus of a PhD thesis by Diplock (1996). The spatial interaction model development undertaken here will draw on traditional empirical analysis and advanced model-breeding techniques presented by Diplock to help give direction to the model development. Model-breeding should not
Commuting to School
be considered as providing a definitive solution, but rather a tool used hand in hand with traditional analysis techniques to highlight trends in equation performance for further investigation.
from that origin add up to 1. Therefore, another way to write this is:
1 j
Model Constraint Separation The first stage to implementing model-breeding techniques outlined by Diplock (1996; 1998) is to simplify the model equation. An approach to achieving model equation simplification is presented by Openshaw (1998), where he suggests the separation of model equation and model constraint. Openshaw (1998, pp. 1859-1860) shows that an origin constrained spatial interaction model equation (5) can be decomposed into two equations, (7) and (8), and can be applied in two separate stages to give the same output. The distance function has been specified as a negative exponential where β is the parameter to be estimated during calibration. 2 Tˆij = Oi AW i j exp( − β d ij )
(5)
where
Ai =
1 ∑ j W exp(− β dij ) 2 j
(6)
Tij* = W j2 exp(− β dij )
(7)
O Tˆij = *i Tij* Tij
(8)
*
where Tij is the relative flow between i and j. The separation of the constraints from the model equation means that the model equation is greatly simplified, and constraints can be applied to different model forms without altering the constraint calculations. If the example above is considered, the derivation process becomes clear. Equation (6) is simply calculating a balancing factor for each origin i that will make all flows
2 AW exp i j
dij
(9)
The balancing factor alters the flows, which are likely to have no meaning in real terms other than the fact that they are relational to each other, into a set of values which will sum to equal one, without altering their relative importance. This could also be expressed as proportionally fitting the relative flows to sum to one. Therefore, when considering the spatial interaction model equation (5), the terms 2 are AW exp dij calculating the probability i j of a flow occurring between a given origin i and all destination zones j. The calculated probability is then multiplied by the known origin value Oi to provide a realistic predicted flow Tij* . The origin constrained derivation offered by Openshaw (1998) simply calculated the relative .. flows, Tij , between the origins and destinations using equation (7). These flows mean little in real terms except that they are relative to each other. However, through the application of equation (8) the relative flows are proportionally fitted to the known origin values producing a predicted .. flow Tij . Because the Oi term is included in the constraint, it is not required in the actual model equation. Additionally, if the spatial interaction model equation (5) is adjusted, the adjustments have to be reflected in the balancing factor calculation, equation (6). Separating the modelling process into two distinct stages resolves this issue. Changes to the interaction model equation (7) do not require alterations to the constraint equation (8), providing much more flexibility in the model building process. Although Openshaw only provides the derivation of an origin constrained spatial interaction model it is possible to apply this principle of proportionality to the total and destination constrained models.
301
Commuting to School
Equation Definition Using a Genetic Algorithm
a function and an operator. This follows a genetic representation of equations outlined by Openshaw (1998). So, in the gravity model shown in equation (1), the spatial interaction model chromosome is everything to the right of the equals sign and can be broken down into three genes. The first gene is kOi, the second gene is Dj and the third gene is f (dij). Having now identified the chromosome and broken it down into the component genes, the genes have to be encoded so that a genetic algorithm can manipulate the form whilst always producing valid models (Diplock, 1996). Following the process outlined by Openshaw (1998) ensures that valid equations are always built, although whether they are intuitively sensible is another matter. Openshaw states that the component parts of the gene can be encoded into strings, which can be manipulated and then decoded to form valid equations. To encode and decode the genes, four lookup tables can be used (Table 1). The lookup tables presented here are not exhaustive and additional data items and functions could be added as required. Encoding the three genes extracted from the gravity model equation using the lookup tables produces the codes in Table 2. To ensure that valid equations can be built from the genes no matter what the order, each gene must contain one of each of the four component parts: data
Following the work of Openshaw (1998) and Diplock (1996), a genetic algorithm will be used to explore the vast array of possible spatial interaction models that can be applied to the journey to school flow data. But how can a genetic algorithm be employed to manipulate spatial interaction model equations? First of all it is important to understand that “Genetic algorithms are search algorithms based on the mechanics of natural selection and natural genetics. They combine survival of the fittest among string structures with a structured yet randomized information exchange to form a search algorithm with some of the innovative flair of human search.” (Goldberg, 1989, p. 1)
Equation Encoding In order to apply a genetic algorithm to ‘breed’ spatial interaction models, the models must first be thought of in a way that is consistent with genetic structures found in the natural environment. Therefore, each equation can be considered a chromosome which is made up of different genes. Each gene is constructed from four component parts, a data item, a parameter (this could also be considered a constant or a balancing factor), Table 1. Lookup tables for encoding equations Data
Code
Parameter
Code
Function
Operator
Code
Oi
1
none
0
none
+
1
Dj
2
parameter
1
exp
-
2
dij
3
constant
2
log
×
3
∑ 0i
4
1
3
ln
÷
4
∑ Dj
5
-1
4
1 x
∑j dij
6
∑ dij
7
∑i∑j dij
8
302
x2
x xy
Commuting to School
item, parameter, function and operator. The final gene in the gravity model used in this example has no operator in the equation. To ensure that the third gene complies with the rules to produce valid equations, an operator must be incorporated which can be selected at random. In this case it is selected to be the same as all the other operators, 3. When the equation is resolved, the final operator in the equation is ignored. Concatenating these three encoded genes together in the order they would be encountered produces the encoded equation 1,3,1,3,2,1,1,3,3,5,2,3. To enable the formulation of as many equations as possible the encoding procedure also must be able to handle parentheses. This is achieved by assigning each opening and closing parenthesis a specific code, which is handled as a separate case. The rules that apply to parentheses are: i) the gene cannot be altered; and ii) they must form pairs, so each opening parenthesis must have an accompanying closing parenthesis. Additionally, the traditional parameter notation of α and β are not included in the parameter coding lookup. This is simply because it is feasible that equations may contain more than two parameters. Therefore, parameters are applied generically using the no-
Table 2. Encoded gravity model equation Gene
Code
kOi
1,3,1,3
Dj
2,1,1,3
exp(-1dij)
3,5,2,3
tation parn. For example, if an equation contains three parameters, they would be referred to as par1, par2 and par3.
Genetic Algorithm The actual flow control for a genetic algorithm has evolved and simplified since the initial birth of the field of study in the early 1970s by Holland. An overview of the flow control sequence is given by Davis (1991) (Table 3), and this simple flow control has been further developed by Flake (2001), who proposes creating a new empty population at stage four and populating it with the children from the breeding population. The breeding population is retained and, as better performing population members appear, they replace members of the breed population. The flow genetic algorithm runs for a number of ‘generations’, dictating how many times the population will ‘breed’ and therefore evolve. Each generation’s individual population members are assessed for ‘fitness’. In the case presented here, a population member is a spatial interaction model equation and the assessment of the fitness requires the model to be calibrated. The evolutionary process is achieved in the application of mutation and recombination when each generation’s population is created. A diagrammatic representation of these processes is shown in Figure 4. The recombination process shows how genes can be exchanged between different ‘parents’ to produce ‘children’. In much the same way as
Table 3. Overview of a genetic algorithm The Genetic Algorithm 1. Initialize a population of chromosomes. 2. Evaluate each chromosome in the population 3. Create new chromosomes by mating current chromosomes; apply mutation and recombination as the parent chromosomes mate 4. Delete members of the population to make room for the new chromosomes 5. Evaluate the new chromosomes and insert them into the population 6. If time is up, stop and return the best chromosome; if not, go to 3 Source: Davis (1991, p. 5).
303
Commuting to School
Figure 4. Diagrammatic representation of the recombination and mutation processes
observed in nature, the resulting child retains characteristics from both parents. The best performing population members have a higher propensity to breed. Therefore, the successful traits in the population are recombined and passed on to the next generation. As is also observed in nature, mutation of genes can occur although the mutated gene must still be valid. Mutation and recombination are the methods for ensuring ‘survival of the fittest’ chromosomes whilst providing the mechanisms for exploring new valid population configurations (Heppenstall, 2004). Although genetic algorithms do provide a sophisticated way of searching a universe of possible problem solutions, there are a number of parameters which can influence the efficiency and effectiveness of the search. The mutation frequency is a difficult value to arrive at. If the mutation rate is set too high, most of the child chromosomes will have undergone some mutation, which can lead to good parental characteristics being lost. On the other hand, if the mutation rate is too low, a convergence will occur too quickly and not enough of the search space will be covered. These problems also persist when considering the size of the breed population, too large and the ‘survival of the fittest’ process is hampered and the optimal solution may not be found; too small and convergence will be too soon with little of the search space being covered (Davis, 1991). Additionally, different methods of selecting
304
the breed pool and the propensity for breed pool members to recombine can influence the outcome from the genetic algorithm. Therefore, the mutation rate, breed pool size, breed pool selection method and mating propensity are all parameters that are influential in the effective execution of the genetic algorithm. However, as noted by Heppenstall (2004), little guidance is given on the most appropriate selection methods, and initial parameter values. On reflection, this is not surprising, as each application of a genetic algorithm will have very different dimensions in terms of population, search space and known information, affecting the application of a genetic algorithm to a specific problem. Therefore, a good selection of parameters in one genetic algorithm application may not be appropriate in another situation. Figure 5 shows a process flow for a genetic algorithm for searching different model equations. The initial population is created at the start. Each model equation is calibrated with the equation configuration, best fit parameters and goodness of fit (GOF) statistics being stored in the inner loop. When all the population equations have been run, the generation status is checked to see if more generations are required. If this is the final specified generation, the final results are stored and the process will end. The equations are scanned to ensure that new equations are being created, and the GOF statistics are assessed to evaluate if a user specified limit has been reached. If new
Commuting to School
Figure 5. Genetic algorithm process flow
equations are not being created, the process will store the final results and then terminate because new ground in the search space is not being covered. If a predefined limit in the GOF statistic has been reached or surpassed, the process will store the final results and terminate, because a model consistent with the user’s specification has been generated. If more generations are required, and effective searching is continuing, a breed population is selected in the outer loop. The breed population is used to create a new generation of equations, applying the principles of recombination and mutation, and the equation evaluation process in the inner loop begins again. A problem with search algorithms of this kind is that they can possibly re-cross search ground already covered, and therefore waste resources and time. A similar duplication issue is addressed by
Davis (1991), where he utilises a recombination method that discards duplicate population members. The process of calibrating and assessing a model equation is time consuming. Therefore, the algorithm proposed here, as proposed by Davis, uses a scanning routine to ensure that duplicate population members are not retested. Furthermore, by keeping a record of all model equations assessed, the algorithm does not waste time retesting equations that have been produced and assessed in preceding generations. An additional benefit to this scanning technique is that a convergence state can be assessed for each generation and if the algorithm ceases to cover new search ground, this is recognised and the process is terminated, reporting results to that point. Calibration of model equations is achieved using a semi-heuristic search algorithm that is independent of the number of parameters contained in the model equation. The user enters an upper and lower limit for the parameters and the maximum number of decimal places required. Every parameter combination is tested on the first iteration using an increment of one. The best performing parameter values are used to adjust the upper and lower limits of each individual parameter in the model equation (the best performing value is used as the mid-point, with the upper limit being the mid-point plus one incremental value and the lower limit being the mid-point minus one incremental value). The new parameter space is searched using a new incremental value equal to the old incremental value divided by ten. This process is executed iteratively until the best performing parameter values are found to the desired number of decimal places.
THE SPATIAL EDUCATION MODEL Model Structure Disaggregation of spatial interaction models into layers has been successfully used to simulate the
305
Commuting to School
Table 4. Summary of ‘super group’ categories in NCCOA Group
Name
Key characteristics
% of pupils
1
Blue Collar Communities
Terraced housing/routine, semi-routine employment
20.2
2
City Living
Born outside UK/no central heating
2.6
3
Countryside
2+ cars/work from home/detached housing
2.1
4
Prospering Suburbs
2+ cars/detached housing
22.3
5
Constrained by Circumstance
All flats/lone parent households/ routine, semi-routine employment/ unemployment/public renting
17.4
6
Typical Traits
Terraced housing/work part-time
21.2
7
Multicultural
Born outside UK/public transport to work/students/no central heating/ ethnic minority groups
14.2
different choice sets displayed by population groups in previous modelling projects (Wilson, 1971; Birkin et al., 2004), and have proven to be robust (Roy & Thill, 2004). A layered model approach will be adopted here to represent selection behaviour displayed by pupils and parents from different socio-economic backgrounds. Each layer will represent a different category of family which exhibits different choice behaviour. The National Classification for Census Output Areas (NCCOA) (Vickers et al., 2006) will be used to disaggregate the pupil population. The NCCOA contains seven super group categories shown in Table 4. Each layer is run inside of an overarching logic model, which in turn imposes supply-side characteristics on each of the spatial interaction model layers. This ensures that destination schools do not exceed their maximum pupil capacities. The model structure has been named the spatial education model (SEM).
Spatial Education Model In the SEM diagram (Figure 4), there are five different data sources used. The data sources data and network are used solely to read required data from; results is used to write simulation results for each individual model layer; config stores configuration files for each individual layer; and multi-layer stores the configuration files for the overarching SEM and the overall results for the
306
model. This multiple data source approach has been adopted simply to enforce a logical structure on the data and to minimise the chances of data loss through user error. The number of data sources required is at the users discretion and, indeed, a single data source could be used to perform all the tasks outlined above. Each of the model layers represents an individual origin constrained spatial interaction model, calibrated for a specific pupil group. The SEM controls the running of each individual spatial interaction model layer, and although Figure 6 shows seven layers, layers can be added or removed at the user’s discretion. Figure 7 shows the process flow for the SEM. The model starts and runs the first layer. The Figure 6. Concept diagram of the spatial education model structure
Commuting to School
Figure 7. Spatial education model process flow
there is enough capacity in the model to meet the demand. If there is not, a failure notification is printed to the screen informing the user to check the model inputs, and the model exits. If there is enough capacity, the layer is run again and the process repeated until all the schools meet their maximum capacity constraints. Once the maximum capacity constraint is satisfied for the current layer, the layer results are added to the results from previous layers. If there are more layers to run, the model moves to the next layer and then starts the layer run process again. When the last layer has been processed, the model checks whether a geographical aggregation has been requested by the user. If it has, the aggregation is performed and the results saved to the user’s specified location. If no aggregation is requested, the model saves the results out to the specified location and ends. Building a SEM has four broad processes: data preparation; configuration and calibration; model evaluation; and forecasting. Each of these four broad processes involved with SEM development is considered in depth for the secondary education phase.
Data Preparation
maximum capacity constraint is checked and if it is to be applied, the number of children allocated to each school is assessed. If any of the schools exceed the maximum capacity, the school with the greatest over-subscription has the relevant oversubscription rules applied. Children that meet the rules are removed from the model numbers and the school attractiveness value is set to zero, effectively removing the school from future runs because it is full. The next stage assesses whether
The SEM requires several datasets to enable model construction, these are: pupil preference data; PLASC data; school information including identification number, location and net capacities, and a geodemographic database or classification system to inform disaggregation of the pupil population. Each dataset must be analysed for errors and cleaned to ensure consistency and accuracy within the data. Error analysis and data cleaning is an involved process which is reported in Harland & Stillwell (2007a). Therefore, this stage will not be considered in any more depth than to say that school information, preference and PLASC datasets were made available for the years 2004/05 and 2005/06 by Education Leeds. The NCCOA introduced
307
Commuting to School
previously will be used to disaggregate pupil information. Additional to Euclidean distances, network distances between each pupil location and school have been calculated using ESRI’s ArcMap Network Analyst and Ordnance Survey’s MasterMap Integrated Transport Network.
Configuration and Calibration The first stage in the configuration process is to identify appropriate spatial interaction model equations to represent each of the pupil categories. Applying the model-breeding process as a tool to examine different aspects of the secondary education sector in the Leeds study area revealed that school attractiveness was not linear for most NCCOA groups in the preference dataset. The square root of the mean square error (SRMSE) is used as the calibration statistic and absolute entropy difference (AED) and R2 are additional goodness of fit (GOF) measures (see Knudsen & Fotheringham, 1986, for further discussion). Only NCCOA group two, City Living, performed better using a linear attractiveness value, the remaining six NCCOA groups produced better GOF measures when applying an exponential transformation to the school attractiveness values. Figure 8. Example transformation curves
308
The use of an exponential transformation on school attractiveness values does make sense when the shape of the curve is considered. Figure 8 shows the difference between an exponential transformation and an equivalent power curve. The exponential transformation is shallower until the value on the x axis reaches between six and seven and then it becomes very steep (the x axis value is the initial school attractiveness value). Considering this shape in terms of school attractiveness, schools that are relatively equal in attractiveness terms will compete mainly on distance deterrence. However, schools that are very popular will have their attractiveness inflated enabling the distance deterrence to be overcome and pupils attracted to popular schools past less popular schools. This is a process that is observed in the secondary phase of education in the study area, where over half of secondary school pupils prefer to attend a school other than the one closest to them. An alternative distance deterrence function to the standard negative exponential, a modified gamma function Tanner’s function, was observed in the model-breeding. Tanner’s distance deterrence function is capable of representing more complex distance deterrence profiles than the smooth decreasing profile produces by a negative exponential or power function (Openshaw, 1998). Both distance deterrence functions have been applied to Euclidean and network distances to assess the performance of both the functions and the distance calculation. The final model configuration and calibrations for the seven models used in the secondary education phase are shown in Table 5. The SRMSE values for each of the spatial interaction model layers are greater than one. This is due to the size of the origin-destination matrix and the high number of zero values within it. In the 2004/05 preference data, there are 2,504 possible origin Output Areas (OAs) and 41 possible destination schools giving a matrix size of 102,664 cells with 8,953 pupils. In 2005/06, there are 2,411 possible origin OAs and 40 possible
Commuting to School
destination schools giving a matrix size of 96,440 cells with 8,094 pupils. Therefore, the resulting interaction matrices are sparse resulting in SRMSE values that exceed the usual upper limit of one because dividing the sum or errors squared by the matrix dimensions results in a decimal value. The square root of this decimal value is calculated which increases the value rather than decreasing it and produces an effect where the error value is greater than the observations divided by the matrix dimensions making the upper value of the SRMSE statistic greater than one (Knudsen & Fotheringham, 1986).
results for the SEM outputs are good with SRMSE values of 3.09 and 3.05 respectively and R2 of 0.69 and AED 0.02 for both years. This indicates a good overall model fit. On closer inspection, the GOF statistics for each individual layer indicate that layers representing the NCCOA groups run later in the model have been detrimentally affected by their position in the running order. Each layer in the SEM shows an increase in performance against the calibration statistics (Table 4), except for the first layer to execute, NCCOA group three, and the last layer to execute, NCCOA group seven. These two layers show degraded performance in all GOF statistics in respect to the calibration statistics (Table 6). In reality, not all pupils are allocated their first preference of school. NCCOA group three receives the highest first preference percentage at 90% when averaged across 2004/05 and 2005/06 preference data. The SEM allows all pupils in this group to be allocated to the most attractive schools to the pupils, effectively providing a 100% allocation to first preference, clearly unrealistic. By the time the SEM reaches the final layer, NCCOA group seven, most schools will have reached their maximum capacities in the model and, therefore, pupils from this group will be allocated to destination school that have space left available. To improve the inequalities observed in the SEM, each of the individual NCCOA layers is fur-
Model Evaluation The calibrated individual model layers created in the calibration stage above are now adjusted to use the PLASC dataset as the origin data. The SEM runs each individual layer in the order that it is entered into the model therefore the order of layers is important. To decide on the appropriate execution order the proportion of pupils from each NCCOA group allocated to attend the school they expressed as their first preference is examined. This analysis provides an execution order of NCCOA group three, six, four, one, two, five and finally seven. Table 5 shows the results from the initial run of the SEM for 2004/05 and 2005/06. The overall
Table 5. Summary of final model layer configuration and calibration NCCOA Group
Equation
dij
Calibration parameters
GOF statistics
par1
par2
par3
SRMSE
R2
AED
1
exp(dipar1)exp(dijpar2)
Euc
1.01
-1.40
-
3.07
0.69
0.03
2
Djpar1exp(dijpar2)
Euc
0.45
-0.67
-
5.22
0.28
0.22
3
exp(D j par1 ) exp dij par2 dij
Euc
1.59
-0.60
-2.15
2.92
0.81
0.03
4
exp(Djpar1)exp(dijpar2)
Euc
0.84
-1.13
-
3.36
0.69
0.07
Net
0.96
-1.10
0.43
3.52
0.58
0.08
Euc
0.99
-1.28
-
3.71
0.63
0.06
Euc
0.70
-1.91
-1.33
3.58
0.50
0.25
par3
5
exp(D j par1 ) exp dij par2 d
6
exp(Djpar1)exp(dijpar2)
7
exp(D j par1 ) exp dij par2 dij
par3 ij
par3
309
Commuting to School
Table 6. Goodness of fit statistics for the initial SEM Layer
2004/05 SRMSE
2005/06 R
2
AED
SRMSE
R2
AED
0.02
3.05
0.69
0.02
Overall SEM performance All years
3.09
0.69
Performance of individual layers 3
3.40
0.76
0.29
3.48
0.77
0.32
6
2.98
0.71
0.21
2.94
0.72
0.22
4
2.70
0.79
0.17
2.65
0.79
0.17
1
2.55
0.74
0.06
2.51
0.75
0.06
2
4.13
0.51
0.02
4.02
0.53
0.00
5
3.08
0.66
0.17
3.06
0.65
0.15
7
3.88
0.45
0.40
3.81
0.48
0.41
Table 7. Goodness of fit for 14 layer SEM Layer
2004/05 SRMSE
2005/06 R
2
AED
SRMSE
R2
AED
0.14
2.82
0.72
0.12
Overall SEM performance All Layers
2.83
0.72
Performance of individual layers 3
3.44
0.76
0.19
3.53
0.76
0.23
6
2.84
0.74
0.18
2.86
0.73
0.17
4
2.73
0.78
0.11
2.70
0.78
0.11
1
2.54
0.74
0.11
2.51
0.75
0.09
2
4.09
0.52
0.08
4.10
0.52
0.06
5
2.67
0.70
0.20
2.68
0.69
0.16
7
2.89
0.58
0.10
2.88
0.58
0.09
ther disaggregated into two, an a and b layer, with an appropriate percentage of pupils in each layer. The order of layer execution remains the same, but all the a layers are executed first, followed by all of the b layers. After SEM execution, the NCCOA component a and b layers are aggregated back together. Different disaggregation thresholds were experimented with, but the best performing disaggregation proportion was found to be 75% of pupils from each group in the a layers and 25% of pupils from each group in the b layers.
310
The results from the 14 layer SEM are shown in Table 7. The overall model performance has improved and the performance of the individual layers is much more balanced with all layers, except NCCOA group 3, showing improved performance over the initial calibration statistics. Figure 9 shows the simulated school roll figures plotted against the observed school roll figures for 2004/05 and 2005/06. All school roll simulations in 2004/05 and all but one in 2005/06 are within 20% of the observed values. Additionally, only
Commuting to School
six out of 41 secondary schools in 2004/05 and five (including the school mentioned above) out of 40 secondary schools in 2005/06 are inconsistent with observed values by 10% or more. This indicates that the 14 layer SEM is a good simulation of reality and can therefore be used to forecast possible future scenarios.
Figure 9. Simulated versus observed school roll numbers
Scenario Forecasting One possible future scenario suitable for examination is to explore what would happen if all primary school children in the study attended secondary schools in the study area as their school career progressed? Assumptions made in this scenario are that: • •
•
no pupils are lost to independent schools or across borders to neighbouring LEAs; no pupils are gained from independent schools or across borders from neighbouring LEAs; and the school network and school capacities remains consistent over the time period.
These assumptions allow for a consistent progression of primary pupils attending school in the Leeds study area through their school career until 2013. An initial exploration of pupil numbers for the scenario outlined above shows a decline of 3,164, from 41,650 in 2006 to 38,486 in 2013. This 7.6% fall in the secondary school population is a significant reduction. Figure 10 shows that the largest secondary schools in Leeds have approximately 1,500 pupils on roll. Therefore, putting a decline of these proportions into the context of school rolls this would equate to the loss of two large secondary schools in the area. Replacing the current secondary school population with the predicted secondary school population in 2013, projected school roll counts can be obtained from the SEM. Using the results
from the SEM for 2013 and the observed school rolls for 2006 the net balance can be produced for each school. The resulting net balances and pupil population changes can be mapped to examine the extent and impact of the expected decline (Figure 10). The geographical distribution of the schools with significantly declining school rolls shows a bias towards the western border of the LEA. There are a number of reasons why this bias may be observed. The first is that the major urban area is located closer to the western border and is much more densely populated than the eastern side of the LEA, thus reflecting greater change in the pupil population. The second possibility is that the western border is more porous and pupils are more likely to travel across this border from neighbouring LEAs to attend school in the study area. Therefore, it is possible that some of the more pronounced decreases in school rolls could be averted if more secondary school pupils from further within the neighbouring LEAs are attracted to attend school in the study area.
311
Commuting to School
Figure 10. Map of pupil population change and SEM school roll forecasts between 2006 and 2013
CONCLUSION This research has shown that, following Openshaw and Diplock, genetic algorithms can be successfully applied as a tool to aid in the exploration and construction of spatial interaction model equations. Furthermore, a spatial interaction model equation and constraint equation can be, and indeed should be, considered as two distinct stages. Changes to a modelling equation should not result in changes to the constraint equation. The architecture of the SEM, with an overarching logic model controlling the execution of individual origin constrained spatial interaction models, provides a good modelling framework for education planning. On the supply side of the modelling architecture, the SEM framework provides a mechanism to impose individual destination capacity constraints across multiple spatial interaction model layers and over-subscription criteria for each destination school. On the demand side, the disaggregation of the SEM into multiple layers facilitates the representation of different choices made by families from different socio-economic backgrounds. Furthermore, choice set simulation is enhanced using pupil preference data to calibrate each of the individual spatial interaction model layers.
312
Evaluating the SEM against observed PLASC data for 2004/05 and 2005/06 shows that the model outputs provide a good representation of reality. Using the SEM to forecast possible future school rolls shows that the declining pupil population in the study area will have the most severe impact on secondary schools in the west of the LEA. However, the propensity of pupils to cross the western border of the LEA to attend school means that these figures may be subject to distortions due to the artificial border effect that has been imposed. This border affect demonstrates a limitation of the process demonstrated here. Artificial borders imposed across LEA boundaries that are porous to pupils can affect the reliability of school roll projections for schools in close proximity to such borders. Although, it can be argued that deficiencies in preference data availability and not the modelling methodology is the problem here. If both preference data and PLASC data were made available for an LEA and the surrounding area it would be possible to simulate cross border flows effectively and gain accurate projections for schools close to porous LEA boundaries. Indeed, the significant projected decreases in boundary school rolls highlights the requirement for close
Commuting to School
communication during the education planning process between neighbouring LEAs and the need for planning models capable of transcending LEA boundaries to support a cross-border planning process.
REfERENCES Birkin, M., Clarke, G., Clarke, M., & Culf, R. (2004). Using Spatial Models to Solve Difficult Retail Location Problems (pp. 35–54). Chichester, UK: John Wiley and Sons. Davis, L. (1991). Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold. DCSF. (2008, March 3). Secondary Applications and Offers – National Offer Day. Department for Children, Schools and Families, London. Retrieved from http://www.dfes.gov.uk/ rsgateway/ DB/STA/t000791/AdmissionsStatisticalReportNon-nationalStatsRevised.pdf Demie, F. (2002). Pupil mobility and education in schools: an empirical analysis. Educational Research, 44(2), 197–215. doi:10.1080/00131880210135304 DfES. (2007). School Admissions Code. London: The Stationary Office. Diplock, G. J. (1996). The Application of Evolutionary Computing Techniques to Spatial Interaction Modelling. Ph.D. Thesis, School of Geography, University of Leeds, Leeds. Diplock, G. J. (1998). Building new spatial interaction models by using genetic programming and a supercomputer. Environment & Planning A, 30(10), 1893–1904. doi:10.1068/a301893 Dobson, J., Henthorne, K., & Lynas, Z. (2000). Pupil Mobility in Schools (Final Report). Tech. rep. London: Department of Geography, University College London.
Dobson, J., & Pooley, C. E. (2004). Mobility, Equality, Diversity: A Study of Pupil Mobility in the Secondary School System. London: Technical Report, Department of Geography, University College London. ERA. (1988). Education Reform Act. Her Majesty’s Stationary Office and Queen’s Printer of Acts of Parliament. Ewens, D. (2005). The National and London Pupil Datasets: An introductory briefing for researchers and research users. DMAG Briefing 2005/8. London: Data Management and Analysis Group, Greater London Authority, City Hall. Eyre, H. (1999). Measuring the Performance of Spatial Interaction Models in Practice. Ph.D. Thesis, School of Geography, University of Leeds, Leeds. Flake, G. W. (2001). The Computational Beauty of Nature: Computer Explorations of Fractals, Chaos, Complex Systems and Adaption (4th ed.). Cambridge, MA: MIT Press. Foot, D. (1981). Operational Urban Models. London: Methuen. Fotheringham, A. S. (1983). A new set of spatial interaction models: the theory of competing destinations. Environment & Planning A, 15(1), 15–36. doi:10.1068/a150015 Fotheringham, A. S., & Trew, R. (1993). Chain image and store-choice modelling: the effects of income and race. Environment & Planning A, 25, 179–196. doi:10.1068/a250179 GAD. (2004). Population projections by the Government Actuary, 2004-based Principal Projection for England. Retrieved from http:// www.gad.gov.uk/Demography_Data/Population/ Index.asp? y=2004&v=Principal&dataCountry= england&chkDataTable=yy_singyear&subTable =Perform+search
313
Commuting to School
Gereluk, D. (2005). Communities in a changing educational environment. British Journal of Educational Studies, 53(1), 4–18. doi:10.1111/j.14678527.2005.00280.x Gibson, A., & Asthana, S. (2000a). Local markets and the polarization of public-sector schools in England and Wales. Transactions of the Institute of British Geographers, 25(3), 303–319. doi:10.1111/j.0020-2754.2000.00303.x Gibson, A., & Asthana, S. (2000b). What’s in a number? Commentary on Gorard and Fitz’s Investigating the determinants of segregation between schools. Research Papers in Education, 15(2), 133–153. doi:10.1080/026715200402461 Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimisation and Machine Learning. Reading, MA: Addison Wesley. Gorard, S. (1999). ‘Well. That about wraps it up for school choice research’: a state of the art review. School Leadership & Management, 19, 25–47. doi:10.1080/13632439969320 Gorard, S. (2000). Here we go again: a reply to ‘what’s in a number?’ by Gibson and Asthana. Research Papers in Education, 15(2), 155–162. doi:10.1080/026715200402470 Gorard, S. (2004). Comments on ‘Modelling social segregation’ by Goldstein and Noden. Oxford Review of Education, 30(3), 435–440. doi:10.1080/0305498042000260520 Gould, P. (1972). Pedagogic Review. Annals of the Association of American Geographers. Association of American Geographers, 62(4), 689–700. doi:10.1111/j.1467-8306.1972.tb00896.x Harland, K., & Stillwell, J. (2007a). Commuting to School in Leeds: How useful is the PLASC? Working Paper 07/02, School of Geography, University of Leeds, Leeds. Harland, K., & Stillwell, J. (2007b). Evidence of ethnic minority dispersal in Leeds. The Yorkshire and Humber Regional Review, 17(2), 21–23. 314
Harris, R., & Johnston, R. (2008). Primary schools, markets and choice: studying polarization and the core catchment. Applied Spatial Analysis and Policy, 1(1), 59–84. doi:10.1007/s12061008-9002-8 Heppenstall, A. J. (2004). Application of Hybrid Intelligent Agents to Modelling a Dynamic, Locally Interacting Retail Market. Unpublished PhD Thesis, School of Geography, University of Leeds, Leeds. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press. Huff, D. L. (1961). A note on the limitations of intra-urban gravity models. Land Economics, 38, 64–66. doi:10.2307/3144725 Johnston, R., Burgess, S., Wilson, D., & Harris, R. (2006). School and residential ethnic segregation: an analysis of variation across England’s Local Education Authorities. Regional Studies, 40(9), 973–990. doi:10.1080/00343400601047390 Johnston, R., Wilson, D., & Burgess, S. (2004). School segregation in multiethnic England. Ethnicities, 4(2), 237–265. doi:10.1177/1468796804042605 Johnston, R., Wilson, D., & Burgess, S. (2005). England’s multiethnic educational system? A classification of secondary schools. Environment & Planning A, 37, 45–62. doi:10.1068/a36298 Jones, P., & Elias, P. (2006). Administrative data as a research resource: a selected audit. Economic & Social Research Council Regional Review Board Report 43/06. Warwick Institute for Employment Research, University of Warwick, Kenilworth. Knudsen, D. C., & Fotheringham, A. S. (1982). Matrix comparison, goodness of fit, and spatial interaction modelling. International Regional Science Review, 10, 127–148. doi:10.1177/016001768601000203
Commuting to School
McCarthy, P. S. (1980). A study of the importance of generalised attributes in shopping choice behaviour. Environment & Planning A, 12, 1269–1286. doi:10.1068/a121269 Openshaw, S. (1983). From data crunching to model crunching: the dawn of a new era. Environment & Planning A, 15(8), 1011–1012. Openshaw, S. (1984). The Modifiable Areal Unit Problem. Norwich, MA: Geo Books. Openshaw, S. (1996). Developing GIS-relevant zone-based spatial analysis methods. In Longley, P., & Batty, M. (Eds.), Spatial Analysis: Modelling in a GIS Environment (pp. 55–73). Cambridge, UK: GeoInformation International. Openshaw, S. (1998). Neural network, genetic, and fuzzy logic models of spatial interaction. Environment & Planning A, 30(10), 1857–1872. doi:10.1068/a301857 Oppewal, H., Timmermans, H. J. P., & Louviere, J. J. (1997). Modelling the effects of shopping centre size and store variety on consumer choice behaviour. Environment & Planning A, 29, 1073–1090. doi:10.1068/a291073 Parsons, E., Chalkley, B., & Jones, A. (2000). School catchments and pupil movements: a case study in parental choice. Educational Studies, 26, 33–48. doi:10.1080/03055690097727 Pooler, J. (1994a). An extended family of spatial interaction models. Progress in Human Geography, 18, 17–39. doi:10.1177/030913259401800102 Pooley, C., Turnbull, J., & Adams, M. (2005). The journey to school in Britain since the 1940s: continuity and change. Area, 37(1), 43–53. doi:10.1111/j.1475-4762.2005.00605.x Roy, J., & Thill, J.-C. (2004). Spatial interaction modelling. Papers in Regional Science, 83(1), 339–361. doi:10.1007/s10110-003-0189-4
Senior, M. L. (1979). From gravity modelling to entropy maximizing: a pedagogic guide. Progress in Human Geography, 3(2), 175–210. Stouffer, S. A. (1940). Intervening opportunities: a theory relating mobility and distance. American Sociological Review, 5, 845–867. doi:10.2307/2084520 Thomas, R. W., & Huggett, R. J. (1980). Modelling in Geography: A Mathematical Approach. London: Harper and Row. Vickers, D., Rees, P., & Birkin, M. (2005). Creating the National classification of Census Output Areas: data, methods and results. Working Paper 05/02, School of Geography, University of Leeds, Leeds. Wilson, A. G. (1967). A statistical theory of spatial distribution models. Journal of Transportation Research, 1, 253–269. doi:10.1016/00411647(67)90035-4 Wilson, A. G. (1970). Entropy in Urban and Regional Modelling. London: Pion. Wilson, A. G. (1971). A family of spatial interaction models, and associated developments. Environment and Planning, 3, 1–32. doi:10.1068/ a030001 Wilson, A. G., & Bennett, R. J. (1985). Mathematical Methods in Human Geography and Planning. Chichester: John Wiley & Sons. Wilson, J. (2009). Exploring the dimensions of school change during primary education in England. In Stillwell, J., Coast, E., & Kneale, D. (Eds.), Fertility, Living Arrangements, Care and Mobility Understanding Population Trends and Processes (Vol. 1, pp. 211–237). Dordrecht, The Netherlands: Springer. doi:10.1007/978-1-40209682-2_11
315
316
Appendix
Interaction Data: Classroom Activities Adam Dennett University of Leeds, UK
ABSTRACT This appendix contains a set of three activities that allow you to gain some familiarity with handling migration and commuting data and producing flow maps with three different GIS/mapping systems: MapInfo/MapBasic, Flowmap and Postgis. In all of the exercises you will be using data obtained from the Web-based Interface to Census Interaction Data (WICID). To access WICID, go to the CIDER website: http://cider.census.ac.uk/, click on the ‘WICID’ logo and follow the login instructions. If you have not already done so you will need to register with the Census Portal before being able to access the data. You may also need to agree to some special conditions to access all features on the site.
ACTIVITY 1: fLOW MAPPING Of STUDENT MIGRATIONS IN ENGLAND In this exercise you will select an area of England, download the flows of students to this area from all other areas and produce a flow map representing the volume of these flows. To complete this exercise fully, you will need to have MapInfo installed on your computer.
Part 1: Data Extraction from WICID 1.1
Log into WICID. Click on continue and you should be taken to the general query interface to start building your query. 1.2 Start by clicking on the Data tab. When selecting data in this exercise, we will Select by dataset and table > Migration data > 2000-01, and use the 2001 SMS level 1 dataset > Table 5 – Economic Activity by Sex. 1.3 In this table, you should select the Total economically active full-time students as well
Copyright © 2010, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Appendix
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
as the Total economically inactive students (cells 22 and 31). Once ticked, click the Add selected cells button. At this point, we are going to derive a new variable from the two just selected. Click the Derive new variables button, select both items and continue. Give your new aggregate variable the name ‘Total students’ and Remove the old variables from the list of selected items before clicking continue. You are now ready to select your origin and destination geographies. Firstly, click on Geography > Select or edit origins to select the origin geography you wish to use. For this task, choose List selection and then select CIDS 1991/2001 common geography - ‘100 zones’. Confirm that you wish to use this geography and then click the Add all areas button as we are going to examine the numbers of students migrating from all 100 zones to our zone of choice. Now you will need to select a destination zone. Click select Destinations > List selection and Confirm that you wish to proceed using the same geography. Now you may select any destination that interests you. Once you have selected your destination zone by ticking the box next to it, click Add chosen areas to add the area to your destination selection. At this point you may wish to click on the Query button at the very top of the page to view the summary of your current query. You should see that 100 origins are selected along with the one destination you have chosen. There should also be a green light on the Geography tab, indicating that you can proceed with data extraction. If you are happy with your query, you should click the Run tab to extract your data and then Continue to output pages selecting Tabular output.
1.12 On the tabular output plan page, you should select Origin-destination pair list as the output layout and Comma separated values as the output format. Click on Preview output and download file to download your data selection. Call your file ‘StudentMigrations. csv’ before clicking the button to download the file to a specified directory of your choice. 1.13 Once downloaded, open the file and delete the first four rows replacing the headings in the first row above the data with the words ‘Origin’, ‘Destination’ and ‘Total students’. Scroll to the bottom of your file and delete the notes and source information as it is not required. Save the file as an excel (.xls) file and close it.
Part 2: Mapping the Information Using MapInfo 2.1 Start MapInfo and Open the file ‘UK100Zones.TAB’ which can be found here: http://cider.census.ac.uk/cider/training/ uk100zones.zip. Also open your ‘StudentMigrations.xls’ file and save a copy in MapInfo as ‘StudentMigrations.TAB’ (in newer versions of MapInfo the .TAB file is created automatically when you open the excel file, so you may not have to save a copy).
Obtaining Zone Centroid Coordinates 2.2 For this mapping exercise you will need to find the centroid of each of the 100 Zones. To do this, you will need to select Tools > Coordinate extractor > Extract coordinates (if the coordinate extractor is not already listed, you will need to go into the Tool manager and tick the box next to coordinate extractor to add it to your list). 2.3 You will need to extract the coordinates from the UK100Zone table. Check that the projection is set to ‘British National Grid’
317
Appendix
and you should click the Create new columns to hold coordinates… button and call the new table columns ‘OriginX’ and ‘OriginY’. Once you click OK, you should be able to view the table with the X and Y coordinates created for each of the 100 Zones.
Adding the Attribute Data to the Boundary Data 2.4 To combine your attribute and boundary data into one mappable table, you will need to use the SQL select function in MapInfo. Go to Query > SQL Select. When the module opens you should select all columns (*) from the tables StudentMigrations and UK100Zones where condition column StudentMigrations. origin = column UK100Zones.Cname. When you click OK your two tables should now be joined together into a single table. 2.5 Save the query as a new table called ‘StudentMigrationFlows.TAB’ and then close all tables.
Creating Destination Zone Coordinates 2.6 Open your new StudentMigrationFlows. TAB file. Also open the UK100Zones file. To begin with, you will need to add two new columns to your StudentMigrationFlows table. To do this, go to Table > Maintenance > Table Structure and select the StudentMigrationFlows table. Add two fields named DestX and DestY of the type ‘Float.’ Click OK and then re-open the table. You will see that two new empty columns have been added to the end of your table. 2.7 To create the destination coordinates in these columns, do the following: Select Table > Update column. Firstly, update your new DestX column, and get the value from your UK100Zones table. You will need to join your tables where destination in your StudentMigrationFlows table matches
318
Cname in the UK100Zones table. In this case, you are copying the value of OriginX from where your destination appears in the origin column, so select calculate the value of OriginX, before clicking OK. You should now see your DestX column updated with the X coordinates from your selected destination. To fill your DestY column with coordinates, follow the previous step substituting OriginX for OriginY where appropriate. Once done, save your table.
Mapping the Migration Flows 2.8 Your StudentMigrationFlows table should now contain the UK100Zones boundaries, the data you wish to map and the coordinates of the zone centroids for both the origins and destination you want to map. You now will need to make a copy of StudentMigrationFlows, and call it something like ‘StudentMigrationFlows1’. To do this, use the Save copy as… option in the file menu. You are doing this as it is this version of the table you are going to make the flow lines from. 2.9 Close all tables and open your new StudentMigrationFlows1 table. Go to the Options menu and select Show MapBasic Window. Once the MapBasic window has opened, to create the flow lines, type the following: Update StudentMigrationFlows1 set obj = CreateLine(OriginX, OriginY, DestX, DestY) 2.10 Once you hit the return button, you should see lines created from the centroids of each of the 100Zones origins to the centroid of your chosen destination. You may find that the map window remains blank. If this occurs, it is because the Map Window settings are incorrect. You can solve this problem by going to the Options menu > Preferences > Map Window and setting both the Table
Appendix
projection and Session Projection to British Coordinate Systems and British National Grid. Then run your MapBasic command again and the flows should appear in the Map Window. 2.11 Currently you will only see the lines displayed on your map. To display the UK100Zones boundaries with your map, you should reopen either the StudentMigrationFlows table or the UK100Zones table. Doing this should automatically add the zone boundaries to your map with flows lines. 2.12 This next stage is to produce a thematic map so that the flow lines are displayed of differing thickness depending on the flow size. Go to the Map menu and select Cre-
ate thematic map. You should choose Line Ranges, Varying Width style. In step 2, select StudentMigrationFlows1 and choose a variable you wish to see mapped – in this case your only choice is likely to be ‘Total students’. Click Next to go to step 3 where you can customise your map (for example you may wish to alter the colour of some of the lines to add additional clarity to your map, or change the legend). Click OK and a flow map will appear which allows easy interpretation of from where students are migrating from, to your chosen area. Figure A.1 shows the distribution of student flows to Leeds from 100 zones in the rest of England in 2000-01.
Figure A.1. An example flow map showing student flows to Leeds from 100 zones areas in England
319
Appendix
ACTIVITY 2: ANALYSING REGIONAL MIGRATION fLOWS IN BRITAIN USING fLOWMAP SOfTWARE Activity 1 made use of proprietary software in the form of MapInfo to analyse flow data. We recognise that not every user of interaction data has access to commercially developed, GIS packages which are sometimes expensive. Therefore the next two activities make use of freely available software to analyse flow data. In this activity, we make use of Flowmap – a GIS package designed specifically to handle data related to origin and destination pairs. Flowmap can be downloaded from: http://flowmap.geog. uu.nl/. Download and install Flowmap on your computer if you wish to attempt this activity.
Part 1: Data Extraction from WICID
1.4
1.5 1.6 1.7
For this exercise you can choose any migration variable you wish. 1.1
Log into WICID. Click on continue and you should be taken to the general query interface to start building your query. 1.2 Start by clicking on the Data tab. When selecting data in this exercise you can try the Select by variable option. This facility allows you to select variables that fall under any of the categories listed. By clicking on one of the categories, the list will be expanded to include all of the datasets and tables that feature variables in this category. 1.3 You will see that datasets are identified by the census years they are associated with, as well as by an SMS, SWS or STS identifier, indicating whether the data are for migration or commuting. In this exercise, we will be using post-1999 Government Office Regions as our geography, so choose a 2001 data set.
320
1.8
1.9
1.11
We will also be looking at migration flows, so choose an ‘SMS’ set (although if you wanted to look at inter-regional commuting, this would be possible), remembering that level 1 data are likely to be more accurate than level 2 or 3 data. Click the table you wish to take data from. Once you are on the variable selection screen for your chosen table, choose one variable you wish to analyse (or alternatively a new variable derived from a number of variables – see Activity 1 for instructions on deriving new variables). Once you have selected your variable, click Add select cells and then click the Geography tab to begin selecting your origins and destinations. Firstly, click Select or edit origins to select the origin geography you wish to use. For this task choose List selection and then select UK Government Office Regions (1999-) Confirm that you wish to use this geography and then tick areas 1-11 (Government Office Regions in England plus Scotland and Wales) before clicking the Add chosen areas button. Now you will need to select destination zones. Click select Destinations > Copy selection and click the Origins -> Destinations button to select your destinations to equal your origins. At this point you may wish to click on the Query button at the very top of the page to view the summary of your current query. You should see that 11 origins are selected along with the same 11 destinations. Your data item will also be shown. There should now be a green light on the Geography tab, indicating that you can proceed with data extraction. If you are happy with your query, you should click the Run tab to extract your data and then Continue to output pages selecting tabular output.
Appendix
1.12 On the tabular output plan page you should select Origin-destination pair list as the output layout and comma separated values as the output format. Click on Preview output and download to download your data selection. Call your file ‘RegFlow2.csv’ before clicking the button to download the file to a specified directory of your choice. 1.13 Once downloaded, open the file and delete the first four rows replacing the headings in the first row above the data with the word ‘label1’, ‘label2’ and ‘score’. Scroll to the bottom of your file and delete the notes and source information as it is not required. 1.14 You should now save your file in .dbf format (dbf III if you have a choice) with the same file name (e.g. RegFlow2.dbf).
Notes (i)
(ii)
If you are using Excel 2007, you will not be able to save in .dbf format. To get around this you will need to save your data as an Excel (.xls or .xlsx) file, and open this file in MS Access. In Access you will be able to save or export the file as a .dbf. For Flowmap to recognise the .dbf data file, the file name must be in an exact format. It MUST consist of 7 characters – no more, no less – and end in ‘2.dbf’. Any file name not in this precise format will not be recognised by the program.
Part 2: Mapping the Data in flowmap Preparing the Files 2.1 First you will need to place a number of files into a project directory for Flowmap to read. These additional files can be found on the CIDER website at: http://cider.census.ac.uk/ cider/training/training.php under Additional data for use with Flowmap practical.
2.2 Download these files and unzip them into a directory of your choice (preferably the default C:\Program Files\FLOWMAP directory, but any will suffice). The unzipped file should contain two files: GB_Rgion.BNA and Region01.DBF. To the directory where these two files are now located, also add the RegFlow2.dbf file you created in the last section. 2.3 Start Flowmap. The first thing that you will need to do is create a map file from the .BNA import file you have just unzipped. Go to File > Convert Files > BNA > BNA -> Flowmap and locate the GB_Rgion.BNA file. Flowmap should then give you a default file name for the new map file. This will probably be GB_Rgion.006. Click Save. It may ask you to select vector type. Choose Polygon. It may also find some topological errors in the file. This is not a problem for this exercise and you should just choose the Continue regardless option. 2.4 Select File > New Project. The New Project window (Figure A.2) will open. You might need to change your workspace directory so it matches the directory you have just unzipped the additional files into. 2.5 You should now add the files you have created and downloaded to the project. As above, the map file will be GB_Rgion.006; the origin and destination files will be Region01. dbf; and the flow file will be RegFlow2.dbf (with origin set as LABEL1, destination as LABEL2 and flow size as SCORE). You should also tick the three boxes under View Settings and click the Set button. Click Save as and save your project.
Mapping the Data 2.6 Now a map of your flow data can be created. To draw the base map go to Maps > Advanced Display > Draw Map File > Draw Edges/Lines > Uniform Drawing and select
321
Appendix
Figure A.2. New Project window in Flowmap
an appropriate line style. This should draw your base map. 2.7 Before drawing the flow lines, it is useful to adjust the symbology so that larger flows are more clearly distinguished from smaller flows. To do this go to File > Edit Symbology Settings > Customise Flow Symbology settings and click on one of the triangle symbols. Set the thickness to Max and click Ok. 2.8 To draw the flows go to Maps > Advanced Display > Draw Desire Lines from Flow File and select the symbol you have just adjusted. You may also wish to Set intrazonal interaction as symbols to display intra-zonal flows. Your finished flowmap should resemble something similar to the example in Figure A.3.
322
ACTIVITY 3: DISPLAYING INTERACTION DATA USING OPEN SOURCE DATABASE AND GIS SOfTWARE: MAPPING THE PREfERRED METHOD Of TRAVEL TO WORK fOR OUTPUT AREAS IN LONDON In this activity, you will be introduced to two very powerful pieces of free (open source) software: PostgreSQL and Quantum GIS. PostgreSQL is a relational database management system which importantly has a spatial extension known as PostGIS. For this exercise you will need to install PostgreSQL with the PostGIS extension. PostgreSQL along with PostGIS can be downloaded and installed from the PostgreSQL website: http:// www.postgresql.org/ Note: When you install the program, if you have not used it before it is recommended that you
Appendix
Figure A.3. Intra- and inter-region flows depicted using Flowmap
database tables. QGIS can be downloaded from http://www.qgis.org/ - the version used in this example is version 0.11.0. Newer versions of the software may now be available.
Part 1: Downloading your Data from WICID and UKBORDERS This section will go through, in detail, the process of selecting output area boundary data from UKBORDERS for all output areas in London. It will also describe extracting from WICID, 2001 commuting data by all methods of transport (with car passenger and car driver aggregated into one variable) from all output areas as an origin and the London region as a destination (downloaded as a paired list. If this is a familiar process, attempt it yourself and move straight to Part 2, otherwise follow these instructions: 1.1
install to the host localhost, on the default port 5432, with a username postgres and a password postgres. Quantum GIS (QGIS) is a user friendly, graphical interfaced Geographical Information System developed by a community of open source developers. QGIS supports vector, raster and database formats and, importantly for this exercise, has direct support for PostGIS spatially enabled
Customised boundary data downloads can take some time from UKBORDERS, so we will go through this process first in order to get the boundaries by the end of Part 1. Go to the UKBORDERS website http://edina. ac.uk/ukborders/ and log in. 1.2 Choose the Boundary Data Selector and select England > Census boundaries > Post 1999 and click Find. Click English output areas > List areas click Greater London > Expand. You now see that the search summary has your target geography as English Output Areas and the selected area as Greater London. Check that the data format is ESRI Shapefile. If it is not, use the Format tab at the top of the page to change the format. Click Extract Boundary Data. You will now be taken to the data extraction screen. It is normal that because of the large number of boundaries being extracted you will be taken to the UKBORDERS bookmarking facility. Bookmark this page in your browser and return to it at the end of Part 1.
323
Appendix
1.3 Log into WICID. Click on continue and you should be taken to the general query interface to start building your query. 1.4 Start by clicking on the Data tab. When selecting data in this exercise, we will Select by dataset and table > Commuting data > 2001, and use the 2001 SWS level 3 dataset > Table 1 – Method of travel to work. 1.5 We want to select all persons and all methods of travel, so check boxes, 4, 7, 10, 13….34. Once checked, click the Add selected cells button. The status at the top of the page should inform you that eleven data items have been selected. Two of these items (car driver and car passenger) will need to be aggregated into a single item, however, before we continue. 1.6 At the bottom of the page click on the Derive new variables link. This will take you to a new page where the two variables can be selected. Tick the boxes next to car passenger and car driver and click Continue. Call your new variable ‘car’ and check the option to remove the old variables from the list before clicking Continue. 1.7 You are now ready to select you geographies. Click the Geography tab and select origins first. Choose List selection > All geographies > UK Output Areas 2001 > Select a higher level geography > UK Government Office Regions (1999-) > Confirm that you wish to proceed with this geography. Now, by ticking London and clicking the Add chosen areas button you will select all 24,140 output areas within London. 1.8 You can now select your destination. Click Select destinations > List selection > All geographies > UK Government Office Regions (1999-) > Confirm and check London before clicking Add chosen areas. This should select just the one area, London, as your destination. 1.9 You can now click the Run tab to execute your query. Once the query is complete select
324
Continue > Tabular output. On the tabular output planner page make sure that your output layout is a pair list and the format is Comma separated values. Download your file and save it to an appropriate location. 1.10 If you return to your UKBORDERS bookmark, your data should now have extracted and you should be able to download the zip file containing your boundary data. Download this to the same location as your commuting data file.
Part 2: Preparing the Data for Mapping Preparing Your Data from the Raw Download File 2.1 Open the .csv file you downloaded from WICID. Column A should contain the codes of the output area origins, B will just contain ‘London’ as a destination and columns C to L should contain counts of commuters by their mode of travel for each output area. 2.2 We now want to convert this data so that for each output area, we know the most popular method of travel to work. To do this we need the column reference of the method of travel with the highest count for each output area. Assuming that C5 is the first cell with numeric data type the following formula into cell M5: =IF(C5=MAX($C5:$L5),COLUMN(0,0) This formula returns a numeric column reference for the maximum value in the data between cells C5 and L5. The $ symbols fix the range from C to L enabling the formula to be copied across and down to other cells. 2.3 Copy this formula to each cell until cell V5. You should find that each cell contains a value 0 except for T5 which contains ‘20’. Highlight this new row of values from M5
Appendix
to V5. Double click the small black square that appears in the bottom right corner of the square and the formulas should be copied all the way down to the last row of data. 2.4 The next empty cell you have should be cell W5. In this cell type the following formula: =MAX(M5:V5) In the same way you copied the last formula down all rows by clicking on the small black square in the corner of the cell, copy this formula to all rows in column W. Column W will now contain a reference to the most popular method of travel to work for all output areas in London. You might want to make a note of the method each number relates to e.g. 13 – Home worker, 14 – Underground et cetera. 2.5 Open up a new excel spreadsheet and copy and paste the output area reference codes values from column W into columns A and B in the new spreadsheet. Do not include a header row. You should now have values in every row down to 24140. Go to Save As and save a copy of the file as Text (Tab delimited) (*.txt) under the name london_commuting. txt (click ‘yes’ if any pop-ups appear).
Adding your Data to PostGIS / PostgreSQL 2.6 You now need to set up a PostGIS enabled PostgreSQL database to store this data in. Go to pgAdmin to start Postgres and double-click on the localhost server you set up when you installed the program. You may be prompted for a password – if you set up the installation as recommended at the beginning of this exercise, this will be postgres. You will now need to set up a new database on this localhost server. Click Edit
> New object > New database. Call your database commuting but make sure you use postgis as a template. Leave the encoding as SQL_ASCII.
2.7 After a moment pgAdmin will have created a new database into which you will be able to put both the interaction data you have just prepared and the boundary data downloaded earlier. Click on commuting and then the + symbol. Next + Schemas + Public + Tables - this should reveal the six tables including a table called geometry_columns. If this table exists then your database is spatially enabled. 2.8 We will now copy the london_commuting. txt data into a table called london_commuting in the commuting database. Click the SQL icon in pgAdmin to open up an SQL window. In the box type the following:
CREATE TABLE london _ commuting (
);
orig _ label text,
method _ travel integer
COPY london _ commuting FROM ‘C:/london _ commuting.txt’;
Now click the green play button to run the query. Note: You will probably need to change the drive and the path to london_commuting.txt to the location it is stored on your computer. If any error messages appear, check that the original london_commuting.txt file only has two columns of data and there are no column headers in this file. If you click the refresh button in pgAdmin you will see you new table has appeared. Right clicking on this table you should be able to View data > View all rows to see the table.
325
Appendix
Adding your Boundary Data 2.9 We can now add the boundary data downloaded from UKBORDERS to our new commuting PostGIS database. If you have not already done so, un-zip the file downloaded from UKBORDERS. Start Quantum GIS. 2.10 Once QGIS has started, go to the Plugins > Plugin manager and check the box next to SPIT and click OK. You are now able to click on the SPIT icon in QGIS to bring up the box shown in Figure A.4. Click New and create a new PostGIS connection using the options shown in Figure A.4. 2.11 Once you have created a connection to your PostGIS database you will need to click the Connect button to set up SPIT for importing the shapefile. After clicking Connect, click Add and navigate to the shapefile containing the boundaries of output areas in London. SPIT will scan the file and then add it to the selection window. Click OK to import the
Figure A.4. Shapefile to Postgis Import Tool in QGIS
326
file to your commuting PostGIS database. Note: This may take a few minutes. 2.12 If you click the refresh button in pgAdmin you should now see your new boundary data table. This table can be mapped in QGIS, but at the moment your london_commuting data cannot. To map this data you will need to create a new table which combines the two tables together using a common identifier – in this case the output area code. The quick way to do this is to input the following into a new SQL window: create table london _ commuting _ mappable WITH OIDS AS select
*
from
land _ oa _ 2001
london _ commuting,
eng-
where london _ commuting.orig _ label = england _ oa _ 2001.ons _ label;
Click the run button. If you refresh the tables list in pgAdmin, your new table should appear.
Appendix
Note: The WITH OIDS is key to this new table as QGIS will not be able to map a table without an index.
Part 3: Mapping your Data 3.1
Now you have set up your data, you are able to produce a map using QGIS. Go to Layer > Add a PostGIS layer. A connection to commuting should have already been set up when you imported the shapefile in the last section. Select this database from the dropdown list and click connect. Click on the london_commuting_mappable layer and then click Add. A plain map of all output areas in London should appear.
3.2 To create a thematic map of your data, rightclick the layer in the Legend window and select Properties. A box similar to the one in Figure A.5 should appear. In the Symbology tab change legend type to Unique value and classification field to Method_travel. Click Classify and the seven unique values relating to the seven types of transport most popular for output areas should appear. Hold down Shift and highlight all seven before changing Outline style in the Style options to No Pen. Click Apply and OK and your thematic map should now be drawn. 3.3 You can now edit your map and legend – add additional layers et cetera so that an end product similar to the one shown in Figure A.6 can be produced.
Figure A.5. Layer Properties window in QGIS
327
Appendix
Figure A.6. Preferred method of travel by output area in London, 2001
328
329
Compilation of References
(1992). InStillwell, J. C. H., Rees, P. H., & Boden, P. (Eds.). Migration Processes and Patterns: Vol. 2. Population Redistribution in the United Kingdom. London: Belhaven Press. (1995). InHills, J. (Ed.). Inquiry into Income and Wealth: Vol. 2. A Survey of the Evidence. York, UK: Joseph Rowntree Foundation. Agresti, A. (2002). Categorical Data Analysis. Chichester, UK: John Wiley. doi:10.1002/0471249688 Al-Hamad, A., Hayes, L., & Flowerdew, R. (1997). Migration of the elderly to join existing households, evidence from the Household SAR. Environment & Planning A, 29(7), 1243–1255. doi:10.1068/a291243 Allison, P. D. (1978). Measures of inequality. American Sociological Review, 43, 865–881. doi:10.2307/2094626 Alvanides, S., Boyle, P. J., Duke-Williams, O., Openshaw, S., & Turton, I. (1996). Modelling migration in England and Wales at the ward level and the problem of estimating inter-ward distances, Geocomputation Conference, Leeds University, Leeds, September 17-19. Atkins, D., Charlton, M., Dorling, D., & Wymer, C. (1993). Connecting the 1981 and 1991 Censuses, NERRL Research Report 93/9, University of Newcastle: Newcastle-upon-Tyne.
Barclay, P. (Chair). (1995). Inquiry into Income and Wealth, Volume 1: Report. York, UK: Joseph Rowntree Foundation. Bates, J., & Bracken, I. (1982). Estimation of migration profiles in England and Wales. Environment & Planning A, 14(7), 889–900. doi:10.1068/a140889 Bates, J., & Bracken, I. (1987). Migration age profiles for local authority areas in England, 1971-1981. Environment & Planning A, 19, 521–535. doi:10.1068/a190521 Bauere, V., Densham, P., Millar, J., & Salt, J. (2007). Migration from Central and Eastern Europe: local geographies. Population Trends, 129, 7–20. Baxter, M. (1982). Similarities in methods of estimating spatial interaction models. Geographical Analysis, 14, 267–272. Baydar, N. (1983). Analysis of the temporal stability of migration in the context of multi-regional forecasting. Working Paper No. 38, Netherlands Interuniversity Demographic Institute, Voorburg. Baydar, N. (1984). Issues in multiregional demographic forecasting. Ph.D. Dissertation, Vrije Universiteit Brussel.
Bailey, N., & Livingston, M. (2007). Population turnover and area deprivation. York, UK: Joseph Rowntree Foundation.
Bell, M., Blake, M., Boyle, P., Duke-Williams, O., Rees, P., Stillwell, J., & Hugo, G. (2002). Cross-national comparison of internal migration: issues and measures. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 165, 435–464. doi:10.1111/1467-985X.00247
Bailey, T. C., & Gatrell, A. C. (1995). Interactive Spatial Data Analysis. Harlow, UK: Longman.
Bell, M., Rees, P., Blake, M., & Duke-Williams, O. (1999). An age-period-cohort database of inter-regional migra-
Copyright © 2010, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Compilation of References
tion in Australia and Britain, 1976-96. Working Paper 99/02, School of Geography, University of Leeds. Bentham, G. (1988). Migration and morbidity: implications for geographical studies of disease. Social Science & Medicine, 26, 49–54. doi:10.1016/0277-9536(88)900445 Bibby, P., & Shepherd, J. (2005). Developing a New Classification of Urban and Rural Areas for Policy Purposes: The Methodology. Final Report to DEFRA, DEFRA, London. Birkin, M., Clarke, G., Clarke, M., & Culf, R. (2004). Using Spatial Models to Solve Difficult Retail Location Problems (pp. 35–54). Chichester, UK: John Wiley and Sons. Blackwell, L., Akinwale, B., Antonatos, A., & Haskey, J. (2005). Opportunities for new research using the post-2001 ONS Longitudinal Study. Population Trends, 121, 8–16. Boden, P. (1989). The analysis of internal migration in the United Kingdom using Census and National Health Service Central Register data. Unpublished PhD Thesis, University of Leeds, Leeds.
Bover, O. A. M. (2002). Learning about migration decisions from the migrants, Using complementary datasets to model intra-regional migrations in Spain. Journal of Population Economics, 15(2), 357–380. doi:10.1007/ s001480100066 Boyle, P. (1995). Public housing as a barrier to longdistance migration. International Journal of Population Geography, 1, 147–164. Boyle, P. (1998). Migration and housing tenure in South East England. Environment & Planning A, 30, 855–866. doi:10.1068/a300855 Boyle, P. (2004). Population geography: migration and inequalities in mortality and morbidity. Progress in Human Geography, 28(6), 767–776. doi:10.1191/0309132504ph518pr Boyle, P. J. (1991). A theoretical and empirical examination of local-level migration: the case of Hereford and Worcester, Unpublished PhD Thesis, Lancaster University, Lancaster. Boyle, P. J. (1993). Modelling the relationship between migration and tenure. Transactions of the Institute of British Geographers, 18, 359–376. doi:10.2307/622465
Boden, P., & Stillwell, J. (2006). New migrant labour in Yorkshire and the Humber. The Regional Review for Yorkshire and the Humber, 16(3), 18–20.
Boyle, P. J. (1994). Metropolitan out-migration in England and Wales 1980-81. Urban Studies (Edinburgh, Scotland), 31, 1707–1722. doi:10.1080/00420989420081591
Boden, P., Stillwell, J., & Rees, P. (1992). How good are the NHSCR data? In Stillwell, J., Rees, P. H., & Boden, P. (Eds.), Migration Processes and Patterns Voume 2 Population Redistribution in the United Kingdom (pp. 13–27). London: Belhaven Press.
Boyle, P. J. (1995). Rural in-migration in England and Wales, 1980-81. Journal of Rural Studies, 11, 65–78. doi:10.1016/0743-0167(94)00058-H
Bogue, D. (1959). Internal migration. In Hauser, P., & Duncan, O. D. (Eds.), The Study of Population (pp. 486–509). Chicago: University of Chicago Press. Bohara, A. K., & Krieg, R. G. (1996). A zero-inflated Poisson model of migration frequency. International Regional Science Review, 19, 211–222. Boundary Committee for England. (2008). Draft Proposal for Unitary Local Government in Devon. London: The Boundary Committee for England.
330
Boyle, P. J., Duke-Williams, O., & Gatrell, A. (2001). Do area-level population change, deprivation and variations in deprivation affect self-reported limiting long-term illness? An individual analysis. Social Science & Medicine, 53, 795–799. doi:10.1016/S0277-9536(00)00373-7 Boyle, P., & Feng, Z. (2002). A method for integrating the 1981 and 1991 GB Census interaction data. Computers, Environment and Urban Systems, 26, 241–256. doi:10.1016/S0198-9715(01)00043-6 Boyle, P., & Flowerdew, R. (1993). Modelling sparse interaction matrices: interward migration in Hereford and
Compilation of References
Worcester and the underdispersion problem. Environment & Planning A, 25, 1201–1209. doi:10.1068/a251201
Bulusu, L. (1990). Internal migration in the United Kingdom, 1989. Population Trends, 62, 33–36.
Boyle, P., & Flowerdew, R. (1997). Improving distance estimates between areal units in migration models. Geographical Analysis, 29, 93–107.
Bulusu, L. (1991). A Review of Migration Data Sources. OPCS Occasional Paper 39. London: Office of Population Censuses and Surveys.
Boyle, P., & Shen, J. (1997). Public housing and migration: a multi-level modelling approach. International Journal of Population Geography, 3, 227–242. doi:10.1002/(SICI)1099-1220(199709)3:33.0.CO;2-W
Bulusu, L. (1991). Review of migration data sources. OPCS Occasional Paper 39. London, OPCS.
Boyle, P., Feijten, P., Feng, Z., Hattersley, L., Huang, Z., Nolan, J., & Raab, G. (2008). Cohort Profile: The Scottish Longitudinal Study (SLS). International Journal of Epidemiology, 2008, 1–8. Boyle, P., Halfacree, K., & Robinson, V. (1998). Exploring Contemporary Migration. Harlow, UK: Longman. Boyle, P., Norman, P., & Rees, P. (2002). Does migration exaggerate the relationship between deprivation and limiting long-term illness? A Scottish analysis. Social Science & Medicine, 55, 21–31. doi:10.1016/S02779536(01)00217-9 Bramley, G., Pawson, H., & Third, H. (2000). Low Demand and Unpopular Housing. London: Department of the Environment, Transport and the Regions. Brewer, M., Goodman, A., Shaw, J., & Sibieta, L. (2006). Poverty and Inequality in Britain 2006. London: Institute of Fiscal Studies. Brimblecombe, N., Dorling, D., & Shaw, M. (2000). Migration and geographical inequalities in health in Britain. Social Science & Medicine, 50, 861–878. doi:10.1016/ S0277-9536(99)00371-8 Buck, N., Gershuny, J., Rose, D., & Scott, J. (Eds.). (1994). Changing Households. The British Households Panel Survey 1990-1992. Colchester: ESRC Centre for Micro-Social Change. Bulusu, L. (1989). Migration in 1988. Population Trends, 58, 33–39.
Cadwallader, M. (1989). A synthesis of macro and micro approaches to explaining migration: evidence from inter-state migration in the United States. Geografiska Annaler B, 2, 85–94. doi:10.2307/490517 Cameron, A. C., & Trivedi, P. K. (1998). Regression Analysis of Count Data. Cambridge, UK: Cambridge University Press. Carstairs, V., & Morris, R. (1989). Deprivation and mortality: an alternative to social class? Community Medicine, 11, 210–219. Cattan, N. (2001). Functional Regions: A Summary of Definitions and Usage in OECD Countries, OECD (DT/ TDPC/TI(2001)6), Paris. Champion, A. (2005b). Population movement within the UK. In R. Chappell (Ed.), Focus on People and Migration 2005 Ed. (pp. 91-113). Basingstoke, UK: Palgrave Macmillan. Champion, A. G. (1994). Population change and migration in Britain since 1981: evidence for continuing deconcentration. Environment & Planning A, 10, 1501–1520. doi:10.1068/a261501 Champion, A. G. (1995). Analysis of change through time. In Openshaw, S. (Ed.), Census Users’ Handbook (pp. 7–35). Cambridge, UK: GeoInformation International. Champion, A. G. (1996). Population review: (3) migration into, from and within the United Kingdom. Population Trends, 83, 5–19. Champion, A. G. (2005). Population movement within the UK. In Chappell, R. (Ed.), Focus on People and Migration (pp. 92–114). Basingstoke, UK: Palgrave Macmillan.
331
Compilation of References
Champion, A. G. (Ed.). (1989). Counterurbanisation: The Changing Pace and Nature of Population Deconcentration. London: Edward Arnold. Champion, A. G., & Coombes, M. (2007). Using the 2001 census to study human capital movements affecting Britain’s larger cities: insights and issues. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 170(2), 1–20. doi:10.1111/j.1467-985X.2006.00459.x Champion, A. G., Coombes, M., Raybould, S., & Wymer, C. (2007). Migration and Socio-economic Change: A 2001 Census Analysis of Britain’s Larger Cities. York, UK: Joseph Rowntree Foundation. Champion, A. G., Fotheringham, A. S., Rees, P., Boyle, P. H., & Stillwell, J. C. H. (1998). The Determinants of Migration Flows in England: a Review of Existing Data and Evidence. Department of Geography, University of Newcastle, for the Department of Environment, Transport and the Regions. Champion, A. G., Fotheringham, A. S., Rees, P., Boyle, P. J., & Stillwell, J. C. H. (1998). The Determinants of Migration Flows in England: A Review of Existing Data and Evidence. University of Newcastle upon Tyne, Newcastle upon Tyne. Champion, A., & Coombes, M. (2007). Using the 2001 census to study human capital movements affecting larger cities: insights and issues. Journal of the Royal Statistical Society A, 170(2), 447–467. doi:10.1111/j.1467985X.2006.00459.x Champion, A., Bramley, G., Fotheringham, A., Macgill, J., & Rees, P. (2003). A migration modelling system to support government decision making. In Geertman, S., & Stillwell, J. (Eds.), Planning Support Systems in Practice (pp. 269–290). Heidelberg: Springer. Champion, A., Coombes, M., Raybould, S., & Wymer, C. (2007). Migration and Socioeconomic Change A 2001 Census Analysis of Britain’s Larger Cities. York, UK: Joseph Rowntree Foundation. Champion, A., Fotheringham, S., Boyle, P., Rees, P., & Stillwell, J. (1998). The Determinants of Migration Flows in England: A Review of Existing Data and Evidence. London: DETR. 332
Champion, T. (2001). Urbanisation, suburbanisation, counterurbanisation and reurbanisation. In Paddison, R. (Ed.), Handbook of Urban Studies (pp. 143–161). London: Sage Publications. Champion, T., & Coombes, M. (2007). Using the 2001 census to study the human capital movements affecting Britain’s larger cities: insights and issues. Journal of the Royal Statistical Society A, 170, 447–467. doi:10.1111/ j.1467-985X.2006.00459.x Champion, T., & Fisher, T. (2004). Migration, residential preferences and the changing environment of cities. In Boddy, M., & Parkinson, M. (Eds.), City Matters: Competitiveness, Cohesion and Urban Governance (pp. 111–128). Bristol, UK: Policy Press. Champion, T., Coombes, M., Raybould, S., & Wymer, C. (2007). Migration and Socio-economic Change: A 2001 Census Analysis of Britain’s Larger Cities. Bristol, UK: Policy Press. Champion, T., Fotheringham, S., Rees, P., Boyle, P., & Stillwell, J. (1998). The Determinants of Migration Flows in England: a Review of Existing Data and Evidence, a report prepared for the Department of the Environment, Transport and the Regions. Department of Geography, University of Newcastle upon Tyne, Newcastle. Chappell, R., Vickers, L., & Evans, H. (2000). The use of patient registers to estimate migration. Population Trends, 101, 19–24. Cole, K., Frost, M., & Thomas, F. (2002). Workplace data from the census. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 269–280). Chichester, UK: Wiley. Congdon, P. (1991). An application of general linear modelling to migration in London and the South East. In Stillwell, J. C. H., & Congdon, P. (Eds.), Migration Models: Macro and Micro Perspectives (pp. 113–136). London: Belhaven Press. Congdon, P. (2005). Bayesian Models for Categorical Data. Chichester, UK: Wiley. doi:10.1002/0470092394
Compilation of References
Coombes, M. (2002). Localities and city regions codebook. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System. Chichester, UK: Wiley.
Dale, A. (1998). The value of the SARs in spatial and area-level research. Environment & Planning A, 30, 767–774. doi:10.1068/a300767
Coombes, M. G. (2000). Defining locality boundaries with synthetic data. Environment & Planning A, 32, 1499–1518. doi:10.1068/a29165
Dale, A., & Marsh, C. (Eds.). (1993). The 1991 Census User’s Guide. London: HMSO Publications.
Coombes, M. G., & Bond, S. (2008). Travel-to-Work Areas: The 2007 Review. London: Office for National Statistics.
Dale, A., & Teague, A. (2002). Microdata from the Census: Samples of Anonymised Records. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 203–212). Chichester, UK: John Wiley.
Coombes, M. G., & Openshaw, S. (1982). The use and definition of Travel-to-Work Areas in Great Britain: some comments. Regional Studies, 16, 141–149. doi:10.1080/09595238200185161
Dale, A., Creeser, R., Dodgeon, B., Gleave, S., & Filakti, H. (1993). An introduction to the OPCS Longitudinal Study. Environment & Planning A, 25, 1387–1398. doi:10.1068/a251387
Coombes, M. G., Green, A. E., & Openshaw, S. (1986). An efficient algorithm to generate official statistical reporting areas: the case of the 1984 Travel-to-Work Areas revision in Britain. The Journal of the Operational Research Society, 37, 943–953.
Dale, A., Fieldhouse, E., & Holdsworth, C. (2000). Analysing Census Microdata. London: Arnold.
Courgeau, D. (1973). Migrants and migrations. Population, 1, 96–129. Courgeau, D. (1976). Quantitative, demographic and geographic approaches to internal migration. Environment & Planning A, 8, 261–269. doi:10.1068/a080261 Cox, L. (1987). A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association, 82(398), 520–524. doi:10.2307/2289455 Creeser, R., Dodgeon, B., Joshi, H., & Smith, J. (2002). The ONS Longitudinal Study: linked census and event data to 2001. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 221–229). Chichester, UK: John Wiley. Daily Mail. (2008, September 28). Middle classes leading the flight as 250,000 quit London. Daily Mail. Retrieved from http://www.dailymail.co.uk/news/article-1062314/ Middle-classes-leading-flight-250-000-quit-London. html Dale, A. (1993). The OPCS Longitudinal Study. In Dale, A., & Marsh, C. (Eds.), The 1991 Census User’s Guide (pp. 312–329). London: HMSO.
Davies, E., Williamson, P., & Houldsworth, C. (2006). The Leaving of Liverpool: An Examination into the Migratory Characteristics of Liverpool. Mimeo, University of Liverpool. Davis, L. (1991). Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold. DCSF. (2008, March 3). Secondary Applications and Offers – National Offer Day. Department for Children, Schools and Families, London. Retrieved from http:// www.dfes.gov.uk/ rsgateway/DB/STA/t000791/AdmissionsStatisticalReportNon-nationalStatsRevised.pdf Demie, F. (2002). Pupil mobility and education in schools: an empirical analysis. Educational Research, 44(2), 197–215. doi:10.1080/00131880210135304 Denham, C., & Rhind, D. (1983). The 1981 census and its results. In Rhind, D. (Ed.), A Census User’s Handbook (pp. 17–88). London: Methuen. Dennett, A., & Stillwell, J. (2008a). Internal migration in Great Britain – a district level analysis using 2001 data. Working Paper 08/1, School of Geography, University of Leeds, Leeds. Dennett, A., & Stillwell, J. (2008b). Population turnover and churn - enhancing understanding of internal migra-
333
Compilation of References
tion in Britain through measures of stability. Population Trends, 134, 24–41. Dennett, A., Duke-Williams, O., & Stillwell, J. (2007). Interaction data sets in the UK: an audit. Working Paper 07/05, School of Geography, University of Leeds, Leeds. Devis, T. (1984). Population movements measured by the NHSCR. Population Trends, 36, 15–20. Devis, T., & Mills, I. (1986). A comparison a migration data from the National Health Service Central Register and the 1981 Census. OPCS Occasional Paper 35, OPCS, London. DfES. (2007). School Admissions Code. London: The Stationary Office. Diplock, G. J. (1996). The Application of Evolutionary Computing Techniques to Spatial Interaction Modelling. Ph.D. Thesis, School of Geography, University of Leeds, Leeds.
Duke-Williams, O., & Rees, P. (1993). TIMMIG: a program for extracting migration time series tables. Working Paper 93/13, School of Geography, University of Leeds, Leeds. Duke-Williams, O., & Stillwell, J. (2002). Web-based access to complex UK census datasets. In IASSIST 2002 Conference, University of Connecticut, CT. Duke-Williams, O., & Stillwell, J. (2007). Investigating the potential effects of small cell adjustment on interaction data from the 2001 Census. Environment & Planning A, 39, 1079–1100. doi:10.1068/a38143 Dustmann, C., & Faber, F. (2005). Immigrants in the British labour market. Fiscal Studies, 26(4), 423–470. doi:10.1111/j.1475-5890.2005.00019.x Dustmann, C., Fabbri, F., & Preston, I. (2005). The impact of immigration on the British labour market. The Economic Journal, 115(507), 324–341. doi:10.1111/j.14680297.2005.01038.x
Diplock, G. J. (1998). Building new spatial interaction models by using genetic programming and a supercomputer. Environment & Planning A, 30(10), 1893–1904. doi:10.1068/a301893
Ellis, M., & Wright, R. (1998). The balkanisation metaphor in the analysis of US immigration. Annals of the Association of American Geographers. Association of American Geographers, 88(4), 686–698. doi:10.1111/0004-5608.00118
Dobson, J., & Pooley, C. E. (2004). Mobility, Equality, Diversity: A Study of Pupil Mobility in the Secondary School System. London: Technical Report, Department of Geography, University College London.
ERA. (1988). Education Reform Act. Her Majesty’s Stationary Office and Queen’s Printer of Acts of Parliament.
Dobson, J., Henthorne, K., & Lynas, Z. (2000). Pupil Mobility in Schools (Final Report). Tech. rep. London: Department of Geography, University College London. Dorling, D. (1995). A New Social Atlas of Britain. Chichester, UK: John Wiley & Sons. Dorling, D., Rigby, J., Wheeler, B., Ballas, D., Thomas, B., & Fahmy, E. (2007). Poverty, Wealth and Place in Britain, 1968 to 2005. Bristol, UK: Policy Press. Duke-Williams, O. (2004). The development and use of information systems for monitoring and analysing migration in Britain. Unpublished PhD Thesis, University of Leeds, Leeds.
334
Eurostat. (1992). Study on Employment Zones, Eurostat (E/LOC/20), Luxembourg. Ewens, D. (2005). The National and London Pupil Datasets: An introductory briefing for researchers and research users. DMAG Briefing 2005/8. London: Data Management and Analysis Group, Greater London Authority, City Hall. Ewens, D. (2005b). Moving home and changing school: widening the analysis of pupil mobility. Data management and Analysis Group. London: Greater London Authority. Exeter, D. J., Boyle, P., Feng, Z., Flowerdew, R., & Schierloh, N. (2005). The creation of ‘Consistent Areas
Compilation of References
Through Time’ (CATTs) in Scotland, 1981-2001. Population Trends, 119, 28–36. Eyre, H. (1999). Measuring the Performance of Spatial Interaction Models in Practice. Ph.D. Thesis, School of Geography, University of Leeds, Leeds. Faggian, A., McCann, P., & Sheppard, S. (2007). Some evidence that women are more mobile than men: gender differences in UK graduate migration behaviour. Journal of Regional Science, 47(3), 517–539. doi:10.1111/j.14679787.2007.00518.x Farr, M., & Webber, R. (2001). MOSAIC: From and area classification system to individual classification. Journal of Targeting. Measurement and Analysis for Marketing, 10(1), 55–65. doi:10.1057/palgrave.jt.5740033 Farr, W. (1864). Supplement to the 25th Annual Report of the Registrar General. London: HMSO. Fielding, A. (1998). Counterurbanisation and social class. In Boyle, P., & Halfacree, K. (Eds.), Migration in Rural Areas: Theories and Issues. Chichester, UK: John Wiley. Fielding, A. J. (1992). Migration and social mobility: South East England as an escalator region. Regional Studies, 26(1), 1–15. doi:10.1080/00343409212331346741 Findley, S. E. (1988). The directionality and age selectivity of the health-migration relation: evidence from sequences of disability and mobility in the United States. The International Migration Review, 22, 4–29. doi:10.2307/2546583 Finney, N., & Simpson, L. (2008). Internal migration and ethnic groups: evidence for Britain from the 2001 Census. Population Space and Place, 14, 63–83. doi:10.1002/psp.481 Flake, G. W. (2001). The Computational Beauty of Nature: Computer Explorations of Fractals, Chaos, Complex Systems and Adaption (4th ed.). Cambridge, MA: MIT Press. Flowerdew, R. (1982). Fitting the lognormal gravity model to heteroscedastic data. Geographical Analysis, 14, 263–267.
Flowerdew, R. (1991). Poisson regression modelling of migration. In Stillwell, J., & Congdon, P. (Eds.), Migration Models: Macro and Micro Approaches (pp. 92–112). London: Belhaven. Flowerdew, R. (1997). The potential use of moving units in British migration analysis. In Rees, P. (Ed.), Third workshop: the 2001 Census - special datasets: what do we want? Working Paper 97/9. Leeds: School of Geography, University of Leeds. Flowerdew, R., & Aitkin, M. (1982). A method of fitting the gravity model based on the Poisson distribution. Journal of Regional Science, 22, 191–222. doi:10.1111/j.1467-9787.1982.tb00744.x Flowerdew, R., & Green, A. (1993). Migration, transport and workplace statistics from the 1991 census. In Dale, A., & Marsh, C. (Eds.), The 1991 Census User’s Guide (pp. 269–294). London: HMSO. Flowerdew, R., & Green, M. (1992). Developments in aerial interpolation methods and GIS. The Annals of Regional Science, 26, 67–78. doi:10.1007/BF01581481 Flowerdew, R., & Lovett, A. (1988). Fitting constrained Poisson regression models to interurban migration flows. Geographical Analysis, 20, 297–307. Flowerdew, R., & Lovett, A. (1989). Compound and generalised Poisson models for inter-urban migration. In Congdon, P., & Batey, P. (Eds.), Advances in Regional Demography (pp. 246–256). London: Belhaven. Foot, D. (1981). Operational Urban Models. London: Methuen. Forsythe, F. (1992). The nature of migration between Northern Ireland and Great Britain - a preliminary analysis based on the Labour Force Surveys, 1986-88. The Economic and Social Review, 23(2), 105–127. Fotheringham, A. S. (1983). A new set of spatial interaction models: the theory of competing destinations. Environment & Planning A, 15(1), 15–36. doi:10.1068/ a150015 Fotheringham, A. S., & O’Kelly, M. E. (1989). Spatial Interaction Models: Formulations and Applications. Dordrecht, The Netherlands: Kluwer. 335
Compilation of References
Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Chichester, UK: Wiley. Fotheringham, A. S., Rees, P., Champion, T., Kalogirou, S., & Tremayne, A. R. (2004). The development of a migration model for England and Wales: overview and modelling out-migration. Environment & Planning A, 36, 1633–1672. doi:10.1068/a36136 Frey, W. (1996). Immigration, domestic migration and demographic balkanisation in America: new evidence for the 1990s. Population and Development Review, 22(4), 741–763. doi:10.2307/2137808 Frey, W., & Speare, A. (1995). Metropolitan areas as functional communities. In Dahmann, D. C., & Fitzsimmons, J. D. (Eds.), Metropolitan and Nonmetropolitan Areas: New Approaches to Geographical Definition (pp. 139–190). Washington, DC: US Bureau of the Census. Frost, M., Linneker, B., & Spence, N. (1996). The spatial externalities of car-based worktravel emissions in Greater London, 1981 and 1991. Transport Policy, 3, 187–200. doi:10.1016/S0967-070X(96)00027-3 Frost, M., Linneker, B., & Spence, N. (1997). The energy consumption implications of changing worktravel in London, Birmingham and Manchester: 1981 and 1991. Transportation Research Part A, Policy and Practice, 31, 1–19. doi:10.1016/S0965-8564(96)00011-0 GAD. (2004). Population projections by the Government Actuary, 2004-based Principal Projection for England. Retrieved from http://www.gad.gov.uk/Demography_ Data/Population/Index.asp? y=2004&v=Principal&da taCountry=england&chkDataTable=yy_singyear&sub Table=Perform+search Gatrell, A. C. (2002). Geographies of Health: An Introduction. Oxford, UK: Blackwell. Geertman, S., de Jong, T., & Wessels, C. (2003). Flowmap: a support tool for strategic network analysis. In Geertman, S., & Stillwell, J. (Eds.), Planning Support Systems in Practice (pp. 155–175). Heidelberg, Germany: Springer.
336
Gereluk, D. (2005). Communities in a changing educational environment. British Journal of Educational Studies, 53(1), 4–18. doi:10.1111/j.1467-8527.2005.00280.x Gibson, A., & Asthana, S. (2000a). Local markets and the polarization of public-sector schools in England and Wales. Transactions of the Institute of British Geographers, 25(3), 303–319. doi:10.1111/j.00202754.2000.00303.x Gibson, A., & Asthana, S. (2000b). What’s in a number? Commentary on Gorard and Fitz’s Investigating the determinants of segregation between schools. Research Papers in Education, 15(2), 133–153. doi:10.1080/026715200402461 Gober-Meyers, P. (1978). Migration analysis: the role of geographical scale. The Annals of Regional Science, 12(3), 52–61. doi:10.1007/BF01286122 Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimisation and Machine Learning. Reading, MA: Addison Wesley. Goldstein, S. (1964). The extent of repeated migration: an analysis based on the Danish Population Register. Journal of the American Statistical Association, 59, 1121–1132. doi:10.2307/2282627 Goodchild, M. F., Anselin, L., & Deichmann, U. (1993). A framework for the areal interpolation of socioeconomic data. Environment & Planning A, 25, 383–397. doi:10.1068/a250383 Goodman, J. F. B. (1970). The definition and analysis of local labour markets: some empirical problems. British Journal of Industrial Relations, 8, 179–196. doi:10.1111/j.1467-8543.1970.tb00968.x Gorard, S. (1999). ‘Well. That about wraps it up for school choice research’: a state of the art review. School Leadership & Management, 19, 25– 47. doi:10.1080/13632439969320 Gorard, S. (2000). Here we go again: a reply to ‘what’s in a number?’ by Gibson and Asthana. Research Papers in Education, 15(2), 155–162. doi:10.1080/026715200402470
Compilation of References
Gorard, S. (2004). Comments on ‘Modelling social segregation’ by Goldstein and Noden. Oxford Review of Education, 30(3), 435–440. doi:10.1080/0305498042000260520 Gordon, I. (1995). Migration in a segmented labour market. Transactions of the Institute of British Geographers, 20(2), 139–155. doi:10.2307/622428 Gould, P. (1972). Pedagogic Review. Annals of the Association of American Geographers. Association of American Geographers, 62(4), 689–700. doi:10.1111/j.1467-8306.1972.tb00896.x Green, A. (1997). A question of compromise? Case study evidence on the location and mobiolity strategies of dual career households. Regional Studies, 31(7), 641–657. doi:10.1080/00343409750130731 Green, A. E., Hogarth, T., & Shackleton, R. E. (1999). Long Distance Living: Dual Location Households. Bristol, UK: Policy Press. Greene, W. H. (2003). Econometric Analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall. Gregory, I. N., & Ell, P. S. (2005). Breaking the boundaries: geographical approaches to integrating 200 years of the census. Journal of Royal Statistical Society A, 168, 419–437. doi:10.1111/j.1467-985X.2005.00356.x Gregory, I., Dorling, D., & Southall, R. (2001). A century of inequality in England and Wales using standardized geographical units. Area, 33, 297–311. doi:10.1111/14754762.00033 Gregory, I., Southall, H., & Dorling, D. (2000). A century of poverty in England and Wales, 1898-1998: a geographical analysis. In Bradshaw, J. R., & Sainsbury, R. (Eds.), Researching Poverty. Aldershot, UK: Ashgate. Griffith, D., & Haining, R. (2006). Beyond mule kicks: the Poisson distribution in geographical analysis. Geographical Analysis, 38, 123–139. doi:10.1111/j.00167363.2006.00679.x GRO Scotland. (2006). Retrieved from http://www. gro-scotland.gov.uk/statistics/publications-and-data/ annual-report-publications/annrep/01sect5/estimatingmigration.html
Guy, C. M. (1991). Spatial interaction modelling in retail planning practice: the need for robust statistical methods. Environment and Planning B, 18, 191–203. doi:10.1068/b180191 Haining, R. (2003). Spatial Data Aanalysis: Theory and Practice. Cambridge, UK: Cambridge University Press. Harding, S. (2003). Social mobility and self-reported limiting long-term illness among west Indian and South Asian migrants living in England and Wales. Social Science & Medicine, 56, 355–361. doi:10.1016/S02779536(02)00041-2 Harland, K., & Stillwell, J. (2007a). Commuting to School in Leeds: How useful is the PLASC? Working Paper 07/02, School of Geography, University of Leeds, Leeds. Harland, K., & Stillwell, J. (2007b). Evidence of ethnic minority dispersal in Leeds. The Yorkshire and Humber Regional Review, 17(2), 21–23. Harland, K., & Stillwell, J. (2007b). Using PLASC data to identify patterns of commuting to school, residential migration and movement between schools in Leeds. Working Paper 07/03, School of Geography, University of Leeds, Leeds. Harland, K., Duke-Williams, O., & Stillwell, J. (2006). Commuting to school: an investigation of 2001 Census STS and PLASC data. Presentation at GISRUK’06, University of Nottingham, 10 April. Harris, J., Hayes, J., & Cole, K. (2002). Disseminating census area statistics over the Web, Chapter 8 in Rees, P., Martin, D., & Williamson, P. (Eds.). The Census Data System, Wiley, Chichester, pp. 113-121. Harris, R., & Johnston, R. (2008). Primary schools, markets and choice: studying polarization and the core catchment. Applied Spatial Analysis and Policy, 1(1), 59–84. doi:10.1007/s12061-008-9002-8 Hattersley, L., & Creeser, R. (1995). Longitudinal Study 1971-1991: History, Organisation and Quality of Data. OPCS Series DS 15. London: HMSO.
337
Compilation of References
Hayes, L., & Al-Hamad, A. (1997). Residential movement into elderly person households: evidence from the Household Sample of Anonymised Records. Environment & Planning A, 29(8), 1433–1447. doi:10.1068/a291433 Hayes, L., & Al-Hamad, A. (1999). Residential change: differences in the movements and living arrangements of divorced men and women. In Boyle, P., & Halfacree, K. (Eds.), Migration and Gender in the Developed World (pp. 261–279). London: Routledge. Heppenstall, A. J. (2004). Application of Hybrid Intelligent Agents to Modelling a Dynamic, Locally Interacting Retail Market. Unpublished PhD Thesis, School of Geography, University of Leeds, Leeds. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press. Home Office. (2006) A Points Based System: Making Migration Work for Britain, Command Paper. http:// www.homeoffice.gov.uk/documents/command-pointsbased-migration?view=Binary Home Office. (2008a) Asylum Statistics, Quarter 1 2008. http://www.homeoffice.gov.uk/rds/pdfs08/asylumq108. pdf Home Office. (2008b) Persons Granted British citizenship, United Kingdom, 2007. http://www.homeoffice.gov. uk/rds/pdfs08/hosb0508.pdf Home Office. (2008c) Accession Monitoring Report May 2004-March 2008, A8 Countries, A joint online report between the Border and Immigration Agency, Department for Work and Pensions, HM Revenue and Customs and Communities and Local Government. http://www. bia.homeoffice.gov.uk/sitecontent/documents/aboutus/ reports/accession_monitoring_report/report15/may04mar08.pdf?view=Binary Horsfield, G. (2005). International migration, Chapter 7 in Chappell, R. (ed). Focus on People and Migration 2005 Ed., Palgrave Macamillan, Basingstoke, pp. 114-129. House of Commons Treasury Committee. (2008) Counting the Population, Eleventh Report of Session 2007-08. HC 183-1. http://www.publications.parliament.uk/pa/ cm200708/cmselect/cmtreasy/183/183.pdf
338
House of Lords Select Committee on Economic Affairs. (2008a) The Economic Impact of Immigration Volume I: Report, 1st Report of Session 2007-08, HL Paper 82-I. www.publications.parliament.uk/pa/ld200708/ldselect/ ldeconaf/82/82.pdf Huff, D. L. (1961). A note on the limitations of intraurban gravity models. Land Economics, 38, 64–66. doi:10.2307/3144725 Hussain, S., & Stillwell, J. (2008). Internal migration of ethnic groups in England and Wales by age and district type. Working Paper 08/03, School of Geography, University of Leeds, Leeds. Johnson, R., Forrest, J., & Poulsen, M. (2002). Are there ethnic enclaves/ghettos in English cities? Urban Studies (Edinburgh, Scotland), 39(4), 591–618. doi:10.1080/00420980220119480 Johnson, R., Poulsen, M., & Forrest, J. (2001). The ethnic geography of EthniCities: the American model and residential concentration in London. Ethnicities, 2, 209–235. doi:10.1177/1468796802002002657 Johnston, R., Burgess, S., Wilson, D., & Harris, R. (2006). School and residential ethnic segregation: an analysis of variation across England’s Local Education Authorities. Regional Studies, 40(9), 973–990. doi:10.1080/00343400601047390 Johnston, R., Wilson, D., & Burgess, S. (2004). School segregation in multiethnic England. Ethnicities, 4(2), 237–265. doi:10.1177/1468796804042605 Johnston, R., Wilson, D., & Burgess, S. (2005). England’s multiethnic educational system? A classification of secondary schools. Environment & Planning A, 37, 45–62. doi:10.1068/a36298 Jones, P., & Elias, P. (2006). Administrative data as a research resource: a selected audit. Economic & Social Research Council Regional Review Board Report 43/06. Warwick Institute for Employment Research, University of Warwick, Kenilworth. Kalogirou, S. (2005). Examining and presenting trends of internal migration flows within England and Wales.
Compilation of References
Population Space and Place, 11(4), 283–297. doi:10.1002/ psp.376 Kirkwood, B. R., & Sterne, J. A. C. (2003). Essential Medical Statistics (2nd ed.). Oxford, UK: Blackwell. Knudsen, D. C., & Fotheringham, A. S. (1982). Matrix comparison, goodness of fit, and spatial interaction modelling. International Regional Science Review, 10, 127–148. doi:10.1177/016001768601000203
seasonal to calendar quarters: an overview of the switch of the LFS to calendar quarters and the potential effects of this change on users. London: ONS. Manchester Computing. (1989). MATPAC User’s Manual, NAT 664. Marsh, C. (1993). The Sample of Anonymised Records. In Dale, A., & Marsh, C. (Eds.), The 1991 Census User’s Guide (pp. 295–311). London: HMSO.
Lambert, D. (1993). Measures of disclosure risk and harm. Journal of Official Statistics, 9(2), 313–331.
Martin, D. (2002). Geography for the 2001 Census in England and Wales. Population Trends, 108, 7–15.
Large, P., & Ghosh, K. (2006). Estimates of the population by ethnic group for areas within England. Population Trends, 124, 8–17.
McCarthy, P. S. (1980). A study of the importance of generalised attributes in shopping choice behaviour. Environment & Planning A, 12, 1269–1286. doi:10.1068/ a121269
Large, P., & Ghosh, K. (2006a). A methodology for estimating the population by ethnic group for areas within England. Population Trends, 123, 21–31. Lee (1966). A theory of migration, Demography, 3(1), 47-57. Liffen, K., Maslen, S., & Price, S. (1988). HES Book. London: Department of Health. Lovett, A., & Flowerdew, R. (1989). Analysis of count data using Poisson regression. The Professional Geographer, 41, 190–198. doi:10.1111/j.0033-0124.1989.00190.x Lowry, I. (1966). Migration and Metropolitan Growth: Two Analytical Reports. San Francisco: Chandler. Lupton, R. (2005). Changing Neighbourhoods? Mapping the Geography of Poverty and Worklessness Using the 1991 and 2001 Census, CASE-Brookings Census Brief 3. London: London School of Economics. Machin, S., Telhaj, S., & Wilson, J. (2006). The mobility of English school children, Discussion paper. Centre for Economic Performance. Mackintosh, M. (2005). 2001 Census: The migration patterns of London’s ethnic groups, DMAG Briefing Paper 2005/30. London: Greater London Council Data Management and Analysis Group, Greater London Council. Madouros, V. (2006). Impact of the LFS switch from
McHugh, K. E., Hogan, T. D., & Happel, S. K. (1995). Multiple residence and cyclical migration - a life-course perspective. The Professional Geographer, 47(3), 251–267. doi:10.1111/j.0033-0124.1995.00251.x Middleton, E. (1995). Samples of Anonymised Records. In Openshaw, S. (Ed.), Census Users’ Handbook (pp. 337–362). Cambridge, UK: GeoInformation International. Migration Statistics Unit. (2007). Using Patient Registers to Estimate Internal Migration, Technical Guidance Notes. Migration Statistics Unit. ONS, Titchfield. Mitchell, R., Dorling, D., Martin, D., & Simpson, L. (2002). Bringing the missing million home: correcting the 1991 small area statistics for undercount. Environment & Planning A, 34, 1021–1035. doi:10.1068/a34161 Nakaya, T. (2001). Local spatial interaction modelling based on the geographically weighted regression approach. GeoJournal, 53, 347–358. doi:10.1023/A:1020149315435 Nelder, J., & Wedderburn, R. W. M. (1972). Generalised linear models. Journal of the Royal Statistical Society A, 135, 370–384. doi:10.2307/2344614 Nielson, T. A. S., & Hovgesen, H. H. (2007). Exploratory mapping of commuter flows in England and Wales. Journal of Transport Geography. Retrieved from doi:10.1016/j. trangeo.2007.04.005
339
Compilation of References
NISRA. (2005). Development of methods/sources to estimate population migration in Northern Ireland. NISRA Paper. Retrieved June 13, 2007, from http:// www.nisra.gov.uk/archive/demography/publications/ dev_est_mig.pdf Norman, P. (2002). Estimating small area populations for use in medical studies: accounting for migration. PhD Thesis, School of Geography, University of Leeds, Leeds. Norman, P. (2003). What are individual-level microdata and aggregate-level area census data? FAQ 11 Individual versus Aggregate. Retrieved from http://www.chcc.ac.uk/ overview/faq11/frame.html Norman, P. (2006). The joys and challenges of attaching area data to the ONS Longitudinal Study. CeLSIUS Newsletter 7. Retrieved from http://celsius.census.ac.uk/ news.html Norman, P., & Bambra, C. (2007). The utility of medically certified sickness absence data as an updatable indicator of population health. Population Space and Place, 13(5), 333–352. doi:10.1002/psp.458 Norman, P., Boyle, P., & Rees, P. (2005). Selective migration, health and deprivation: a longitudinal analysis. Social Science & Medicine, 60(12), 2755–2771. doi:10.1016/j. socscimed.2004.11.008 Norman, P., Rees, P., & Boyle, P. (2003). Achieving data compatibility over space and time: creating consistent geographical zones. International Journal of Population Geography, 9(5), 365–386. doi:10.1002/ijpg.294 O’Brien, L. (1992). Introducing Quantitative Geography: Measurement, Methods and Generalised Linear Models. London: Routledge. ODPM. (2002). Development of a Migration Model. Report prepared by the University of Newcastle upon Tyne, the University of Leeds, and the Greater London Authority/London Research Centre. London: ODPM. Office for National Statistics, General Register Office for Scotland, Northern Ireland Statistics and Research Agency. (2006). UK Statistical Disclosure Control Policy
340
for 2011 Census Output, November. Retrieved March 1, 2007, from http://www.statistics.gov.uk/census/pdfs/ SDCpolicy.pdf Office for National Statistics. (1999). Gazetteer of the New and Old Geographies of the United Kingdom. Retrieved from http://www.statistics.gov.uk/downloads/ ons_geography/Gazetteer_v3.pdf Office for National Statistics. (2003). Census 2001 Review and Evaluation: Edit and Imputation Evaluation Report. Retrieved from http://www.statistics.gov.uk/ census2001/proj_eai.asp Office for National Statistics. (2004). Area Classification for Statistical wards – methods. Retrieved from http:// www.statistics.gov.uk/about/methodology_by_theme/ area_classification/wards/methodology.asp) Office for National Statistics. (2005). Quality Report for England and Wales (Census 2001). Basingstoke: Palgrave-Macmillan. Office of the Deputy Prime Minister. (2002). Development of a Migration Model. London: ODPM. Office of the Deputy Prime Minister. (2004). Planning Policy Statement 7: Sustainable Development in Rural Areas. London: HMSO. Ogilvy, A. A. (1980). Inter-regional migration since 1971: an appraisal of the data from the National Health Service Central Register and labour Force Surveys. OPCS Occasional Paper 16, OPCS, London. ONS, & Coombes, M. (1998). 1991-based Travel-to-Work Areas. London: Office for National Statistics. ONS. (1999). 1996-based subnational population projections England, ONS PP3 No 10. London: The Stationery Office. ONS. (2001). Census 2001 Classifications. Titchfield. Hampshire, UK: Office for National Statistics. ONS. (2001). Census 2001 Origin-destination Statistics. Final Specifications. ONS. (2003a). Proposals for an Integrated Population Statistics System. Discussion Paper, ONS, October.
Compilation of References
ONS. (2003b). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2001. Series MN no.28. Office for National Statistics, London. Retrieved January 15, 2007, from http://www.statistics.gov.uk/downloads/theme_population/MN28.pdf ONS. (2004). Origin-Destination Statistics: Local Authorities (CD-ROM). London: Office for National Statistics. ONS. (2004a). Proposals for a Continuous Population Survey. Consultation Paper, ONS, July. ONS. (2004b). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2002. Series MN no.29. Office for National Statistics, London. Retrieved January 15, 2007, from http:// www.statistics.gov.uk/downloads/theme_population/ MN_no_29_v3.pdf ONS. (2005). Census 2001: Quality Report for England and Wales. Basingstoke: Palgrave Macmillan. ONS. (2005a). The 2011 Census: Initial View on Content for England and Wales. Consultation Document. London: ONS. ONS. (2005b). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2003. Series MN no.30. Office for National Statistics, London. Retrieved January 15, 2007, from http://www.statistics.gov.uk/downloads/theme_population/MN_No30_2003v3.pdf ONS. (2006). Neighbourhood Statistics Programme - Evaluation Report. London: Office for National Statistics. ONS. (2006). Social Trends 36. Basingstoke: Social Trends. ONS. (2006a). Travel Trends. A report on the 2005 International Passenger Survey. Retrieved January 15, 2007, from http://www.statistics.gov.uk/downloads/ theme_transport/traveltrends2005.pdf ONS. (2006b). Methodology for the experimental monthly index of services. Retrieved January 15, 2007, from http://
www.statistics.gov.uk/iosmethodology/downloads/ Whole_Report.pdf ONS. (2006c). International Migration: Migrants Entering or Leaving the United Kingdom and England and Wales, 2004. Series MN no.31. Office for National Statistics, London. Retrieved January 15, 2007, from http://www.statistics.gov.uk/downloads/theme_population/MN31.pdf ONS. (2006d). Report for the Inter-departmental Task Force on Migration Statistics. Retrieved January 15, 2007, from http://www.national-statistics.org.uk/about/data/ methodology/specific/population/future/imps/updates/ downloads/TaskForceReport151206.pdf ONS. (2007). Population turnover figures reveal large changes in the population of seaside towns. Retrieved September 12, 2008, from http://www.neighbourhood. statistics.gov.uk/dissemination/Info.do?page=news/ newsitems/14-march-2007-population-turnover-analysis.htm ONS. (2007a) 2006 Mid-year Estimates. http://www.statistics.gov.uk/statbase/Product.asp?vlnk=601&More=N ONS. (2007b) Improved Methods for Estimating International Migration – Geographical Distribution of Estimates of In-migration. http://www.statistics.gov. uk/downloads/theme_population/Geog_distn_in-migs. pdf ONS. (2007d) Update on the Development of the Integrated Household Survey. http://www.ccsr.ac.uk/esds/ events/2007-03-29/ihs/slides/bennett.ppt ONS. (2008a) National Population Projections: 2006-based. Series PP No 26. http://www.statistics.gov. uk/downloads/theme_population/pp2no26.pdf ONS. (2008b) International Migration. Series MN No 33, 2006 Data. http://www.statistics.gov.uk/downloads/ theme_population/MN33.pdf ONS. (2008c) 2004-based SNPP, Current Data. http://www.statistics.gov.uk/STATBASE/Product. asp?vlnk=997
341
Compilation of References
ONS. (2008d) Updated Short-Term Migration Estimates, mid-2004 and mid-2005. http://www.statistics.gov.uk/ about/data/methodology/specific/population/future/ imps/updates/downloads/STM_Update.pdf ONS/GROS/NISRA. (2001). Census 2001 OriginDestination Statistics (Final Specifications), London. Retrieved May 15, 2007, from http://www.statistics.gov. uk/census2001/pdfs/OriginDest4web.pdf OPCS/GROS. (1992). 1991 Census, Definitions Great Britain. London: HMSO. Openshaw, S. (1983). From data crunching to model crunching: the dawn of a new era. Environment & Planning A, 15(8), 1011–1012. Openshaw, S. (1984). The Modifiable Areal Unit Problem (Concepts and Techniques in Modern Geography 38). Norwich, UK: GeoBooks. Openshaw, S. (1996). Developing GIS-relevant zonebased spatial analysis methods. In Longley, P., & Batty, M. (Eds.), Spatial Analysis: Modelling in a GIS Environment (pp. 55–73). Cambridge, UK: GeoInformation International.
Palmer, G., Kenway, P., & Wilcox, S. (2006). Housing and Neighbourhoods Monitor. London: New Policy Institute. Parsons, E., Chalkley, B., & Jones, A. (2000). School catchments and pupil movements: a case study in parental choice. Educational Studies, 26, 33–48. doi:10.1080/03055690097727 Peach, C. (1996b). Does Britain have ghettoes? Transactions. Institute of British Geographers NS, 22, 216–235. doi:10.2307/622934 Peloe, A., & Rees, P. (1999). Estimating ethnic change in London, 1981-91, using a variety of census data. International Journal of Population Geography, 5, 179–194. doi:10.1002/(SICI)1099-1220(199905/06)5:33.0.CO;2-P Petrie, A., & Sabin, C. (2005). Medical Statistics at a Glance (2nd ed.). Oxford, UK: Blackwell. Phillips, D. (1998). Black minority ethnic concentration, segregation and dispersal in Britain. Urban Studies (Edinburgh, Scotland), 35(10), 1681–170. doi:10.1080/0042098984105
Openshaw, S. (1998). Neural network, genetic, and fuzzy logic models of spatial interaction. Environment & Planning A, 30(10), 1857–1872. doi:10.1068/a301857
Plane, D. (1984). A systematic demographic efficiency analysis of US interstate population exchange. Economic Geography, 60, 294–312. doi:10.2307/143435
Openshaw, S. (Ed.). (1995). Census User’s Handbook. Cambridge: GeoInformation International.
Plane, D., & Mulligan, G. (1997). Measuring spatial focusing in a migration system. Demography, 34(1), 251–262. doi:10.2307/2061703
Openshaw, S., & Rao, L. (1995). Algorithms for reengineering 1991 Census geography. Environment & Planning A, 27, 425–446. doi:10.1068/a270425 Oppewal, H., Timmermans, H. J. P., & Louviere, J. J. (1997). Modelling the effects of shopping centre size and store variety on consumer choice behaviour. Environment & Planning A, 29, 1073–1090. doi:10.1068/a291073 Owen, D. (1997). Migration by minority ethnic groups within Great Britain in the early 1990s. In 28th annual conference of the British and Irish section of the Regional Science Association International. Falmouth College of Arts.
342
Platt, L., Simpson, L., & Akinwale, B. (2005). Stability and change in ethnic groups in England and Wales. Population Trends, 121, 35–46. Pooler, J. (1994a). An extended family of spatial interaction models. Progress in Human Geography, 18, 17–39. doi:10.1177/030913259401800102 Pooley, C., Turnbull, J., & Adams, M. (2005). The journey to school in Britain since the 1940s: continuity and change. Area, 37(1), 43–53. doi:10.1111/j.14754762.2005.00605.x
Compilation of References
Poulain, M. (1996). Confrontation des statistiques de migration intra-Européennes: vers plus d’harmonisation. European Journal of Population, 9(4), 353–381. doi:10.1007/BF01265643 Poynter, K. (2008). Review of Information Security at HM Revenue and Customs - Final report. London: HMSO. Retrieved from http://www.hm-treasury.gov. uk/independent_reviews/poynter_review/poynter_review_index.cfm. Ravenstein, E. G. (1889). The laws of migration. Journal of the Royal Statistical Society, 52, 241–301. doi:10.2307/2979333 Raymer, J., & Rogers, A. (2007). Using age and spatial flow structures in the indirect estimation of migration streams. Demography, 44(2), 199–223. doi:10.1353/ dem.2007.0016 Raymer, J., Abel, G., & Smith, P. W. F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 170(4), 891–908. doi:10.1111/j.1467-985X.2007.00490.x Raymer, J., Abel, G., & Smith, P. W. F. (2007). Combining census and registration data to estimate detailed elderly migration flows in England and Wales. Journal of Royal Statistical Society A., 170, 891–908. doi:10.1111/j.1467985X.2007.00490.x Raymer, J., Bonaguidi, A., & Valentini, A. (2006). Describing and projecting the age and spatial structures of interregional migration in Italy. Population Space and Place, 12, 371–388. doi:10.1002/psp.414 Raymer, J., Smith, P. W. F., & Giulietti, C. (2008). Combining census and registration data to analyse ethnic migration patterns in England from 1991 to 2007. Paper presented at the European Population Conference, Barcelona, 9-12 July. Rees, P. (1977). The measurement of migration from census and other sources. Environment & Planning A, 9, 257–280. doi:10.1068/a090247 Rees, P. H. (1995). Putting the census on the researcher’s desk. In Openshaw, S. (Ed.), Census Users’ Handbook
(pp. 27–81). Cambridge, UK: GeoInformation International. Rees, P. H., & Duke-Williams, O. (1997). Methods for estimating missing data on migrants in the 1991 British Census. International Journal of Population Geography, 3, 323–368. doi:10.1002/(SICI)10991220(199712)3:43.0.CO;2-Z Rees, P. H., & Wilson, A. G. (1977). Spatial Population Analysis. London: Edward Arnold. Rees, P., & Boden, P. (2006) Estimating London’s New Migrant Ppopulation: Stage 1 – Review of Methodology, A Report commissioned by the Greater London Authority for the Mayor of London. http://www.london.gov.uk/ mayor/refugees/docs/nm-pop.pdf Rees, P., & Butt, F. (2004). Ethnic change and diversity in England, 1981-2001. Area, 36(2), 174–186. doi:10.1111/ j.0004-0894.2004.00213.x Rees, P., Durham, H., & Kupiszewski, M. (1996). Internal migration and regional population dynamics in Europe: United Kingdom case study. Working Paper 96/20, School of Geography, University of Leeds, Leeds. Rees, P., Fotheringham, S., & Champion, A. (2002). Modelling migration for policy analysis. In Clarke, G., & Stillwell, J. (Eds.), Applied GIS and Spatial Analysis. Chichester, UK: John Wiley and Sons. Rees, P., Martin, D., & Williamson, P. (Eds.). (2002). The Census Data System. Chichester, UK: Wiley. Rees, P., Thomas, F., & Duke-Williams, O. (2002). Migration data from the census. In Rees, P., Martin, D., & Williamson, P. (Eds.), The Census Data System (pp. 245–267). Chichester: Wiley. Rogers, A. (1966). Matrix methods of population analysis. Journal of the American Institute of Planners, 32, 40–44. Rogers, A. (1967). Matrix analysis of interregional population growth and distribution. Papers / Regional Science Association. Regional Science Association. Meeting, 18, 17–196.
343
Compilation of References
Rogers, A. (1968). Matrix Analysis of Interregional Population Growth and Distribution. Berkeley, CA: University of California Press.
Rotolo, T., & Tittle, C. R. (2006). Population size, change, and crime in U.S. cities. Journal of Quantitative Criminology, 22, 341–368. doi:10.1007/s10940-006-9015-x
Rogers, A. (1989). Requiem for the net migrant. Population Program Working Paper No. WP-89-5, Institute of Behavioural Science, University of Colorado, Boulder.
Rowland, D. T. (2006). Demographic Methods and Concepts. Oxford, UK: Oxford University Press.
Rogers, A. (1990). Requiem for the net migrant. Geographical Analysis, 22, 283–300. Rogers, A. (1995). Multiregional Demography: Principles, Methods, and Extensions. Chichester, UK: Wiley. Rogers, A., & Castro, L. J. (1981). Model migration schedules. Research Report-81-30. International Institute for Applied Systems Analysis, Laxenburg. Rogers, A., & Raymer, J. (1998). The spatial focus of US interstate migration flows. International Journal of Population Geography, 4(1), 63–80. doi:10.1002/(SICI)10991220(199803)4:13.0.CO;2-U Rogers, A., Willekens, F. J., & Raymer, J. (2001). Modeling interregional migration flows: Continuity and change. Mathematical Population Studies, 9, 231–263. doi:10.1080/08898480109525506 Rogers, A., Willekens, F. J., & Raymer, J. (2002). Capturing the age and spatial structures of migration. Environment & Planning A, 34, 341–359. doi:10.1068/a33226 Rogers, A., Willekens, F. J., & Raymer, J. (2003). Imposing age and spatial structures on inadequate migration flow datasets. The Professional Geographer, 55(1), 56–69. Rogers, A., Willekens, F. J., Little, J. S., & Raymer, J. (2002). Describing migration spatial structure. Papers in Regional Science, 81, 29–48. doi:10.1007/ s101100100090 Rose, D., & O’Reilly, K. (1998). The ESRC Review of Government Social Classifications. Office for National Statistics, London. Retrieved from www.statistics.gov. uk/downloads/theme_compendia/ESRC_Review.pdf Rosenbaum, M., & Bailey, J. (1991). Movement within England and Wales during the 1980s, as measured by the NHS Central Register. Population Trends, 65, 24–34.
344
Roy, J., & Thill, J.-C. (2004). Spatial interaction modelling. Papers in Regional Science, 83(1), 339–361. doi:10.1007/s10110-003-0189-4 Salt, J. (2005). International Migration and the United Kingdom - Report of the United Kingdom SOPEMI correspondent to the OECD, 2005. Retrieved January 15, 2007, from http://www.geog.ucl.ac.uk/ mru/docs/ Sop05fin_20060627.pdf SARs. (2004). What are the Samples of Anonymised Records? Retrieved from http://www.ccsr.ac.uk/sars/ guide/introduction/ Scott, A., & Kilbey, T. (1999). Can patient registers give an improved measure of internal migration in England and Wales? Population Trends, 96, 44–55. Scott, A., Pearce, D., & Goldblatt, P. (2001). The sizes and characteristics of the minority ethnic populations of Great Britain – latest estimates. Population Trends, 105, 6–15. Senior, M. L. (1979). From gravity modelling to entropy maximizing: a pedagogic guide. Progress in Human Geography, 3(2), 175–210. Senior, M. L. (1987). The establishment of family planning clinics in south-west Nigeria by 1970: analyses using logit and Poisson regression. Area, 19, 237–245. Senior, M. L., Williams, H., & Higgs, G. (2000). Urbanrural mortality differentials: controlling for material deprivation. Social Science & Medicine, 51, 289–305. doi:10.1016/S0277-9536(99)00454-2 Shepherd, J., Bibby, P., & Frost, M. (2008). Mapping Socio-Economic Flows Across the Region. Final Report to East of England Development Agency, EEDA, Histon. Shields, Ma, P. (1998). The earnings of male immigrants in England: evidence from the quar-
Compilation of References
terly LFS. Applied Economics, 30(9), 1157–1168. doi:10.1080/000368498325057 Silvers, A. L. (1979). Probabilistic income maximising behaviour in regional migration. International Regional Science Review, 2, 29 – 40. doi:10.1177/016001767700200103 Simmonds, D. C., & Skinner, A. (2003). The South and West Yorkshire strategic land-use/transportation model. In Clarke, G., & Stillwell, J. (Eds.), Applied GIS and Spatial Analysis (pp. 195–214). Chichester, UK: Wiley. doi:10.1002/0470871334.ch11 Simpson, S., & Dorling, D. (1994). Those missing millions: implications for social statistics of non-response to the 1991 Census. Journal of Social Policy, 23, 543–567. doi:10.1017/S0047279400023345 Simpson, S., & Middleton, E. (1997). Who is missed by a national Census? A review of empirical results from Australia, Britain, Canada and the USA. CCSR Working Paper No 2, Centre for Census and Survey Research, University of Manchester, Manchester. Singer, E., Mathiowetz, N., & Couper, M. (1993). The impact of privacy and confidentiality concerns on survey participation: the case of the 1990 Census. Public Opinion Quarterly, 57, 465–482. doi:10.1086/269391 Sloggett, A., & Joshi, H. (1998). Indicators of deprivation in people and places: longitudinal perspectives. Environment & Planning A, 30, 1055–1076. doi:10.1068/ a301055 Smart, M. (1974). Labour market areas: uses and definitions. Progress in Planning, 2, 239–353. doi:10.1016/03059006(74)90008-7 StataCorp. (2005). Stata Statistical Software: Release 9. Reference R-Z. College Station, TX: StataCorp LP. Statistics Commission. (2007) Foreign Workers in the UK,Statistics Commission Briefing Note. http://www. statscom.org.uk/C_1237.aspx Stewart, J. Q. (1948). Demographic gravitation: evidence and applications. Sociometry, 11, 31–58. doi:10.2307/2785468
Stillwell, J. (1991). Spatial interaction models and the propensity to migrate over distance. In Stillwell, J., & Congdon, P. (Eds.), Migration Models: Macro and Micro Approaches (pp. 34–56). London: Belhaven. Stillwell, J. (1994). Monitoring intercensal migration in the United Kingdom. Environment & Planning A, 26, 1711–1730. doi:10.1068/a261711 Stillwell, J. (2006a). Providing access to census-based interaction data: that’s WICID. The Journal of Systemics Cybernetics and Informatics, 4(1), 63–68. Stillwell, J. (2006b). Using WICID (Web-based interface census information data) in the classroom. The Journal of Systemics Cybernetics and Informatics, 4(6), 106–111. Stillwell, J. (2008). Inter-regional migration modelling: a review and assessment. In Poot, J., Waldorf, B., & Van Wissen, L. (Eds.), Migration and Human Capital: Regional and Global Perspectives. Cheltenham, UK: Edward Elgar. Stillwell, J. (2009). Inter-regional migration modelling: a review. In Poot, J., Waldorf, B., & van Wissen, L. (Eds.), Migration and Human Capital: Regional and Global Perspectives. Cheltenham, UK: Edward Elgar. Stillwell, J. C. H. (1978). Interzonal migration: some historical tests of spatial-interaction models. Environment & Planning A, 10(10), 1187–1200. doi:10.1068/a101187 Stillwell, J. C. H., Bell, M., Blake, M., Duke-Williams, O., & Rees, P. (2001). Net migration and migration effectiveness: a comparison between Australia and the United Kingdom, 1976-96. Part 2: age related migration patterns. Journal of Population Research, 18(1), 19–39. doi:10.1007/BF03031953 Stillwell, J. C. H., Duke-Williams, O., & Rees, P. (1995). Time series migration in Britain: the context for 1991 Census analysis. Papers in Regional Science: Journal of the Regional Science Association, 74(4), 341–359. Stillwell, J., & Congdon, P. (Eds.). (1991). Migration Models: Macro and Macro Approaches. London: Belhaven Press.
345
Compilation of References
Stillwell, J., & Duke-Williams, O. (2001). Web-Based Interface to Census Interaction Data (WICID), Final report and demonstration. ESRC/JISC 2001 Census Development Programme. Leeds: Fourth Workshop.
ness: a comparison between Australia and the United Kingdom, 1976-96. Part 1: total migration patterns. Journal of Population Research, 17(1), 17–38. doi:10.1007/ BF03029446
Stillwell, J., & Duke-Williams, O. (2003). A new webbased interface to British census of population origindestination statistics. Environment & Planning A, 35(1), 113–132. doi:10.1068/a35155
Stillwell, J., Duke-Williams, O., & Rees, P. (1995). Time series migration in Britain: the context for 1991 Census analysis. Papers in Regional Science: Journal of the Regional Science Association, 74(4), 341–359.
Stillwell, J., & Duke-Williams, O. (2005). Ethnic population distribution, immigration and internal migration in Britain: What evidence of linkage at the district scale? Paper presented at the British Society for Population Studies, University of Kent, Canterbury.
Stillwell, J., Duke-Williams, O., Feng, Z., & Boyle, P. (2005). Delivering Census Interaction Data to the User: Data Provision and Software Development. Working Paper 05/1, School of Geography, University of Leeds, Leeds.
Stillwell, J., & Duke-Williams, O. (2007). Understanding the 2001 UK census migration and commuting data: the effect of small cell adjustment and problems of comparison with 1991. Journal of the Royal Statistical Society. Series A (General), 170(Part 2), 1–21.
Stillwell, J., Hussain, S., & Norman, P. (2008). The internal migration of ethnic groups in Britain: a study using the census macro and micro data. Paper prepared for the European Association for Population Studies, Barcelona, July.
Stillwell, J., & Duke-Williams, O. (2007). Understanding the 2001 UK census migration and commuting data: the effect of small cell adjustment and problems of comparison with 1991 flow datasets. Journal of the Royal Statistical Society A, 170(2), 425–445. doi:10.1111/j.1467985X.2006.00458.x
Stillwell, J., Rees, P., & Duke-Williams, O. (1996). Migration between NUTS level 2 regions in the United Kingdom. In Rees, P., Stillwell, J., Convey, A., & Kupiszewski, M. (Eds.), Population Migration in the European Union. Chichester, UK: Wiley.
Stillwell, J., & Hussain, S. (2008). Ethnic group migration within Britain during 2000-01: a district level analysis. Working Paper, 08/2, School of Geography, University of Leeds, Leeds Stillwell, J., Bell, M., Blake, M., Duke-Williams, O., & Rees, P. (2000). A comparison of net migrationflows and migration effectiveness in Australia and Britain: Part 1, Total migration patterns. Journal of Population Research, 17(1), 17–41. doi:10.1007/BF03029446 Stillwell, J., Bell, M., Blake, M., Duke-Williams, O., & Rees, P. (2001). A comparison of net migration flows and migration effectiveness in Australia and Britain: Part 2, Age-related migration patterns. Journal of Population Research, 18(1), 19–39. doi:10.1007/BF03031953 Stillwell, J., Bell, M., Blake, M., Duke-Williams, O., & Rees, P. (2000). Net migration and migration effective-
346
Stillwell, J., Rees, P., Eyre, H., & Macgill, J. (2002). Improving the treatment of international migration. In ODPM (2002), Development of a Migration Model (pp. 223-238). London: ODPM. Stouffer, S. A. (1940). Intervening opportunities: a theory relating mobility and distance. American Sociological Review, 5, 845–867. doi:10.2307/2084520 Thomas, R. W., & Huggett, R. J. (1980). Modelling in Geography: A Mathematical Approach. London: Harper and Row. Tobler, W. (1987). Experiments in migration mapping by computer. The American Cartographer, 14(2), 155–163. doi:10.1559/152304087783875273 Tobler, W. R. (1970). A computer model simulation of urban growth in the Detroit region. Economic Geography, 46(2), 234–240. doi:10.2307/143141
Compilation of References
Townsend, A. R., Blakemore, M. J., & Nelson, R. (1987). The Nomis database - availability for users and geographers. Area, 19(1), 43–50.
Vickers, D., & Rees, P. (2006). Introducing the area classification of output areas. Population Trends, 125, 15–29.
Treasury, H. M. BERR & CLG. (2007). Review of Subnational Economic Development and Regeneration. HM Treasury, Department for Business, Enterprise and Regulatory Reform and Communities and Local Government, London.
Vickers, D., & Rees, P. (2007). Creating the UK National Statistics 2001 output area classification. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 170(2), 379–403. doi:10.1111/j.1467985X.2007.00466.x
United Nations Statistics Division. (2006) Statistics and Statistical Methods Publications. http://unstats.un.org/ unsd/pubs/gesgrid.asp?ID=116
Vickers, D., Rees, P., & Birkin, M. (2003). A new classification of UK local authorities using 2001 Census key statistics. Working Paper 03/03, School of Geography, University of Leeds, Leeds.
Van der Gaag, N., van Wissen, L., Rees, P., Stillwell, J., & Kupiszewski, M. (2003). Study of Part and Future Interregional Migration Trends and Patterns within European Countries. In Search of a Generally Applicable Explanatory Model. the Hague: Report for Eurostat, Netherlands Interdisciplinary Demographic Institute Van Imhoff, E., Van der Gaag, N., Van Wissen, L., & Rees, P. (1997). The selection of internal migration models for European regions. International Journal of Population Geography, 3(2), 137–159. doi:10.1002/(SICI)10991220(199706)3:23.0.CO;2-R Van Wissen, L., van der Gaag, N., Rees, P., & Stillwell, J. (2009). In search of a modelling strategy for projecting internal migration in European countries: Demographic versus economic-geographical approaches. In Poot, J., Waldorf, B., & van Wissen, L. (Eds.), Migration and Human Capital (pp. 49–74). Cheltenham, UK: Edward Elgar. Vandeschrick, C. (1992). Le diagramme de Lexis revisité. Population, 92(5), 1241–1262. doi:10.2307/1533940 Vandeschrick, C. (2001). The Lexis diagram, a misnomer. Demographic Research, 4(3), 98–124. Verheij, R. A., Dike van de Mheen, H., de Bakker, D. H., Groenewegen, P. P., & Mackenbach, J. P. (1998). Urban-Rural variations in health in the Netherlands: does selective migration play a part? Journal of Epidemiology and Community Health, 52, 487–493. doi:10.1136/ jech.52.8.487
Vickers, D., Rees, P., & Birkin, M. (2005). Creating the National classification of Census Output Areas: data, methods and results. Working Paper 05/02, School of Geography, University of Leeds, Leeds. Walters, W. H. (2000). Assessing the impact of place characteristics on human migration: the importance of migrants’ intentions and enabling attributes. Area, 32(1), 119–123. doi:10.1111/j.1475-4762.2000.tb00121.x Welton, T. A. (1872). On the effect of migrations in disturbing local rates of mortality, as exemplified in the statistics of London and the surrounding country, for the years 1851-1860. Journal of the Institute of Actuaries, 16, 153. Westefeld, A. (1940). The distance factor in migration. Social Forces, 19, 213–218. doi:10.2307/2571302 Willekens, F. J. (1983). Log-linear modelling of spatial interaction. Papers / Regional Science Association. Regional Science Association. Meeting, 52, 187–205. doi:10.1007/BF01944102 Willekens, F. J., & Baydar, N. (1986). Forecasting placeto-place migration with generalized linear models. In Woods, R., & Rees, P. (Eds.), Population structures and models: Developments in spatial demography (pp. 203–245). London: Allen & Unwin. Williams, M. (2000). Migration and social change in Cornwall 1971-91. In R. Creeser & S. Gleave (Eds.), Migration in England and Wales Using the Longitudinal
347
Compilation of References
Study (pp. 30-39). London: The Stationery Office, ONS Series LS. Williamson, P. (2007). The impact of cell adjustment on the analysis of aggregate census data. Environment & Planning A, 39, 1058–1078. doi:10.1068/a38142 Wilson, A. G. (1967). A statistical theory of spatial distribution models. Journal of Transportation Research, 1, 253–269. doi:10.1016/0041-1647(67)90035-4 Wilson, A. G. (1970). Entropy in Urban and Regional Modelling. London: Pion. Wilson, A. G. (1971). A family of spatial interaction models, and associated developments. Environment and Planning, 3, 1–32. doi:10.1068/a030001 Wilson, A. G. (1972). Papers in Urban and Regional Analysis. London: Pion. Wilson, A. G., & Bennett, R. J. (1985). Mathematical Methods in Human Geography and Planning. Chichester: John Wiley & Sons.
348
Wilson, J. (2009). Exploring the dimensions of school change during primary education in England. In Stillwell, J., Coast, E., & Kneale, D. (Eds.), Fertility, Living Arrangements, Care and Mobility Understanding Population Trends and Processes (Vol. 1, pp. 211–237). Dordrecht, The Netherlands: Springer. doi:10.1007/9781-4020-9682-2_11 Winter, M., & Rushbrook, L. (2003). Literature Review of the English Rural Economy. Report to DEFRA. London: HMSO. Yano, K., Nakaya, T., Fotheringham, A. S., Openshaw, S., & Ishikawa, Y. (2003). A comparison of migration behaviour in Japan and Britain using spatial interaction models. International Journal of Population Geography, 9(5), 419–431. doi:10.1002/ijpg.297 Zipf, G. K. (1946). The P1P2/D hypothesis: on intercity movement of persons. American Sociological Review, 11, 677–686. doi:10.2307/2087063
349
About the Contributors
John Stillwell is Professor of Migration and Regional Development in the School of Geography at the University of Leeds and is Director of CIDER. He is also the national Coordinator of the ESRC’s Understanding Population Trends and Processes (UPTAP) programme (2005-09), overseeing a wide range of demographic projects by researchers in different disciplines across the UK. His primary research interest has always been migration, in particular the analysis and modelling of flows of internal migration in the UK, with a series of publications in leading journals and edited books including Contemporary Research in Population Geography (1989), Migration Models: Macro and Micro Approaches (1990), Migration Processes and Patterns Volume 2: Population Redistribution in the United Kingdom (1992), Population Migration in the European Union (1996). He has also co-edited several books on the use of GIS in planning, most recently Planning Support Systems: Best Practice and New Methods (2009) and he is co-editor of the journal Applied Spatial Analysis and Policy (ASAP). Oliver Duke-Williams completed his PhD in the School of Geography at the University of Leeds and is now Senior Research Fellow and Deputy Director of CIDER, working remotely from his home in Walthamstow, London. His research interests include methods and reasons for collecting and using small area interaction data, the effects of disclosure control on Census and survey data, and dissemination of data sets so as to promote their widespread usage. As part of the ESRC Census Programme, he also directs a research network which explores the potential benefits and implications of distributing census data via data feed APIs rather than the traditional bulk methods used with previous censuses. Adam Dennett studied Geography as an undergraduate at Lancaster University before training as a secondary school teacher at the University of Cambridge. After some years in the teaching profession, he returned to higher education and completed a Masters degree at the University of Leeds where he now remains as a researcher, working full-time for CIDER and studying part-time for a PhD on the development of a migration-based area classification framework. His research interests lie in the quantitative analysis of population; principally internal migration flows. He has recently published papers in Population Trends and Population Space and Place relating to methodological developments in the analysis of internal migration in Britain and he is currently working on a European Spatial Planning Observatory Network project known as DEMIFER (DEmographic and MIgratory Flows affecting European Regions and cities). ***
Copyright © 2010, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
About the Contributors
Martin Bell is a population geographer with core interests in the fields of population mobility and demographic forecasting. He graduated from Flinders and has a PhD from Queensland where he is now Professor. He has just completed six years as Head of the School of Geography, Planning and Environmental Management. He is the Director of the Queensland Centre for Population Research which undertakes pure and applied research and provides education and training in demography and population geography. He has written extensively about migration, most recently on Mobility in the New Millennium: Australians on the Move (2009), and the focus of his current research is on cross-national comparisons of internal migration. Peter Boden is a Research Fellow at the University of Leeds and Director of Edge Analytics Ltd. He is a specialist in population analysis, particularly the impact of migration upon local population estimates and projections. He is a former Director of GMAP Ltd, having spent 15 years delivering bespoke geographical modelling solutions to a range of businesses that included WHSmiths, Asda Walmart, Ford Motor Company, Esso, BP, NS&I and HBOS. More recently, he has worked as a specialist in the credit industry to develop a model of indebtedness for all UK households and within the utility industry on the analysis of consumer debt and its relationship to income deprivation. Paul Boyle is Professor of Human Geography and Head of the School of Geography and Geosciences at the University of St Andrews. He is President of the British Society for Population Studies (BSPS). He directs the Longitudinal Studies Centre - Scotland (LSCS), which has established and continues to maintain and support the Scottish Longitudinal Study (SLS), one of the world’s largest longitudinal datasets for health and social science research. He is co-Director of the recently funded ESRC Centre for Population Change (CPC); co-applicant on the recently funded ESRC Administrative Data Liaison Service (ADLS); and co-applicant on the Wellcome Trust Scottish Health Informatics Programme (SHIP). Paul has particular expertise in record linkage and the use of routinely collected data in health and social science research. He has published widely on demography and epidemiology. Tony Champion is Emeritus Professor of Population Geography at the University of Newcastle upon Tyne. His research interests include urban and regional changes in population distribution and composition, with particular reference to counterurbanization and population deconcentration in developed countries and the policy implications of changes in local population profiles. He led the IUSSP’s Working Group on Urbanization in 1999-2002 and is author or co-author of several books and reports including New Forms of Urbanization: Beyond the Urban-Rural Dichotomy (2004), The Containment of Urban Britain: Retrospect and Prospect (2002), Urban Exodus (1998), The Population of Britain in the 1990s (1996), and Counterurbanization (1989). Mike Coombes has been Professor of Geographic Information since 1998 at CURDS, and a researcher there since 1977. His primary research area is the analysis of flow datasets to represent the changing ways people use local areas, with a particular interest in the commuting patterns and the links between urban and rural areas which in combination make up labour market areas and city regions. Mike recently undertook, for the third decade running, analyses of commuting data to define Travel to Work Areas (TTWAs), the only British official boundaries defined by academics. Mike’s other activities include being a core researcher in the national Spatial Economics Research Centre (from 2008), Regional Studies journal editorial team membership (until 2008), plus appointment to various expert groups advising government Departments and regional authorities. 350
About the Contributors
Zhiqiang Feng has a PhD from Lancaster and is now a Research Fellow in the School of Geography and Geosciences at the University of St Andrews. He also works for the Longitudinal Studies Centre for Scotland (LSCS) and CIDER. His interests are in geography of population, health inequality, longitudinal analysis, migration, and applications of geographical information systems. Robin Flowerdew is Professor of Human Geography at the University of St Andrews in Scotland, where he has been based since 2000. He was born in London and educated at Oxford (BA) and Northwestern (PhD) Universities. Returning to the UK, he worked as a Research Assistant at University College London, studying British census migration data (as well as environmental hazards in Los Angeles and Australian aboriginal demography) and he has been interested in working with interaction data ever since. After moving to a permanent post at Lancaster University in 1976, he continued to work with migration matrices. The statistician Murray Aitken introduced him to Poisson regression analysis, and with Andrew Lovett and Paul Boyle, he has contributed much to the application and development of the method. Other interests include the modifiable areal unit problem and areal interpolation. Martin Frost is Reader in Economic Geography at Birkbeck College, London. For more than thirty years he has specialised in the numerical analysis of geographical data in the support of policy development and evaluation. In recent years he has been part of the Jubilee Line Extension evaluation team, Defra’s Rural Evidence Research Centre, and the Director of Geographical Analyses for the Government’s £21 million National Evaluation of Sure Start. In addition, he has developed a GIS-based approach to the collection and analysis of evidence on the state of the English countryside that has supported the work of both the Countryside Agency and the Commission for Rural Communities. Many of these applications have involved the analysis of commuting data drawn principally from various censuses of population. He has the dubious distinction of having conducted detailed analysis on the commuting data produced by every census since 1966. Corrado Giulietti is currently a PhD student in Economics at the University of Southampton. He is also a Research Assistant for the Southampton Statistical Sciences Research Institute. His research interests are the relationships between immigration and labour markets and migration estimation. He has worked on the economic drivers of migration, the consequences for the labour markets of receiving countries and the economic assimilation of new immigrants. He has also collaborated on an ESRC project entitled ‘Combining migration data in England and Wales’. Kirk Harland is a Research Fellow at the University of Leeds. Both his undergraduate degree and PhD were devoted to the study of computational geography with a particular focus on modelling social systems and the application of advanced spatial analysis techniques. The technical demands of computational geography have prompted Kirk to develop a core set of technical competencies in Geographical Information System (GIS), information management, database design and to become a Sun Certified Java Programmer. He has over ten years experience of analysing and modelling spatial phenomena ranging from change detection in national parks to developing decision support systems for major international corporations. Over the last four years his research interests have been concentrated on planning, assessing and managing the impacts of change in education and health services.
351
About the Contributors
Paul Norman is a Lecturer in Human Geography at the School of Geography, University of Leeds. He is a population and health geographer with a particular interest in time-series analysis of both area and individual-level data derived from census, survey and administrative records. His research includes the development of methods to geographically harmonise small area socio-demographic, morbidity and mortality data to enable analyses of demographic and health change and the use of area typologies to understand migration patterns and resulting health outcomes. Paul’s recent research includes two projects under the ESRC’s Understanding Population Trends and Processes programme, an area-based study of the micro-geography of UK demographic change 1991-2001 and the development of UK coverage sub-national ethnic group population projections. James Raymer joined the University of Southampton in January 2004 after completing his PhD degree in the Department of Geography at the University of Colorado at Boulder. He is currently a Lecturer in Demography in Social Statistics, a division within the School of Social Sciences. His main research interests are migration analysis and estimation and population modelling in the context of inadequate, inconsistent or missing data. He has worked on several projects, including the estimation of international migration flows between countries in the European Union, the projection of age-specific interregional migration in Italy, and the estimation of detailed flows by combining census, survey and registration migration data in England and Wales. In February 2009, he started a five-year project on developing dynamic population models for the UK, as part of the new ESRC Centre for Population Change. Phil Rees is Professor of Population Geography at the University of Leeds. He is one of the UK’s leading authorities on demographic analysis, who developed methods of population accounting and projection for multi-state systems in the 1970s and 1980s. These methods extend conventional demography, to incorporate state to state transitions. From 1982 to 1992, Phil co-ordinated one of the ESRC’s flagship programmes, the Census Programme, which opened up access to secondary census data, both in table and individual formats, for use by academic researchers, free at the point of use. The Agreements negotiated between the ESRC and National Statistical Offices made the UK a data-rich environment for social science research. He was awarded a CBE in recognition of these efforts in 2004. From 2003 to 2007, Phil Rees served as a member of ESRC’s Research Resources Board. He was responsible for the development and first round commissioning of the ESRC’s Understanding Population Trends and Processes (UPTAP) programme and is principal investigator for an UPTAP large grant (2007-9) on Ethnic group population trends and projections for UK local areas. In 2009 he was awarded the Victoria Medal by the Royal Geographical Society in recognition of his contribution to population geography over four decades.
352
353
Index
A administrative sources 2, 18, 19, 23 aggregate data 51, 53, 56, 66, 67 akaike information criteria (AIC) 282 algorithm 2, 24, 229, 233, 234, 241 analytical methods 84 annual population survey(APS) 117 application language 36, 37 archetypes 203 asylum statistics 117 avon total 60
B baseline model 248 best estimate 94 binomial 261, 262, 271, 275, 276, 277 binomial regression 261, 271 birth cohort 4, 5, 13
C cathie marsh centre for census and survey research (CCSR) 136, 148 census 115, 116, 129 census area statistics (CAS) 7, 220, 245 census-based commuting data 214 census coding 96 census data 90, 91, 92, 98, 243, 246, 258 censuses 1, 2, 5, 7, 12, 22, 23, 24 centre for interaction data estimation and research(CIDER) 96, 225, 283 centre for longitudinal study information and user support (CeLSIUS) 141, 148 centroid 75, 265, 272, 273, 277, 278 chi-squared 267, 271
code data 232 cohort studies 12, 13 CommuterView 81 commuting catchment 216 commuting data 2, 6, 7, 12, 18, 23, 29, 38, 41, 46 commuting dataset yielded 230 commuting footprint 219 concentric 243 contiguity variable 265, 268, 272 contracted migrants 275 conurbation 107, 235, 239 correlation statistics 47 counterurbanisation 89, 109 cross-classifying 262 cross-sectional data 133, 140, 141, 147 cross-tabulated 178
D database management system (DBMS) 36, 37, 39 data neutrality 216 data protection act (DPA) 53 datasets 133, 134, 135, 140, 141, 148 decennial census 115 decennial population 196, 197 demographic 134, 135, 136, 140, 142, 148, 265 denominator 264 department for children, schools and families (DCSF) 116 department for environment, food and rural affaires (DEFRA) 212 deprivation 175, 176, 191, 192, 193, 194 deprived quintile 193
Copyright © 2010, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Index
design of the tables 214 destination-constrained model 274 destination population 265, 266, 268, 275 dichotomy 197 disaggregation 54, 56, 62 drivers and vehicle licensing agency (DVLA) 19 dummy variable 265, 268, 272, 274, 277
E economically active 75 economically inactive 75 economic and social research council (ESRC) 148 economic migration 112 ED-level 257 EEA 113, 115, 116, 117 empirical analyses 197, 209 entropy maximising 261, 275 enumeration 243, 247 enumeration district (ED) 94, 140, 247 environmental load 213 ethnic dimension 175 ethnic migration 280, 284, 287, 292 ethnic net migration 176 ethnic proportion 287 Euclidean distance 249 european economic area (EEA) 113 european union (EU) 112 explanatory variables 261, 274, 277
F family health service areas (FSHAs) 14 family practitioner committee areas (FPCAs) 14 flow data 229, 243, 244, 246, 257, 258 Flowmap 81, 87
G general register office for scotland (GROS) 141, 148 genetic algorithm 80 gentrification 202, 206, 208, 209 geo-computation 229 geodemographic 162
354
geographically weighted regression (GWR) 79 geometric centre 272 geo-referencing 58 government office region (GOR) 91, 93, 116, 140, 177, 283 gravity model 265, 266, 267, 268, 269, 270, 277, 278 gross migraproduction rate (gmr) 74 gross reproduction rate 74
H health authority (HA) 97 heterogeneity 58 higher education statistics agency(HESA) 160 higher managerial and professional (HMP) 200, 202 high in-migration 254 high out-migration 254 homoscedasticity 270 hospital episode statistics (HES) 17 household SAR 11
I individual anonymity 213 individual SAR 11 inflow 69, 71, 72, 73, 98, 100, 102, 254 inflow/outflow ratios 163 in-migration 7, 10, 12, 71, 72, 74, 75, 81, 97, 98, 102, 157, 161, 163, 165, 166, 168, 169, 175, 184, 185, 186, 194 in/out ratio 164, 201, 202, 203, 204, 205, 206, 209 interaction data 1, 2, 8, 9, 11, 12, 13, 17, 18, 19, 22, 23, 24, 25, 26, 30, 31, 32, 33, 34, 35, 36, 38, 39, 42, 46, 49, 50, 51, 52, 54, 55, 56, 58, 62, 63, 64, 66, 67, 68, 69, 70, 75, 79, 85, 86, 89, 90, 91, 93, 94, 96, 109, 110, 133, 147 interaction matrix 70, 85 inter-area 70, 71, 72, 74, 78 inter-censal migration 109 inter-district flows 159, 164 inter-district migration 275 inter-district movement 158 inter-LAD flows 244
Index
internal migration 97, 110, 119, 153, 157, 158, 164, 171, 175, 176, 192, 194, 195 internal migration data 74 international migration 111, 112, 113, 115, 116, 118, 119, 121, 124, 128, 129, 130 international passenger survey (IPS) 2, 20, 114, 115 interpolation 245, 246, 248, 259 interregional migration 280, 281, 282, 284, 285, 286, 287, 288, 290, 291, 292 inter-urban migration 274, 279 inter-ward 244, 247, 249, 250, 251, 258 inter-ward flow models 251 inter-ward level 244 inter-zonal flows 80 inter-zonal migrants 273 inter-zone 158 inter-zone migration 273 intra-area flow 44, 70, 71, 75 intra-area migration 74, 158 intra-district flows 159, 164 intra-district migration 158, 159 intra-regional migration 206 intra-ward 244, 247 intrazonal migrants 273 intra-zone 158 item non-response 56
L labour force survey (LFS) 2, 19, 116, 117, 122, 124 labour market 227, 228, 230, 231, 233, 239, 240 labour market structure 220 labour migrants 275 less in-migration 278 limiting long-term illness (LLTI) 137, 143, 144, 145 linear predictor 261, 264, 265 local authority district (LAD) 7, 20 local education authority (LEA) 16, 116 local government district (LGD) 16 logarithm 55, 264, 265, 268, 270, 271 logarithmically 261, 271
logit model 277 log-linear model 79, 81 log-linear statistical model 79 longitudinal data 133, 134, 142, 147 longitudinal study (LS) 12, 30, 133, 135, 140 longitudinal study(LS) 133, 134, 135, 140, 141, 147 long-term migrant 114, 127, 128 lower-level super output areas (LLSOAs) 232 lower professional and managerial (LMP) 200, 205 low out-migration 278
M MapBasic 83 MapInfo 81, 83 marginal totals 71 MATPAC (the matrix analysis package) 33, 34, 35, 50 metadata 31, 35, 36, 38, 39, 48, 49 metadata framework 31, 35 meta_datasets 39 meta_geog 39 methodological 162, 201, 209, 214, 219, 243, 249, 251, 257, 258, 260 micro-data 53, 56, 57, 69, 70, 82, 85, 88, 133, 134, 136, 140, 145, 147, 150 migration data 7, 13, 18, 19, 22, 25, 34, 35, 36, 46, 51, 55, 62, 64 migration matrix 199, 200 migration patterns 153, 154, 156, 164, 173, 174, 278 migration propensities 155, 156, 157, 159, 160, 172, 175, 176, 194 modelling 10, 14, 25, 26, 69, 70, 76, 79, 80, 81, 84, 85, 86, 87, 88, 96, 207, 246, 247, 248, 250, 257, 259, 261, 262, 263, 265, 270, 273, 274, 275, 277, 278, 279 modifiable areal unit problem 273, 277 movement data 4 moving group 6 moving group reference person (MGRP) 6, 200
355
Index
multiplicative component model 281, 282, 291 multivariate cross-tabulations 57 mutual independence model 282, 289
N National Health Service Central Register (NHSCR) 66 national insurance number (NINo) 18, 24, 117, 118, 119, 120, 121, 122, 124, 125, 126, 127, 128, 129 National Online Manpower Information System (Nomis) 34, 35, 50 national travel survey (NTS) 218 net inflows 184 net migration 71, 72, 73, 74, 75, 77, 82, 88, 98, 99, 100, 101, 102, 104, 106, 107, 108, 109, 153, 157, 160, 161, 163, 164, 165, 166, 167, 169, 170, 171, 175, 176, 182, 183, 184, 185, 1 86, 188, 189, 190, 191, 192, 193, 19 4, 252, 254 new migrant databank (NMD) 111, 112, 118, 119, 121, 130 non-truncated model 276 non-zero values 51 northern ireland longitudinal study (NILS) 141 northern ireland statistics and research agency (NISRA) 141, 148 null model 248, 264, 267, 275
O office for national statistics (ONS) 133, 136, 148, 176, 212, 213, 227, 240, 263 offset 267, 270, 271 ordinary least squares (OLS) 262 origin-constrained model 274 origin-destination data 32 origin-destination flow 69, 70, 77, 242 origin-destination interaction 281, 284, 285 origin population 265, 270, 271, 275 outflow 69, 70, 71, 72, 73, 98, 109, 254 out-migration 7, 71, 72, 73, 74, 75, 81, 97 , 98, 101, 102, 157, 161, 163, 164, 165, 166, 169, 183, 184, 185, 186, 194
356
output areas (OAs) 93 output table 53, 57 overdispersion 271, 275, 276, 277
P pakistani and other south asia (POSA) 179, 180, 181, 182, 183, 184, 185, 186, 188, 191 patient register data system (PRDS) 97 pensionable age 156, 163, 164 personal census information 54 point-in-polygon 247 Poisson 261, 262, 263, 264, 265, 266, 267, 268, 270, 271, 273, 274, 275, 276, 277, 278, 279 poisson regression analysis 262 polycentric labour 230 population at risk (PAR) 46, 48, 71, 72, 165, 200, 203, 224 population concentration 175, 191 population data 34, 46 post-aggregation 57, 58, 63 post-labour force 155, 170 pre-aggregation 57, 63 pre-labour force 155, 170 primary suppression 58, 59 primary unit data (PUD) 14 primary urban areas (PUA) 198 principle of entropy 275 propensity 154, 155, 157, 158, 159, 160, 171 pro-rata 96 pupil census 116 pupil level annual school census (PLASC) 16, 17, 18, 24, 27
Q quantitative revolution 229 quick selection 41
R real data 271 record swapping 214, 215 regression models 262, 278 retirement peak 155
Index
S samples of anonymised record (SARs) 11, 133, 134, 135, 136, 139, 140, 141, 143, 144, 145, 147, 148, 149, 150, 176 saturated 282 scottish longitudinal study(SLS) 140, 149 SDC 92 secondary suppression 59 sectoral pattern 278 sensitive data 53 SIRs 145, 146 small area microdata (SAM) 136, 140 small cell adjustment mechanism (SCAM) 9, 10, 57, 62, 63, 64, 65, 66, 67, 177, 179, 183, 193, 199, 200, 201, 209, 210, 263, 278 small cell adjustment method (SCAM) 9, 57, 63, 156, 177, 216, 232 SMS 198, 199, 200, 201, 203, 206, 209, 210 socio-demographic 243 socio-economic 102, 134, 142, 153, 154, 156, 162, 196, 197, 198, 199, 200, 201, 203, 204, 205, 206, 207, 208, 209, 210, 214, 220, 221, 245, 252 socio-economic data 229 spatial 33, 34, 35, 37, 42, 43, 44, 49, 69, 70, 71, 73, 78, 80, 81, 85, 86, 87, 88, 89, 91, 92, 93, 94, 96, 98, 100, 103, 104, 108, 109, 110 spatial interaction models 274, 278, 279 spatial pattern 188 special migration statistics (SMS) 8, 134, 156, 176, 177, 198, 283 special travel statistics (STS) 8, 62, 242 special workplace statistics (SWS) 8, 227 speculative 275 standardised illness ratios (SIRs) 145 standardised mortality ratios (SMRs) 145 Standard Region 60, 61 standard table (ST) 7, 245, 251 statistical disclosure control (SDC) 56, 92, 214 subarea module 33 sub-optimal 229 sub-regional economies 228
suppression 57, 58, 59, 60, 61, 62, 65, 214 systematic variation 267
T total international migration (TIM) 116, 118, 119, 120, 121, 122, 124, 125, 126, 127, 128, 130 transition data 4, 14, 22 transpose 33 travel-to-work areas (TTWAs) 227
U unconstrained model 274, 275 underdispersion 271, 278 unique pupil number (UPN) 16 unit non-response 56 unsaturated log-linear model 280, 288, 289 urban decentralisation 202 urban gentrification 202
V vacancy rate 265 vector 39 visualisation 69, 70, 81, 85, 86
W ward-level models 247, 250 web-based interface to census interaction data (WICID) 9, 22, 31, 32, 35, 36, 37, 38, 39, 40, 41, 45, 46, 47, 48, 49, 50, 225, 258, 260 web server 36, 37 workers registration scheme(WRS) 113, 117 work permit 113, 115, 117 WRS profile 121
Z zinb (zero-inflated negative binomial) 277 zonal system 273 zonation 198, 204, 210 zonelist 33
357