This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
is used to indicate the start of a new paragraph. However, many codes also have a corresponding closing or end tag, which is demonstrated by the use of a forward slash / sign after the less than symbol. So the end of a paragraph would be encoded as. Elements may also contain attributes and values. For example, the element <pause dur=4> could be used in a spoken transcription to indicate the occurrence of a pause during speech, the duration being 4 seconds. Here, the attribute is dur (duration) and its value is 4 (seconds). Different forms of SGML have been employed for a range of purposes. So web pages are encoded in HTML (Hyper Text Markup Language) which uses a predefined set of codes that are based on the general SGML rules. For example, bold print is specified in HTML with the code pair and . See Bryan (1988) and Goldfarb (1990) for more information about SGML. The Text Encoding Initiative (TEI) is a related system, developed in the mid-1990s, which specifies a set of SGML codes which are to be specifically used for different types of text mark up (including written and spoken varieties).12 In addition, XML (extensible Markup Language) allows users to develop their own codes for different types of data. XML is Unicode-compliant, and in ensuring that its users strictly adhere to its rules and syntax, it is rather less forgiving of inconsistencies than HTML. There is no reason why corpus builders must adopt some form of SGML if they encode their corpora, but it is worth knowing about the existence of such codes and what they look like in case they are encountered in other corpora. Also, if an annotation scheme is required, then it seems a shame to have to reinvent the wheel, when a perfectly good set of standards already exists. Finally, corpus analysis packages like WordSmith tend to be capable of handling SGML codes, but they may be less equipped to deal with an ad hoc coding system created by a researcher working alone. 39