Foundations of Databases

2020-02-27 57浏览

1.Contents Preface PART A 1 2 3 vii ANTECHAMBER 1 Database Systems 3 1.1 1.2 1.3 1.4 1.5 3 5 7 7 8 9 The Main Principles Functionalities Complexity and Diversity Past and Future Ties with This Book Bibliographic Notes Theoretical Background 10 2.1 2.2 2.3 10 13 20 Some Basics Languages, Computability, and Complexity Basics from Logic The Relational Model 28 3.1 3.2 3.3 3.4 29 31 32 34 34 The Structure of the Relational Model Named versus Unnamed Perspectives Conventional versus Logic Programming Perspectives Notation Bibliographic Notes xiii
2.xiv Contents PART B 4 5 6 7 PART C 8BASICS:RELATIONAL QUERY LANGUAGES 35 Conjunctive Queries 37 4.1 4.2 4.3 4.4 4.5 38 40 48 52 61 64 65 Getting Started Logic-Based Perspectives Query Composition and Views Algebraic Perspectives Adding Union Bibliographic Notes Exercises AddingNegation:Algebra and Calculus 70 5.1 5.2 5.3 5.4 5.5 5.6 71 72 73 81 91 93 96 98 The Relational Algebras Nonrecursive Datalog with Negation The Relational Calculus Syntactic Restrictions for Domain Independence Aggregate FunctionsDigression:Finite Representations of Infinite Databases Bibliographic Notes Exercises Static Analysis and Optimization 105 6.1 6.2 6.3 6.4 106 115 122 126 134 136 Issues in Practical Query Optimization Global Optimization Static Analysis of the Relational Calculus Computing with Acyclic Joins Bibliographic Notes Exercises Notes on Practical Languages 142 7.1 7.2 7.3 142 149 152 154 154SQL:The Structured Query Language Query-by-Example and Microsoft Access Confronting the Real World Bibliographic Notes Exercises CONSTRAINTS 157 Functional and Join Dependency 159 8.1 8.2 8.3 159 163 169 Motivation Functional and Key Dependencies Join and Multivalued Dependencies
3.Contents 8.4 9 10 11 PART D 12 13 The Chase Bibliographic Notes Exercises xv 173 185 186 Inclusion Dependency 192 9.1 9.2 9.3 9.4 192 197 202 207 211 211 Inclusion Dependency in Isolation Finite versus Infinite Implication Nonaxiomatizability of fd’s + ind’s Restricted Kinds of Inclusion Dependency Bibliographic Notes Exercises A Larger Perspective 216 10.1 10.2 10.3 10.4 A Unifying Framework The Chase Revisited Axiomatization An Algebraic Perspective Bibliographic Notes Exercises 217 220 226 228 233 235 Design and Dependencies 240 11.1 11.2 11.3 242 251 260 264 266 Semantic Data Models Normal Forms Universal Relation Assumption Bibliographic Notes Exercises DATALOG AND RECURSION 271 Datalog 273 12.1 12.2 12.3 12.4 12.5 276 278 282 286 300 304 306 Syntax of Datalog Model-Theoretic Semantics Fixpoint Semantics Proof-Theoretic Approach Static Program Analysis Bibliographic Notes Exercises Evaluation of Datalog 311 13.1 13.2 13.3 312 316 324 Seminaive Evaluation Top-Down Techniques Magic
4.xvi Contents 13.4 14 15 PART E 16 17 18 Two Improvements Bibliographic Notes Exercises 327 335 337 Recursion and Negation 342 14.1 14.2 14.3 14.4 14.5 344 347 355 360 368 369 370 Algebra + While Calculus + Fixpoint Datalog with Negation Equivalence Recursion in Practical Languages Bibliographic Notes Exercises Negation in Datalog 374 15.1 The Basic Problem 15.2 Stratified Semantics 15.3 Well-Founded Semantics 15.4 Expressive Power 15.5 Negation as Failure in Brief Bibliographic Notes Exercises 374 377 385 397 406 408 410 EXPRESSIVENESS AND COMPLEXITY 415 Sizing Up Languages 417 16.1 16.2 16.3 Queries Complexity of Queries Languages and Complexity Bibliographic Notes Exercises 417 422 423 425 426 First Order, Fixpoint, and While 429 17.1 Complexity of First-Order Queries 17.2 Expressiveness of First-Order Queries 17.3 Fixpoint and While Queries 17.4 The Impact of Order Bibliographic Notes Exercises 430 433 437 446 457 459 Highly Expressive Languages 466 18.1 WhileN —while with Arithmetic 18.2 Whilenew — while with New Values 467 469
5.Contents 18.3 PART F 19 20 21 22 Whileuty—An Untyped Extension of while Bibliographic Notes Exercises xvii 475 479 481 FINALE 485 Incomplete Information 487 19.1 19.2 19.3 19.4 19.5 488 490 493 499 501 504 506 Warm-Up Weak Representation Systems Conditional Tables The Complexity of Nulls Other Approaches Bibliographic Notes Exercises Complex Values 508 20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 511 514 519 523 526 531 534 536 538 539 Complex Value Databases The Algebra The Calculus Examples Equivalence Theorems Fixpoint and Deduction Expressive Power and Complexity A Practical Query Language for Complex Values Bibliographic Notes Exercises Object Databases 542 21.1 21.2 21.3 21.4 21.5 543 547 556 563 571 573 575 Informal Presentation Formal Definition of an OODB Model Languages for OODB Queries Languages for Methods Further Issues for OODBs Bibliographic Notes Exercises Dynamic Aspects 579 22.1 22.2 22.3 22.4 22.5 580 584 586 593 600 Update Languages Transactional Schemas Updating Views and Deductive Databases Updating Incomplete Information Active Databases
6.xviii Contents 22.6 Temporal Databases and Constraints Bibliographic Notes Exercises 606 613 615 Bibliography 621 Symbol Index 659 Index 661
7.1Alice:Vittorio:Sergio:Riccardo:Database Systems I thought this was a theory book. Yes, but good theory needs the big picture. Besides, what will you tell your grandfather when he asks what you study? You can’t tell him that you’re studying the fundamental implications of genericity in database queries. C omputers are now used in almost all aspects of human activity. One of their main uses is to manage information, which in some cases involves simply holding data for future retrieval and in other cases serving as the backbone for managing the life cycle of complex financial or engineering processes. A large amount of data stored in a computer is called a database. The basic software that supports the management of this data is called a database management system (dbms). The dbms is typically accompanied by a large and evergrowing body of application software that accesses and modifies the stored information. The primary focus in this book is to present part of the theory underlying the design and use of these systems. This preliminary chapter briefly reviews the field of database systems to indicate the larger context that has led to this theory. 1.1 The Main Principles Database systems can be viewed as mediators between human beings who want to use data and physical devices that hold it (see Fig. 1.1). Early database management was based on explicit usage of file systems and customized application software. Gradually, principles and mechanisms were developed that insulated database users from the details of the physical implementation. In the late 1960s, the first major step in this direction was the development of three-level architecture. This architecture separated database functionalities into physical, logical, and external levels. (See Fig. 1.2. The three views represent various ways of looking at thedatabase:multirelations, universal relation interface, and graphical interface.) The separation of the logical definition of data from its physical implementation is central to the field of databases. One of the major research directions in the field has been the development and study of abstract, human-oriented models and interfaces for specifying the structure of stored data and for manipulating it. These models permit the user to concentrate on a logical representation of data that resembles his or her vision of the reality modeled by the data much more closely than the physical representation. 3
8.4 Database Systems DBMS Figure 1.1: Database as mediator between humans and data Several logical data models have been developed, including the hierarchical, network, relational, and object oriented. These include primarily a data definition language (DDL) for specifying the structural aspects of the data and a data manipulation language (DML) for accessing and updating it. The separation of the logical from the physical has resulted in an extraordinary increase in database usability and programmer productivity. Another benefit of this separation is that many aspects of the physical implementation may be changed without having to modify the abstract vision of the database. This substantially reduces the need to change existing application programs or retrain users. The separation of the logical and physical levels of a database system is usually called the data independence principle. This is arguably the most important distinction between file systems and database systems. The second separation in the architecture, between external and logical levels, is also important. It permits different perspectives, or views, on the database that are tailored to specific needs. Views hide irrelevant information and restructure data that is retained. Such views may be simple, as in the case of automatic teller machines, or highly intricate, as in the case of computer-aided design systems. A major issue connected with both separations in the architecture is the trade-off between human convenience and reasonable performance. For example, the separation between logical and physical means that the system must compile queries and updates directed to the logical representation into “real” programs. Indeed, the use of the relational model became widespread only when query optimization techniques made it feasible. More generally, as the field of physical database optimization has matured, logical models have become increasingly remote from physical storage. Developments in hardware (e.g., large and fast memories) are also influencing the field a great deal by continually changing the limits of feasibility.
9.1.2 Functionalities View 1 External Level View 2 5 View 3 Logical Level Physical Level Figure 1.2: 1.2 Three-level architecture of database systems Functionalities Modern dbms’s include a broad array of functionalities, ranging from the very physical to the relatively abstract. Some functionalities, such as database recovery, can largely be ignored by almost all users. Others (even among the most physical ones, such as indexing) are presented to application programmers in abstracted ways. The primary functionalities of dbms’s are asfollows:Secondary storagemanagement:The goal of dbms’s is the management of large amounts of shared data. By large we mean that the data is too big to fit in main memory. Thus an essential task of these systems is the management of secondary storage, which involves an array of techniques such as indexing, clustering, and resource allocation.Persistence:Data should be persistent (i.e., it should survive the termination of a particular database application so that it may be reused later). This is a clear divergence from standard programming, in which a data structure must be coded in a file to live beyond the execution of an application. Persistent programming languages (e.g., persistent C++) are now emerging to overcome this limitation of programming languages.
10.6 Database Systems Concurrencycontrol:Data is shared. The system must support simultaneous access to shared information in a harmonious environment that controls access conflicts and presents a coherent database state to each user. This has led to important notions such as transaction and serializability and to techniques such as two-phase locking that ensure serializability. Dataprotection:The database is an invaluable source of information that must be protected against human and application program errors, computer failures, and human misuse. Integrity checking mechanisms focus on preventing inconsistencies in the stored data resulting, for example, from faulty update requests. Database recovery and backup protocols guard against hardware failures, primarily by maintaining snapshots of previous database states and logs of transactions in progress. Finally, security control mechanisms prevent classes of users from accessing and/or changing sensitive information. Human-machineinterface:This involves a wide variety of features, generally revolving around the logical representation of data. Most concretely, this encompasses DDLs and DMLs, including both those having a traditional linear format and the emerging visual interfaces incorporated in so-called fourth-generation languages. Graphically based tools for database installation and design are popular.Distribution:In many applications, information resides in distinct locations. Even within a local enterprise, it is common to find interrelated information spread across several databases, either for historical reasons or to keep each database within manageable size. These databases may be supported by different systems (interoperability) and based on distinct models (heterogeneity). The task of providing transparent access to multiple systems is a major research topic of the 1990s. Compilation andoptimization:A major task of database systems is the translation of the requests against the external and logical levels into executable programs. This usually involves one or more compilation steps and intensive optimization so that performance is not degraded by the convenience of using more friendly interfaces. Some of these features concern primarily the physical datalevel:concurrency control, recovery, and secondary storage management. Others, such as optimization, are spread across the three levels. Database theory and more generally, database models have focused primarily on the description of data and on querying facilities. The support for designing application software, which often constitutes a large component of databases in the field, has generally been overlooked by the database research community. In relational systems applications can be written in C and extended with embedded SQL (the standard relational query language) commands for accessing the database. Unfortunately there is a significant distance between the paradigms of C and SQL. The same can be said to a certain extent about fourth-generation languages. Modern approaches to improving application programmer productivity, such as object-oriented or active databases, are being investigated.
11.1.4 Past and Future 1.3 7 Complexity and Diversity In addition to supporting diverse functionalities, the field of databases must address a broad variety of uses, styles, and physical platforms. Examples of this variety include thefollowing:Applications:Financial, personnel, inventory, sales, engineering design, manufacturing control, personal information, etc.Users:Application programmers and software, customer service representatives, secretaries, database administrators (dba’s), computer gurus, other databases, expert systems, etc. Accessmodes:Linear and graphical data manipulation languages, special purpose graphical interfaces, data entry, report generation, etc. Logicalmodels:The most prominent of these are the network, hierarchical, relational, and object-oriented models; and there are variations in each model as implemented by various vendors.Platforms:Variations in host programming languages, computing hardware and operating systems, secondary storage devices (including conventional disks, optical disks, tape), networks, etc. Both the quality and quantity of variety compounds the complexity of modern dbms’s, which attempt to support as much diversity as possible. Another factor contributing to the complexity of database systems is their longevity. Although some databases are used by a single person or a handful of users for a year or less, many organizations are using databases implemented over a decade ago. Over the years, layers of application software with intricate interdependencies have been developed for these “legacy” systems. It is difficult to modernize or replace these databases because of the tremendous volume of application software that uses them on a routine basis. 1.4 Past and Future After the advent of the three-level architecture, the field of databases has become increasingly abstract, moving away from physical storage devices toward human models of information organization. Early dbms’s were based on the network and hierarchical models. Both provide some logical organization of data (in graphs and trees), but these representations closely mirror the physical storage of the data. Furthermore, the DMLs for these are primitive because they focus primarily on navigation through the physically stored data. In the 1970s, Codd’s relational model revolutionized the field. In this model, humans view the data as organized in relations (tables), and more “declarative” languages are provided for data access. Indexes and other mechanisms for maintaining the interconnection between data are largely hidden from users. The approach became increasingly accepted as implementation and optimization techniques could provide reasonable response times in spite of the distance between logical and physical data organization. The relational model also provided the initial basis for the development of a mathematical investigation of databases, largely because it bridges the gap between data modeling and mathematical logic.
12.8 Database Systems Historically dbms’s were biased toward business applications, and the relational model best fitted the needs. However, the requirements for the management of large, shared amounts of data were also felt in a variety of fields, such as computer-aided design and expert systems. These new applications require more in terms of structures (more complex than relations), control (more dynamic environments), and intelligence (incorporation of knowledge). They have generated research and developments at the border of other fields. Perhaps the most important developments are thefollowing:Object-orienteddatabases:'>databases: