Each month, the TSC examines a key emerging technology or its use. This time, we consider an aspect of object technology in databases.
By Sukan Makmuri
As adoption of object technology (OT) accelerates, object persistence presents additional challenges. Persistence is the ability of objects to live beyond the lifetime of the program that created them. As competition and evolving technology change business needs, thus requiring faster development of solutions, object-oriented (OO) solutions promise responsiveness.
However, most companies still have their mission-critical data in non-object-oriented data stores: file-based, hierarchical, networked or relational databases. Introducing OO approaches to these existing components creates a fundamental mismatch between the object model and its storage. In implementing object persistence, one of the most significant issues is integration with relational database management systems (RDBMSs), which currently are the primary choice of data storage for new applications.
Early adopters of OT have had limited options for integrating their applications with existing data because of the mismatch between the object model and the non-OO data store. The same information is represented in different ways in two or more data stores, requiring some translation when moving the data from one store to another. The two major factors causing this problem are the basic mismatch of paradigms and the difference in programming paradigms.
The paradigm mismatch may cause a loss of information. The cornerstones of OO approaches (inheritance, polymorphism and other concepts) are not readily supported in relational databases. As a case in point, type extension--required to support inheritance--enables an OO system to add new behaviors to an existing system. This feature facilitates reuse by creating new objects based on predefined objects and data types. For example, an object called Employee can be defined to be a subclass of a Person object, and a Manager object a subclass of an Employee object. An employee inherits a person's attributes and may have attributes (data values held by objects) and methods (functions or transformations that may be applied to or by objects) in addition to or which override that of a person. Because there is no straightforward way to represent these features in a non-OO environment, information (such as the inability to represent certain relationships, containment or inheritance) may be lost.
A similar situation may occur with polymorphism (the ability of different objects to receive the same message and behave in different ways). For example, an Employee object can have a method RAISE_COMPENSATION consisting of the business rule RAISE_FACTOR * SALARY. A Person object does not have this method, and a Manager object overrides it with the rule 1.5 * RAISE_FACTOR * SALARY. A single message RAISE_COMPENSATION(.10) sent to a person, an employee and a manager would yield no changes, a 10 percent increase and a 15 percent increase, respectively. Again, there is no straightforward RDBMS mechanism to support this feature.
As mentioned, the second problematic area is a difference in programming paradigms. For example, there is a difference between a set-oriented language such as Structured Query Language (SQL) and an object-oriented language such as C++ or Smalltalk. An OO language operates on messages that an object receives. A set-oriented language bases its operation on relational algebra and operates on a set of rows in a set of tables. The navigational model differs as well; non-OO environments do not support some object navigation models, such as collections and lists.
Currently, application developers often use stopgap solutions to attain object persistence. A typical means is to create wrappers (data access classes) using an application programming interface (API) to manipulate and manage the in-memory translation between the object representation of the data and its RDBMS format. Critics of this approach contend that the translation from relational to object paradigm, and vice versa, is similar to disassembling a car for overnight storage in a garage and reassembling it before driving it again. This effect is not so pronounced as increasingly powerful computing systems proliferate; the performance hit encountered during the translation from relational to OT is less visible.
Relational databases have proven themselves over some 15 years of industrial exposure. Tremendous improvement in performance, control, concurrency and rollback features for multiuser data management make this an attractive data storage approach.
Current RDBMS releases have powerful engines capable of sophisticated data partitioning and distributed data management, leveraging the full power of symmetric multiprocessors (SMP) and massively parallel processors (MPP), and supporting demanding operational and analytic needs, such as data warehousing. However, it takes a lot of programming effort to convert between the object-oriented and relational models, and to emulate OO features not readily available in the RDBMS. Methods may have to be implemented in a combination of stored procedures, triggers or application code. Even then, there will still be limitations in supporting the complete object-oriented approach. On the upside, major RDBMSs are widely used to provide persistence for object-based front-end application development tools, such as PowerBuilder and VisualBasic.
This approach is preferred if objects are infrequently retrieved from storage and manipulated mostly in the cache. Cursors may be used to iterate over the cached objects. However, bulk data manipulation may be handled best via the native interface of the RDBMS. The relational approach also is preferred for applications that manage simple to moderately complex objects. RDBMSs will continue to be the preferred storage media for newer applications, especially since most RDBMS vendors are providing extensions that reduce the technology mismatch (see below).
Beyond the RDBMS approach lie solutions that exploit emerging database technologies: the object and object/relational database management system (ODBMS and ORDBMS).
ODBMSs have been considered the automatic choice for object persistence, because they provide the least mismatch between the object model and object persistence. They are especially capable of storing objects that have complex relationships, such as computer-aided design/ manufacture (CAD/CAM) objects. They support all prominent OO features. ODBMSs started becoming available as products in the late 1980s but have never captured extensive market share or the imaginations of users. They have been improved over the years but still do not readily provide an efficient way to store oblique data and objects, such as .gif images.
The ODBMS approach may be overkill for applications that use simple objects. Recently, system robustness and performance for bulk operation have improved, making this option attractive for applications that manage many objects. Some ODBMS products are embedded in other products but are not widely used to provide object persistence by some front-end OO tools.
The Object Data Management Group (ODMG), based in Burnsville, MN, is a consortium of ODBMS vendors and others working on standards to allow the portability of customer application software across ODBMS products. It has published ODMG-93, a standards proposal that has been embraced by the Object Management Group (OMG) of Cambridge, MA, the leading OT consortium. ODMG-93 includes object definition and manipulation languages that are supported by several ODBMSs. Thus, using an ODBMS may also require learning these new SQL-like languages. Most ODBMSs support a C++ API, while some also support Smalltalk.
Although ODBMSs have not enjoyed the explosion once predicted for them, they will continue to grow and play a key role for applications with complex objects. More ODBMSs may also be integrated with distributed object computing application development and execution environments.
Object/relational databases (ORDBMSs) have been touted as the trend for future data storage mechanisms. They are primarily RDBMSs with object extensions. That is, they provide the scalability and performance of an RDBMS as well as efficient management of oblique or binary objects. This is in contrast to the binary large object (BLOB) currently supported by several RDBMSs. Some implementations of BLOBs do not support binary data and may be implemented in character sets such as 64K field characters.
The ORDBMS is preferred when applications have to manipulate oblique objects while needing the infrastructure capability of an RDBMS. Imaging and workflow applications may be able to use the ORDBMS as a central repository. As the newest of the database types, the ORDBMS seems poised to address the explosive growth of imaging requirements and should play a key role for the next few years.
There are many challenges and choices for providing object persistence. Users should plan and choose the proper approach based on their systems requirements--such as complexity of data types and access patterns--and the optimum use of their existing data investments.
Sukan Makmuri is a vice president at Bank of America and a member of the UniForum Technical Steering Committee. He can be reached at sukan@rahul.net.