Data Persistence¶
Context and Problem Statement¶
In computer science, persistence refers to the characteristic of state of a system that outlives (persists more than) the process that created it. This is achieved in practice by storing the state as data in computer data storage. Programs have to transfer data to and from storage devices and have to provide mappings from the native programming-language data structures to the storage device data structures.
There are three types of database that are mostly used:
- relational databases
Relational databases have been around since the 1970s. The name comes from the way that data is stored in multiple, related tables. Within the tables, data is stored in rows and columns. The relational database management system (RDBMS) is the program that allows you to create, update, and administer a relational database. Structured Query Language (SQL) is the most common language for reading, creating, updating and deleting data. Relational databases are very reliable. They are compliant with ACID (Atomicity, Consistency, Isolation, Durability), which is a standard set of properties for reliable database transactions. Relational databases work well with structured data. Organizations that have a lot of unstructured or semi-structured data should not be considering a relational database.
- nosql databases
NoSQL is a broad category that includes any database that doesn’t use SQL as its primary data access language. These types of databases are also sometimes referred to as non-relational databases. Unlike in relational databases, data in a NoSQL database doesn’t have to conform to a pre-defined schema, so these types of databases are great for organizations seeking to store unstructured or semi-structured data.
- graph databases
Graph databases are a type of NoSQL database that are based on graph theory. Graph-Oriented Database Management Systems (DBMS) software is designed to identify and work with the connections between data points. Therefore graph databases are often used to analyze the relationships between heterogeneous data points.
Object-Relational Mapping (ORM)
Object-relational mapping (ORM) is a programming technique in which a metadata descriptor is used to connect object code to a relational database. Object code is written in object-oriented programming (OOP) languages such as Java or C#. ORM converts data between type systems that are unable to coexist within relational databases and OOP languages.
An object-relational mapper provides an object-oriented layer between relational databases and object-oriented programming languages without having to write SQL queries. It standardizes interfaces reducing boilerplate and speeding development time.
Object-oriented programming includes many states and codes in a format that is complex to understand and interpret. ORMs translate this data and create a structured map to help developers understand the underlying database structure. The mapping explains how objects are related to different tables. ORMs use this information to convert data between tables and generate the SQL code for a relational database to insert, update, create and delete data in response to changes the application makes to the data object. Once written, the ORM mapping will manage the application’s data needs and you will not need to write any more low-level code.
ORMs create a model of the object-oriented program with a high-level of abstraction. In other words, it makes a level of logic without the underlying details of the code. Mapping describes the relationship between an object and the data without knowing how the data is structured. The model can then be used to connect the application with the SQL code needed to manage data activities. This “plumbing” type of code does not have to be rewritten, saving the developer a tremendous amount of time.
Examples: Entity Framework, NHibernate, Hibernate, SQLAlchemy
Decision Drivers¶
- availability, data security, data consistency and performance constrains
- constrains from existing application(s)
- backup and recovery policy
- data center or hosting options
- budget for different phases, including licensing and ongoing maintenance and upgrade costs during your project's lifetime
Considered Options¶
- MSSQL - Microsoft SQL Server 2022 - the most Azure-enabled release yet, with continued performance, security, and availability innovation. Microsoft SQL Server Standard Edition delivers fully featured database capabilities for mid-tier applications and data marts.
- PostgreSQL - PostgreSQL is an advanced, enterprise-class, and open-source relational database system. PostgreSQL supports both SQL (relational) and JSON (non-relational) querying.
Decision Outcome¶
Chosen option: "PostgreSQL" because,
- It is free.
Pros and Cons of the Options¶
MSSQL¶
Relational database management system:
- Commercial product from Microsoft and can be licensed on a per-core model or server and client access level (CAL) model.
- SQL Server offers clustered and non-clustered indexes. Clustered indexes sort and data rows in the table or view based on key values (columns in the index definition).
- SQL Server is case insensitive by default. The case sensitivity can be changed by adjusting the SQL Server’s collation settings. The collation settings for case sensitivity can be set at the database or column level.
- SQL Server has the geography data type for storing geographic spatial data.
- SQL Server replication duplicates data from a Publisher server to a Subscriber.
- SQL Server offers triggers for different types of database events: (DML, DDL, Logon Triggers)
PostgreSQL¶
Object-relational database management system:
- Open source (completely free)
- PostgreSQL offers a number of options for index types.
- PostgreSQL is case sensitive for evaluating strings.
- PostgreSQL does not have a native data type for geographic data. The open-source PostGIS resource offers support for geographic objects.
- PostgreSQL has Primary-Secondary replication. Replication can be synchronous or asynchronous.
- PostgreSQL has advanced triggers. Supported triggering events are AFTER, BEFORE, and INSTEAD OF, and they can be used for INSERT, UPDATE, and DELETE events.