Lesson 1 - Introduction to databases in Java
We're going to start a big topic with this tutorial, which is working with databases in Java.
Why a database?
You may be asking yourself, why databases? You could just store the data in text files, binaries, XMLs or come up with some other alternative. You'd get it to work somehow...right? Or wouldn't you?
The term database is actually inaccurate, and in technical literature we can encounter the term RDBMS (Relation DataBase Management System). The database engine isn't just a data storage. It's a very sophisticated and optimized tool, that take care of a lot of issues for us and is very simple to use. We use the SQL language to work with databases, whose syntax is more or less human-readable sentences. There are also object-oriented extensions available for this language, but let's get back to that later.
Along with data storage, many other things need to be managed. We might come up, for example, with security or performance optimizations. But RDBMS does even more, it solves the problem when the same item is being edited by multiple users at the same time, which could otherwise cause inconsistencies in the database. In this case, the RDBMS locks the data and then unlocks it once the writing is done. It also allows us to combine several queries into a transaction when either all the queries in the series are completed or neither of them is executed. It can't happen that only a part of them will be executed. These features of the database engine are summarized by the ACID acronym, let's explain it.
ACID is an acronym of the words Atomicity, Consistency, Isolation, and Durability. The individual words have the following meanings:
- Atomicity - The operations within a transaction are performed as a single atomic operation. It means, that if any part of the operation fails, the database returns to its original state and no parts of the transaction are executed. A real example would be, for example, transferring money to a bank account. If the money cannot be subtracted from one account, they won't be added to the other account. Otherwise, the database would be inconsistent. If we handled the data transfer by ourselves, this could easily happen to us.
- Consistency - The database state after completing a transaction is always consistent, which means valid according to all the defined rules and constraints. I'll never happen that the database is in an inconsistent state.
- Isolation - The operations are isolated and don't affect each other. If multiple queries happen to need to write into the same row at the same time, they're executed sequentially as in a queue.
- Durability - All data is written to a permanent data storage immediately (usually on a hard drive), in case of a blackout or any other interruption of the RDBMS operation, everything remains as it was before the failure.
So the database (more precisely the database engine) is a black box, that our application communicates with and stores all the data. It's very simple to use and offers data manipulation features we'd hardly be able to provide by ourselves. We don't have to worry about how the data is stored physically, we communicate with the database using the simple SQL query language, see further. Nowadays, it makes no sense to bother with the issue of storing data, we simply go for a database, there is a wide range of them and are mostly free. The database is sometimes referred as to the 3rd layer of the application (the 1st layer is the user interface, the 2nd is the business application logic, the 3rd one is the data layer).
So-called relational databases are used almost exclusively for data storage.
This term refers to a database based on tables. Each table contains items of one
type. So we can have a
user table, an
comment table, for example.
We can imagine such a database table as a table in Excel. The
user table might look like this:
|First name||Last name||Date of birth||Number of articles|
We store the items as rows, each representing one user in this case. The
columns then specify the attributes (properties, if you'd like) the items have.
The database is mostly type specific, meaning that each column is of a fixed
data type (we distinguish between numbers, characters, short texts, long
texts...) and can contain values of only that type. It's the same as with Java
variables. If we want to use a relational database properly, each row in a table
should be provided with a unique identifier. The users could have a birth
number, but artificial identifiers are used more often. We'll simply assign
unique IDs to the users, starting with
1. We'll get to that
The word relational refers to a relationship between tables or between entities in one table. But let's leave it for another time and till then, we'll work only with one table at a time.
The incompatibility of object-oriented and relational approaches
The object-oriented world and the world of relational databases are very different. They represent two different philosophies, which I dare to say are incompatible.
Relational databases are a proven way of working with data. There are even fully-object-oriented databases available for use out there. However, they're usually not worth investing in, so they're not as popular. After the object-oriented programming revolution, a problem with storing data arose. Since relational databases don't work with objects, they can't store them. There are several ways to work around this problem.
1. Non-object-oriented programming
The first option is obviously to program everything entirely without objects. However, we would go against the mainstream, and wouldn't be able to use any third-party components and would end up making very low-quality code. Since Java is an object-oriented language, we wouldn't be even able to program this way.
2. Database Wrapper
The wrapper approach allows us to work with a database as if it were an object. However, we would still have to communicate with it in the SQL language. In other words, we'll mix object-oriented and relational code. This approach is a sort of middle ground and requires us to bend the OOP philosophy a bit. The advantage to it is that it keeps the database's performance and features for the price of degrading OOP's principles slightly.
We usually get data from a database as values in an array, so we lose the option to add functionality to said data. As a workaround, we'll gather logic into a manager.
In Java, this approach is represented by JDBC (as Java DataBase Connectivity). It's a unified interface that supports all major databases. We just download the connector from the database manufacturer and we can communicate with it from Java using JDBC and its native SQL language. Theoretically, all we need to do is change the Connector, and our finished application works with a completely different database, without having to change the Java code. JDBC maps basic column types to Java data types.
3. Object-relational mapping
Object-relational mapping (ORM) strictly follows the OOP ideology. Meaning that instead of arrays, we get objects straight from the database which provide some methods. We don't communicate in the SQL language at all. We treat database tables as object collections and use OOP language syntax to work with them. This way, we're completely blinded from the fact that we're working with a relational database. Sounds great, right?
The catch is that SQL queries are automatically generated in the background and they're not always effective. Large-scale business applications are often written using ORM and only critical parts are written just using JDBC. Another problem with ORM is that it's quite complex (not to use but to program it). Fortunately, Java offers us a ready-made ORM, so we don't have to solve anything. ORM is specified in Java by the JPA (Java Persistence API), which is an interface for object-oriented data manipulation. The most popular concrete implementation is Hibernate. Another one is also JDO.
JPA maps tables to specific classes (entities). Mapping can be specified either by XML or by annotations. EntityManager synchronizes the database with the object structure.
People's thoughts on ORM are very controversial. Some people say that the idea of ORM is completely wrong since automatically-generated SQL code simply cannot be effective, and we still have to consider the final code so the "blindness" to the relational database is not absolute. Personally, I have a neutral view on ORM, so if somebody gave me a standardized and functional ORM along with a language, I would use it. If not, I avoid using it.
4. Object-oriented database
Aside from relational databases, as you already know, there are object-oriented databases. These solve the incompatibility issue with the object-oriented and relational approaches. They provide the same comfort as ORM but internally have no need to convert object data into tables since it's all stored as objects. Theoretically, there is no performance-related or any other reason why they shouldn't replace today's relational databases. However, they're not used very much at the moment, so we can only hope for that to change. If you're interested in the aforementioned technology, take a look at the MongoDB project.
In the next lesson, Databases in Java JDBC - Printing data and parameters, we'll try basic work with a database in Java using JDBC.