Lesson 4 - Reference and value data types in Java
In the previous exercise, Solved tasks for OOP in Java lesson 3, we've practiced our knowledge from previous lessons.
In the previous lesson, Solved tasks for OOP in Java lesson 3, we created our first regular object in Java, a rolling die. Objects are reference data types that behave in a different way than primitive data types, e.g. int, in certain aspects. It's important to know what exactly is going on inside the program, otherwise, we'd end up with undesired results.
We'll go over primitive types once more before we move on. Generally, primitive types are simple structures, e.g. one number, one character. We work with them mostly to get the job done as fast as possible. There are usually a lot of them in a program and occupy a small amount of memory. They're often described as "light-weight" structures. They each have a fixed size. Examples of value types include int, float, double, char, boolean, and others.
An application, more so, its thread, allocates memory from the operating system in the form of a stack. It accesses this memory at very high speeds, but the application can't control its size and the resources are assigned by the operating system. This small, fast memory is used to store local variables of the primitive type, with some exceptions in the iterations which we'll get into later on. Here's a visual representation of the memory:
The image above shows the memory available to be used by our application.
We've created a variable a
of the int data type in the
application. Its value is 56 and was stored directly into the stack. The
corresponding code might look something like this:
int a = 56;
You could think of it as the a variable having an allocated part of memory in the stack, of the size of the int data type which is 32 bits, where the value 56 is stored.
Let's create a new console application and add a simple class that will represent a "user". For clarity, we'll omit comments and won't bother with access modifiers:
public class User { public int age; public String name; public User(String name, int age) { this.name = name; this.age = age; } @Override public String toString() { return name; } }
The class has two simple public fields, a constructor and an overriden toString(), so users can be printed simply. Let's create an instance of this class in our main program:
int a = 56; User u = new User("James Brown", 28);
A variable u is now of the reference data type. Let's see how this situation looks like in memory:
We can see that an object, a variable of the reference data type, is not stored in the stack, but in the memory called the heap. It's for this very reason that objects are generally more complicated than primitive data types. They usually contain other fields and occupy more space in memory.
Both stack and heap are located in the RAM memory. The difference is in the access and in the size. The heap is almost unlimited memory, which is, however, complicated to access so it ends up being slower. On the other hand, the stack memory is fast, but limited in size.
Reference-type variables are actually stored in the memory twice, once in the stack and once in the heap. Within the stack there is something we call a reference, a link to the heap where an actual object can be found.
Note: For example, in C ++, there's a huge difference between pointers and references. Java fortunately doesn't have pointers and only uses the "reference" term, whose principles are paradoxically closer to those of C++ pointers. The "reference" term mentioned here stands for the Java reference and has nothing to do with C++.
There are several reasons why things are done this way:
- The stack size is limited.
- When we want to use the same object multiple times, e.g. to pass it as a parameter into several methods, we don't need to copy it. We only have to pass a small primitive type containing the reference to the object instead of copying a whole heavy-weight object.
- Thanks to references we are able to create structures with dynamic size easily, for example array-like structures in which we can add new elements at run-time. These elements reference each other, like a string of objects.
Now let's declare two variables of the int type and two variables of the User type:
int a = 56; int b = 28; User u = new User("James Brown", 28); User v = new User("Jack White", 32);
Here's what this would look like in memory:
Now let's assign the b
variable to the a
variable.
We'll also assign the v variable to the u variable. During
value assignments, primitive types are just copied to the stack. Alternatively,
when it comes to objects, only its reference is copied (which is in fact a
primitive type too). Assigning references does not create new objects. Now, our
code should look something like this:
int a = 56; int b = 28; User u = new User("James Brown", 28); User v = new User("Jack White", 32); a = b; u = v;
Memory-wise, it would look like so:
Now, let's verify the reference mechanism, so we can confirm that it truly works this way First, we'll print all 4 variables before and after re-assigment. We could make a method for the printing, but I haven't shown you how to declare methods in ArenaFight.java yet (the file containing the main() method) and it's not a common thing to do anyway, for more serious work we use classes. Let's modify the code:
{JAVA_OOP}
public class User {
public int age;
public String name;
public User(String name, int age) {
this.name = name;
this.age = age;
}
@Override
public String toString() {
return name;
}
}
{JAVA_MAIN_BLOCK}
// variable declaration
int a = 56;
int b = 28;
User u = new User("James Brown", 28);
User v = new User("Jack White", 32);
System.out.println("a: " + a);
System.out.println("b: " + b);
System.out.println("u: " + u);
System.out.println("v: " + v);
System.out.println();
// assignment
a = b;
u = v;
System.out.println("a: " + a);
System.out.println("b: " + b);
System.out.println("u: " + u);
System.out.println("v: " + v);
{/JAVA_MAIN_BLOCK}
{/JAVA_OOP}
We still can't tell what the difference is between primitive and reference data types are based on the output:
Console application
a: 56
b: 28
u: James Brown
v: Jack White
a: 28
b: 28
u: Jack White
v: Jack White
However, we do know that while a and b are really two different numbers with the same value, u and v is the exact same object. Let's change the name of user v and based off what we know, the change should be reflected in the variable u:
{JAVA_OOP}
public class User {
public int age;
public String name;
public User(String name, int age) {
this.name = name;
this.age = age;
}
@Override
public String toString() {
return name;
}
}
{JAVA_MAIN_BLOCK}
// variable declaration
int a = 56;
int b = 28;
User u = new User("James Brown", 28);
User v = new User("Jack White", 32);
System.out.println("a: " + a);
System.out.println("b: " + b);
System.out.println("u: " + u);
System.out.println("v: " + v);
System.out.println();
// assignment
a = b;
u = v;
System.out.println("a: " + a);
System.out.println("b: " + b);
System.out.println("u: " + u);
System.out.println("v: " + v);
System.out.println();
// change
v.name = "John Doe";
System.out.println("u: " + u);
System.out.println("v: " + v);
{/JAVA_MAIN_BLOCK}
{/JAVA_OOP}
We've changed the object in the variable v. Now let's print u and v once more:
Console application
a: 56
b: 28
u: James Brown
v: Jack White
a: 28
b: 28
u: Jack White
v: Jack White
u: John Doe
v: John Doe
The user u changes along with v because both variables point to the same object. If you're asking how to create a true copy of an object, the easiest way is to re-create the object by using the constructor and initializing the new object with the same data. We can also clone objects, but we'll go over that some other time. Let's get back to James Brown:
Now what will happen to him, you ask? He'll be "eaten" by what we call the Garbage collector.
Garbage collector and dynamic memory management
We can allocate memory statically in our programs, meaning that we declare how much memory we'll need in the source code. We've done it several times already and had no problems doing it. We have written the necessary variables in the source code, but soon, we'll make applications, where we won't know how much memory we'll need before we run it. Let's remember the program where we got the average of numbers a user would enter. We asked the user how many numbers he was going to enter during the run-time. So JVM, see the first lesson for explanation, had to create an array in memory at run-time. In this case, we dealt with dynamic memory management.
In the past, particularly in the era of the languages C, Pascal, and C++, direct memory pointers were used for what we call references in Java. Altogether, it worked like this: we'd ask the operating system for a piece of memory of certain size. Then, it would reserve it for us and give us its address. We would then create a pointer to this place, through which we worked with the memory. The problem was that no one was looking after what we put into this memory, the pointer just pointed to the beginning of the reserved memory. When we put something larger there, it would be simply stored anyway and overwrite the data beyond our memory's limits, which belonged to some another program or even to the operating system (in this case, OS would probably kill or stop our application). We would often overwrite our program's data in the memory and the program would start to behave chaotically. Imagine that you add a user to an array and it ends up changing the user's environment color which is something that has nothing to do with it. You would spend hours checking the code for mistakes, and you would end up finding out that there's a memory leak in the user's creation that overflew into the color values in memory.
The other problem was when we stopped using an object, we had to free its memory manually, and if we didn't, the memory would remain occupied. If we did this in a method and forgot to free the memory, our application would start to freeze. Eventually, it would crash the entire operating system. An error like this is very hard to pin-point. Why does the program stop working after a few hours? Where in thousands of lines of code should we look for the mistake? We have no clue. We can't follow anything, so we'd end up having to look through the entire program line by line or examining the computer memory which is in binary. cringes. A similar problem occurs when we free memory somewhere and then use the same pointer again, forgetting it has been already freed, it would point to a place where something new might be already stored, and we would corrupt this data. It would lead to uncontrollable behavior in our application and it could even lead to this:
A colleague of mine once said: "The human brain can't even deal with its own memory, so how could we rely on it for program memory management?" Of course, he was right, except for a small group of geniuses, people became tired of solving permanent and unreasonable errors. For the price of a slight performance decrease, managed languages were developed with what we call a Garbage collector, these include Java and C#. C++ is still used of course, but only for specific programs, e.g. for operating system parts or commercial 3D game engines where you need to maximize the system's performance. Java is suitable for 99% of all other applications, mainly due to its automatic memory management.
Garbage collector is a program that runs in parallel with our applications, in a separate thread. It weaks up time after time and looks in memory for objects to which there is no longer a reference. It removes them and frees the memory. The performance loss is minimal and it'll significantly reduce the suicide rate of programmers who're trying to debug broken pointers in the evenings. We can even affect how GC runs in the code, although it's not needed in 99% of cases. Because the language is managed and doesn't work with direct pointers, it isn't possible to disrupt the memory anyhow, letting it overflow etc., the interpreter will take care of the memory automatically.
The null value
The last thing I'll mention here is the null value. Reference types can, unlike the primitive ones, contain a special value - null. Null is a keyword and it indicates that the reference doesn't point to any data. When we set the variable v to null, we only delete this one reference. If there are still any references to our object, it will still exist. If not, GC will remove the object. Let's change the last lines of our program:
{JAVA_OOP}
public class User {
public int age;
public String name;
public User(String name, int age) {
this.name = name;
this.age = age;
}
@Override
public String toString() {
return name;
}
}
{JAVA_MAIN_BLOCK}
// variable declaration
int a = 56;
int b = 28;
User u = new User("James Brown", 28);
User v = new User("Jack White", 32);
System.out.println("a: " + a);
System.out.println("b: " + b);
System.out.println("u: " + u);
System.out.println("v: " + v);
System.out.println();
// assignment
a = b;
u = v;
System.out.println("a: " + a);
System.out.println("b: " + b);
System.out.println("u: " + u);
System.out.println("v: " + v);
System.out.println();
// change
v.name = "John Doe";
v = null;
System.out.println("u: " + u);
System.out.println("v: " + v);
{/JAVA_MAIN_BLOCK}
{/JAVA_OOP}
The output:
Console application
a: 56
b: 28
u: James Brown
v: Jack White
a: 28
b: 28
u: Jack White
v: Jack White
u: John Doe
v:
We can see that the object still exists and the variable u points to it; however, there is no reference in the variable v anymore. Null values are used plenty both in Java and in databases. We'll get back to reference types in future, in the next lesson, Solved tasks for OOP in Java lesson 4, we'll program something practical again to gain experience. Spoiler: we're making a warrior object for the arena
In the following exercise, Solved tasks for OOP in Java lesson 4, we're gonna practice our knowledge from previous lessons.