Lesson 1 - Introduction to the Java language
Lesson highlights
Are you looking for a quick reference on Java environment instead of a thorough-full lesson? Here it is:
Java programs are instructions for the JVM virtual machine which has to be downloaded as JRE (Java Runtime Environment) to run Java apps:
The advantages are:
- Portability
- Revealing errors in source code
- Stability
- Simple development
- Speed
- Low vulnerability
The Java platform includes Java Standard Edition (JSE), Java Enterprise Edition (JEE) which is web-based, Virtual Machine (JVM) and a complete set of easy-to-use libraries.
Would you like to learn more? A complete lesson on this topic follows.
Welcome to the first lesson of the Java course. We'll go through step by step, from the very beginning to the more complex structures, object models, and databases. With a little patience and persistence, you will become a good programmer.
To fully understand the Java language, we'll have to look to the past and get a good understanding of how programming languages have evolved over the course of time. Doing so will enable us to understand how Java works, and why it is deemed an all-around good programming language to work with.
Evolution of programming languages
1st generation languages - Machine code
Computer processors can perform a limited number of simple instructions, which are stored as a sequence of bits, i.e. numbers. In most cases, the aforementioned instructions are written using the hexadecimal system, so as to make reading them less of a chore. However, the instructions are so limited, that all you can really do is sum up addresses and jump between instructions. As you may already know, in the world of programming, one does not simply add two numbers together. What we do, is look at the numbers' addresses in memory and then sum them up (which takes multiple instructions). Here's what adding two numbers would look like in the hex:
2104 1105 3106 7001 0053 FFFE 0000
The instructions are given to the processor in binary. This sort of code is extremely unreadable and is dependent on the instruction set of the given CPU. I assure you, it is extremely nauseating to program in this "language". Unfortunately, every program must be compiled in binary format so that it can be executed by a computer processor.
2nd generation languages - Assembler
Assembler (ASM for short) is no simpler than machine code, but at least it's human readable! Here, the instructions have human readable text codes, so that people wouldn't have to memorize every single one of the number combinations. The instruction codes are later compiled into binary code. Adding two numbers up in ASM would go something like this:
ORG 100 LDA ADD B STA C HLT DEC 83 DEC -2 DEC 0 END
It's a bit more human-readable, but most people, including me, would still have no clue how this program works.
3rd generation languages
Third generation languages finally give a good amount of abstraction of how the program is seen by the computer. Rather than forcing us to adapt to the computer's arcane way of thinking, the languages focused a bit more on how we see the program. Numbers were then perceived as variables and code had an almost "mathematical-notation" sort of aesthetic.
Adding up two numbers in the C language would go like this:
int main (void) { int a, b, c; a = 83; b = -2; c = a + b; return 0; }
Pretty much anyone could assume what this program does just by looking at it.
It sums 83
and -2
up, and stores the result in a
variable named c
. The main advantage third generation languages had
over all of the previous languages was high readability.
As time went on and code optimization was in demand, object-oriented programming was brought into play, which we will get into later. Third generation languages are essentially divided into the following categories:
Compiled languages
Compiled aka unmanaged languages have their source code in a language that people can fully understand. The source code must still be translated into machine code so that it can be executed by the processor. This translation is provided by a compiler, which compiles the entire program into machine code.
Compiled languages have the following advantages:
- Speed - The program only slows down during the one-time compilation. Once a program is compiled, it runs as quick as, or even quicker due to compiler optimizations, a program written in ASM.
- Inaccessibility of source code - the program is distributed in the compiled form, which makes modifying it very difficult if you don't have the source code.
- Easy to detect errors in source code - If there is an error in the source code, the entire compilation process crashes, and the programmer gets to see where he/she messed up. This greatly simplifies software development.
There are, as you may have guessed, disadvantages as well:
- Platform dependency - the program is still platform-dependent, i.e. on the processor or operating system. We cannot take a pre-compiled program, and run it on another platform without recompiling it and tweaking it a bit.
- Inability to edit - Once the program is compiled into the machine code, you cannot edit it any other way, only by re-compilation. That also applies to the languages mentioned above.
- Memory management - Due to the fact that computers mechanically execute instructions, you may occasionally run into memory overflow errors. Compiled languages don't have automatic memory management, so they're more of a hassle. Run-time errors are caused mainly by manual memory management, which cannot be detected by compilation.
Examples of compiled languages include the C language, its object-oriented successor C++, and Pascal/Delphi.
Interpreted languages
Interpreted languages make an attempt to solve program portability issues, and make programmers' lives a bit easier. Interpreters work much like compilers do, but instead of translating the entire program all at once, they only translate what is needed at a given moment in time. Its name comes from the human profession of Interpretation. Where an interpreter is someone who listens and serves as a "middle man" for people who do not speak the same language. In other words, he/she translates what each person says to a language that they understand. The translation is done while each one speaks. Interpreted languages work in pretty much the same way. The source code is read line by line, compiled into machine code, executed, and then thrown away.
Interpretation is a waste of processor power, of sorts, and is not the fastest way to get things done.
What advantages does interpretation have, then?
- Portability: The program is fully portable. If the platform has an interpreter, our program will be able to run on said platform (developing an interpreter is much simpler than developing a compiler).
- Simpler development - We no longer have to deal with manual memory management. All of that is done for us by what is known as the garbage collector (we'll get into that and more in the advanced courses). In some cases, we don't even have to specify data types, which usually leads to more comfortable data structures.
- Stability - Due to the fact that the interpreter actually understands the code, it spots errors that would eventually be executed by compiled programs. Interpreting programs is, without a doubt, safer that compiling them. Also, using this type of language brings reflection into play, where the program examines itself during the run-time (more on this later on in the courses).
- Easy editing - We can write programs in parts, and upload them to the target destination whenever we want because the code doesn't need to be compiled. In other words, it can easily be edited on the fly.
Interpreters have three major disadvantages:
- Speed - Interpretation can be very slow at times, and the program wouldn't use your computer to its full capacity.
- Difficulties in finding errors - Due to compilation being done during run-time, errors won't pop up before the code is executed, which can be very annoying.
- Vulnerability - Since the program is distributed as source code, anyone and everyone can alter or even steal parts of it.
PHP is an interpreted language. Most websites are written in this relatively easy language because it gets the job done right. Facebook uses a custom version of PHP, if you're interested, look up the "HipHop for PHP" project.
Languages with the virtual machine
Hmm, now what if we took the best of both approaches and left out most of the disadvantages? Thus, the virtual machine was born! Virtual machines are the most advanced kind of programming languages, currently the most widespread and the best choice for developing most applications. Java or C# belong to this category.
First and foremost, the source code is translated into what we call "bytecode", sometimes it can be referred as to intermediate code. It's essentially machine code, i.e. binary. However, it has a considerably simpler instruction set and directly supports object-oriented programming. Due to its higher simplicity, intermediate code can be interpreted relatively quickly by the virtual machine i.e. the intermediate code interpreter. In Java, we refer to it as the JVM (Java Virtual Machine), which is then fed right into our processor.
By using a virtual machine, we essentially eliminate both the interpreter and compiler's disadvantages, while still using most of their advantages:
- Revealing errors in source code - Bytecode compilation easily uncovers bugs in the source code.
- Stability - Due to the fact that the interpreter understands the code, it keeps us from performing dangerous operations as well as alert you with error messages. Reflection is still available for use if needed.
- Simple development - We have hi-tech data structures and libraries available. Memory management is done for us by the Garbage Collector.
- Good amount of speed - The speed of a virtual machine is somewhere in between the interpreter and the compiler. The virtual machine is able to cache results instead of throwing them away like interpreters usually do. It can also optimize itself when it notices recurrent calculations, which does end up speeding the compiler up. The program, on the other hand, is a bit slower because the machine has to translate common libraries during runtime.
- Low vulnerability - The application is distributed as bytecode, which isn't human readable.
- Portability - The final program will run on any hardware that has a proper virtual machine installed.
Java and JDK
Java is distributed in 3 editions:
- Java SE - The Standard Edition we're going to use for the start
- Java EE - The Enterprise Edition isn't actually another Java, but a set of libraries for JSE which enables us to develop large web applications. It's quite complicated, but very popular in companies. If you learn this, you'll be an extremely wanted programmer.
- Java ME - The Micro Editions runs in SIM cards, washing machines and other electronic devices (Oracle claims that Java powers 3 billion devices)
To run you applications, we'll need JRE (Java Runtime Environment) which is a package containing the Java Virtual Machine. To develop in Java, we'll need JDK (Java Development Kit) which contains libraries and tools for developers.
Another advantage of Java is that it's completely free of charge, so everyone can use it. Java applications can be run in the browser by Java Web Start. It also keeps your applications up to date automatically.
Languages with VM are designed for object-oriented programming and is the most modern way to develop software. There are also languages of 4th and 5th generation, but they are very specific and we won't cover them today.
Now we know what we're going to work with. In the next lesson, NetBeans IDE and your first console application, we'll work with the NetBeans IDE (Integrated Development Environment) to create our first program.