If you’re reading this, you have probably written a few programs yourself, maybe in languages in loosely typed languages like Ruby and PHP, or strictly typed languages like Java or C, C++. But just what is the difference between them?
When you are getting started in programming, you may find typing difficult. For a beginner this piece of code makes perfect sense:
int i = 1; printf("The number is: " + i);
However, if you compile this in C, you’ll get a warning:
warning: adding ‘int’ to a string does not append to the string
Despite the warning, it will actually compile, but the result will be quite unexpected. Due to how C works, both sides of the addition will be interpreted as numbers and used as a memory address.
Running a similar code in Ruby will yield us a clear error:
i = 1; //Error: no implicit conversion of Fixnum into String (TypeError) puts "The number is: " + i;
On the other hand, we can do things like this in PHP:
$i = 1; //Dot is the string concat operator in PHP print("The number is: " . $i);
What’s going on here? Let me explain.
Almost all modern programming languages have a concept of data types. You can have integers (whole numbers), float (decimal numbers with a floating point), strings (text types), etc, so the language knows how to deal with them. But why?
To understand that we have to go back to the high school IT. Remember, the computer stores everything in binary numbers. Zeroes and ones. These individual binary numbers are called bits. 0 or 1 is one bit, 00, 01, 10 and 11 are two bits, etc.
Now, these bits aren’t just floating around inside the computer. Modern computers organize data into blocks of 8 bits, called a byte. (It used to be 7 for a time.)
That raises the question: how much data can you store on one byte? For one bit, you can have 0 or 1. Two possible values. How about 2 bits? 00, 01, 10 and 11. Four values. If you continue the math, you’ll end up discovering that one byte (8 bits) can hold 256 different values, or 28.
Our modern CPUs are 32 and 64 bits, which means that they can process up to 4 and 8 bytes of data in one instruction. Everything above that has to be split into multiple instructions.
So far so good? Great. However, these are just numbers. To store text, we need to encode it as numbers. There are many different forms of encoding text as numeric values, but the oldest form is ASCII. The conversion table contains the letters of the US alphabet, numbers, control characters, etc. For example, the letter A is encoded as 65, B is 66, etc. The number 0 is 48, 1 is 49, etc.
So if we encode the number 123456789 as a string, we end up with 9 bytes of data. 9 bytes or 72 bits. Remember, our CPU can’t deal with that, so we end up with very inefficient code. A more useful representation would be not encoding the numbers into a string at all, but convert it to binary directly. 123456789 encoded as binary only takes up 26 bits so that it can be dealt with in one instruction. Not to mention the fact that decoding the data from a string representation, then performing an arithmetic instruction and then re-encoding it is a LOT of instructions.
You can hopefully see where this is leading: our high-level programming language needs to know what the data type is to run at a reasonable speed.
Making sense of data
Let’s set the performance issue aside for just a second and imagine we’re storing everything in strings. What would this piece of code mean:
"123" + "456"
Intuitively you’d say the programming language should just add the two numbers, which would yield
579. But what if
we wanted to concatenate it to give us
123456? We’d need a different operator. We’d need to tell the compiler that
we want these two pieces treated as strings rather than numbers.
In other words, even if we store everything as strings, we can’t escape the fact that the compiler needs to know how the data should be treated.
Loose vs. strict typing
Now that we have established that we need types let’s take a look at how languages treat types. Taking our number example again, what happens if we add two a string and a number?
"123" + 456
Some languages will give you an error outright because they’ll tell you that these are incompatible types and cannot be added together. These languages are called strictly or strongly typed. They don’t have internal type juggling rules so you need to state explicitly what data type you are using.
//yields int(579) var_dump("123a" + 456);
While I won’t say that weakly typed languages are bad, you have to keep the type juggling in mind when working with them. They are easier for a beginner to get into, but most weakly typed languages hold various pitfalls when it comes to type juggling. In the example above, I would expect PHP to throw an error as that particular string should not be converted to an integer.
Dynamic vs. static typing
All right, so we have a typing system for our code, but when do we get the error about using incompatible types?
There are two approaches to this. Languages like Python, Ruby, and PHP (when you switch it to strict typing) give you an error when you run the code, and the actual invalid operation is attempted. We call these languages dynamic, or dynamically typed languages. These languages usually also allow you to change the type of a variable mid-run.
Other languages like Java or C/C++, on the other hand, give you an error when you compile the code. All execution paths are analyzed, and invalid type assignments are complained about immediately. These are called statically typed languages.
Both have their strengths and weaknesses. Dynamic languages make it easier to write and read programs because you don’t have to litter your code with type declarations. Also, usually these languages are script languages, so there is no compiling involved in coding, which makes the whole process faster.
Please note: dynamic typing does NOT mean that the language is weakly typed! You can have a strongly and dynamically typed language!
Statically typed languages, on the other hand, can catch many bugs at compile time. This is something you can only achieve by using code analyzers in the case of dynamic languages (if it is possible at all).
Many people think that languages like Python or Ruby are actually weakly typed, which is not true. The confusion stems from the concept of duck typing. That, in turn, comes from the old saying:
If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
In other words, the type check is applied based on behavior rather than the type itself. So this would work perfectly:
class Duck def quack puts "Quaaaak" end end class Person def quack puts "I'm a duck!" end end def makeTheDuckQuack(duck) duck.quack end makeTheDuckQuack Person.new
Ruby will behave nicely as long as the object passed to it has a
quack method. While this behavior makes it easy to
write code, it makes detecting potential bugs with a static code analyzer harder because you don’t have a
declaration of types anywhere, they are evaluated at runtime.
Which is better?
Remember, the purpose of a programming language is to make it easier for us to describe our wishes to the computer. The only valid measurement of how good a programming language is fitness for purpose. If you are writing a forum software, you probably don’t care that much about type. For a finance application, on the other hand, data types are essential because you don’t want to do financial calculations with floating point numbers. But that’s a topic for another day.
You can, of course, write tests to work around the missing code analysis, but at some point, it may become too tedious. For PHP specifically, I would recommend enabling strict typing because the type juggling modes make no sense at all and lead bugs through silent errors. ∎