PHP is not Java

2011-07-16

The fact that PHP is not Java is self-evident and does not need much emphasizing. Nevertheless, developers who come from a Java/OOP background sometimes treat PHP as if it were Java with dollar signs. For example, one finds not just generic design patterns such as singletons, factories, decorators, etc. in contemporary PHP applications, but also patterns directly taken from the JEE world, such as transfer objects, domain stores, filters, session facades, dispatchers. Even typical Java architectures involving separate business, service, persistence, presentation and integration tiers has been brought to the world of PHP application development. This trend began when PHP became a fully fledged OOP language with version 5. PHP developers seem to have looked to the Java world for inspiration ever since. Possibly it’s a new generation of graduates that was taught to think in Java, or maybe it’s just the overwhelming influence of the mainstream OOP body of thought. I can’t say for sure.

I like Java, but I don’t like PHP code that is written as if it were Java. PHP is great because it makes easy things easy. I believe that this simplicity is an asset, not a flaw. PHP provides a straightforward approach to web development, and I dare say that its success is founded on this principle. PHP gives you productivity, ease-of-use, a high level of abstraction, powerful well-tested libraries and backward compatibility. It’s a pragmatic language, not necessarily a beautiful one. PHP’s share-nothing approach may limit its use in specific areas such as concurrency and parallel processing, but it is ideally suited to the request life cycle model of the web, and it frees the programmer from a lot of potential headaches that come with concurrency. This share-nothing approach also implies that scripts are running in a virtual bubble. A malfunctioning or malicious script doesn’t bring down the entire virtual machine like on the Java platform. This property makes it perfect for shared hosting, and consequently, PHP hosting is ubiquitous and cheap.

Now, let’s look at some Java-esque PHP code:

abstract class Observable {

  private $observers = array();

  public function addObserver(Observer &amp; $observer) {
        array_push($this->observers, $observer);
  }

  public function notifyObservers() {
        for ($i = 0; $i < count($this->observers); $i++) {
                $widget = $this->observers[$i];
                $widget->update($this);
        }
    }
}

class DataSource extends Observable {

  private $names;
  private $prices;
  private $years;

  function __construct() {
        $this->names = array();
        $this->prices = array();
        $this->years = array();
  }

  public function addRecord($name, $price, $year) {
        array_push($this->names, $name);
        array_push($this->prices, $price);
        array_push($this->years, $year);
        $this->notifyObservers();
  }

  public function getData() {
        return array($this->names, $this->prices, $this->years);
  }
}

This is an attempt at the observer pattern taken straight from a popular PHP5 textbook. It is obvious that the implementation doesn’t do anything useful. Since the code is meant for illustration, this is forgiveable. The principal flaw is that in PHP, the observer pattern is for the birds. There are two reasons. First, the observer pattern is fundamentally a Java crutch that compensates for the lack of function arguments in Java. If you find yourself writing an observer in PHP, it might have escaped you that PHP supports function parameters and that you can implement the same functionality more concisely with a simple callback function. The second reason goes somewhat deeper into software design considerations. The observer pattern is typically used in an event-driven design, for example for maintaining the state of GUI widgets or in an event dispatcher for container objects. Such designs are rarely ever useful in PHP, because all PHP objects live in the request scope by default. This means that after the request cycle is completed, the objects -including infrastructure objects such as observers, are deallocated.

The fact that PHP objects don’t survive requests has another important implication: the amount of objects in an application is proportional to CPU consumption, because all objects have to be rebuilt at every request, and unlike in Java, there is no class loader that takes care of these things for you. Therefore, incorporating plenty of Java-esque OOP designs into PHP applications has an adverse impact on performance and scalability. Given that the performance of interpreted PHP is already a magnitude below that of JVM bytecode, you can quickly run into scalability issues with an application that serves a large user base. Putting objects into session scope is no solution, because PHP sessions are serialised/deserialised upon each request. The performance penalty of this operation is likely to be worse than explicit object construction.

The solution to the PHP scalability problem lies in using sensible caching, which is of course what every heavy traffic application does. Products such as memcache and APC provide in-memory data stores for objects. Caching, however, bumps up the complexity of persistence logic in your application considerably. It’s not a plug-and-play solution and it doesn’t provide access to shared data. On a personal note, if I was at the point where I had to consider PHP caching, I would probably already regret not having used a JVM based language in the first place. But let’s get back to the topic. Java developers are used to manipulate data with special purpose data structures called collections. PHP developers, on the other hand, generally use only one bread-and-butter data structure: the associative array. This data structure is an ordered map with either integer or string keys. It’s values can be of any type, and of course, arrays can contain mixed value types. Associative arrays can be used to mimic any collection type, such as lists, maps, queues, stacks, etc., even recursive structures such as trees, because array values can be arrays themselves. So there is no need for special purpose collections and this obviously makes things really simple. As a Java programmer, one should probably resist the temptation to define special purpose collections on top of associative arrays. Why? - It’s not the PHP way.

This statement may require some explanation. You might ask: what keeps you from inserting an element at the head or in the middle of a queue if you use associative arrays? Well, technically nothing. The issue is not about what can be done, but what should be done. It’s has to do with the PHP philosophy. PHP is NOT about enforcing programming by contract. It is a dynamic language, that treats data very loosely. This approach has its benefits and its dangers. If you don’t like it, it may be better to use a statically typed language. What I am saying here is, that it is generally wiser to leverage the strengths of a programming language to the greatest degree possible, rather than trying to compensate for its weaknesses. Dynamic languages have much to offer, but forcing a programming by contract approach onto PHP is painful. Consider the following example:

function recordVote($ballotId, $vote, DateTime $timestamp) {
  if (!is_int($ballotId))
    throw new InvalidArgumentException(
      "ballotId must be an integer");
  if (!is_string($vote))
    throw new InvalidArgumentException(
      "vote must be a string");
  ...
  return $something;
}

This method is doing type checking the hard way. Version 5 of the language introduced a partial type checking syntax called type hinting. Unfortunately, this works only for objects, such as the $timestamp parameter in the above code snippet. The introduction of PHP type hinting did not only dissolve the dynamic paradigm of the PHP language, but it’s implementation is also pitifully inconsistent, because it does not provide for the checking of scalar types, such as int or string. In practice, it often the wrong scalar arguments that cause bugs which are difficult to detect. Wrong object arguments are detected easily in 90% of all cases when references to non-existing methods or properties are invoked. The recordVote() method tries to compensate for the former lack by explicitly checking the arguments using is_int() and is_string(). This produces explicit error messages at run time, but the approach is painful and entails some serious trade-offs.

Most importantly, the caller is now forced to pass specific types which require type casts in all places where the method is called. These casts are easily forgotten and therefore can cause bugs themselves. Did I mention they are also cumbersome to insert? Secondly, the method loses the capacity of implicit overloading, which means that its reusability is seriously curtailed. For example, if I wanted the method to accept timestamps in other formats, I would have to either add additional methods with different names and signatures, or create an adapter. Adding different methods for the same task results in code duplication. Creating an adapter makes the code harder to understand. Both solutions increase verbosity and reduce clarity. A more elegant and more PHP-like solution to this problem would be to replace syntactic checks by semantic checks, make the method signature more dynamic, which gives caller a richer yet still exact API with mixed type parameters:

function recordVote($ballotId, $vote, $timestamp) {
  if (!is_numeric($ballotId))
    throw new InvalidArgumentException(
      "ballotId must be numeric");
  $ballotId = (int) $ballotId;
  if (empty($vote))
    throw new InvalidArgumentException(
      "vote must not be empty");
  $vote = (string) $vote;
  if ($timestamp instanceof DateTime)
    $timestamp = doSomething($timestamp);
  else if (is_string($timestamp))
    $timestamp = doSomethingElse($timestamp);
  else throw new InvalidArgumentException(
    "timestamp not valid");
  ...
  return $something;
}

PHP best practices