Object cloning in PHP

2012-01-21

clones

In any complex object-oriented PHP program, there are situations that require copies of objects. Objects are often designed mutable, which means they contain state information that can change. Consider a bank account object, for example, which contains state information about balance, credit limit, and the account holder. Let’s assume that there are withdraw() and deposit() methods that change the state of this object. By contrast, an immutable design would require that withdraw() and deposit() return new account objects with updated balance information. This may sound like an irrelevant distinction, but the the implications are actually far-reaching, because mutable objects tend to increase complexity in subtle ways. Copying objects is a good example.

1
2
$object1 = new Account();
$object2 = $object1;

By assigning an object instance to a new variable, as above, one creates only a new reference and the object’s state information is shared by both reference variables. Sometimes, this is all a program needs. If withdraw() is called on $object2, both $object1->getBalance() and $object2->getBalance() return the same value. On other occasions, this behaviour is not desirable. For instance, consider displaying the results of a withdrawal operation on an ATM machine before the transaction is executed. In this case, we can make a copy of the account object, execute the withdrawal operation, and display the new balance or an overdraft message to the user without affecting the actual account. For this we need a copy of the object rather than a copy of the reference. PHP provides an intrinsic operation using the keyword clone:

1
2
$account = new Account();
$clonedAccount = clone $account;

The $clonedAccount variable contains a copy of the original object. We can now invoke $clonedAccount->withdraw() to display the results and -with a bit of luck- the original $account object remains unaffected. With a bit of luck? Yes, unfortunately things aren’t quite straightforward. The clone operation creates a so-called shallow copy of the original instance, which means that it constructs a new object with all fields duplicated. Any field that contains internal type data, such as integer, string, float, or an array is copied. If the balance is of type float, for example, we should be fine. If the balance field happens to be an object, however, we have another problem, because the clone operation does not copy composite objects but only their references. If the account class uses a balance object, a call to $clonedAccount->withdraw() method would still affect the state of the original $account object, which is clearly not the desired behaviour.

This can be remedied by adding a magic method named clone() to the original object. The clone() method defines what happens if the object is cloned:

1
2
3
4
5
6
7
class Account {
protected $balance;
function __clone() {
$this->balance = clone $this->balance;
}

}

The somewhat odd looking syntax of the _clone() method above instructs PHP to make a copy of the balance object that the field $balance refers to when the object is cloned. Thus not only the account object itself is copied, but also the balance object that it contains. While this should be okay for our stated purposes, note that this only copies the balance object, and not any other composite objects that the account object might contain. It is not difficult to generalise the code, however. The following even odder looking syntax makes copies of all composite objects of the account object. It does so by iterating all fields of the current instance referred to by $this, whereas $key takes the names of the fields and $value their values:

1
2
3
4
5
6
7
8
9
10
class Account {
function __clone()
{
foreach ($this as $key => $value) {
if (is_object($value)) {
$this->$key = clone $this->$key;
}
}
}
}

The is_object() test in the above code is necessary to avoid cloning non-existing composite objects, i.e. fields whose value is set to null, which would result in an exception. Yet, this code still has a minor flaw. What if our object contains array fields whose values are objects? While the array itself would be copied, the array fields still contain references and thus would point to the same objects as the array fields in the original object. This flaw can be eliminated by adding a few more lines of code that make explicit copies of the array fields:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
function __clone()
{
foreach ($this as $key => $value) {
if (is_object($value)) {
$this->$key = clone $this->$key;
}
else if (is_array($value)) {
$newArray = array();
foreach ($value as $arrayKey => $arrayValue) {
$newArray[$arrayKey] = is_object($arrayValue)?
clone $arrayValue : $arrayValue;
}
$this->$key = $newArray;
}
}
}

This already looks fairly complicated, but unfortunately it is not the end of our troubles. We also have to consider the hierarchical structure of composite objects, which means that the objects in object fields may contain object fields themselves which may yet contain objects with other object fields. Thus, creating a clone from scratch requires recursive copying of the object structure, otherwise known as making a “deep” copy. Obviously, the above method already gives us a way of implicit recursion if all of our objects implement it. The clone() method of Object A is implicitly invoked by the clone() method of object B when B containing A is cloned. We could use a base class for all of our objects to provide deep copying functionality. Although this works only with our own objects, and not with objects from third-party libraries, it would provide a comprehensive method for object copying. Unfortunately, the recursive approach still contains a flaw. Consider the following object structure:

1
2
3
4
5
class Employee {
$name = null; /** @var string employee name */
$superior = null; /** @var Employee employee's superior */
$subordinates = null; /** @var array of Employee, subordinates */
}

This is an example of a class that represents a hierarchical graph of instances in memory. The Employee class defines a tree structure with the variable $superior containing a reference to the ancestor node and the variable $subordinates containing a reference to child nodes. Because of this double linking, the graph contains cycles, and because of these cycles, the above clone method will run into an infinite loop and cause a stack overflow. Cycles are fairly common in object graphs, though they are not necessarily as obvious as in the above example. In order to prevent the clone method from running into a cycle death trap, we need to add a cycle a detection algorithm, for example by reference counting. How exactly this is implemented is beyond the scope of this article. Let’s just say it’s not that trivial.

If you can do without cycle detection, there is a simple alternative for creating deep copies of an object, one which does not require a clone method implementation:

1
2
$object1 = new Account();
$object2 = unserialize(serialize($object1));

This takes advantage of the PHP serialize() and unserialize() library functions that convert an object back and forth to a string expression. These functions take nested object structure into account. However, they are expensive operations, both in terms of CPU and memory, and they should therefore be used with discretion.