Fault tolerant programming in PHP


In your application, every time you call an "external" service you are vulnerable to the failure in that service. That either might be a third party API being down, your database being unresponsive or unexpected errors from the 3rd party library you are using. With many developers and companies being interested in composing applications out of microservices at the moment, guarding for failures because of broken dependencies gets even more important.

A public facing API with dependencies on two internal services

Types of errors might be:

  • a connection timing out
  • no network between your app and the service
  • exceptions from libraries

What happens if these errors occur? They might cascade through your application, but if possible, it would be nice if the end user would not notice the failures at all. Furthermore the people administering the systems should know where exactly things are failing.

A public facing API with a failing dependency cascading errors

Circuit breaker

A way of dealing with these errors is by using a circuit breaker. Every time your application calls an "external" service, the call gets wrapped in a circuit breaker.

A circuit breaker between two services

The circuit breaker is designed to keep your application running, even when the risky calls are failing. It keeps tracks of errors and when the amount of errors from a certain type of call reaches a configured threshold, the circuit breaker will be "tripped". When the circuit breaker is in the tripped state it will:

  • provide fallbacks
  • fail fast
  • retry
  • monitor

Providing fallbacks

In the example shown above the inventory service is down. Instead of the public shop API returning a 500 it can in fact still serve the catalogue part of our API by filling in the amount of inventory we have with a default value. A circuit breaker allows this by configuring a fallback response in case the service is down.

Failing fast

When the circuit breaker is in the tripped state it prevents the service calls to the tripped service. Instead, it directly returns the fallback. This is especially useful for two reasons:

  • Faster reponse (no need to wait for a known and time consuming error such as a timeout)
  • Offloading the failing API

If there is no fallback available, an exception will be thrown instead. The exception will be raised immediately instead of actually doing the call and waiting for a timeout.

Retrying

When a circuit is tripped it is essential that the actual endpoint is called again when it becomes available. The circuit breaker achieves this by allowing a subset of all requests to be passed to the actual service. If the service then provides a valid response it is once again flagged as active and the circuit is closed. All future requests will now be forwarded to the service again.

Monitoring

Another huge advantage of using the circuit breaker is that you have a uniform way of dealing with remote services. The circuit breaker already keeps track of the status of the external services (down/up, uptime percentage, latency, requests per second, etc). In order to get a clear overview of the health of your application you now only have to look at the status of the circuit breaker.

Phystrix

Phystrix is a latency and fault tolerance library for PHP, inspired by Netflix’s Hystrix. It was created and recently open sourced by oDesk. The library provides an abstraction called commands, that wrap calls in a provided circuit breaker implementation. For details, check out the introduction blog post by one of the library authors: Phystrix: latency and fault tolerance library for PHP.

What Netflix says about Hystrix:

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

Conclusion

In our next blog we will show a demo application for a public api service that uses asynchronous calls to internal services, while remaining resilient and tolerant for failures, by combining the react http client together with Phystrix.

Read more: