A Scalable Pattern for Displaying Simple Remote Data in Drupal 8

Let's imagine a scenario where you need to display some data from a remote service to the user. Instagram, for example. You want to grab the 6 most recent posts, pass them through some theming, then output them into a block. How would you go about doing that? In Drupal 7, one possible approach might look like this.

function mymodule_block_info() {
  $blocks = array();
  $blocks['instagram'] = array(
    'info' => t('Instagram'),
  );
  return $blocks;
}

function mymodule_block_view() {
  $block = array(
    'subject' => t('The latest in photos...'),
    'content' => array(
      '#theme' => 'instagram_feed',
      '#data' => mymodule_get_instagram_data(),
    ),
  );
  return $block;
}

function mymodule_get_instagram_data() {
  if($cache = cache_get('instagram_data')) {
    return $cache->data;
  }
  else {
    $instagram_data = mymodule_fetch_instagram_data();
    cache_set('instagram_data', $instagram_data);
    return $instagram_data;
  }
}

function mymodule_fetch_instagram_data() {
  // HTTP request to Instagram API goes here.
}

This works pretty well... The time intensive mymodule_fetch_instagram_data() function is only called if the cache is empty. Clearing the cache will instantly refresh the data. But what happens when Instagram goes down and the cache is cleared? The request may hang and wait for a response from instagram. Worse yet, a second and third person might request the block, leaving 3 hung requests all trying to fetch the same thing. To work around that, you could add locking around the data intensive call, like this:

function mymodule_get_instagram_data() {
  if($cache = cache_get('instagram_data')) {
    return $cache->data;
  }
  else {
    // Get an exclusive lock on fetching the data.
    if (lock_acquire('instagram_data')) {
      $instagram_data = mymodule_fetch_instagram_data();
      cache_set('instagram_data', $instagram_data);
      lock_release('instagram_data');
      return $instagram_data;
    }
    // Failed to acquire lock. Try again!
    return mymodule_get_instagram_data();
  }
}

This prevents our stampede problem (where multiple requests are all executing the fetch task at once), but it still requires at least one request to sit and wait while that Instagram data is fetched. What's more, the cache lifetime here is fixed. We can make it short, and raise the risk of a blocked request waiting on the fetch task, or we can make it long, and accept that we might not have the latest data. Of course this is a simplification - there are ways to work around these issues, but they required another storage mechanism other than the cache, which greatly raised the barrier to entry for building a truly scalable solution in Drupal 7.

So how has the situation changed in Drupal 8? For starters, we have a State API for containing non-volatile data. This means that Instagram data no longer has to live in a place that can be emptied out at the whim of the site owner. We can more or less depend on State data being ready for us when we need it.

On top of that, we've got some additions to the cache API that make invalidation of true cache items a whole lot easier. Cache tags allow us to clear cached items based on their dependencies. For example, when a page contains our Instagram block, and our Instagram block contains our instagram data, we can now issue a cache clear for both the page and the block at once, and the content will be rebuilt as needed.

Let's look at how that same instagram block might look in Drupal 8:

use Drupal\Core\Block\BlockBase;
use Drupal\Core\Cache\Cache;

class MyModuleInstagramBlock extends BlockBase {

  public function build() {
    $data = Drupal::state()->get('instagram_data');

    return [
      '#theme' => 'instagram_feed',
      '#data' => $data,
      '#cache' => [
        // Never expire this entry - we'll invalidate it ourselves.
        'max-age' => Cache::PERMANENT,
        'tags' => ['instagram_data'],
      ]
    ];
  }
}

function mymodule_cron() {
  $new_data = mymodule_fetch_instagram_data();
  $old_data = \Drupal::state()->get('instagram_data');

  if ($new_data !== $old_data) {
    // Instagram data has changed.
    \Drupal::state()->set('instagram_data', $new_data);
    // Clear anything that is using the instagram_data.
    \Drupal::service('cache_tags.invalidator')
      ->invalidateTags('instagram_data');
  }
}

Notice how we've moved the fetch task out of the critical path. There is no longer any chance that Instagram will take your site down. On cron, we're fetching the data, and if it's changed from what we've got stored, we invalidate the instagram_data cache tag. Assuming you run cron frequently and are using a cache-tag friendly page cache (database or Varnish), you can now have near instaneous refresh of your instagram feed across multiple pages of your site without compromising cacheability.

Of course this approach (sticking data into the State API on cron) won't necessarily work for everything... you have to know in advance what data you need. If what you're working on requires configuration (like for instance if you need to support fetching from arbitrary Instagram feeds), you may be better off using a different approach.