Enrichment services: Access control

Written by Erik Mogensen updated: Wednesday September 28 2016 14:11

In this second instalment of enrichment services (see part 1 for an introduction), we'll look at security and how to do access control. When implementing enrichment services, it is often necessary to inspect things in the web service, or even perform web service actions on behalf of the user. Since the enrichment service itself is accessible using HTTP, it is important to protect the enrichment service using standard web technologies. This post shows how enrichment services can harness the user's credentials.

The business requirement: Autopublish related images

The problem is simple: If a user uploads a lot of images, none of them will be published by default. When an image is used in a story, and the story is published, the system should automatically publish the images.

The story goes as follows:

As a journalist or editor, after I publish a story or gallery, the system should automatically publish any related images, should they be unpublished.

This enrichment service needs not only to get to the story or gallery being edited, but it also needs to actually modify other related items to publish them. This requires access to the Content Engine.

In the introductory post on enrichment services, we saw how we could get CUE to show a message box to the user; we will also use the same technique to tell the user if autopublishing failed for whatever reason.

Designing the enrichment service

At a high level, this is what the enrichment service needs to do:

  1. Check that we're getting a POST request with an atom entry, which is what we're expecting
  2. Check that it's getting a story or gallery (the types of content we care about)
  3. Check that there are any related images
  4. Count how many related images are unpublished
  5. Return 204 NO CONTENT if the count is 0 or if any of the checks fail.
  6. For each unpublished image, GET it, and PUT it back with the right state 7 Return 200 OK and a plain text message indicating how many items were auto published.

It's possible to embellish this to include different policies for different sections, or perhaps to return 400 BAD REQUEST in case there were problems publishing some images (e.g. because the user didn't have access rights), and so on.

Building the actual enrichment service

The enrichment service is the actual URL that will be implementing our business logic. We'll build up the service piece by piece, providing tests along the way, and pretty early we will wire it up in CUE so you can test it as an end user would, to verify each piece as it is being built.

Scaffolding

To try out our enrichment service, you can of course use curl from the command line, to see that it behaves as it should (as we showed in part 1), but it is much more interesting to actually see what CUE does, so it's better to get CUE up and running instead:

First of all, install CUE:

$ sudo apt-get install cue-web-2.0

Add the relevant configuration to CUE, create the file /etc/escenic/cue-web-2.0/custom.yml:

endpoints:
  escenic: http://whatever.../webservice/index.xml

enrichmentServices:
  - title: Autopublisher
    href: http://localhost:1234/autopublisher
    triggers:
      - name: before-save-state-published
        properties: {}

Note that this requires that your web service supplies appropriate CORS headers (see the CUE documentation for more information), or that you start Chrome with --disable-web-security and --user-data-dir.

We used the trigger before-save-state-published — which is triggered when the user hits the Publish button; it actually publishes the images before the story or gallery is published.

Apply the configuration to CUE:

$ sudo dpkg-reconfigure cue-web-2.0

You should now be able to open up CUE using http://localhost/cue-web/ and open a story. When you hit "publish" CUE will make a HTTP POST (or an HTTP OPTIONS if CORS is in play) request to http://localhost:1234/publish-related-images.php. Of course it won't work, because we haven't made the enrichment service yet. But this should be enough to play around with enrichment services. Let's go ahead and start on the enrichment service itself.

I've decided to use PHP to build this, but since it will be a bit more involved with HTTP, I'm going to use a PHP framework that provides easier access to the HTTP stack.

First, create an empty directory and grab silex and some dependencies we'll be using:

$ composer require silex/silex "~2.0"
$ composer require monolog/monolog
$ composer require guzzlehttp/guzzle:~6.0
$ mkdir app web

Here's the boilerplate PHP code for a simple Hello, world enrichment service, supporting CORS requests and all. Call this file app/main.php:

<?php

// app/main.php
require_once __DIR__.'/../vendor/autoload.php';

function cors_headers($request) {
    return array(
        "Access-Control-Allow-Origin" => $request->headers->get("origin"),
        "Access-Control-Allow-Method" => "POST",
        "Access-Control-Allow-Headers" => "Content-Type",
    );
}

use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
use GuzzleHttp\Client;

$config = parse_ini_file(__DIR__.'/../config.ini', true);

$app = new Silex\Application();
$app['debug'] = true;
$app->register(new Silex\Provider\MonologServiceProvider(), array(
    'monolog.logfile' => __DIR__.'/../debug.log',
));

$app->options('/autopublisher', function (Request $request) {
    return new Response("", 200, cors_headers($request));
});

$app->post('/autopublisher', function (Request $request) (use $app) {
    $app['monolog']->addDebug('Autopublisher was invoked');
    return new Response('Hello, world!', 200, cors_headers($request) + ["Content-Type"=>"text/plain"]);
});

$app->run();

You also need to make an empty .ini file; we'll be adding stuff to this later.

; nothing to see here

To try this out, start your PHP using the following command:

$ php -e -S localhost:1234 -t web/ app/main.php

We should now get a "Hello, world" message every time we publish a content item in CUE:

Screenshot of a message box saying "Success" and "Hello, world" with a green close button

The ACAO header tells the browser that it's fine for any web page (*) to instruct the browser to make POST requests.

What just happened?

What happens when you hit publish?

  1. CUE opened an editor, configured with an enrichment service
  2. You hit publish
  3. CUE was configured to run the enrichment service before the publish action
  4. CUE POST'ed the atom entry to the enrichment service (possibly with a pre-flight OPTIONS)
  5. The enrichment service responded with 200 OK
  6. CUE received the 200 OK and proceeded to actually publish
  7. CUE shows the message from the user in a dialog box

It's handy to note that if you respond with 400 BAD REQUEST instead of 200 OK — then CUE will actually not publish. The enrichment service can inhibit certain operations in CUE.

Parsing the incoming request

In the $app->post() call, we are at a point that the enrichment service is processing an HTTP POST request with some data. This data is going to be an atom entry (when the service is first invoked by CUE). We can parse the request body as XML and do a quick count of how many related images there are. We should check:

To check the type of content, we should check the <content> of the atom entry, which typically looks like this:

<entry xmlns="..." xmlns:vdf="..." >
  ...
  <content type="application/vnd.vizrt.payload+xml">
    <vdf:payload model="http://.../.../model/content-type/story">

So checking the type of content we're processing amounts to finding the model attribute of that <vdf:payload> element, using the following XPath:

/atom:entry/atom:content/vdf:payload/@model

Here is the code

$app->post('/autopublisher', function (Request $request) use ($app, $config) {
    $app['monolog']->addDebug('Incoming request.');
    $RESPONSE_204 = new Response("", 204, cors_headers($request));
    if ($request -> getContentType() != "atom") return $RESPONSE_204;
    $entry = register_namespaces(simplexml_load_string($request -> getContent()));
    $model = $entry->xpath('/atom:entry/atom:content/vdf:payload/@model');
    if (count($model) != 1) return $RESPONSE_204;
    if (substr($model[0], -6) != "/story" &&
        substr($model[0], -8) != "/gallery") return $RESPONSE_204;
    return new Response(
            "You published a story or gallery!!",
            200,
            cors_headers($request) + [ "Content-Type"=>"text/plain" ]);
});

The code checks the content type, loads the XML, registers a few namespaces and then runs XPath queries to find the type of content, and checks that it's a story or gallery. If any of these fails, it responds with a 204 NO CONTENT response, which effectively tells CUE to continue with whatever it's doing.

At the end it simply states what we have found out so far. If you now publish a story or gallery, it will respond with a message to that effect, while publishing anything else should not result in any message.

Finding the related images

The related items are available in the web service as atom <link> elements, as follows:

<entry>
  ...
  <link href="http://..../.../escenic/content/123123123"
      dcterms:identifier="123123123"
      title="Test"
      rel="related"
      type="application/atom+xml; type=entry"
      metadata:group="galleryimages"
      metadata:synthetic-id="ccd23984-2db3-4f43-ac42-f67958d264d3">
  <vdf:payload model="http://.../.../model/content-summary/image">
    ... (omitted for brevity) ...
  </vdf:payload>
</link>

The following XPATH shows how to get the list of images:

/atom:entry/atom:link[@rel="related"][contains(vdf:payload/@model,"/picture")]/@href

To make something testable in CUE, we'll change the message so that it prints out how many related images there are, at the time of publishing.

$app->post('/autopublisher', function (Request $request) use ($app, $config) {
    ...
    $hrefs = $entry -> xpath('/atom:entry/atom:link[@rel="related"][contains(vdf:payload/@model,"/picture")]/@href');
    return new Response(
            "You have ".count($hrefs)." related images",
            200,
            cors_headers($request) + [ "Content-Type"=>"text/plain" ]);


In CUE, you should now be presented with the following dialog box, when you publish any story or gallery with any images.

Screenshot of a message box saying "Success" and "You have 3 related images" with a green close button, overlaying the CUE editor with three related images and a related story

A note on security

There is one large security concern when implementing enrichment services, and that is that one should always treat the incoming HTTP request as hostile. This means that the incoming HTTP request (indeed, any part of it) should be vetted for various things before acting upon it. Before putting this code in production, there are some things that need to be handled properly:

Finding unpublished images

We now have the actual URLs of the related images in the $hrefs array. The next thing we need to focus on is actually checking and later publishing them.

This is also the most sensitive part of the enrichment service. Up until now, it has only acted on the HTTP request coming from the browser, not able to do any harm. Now it will actually need to make HTTP requests on behalf of the user, and in order to do that, we need the user's credentials. Typically, the credentials are passed using the standard HTTP Authorization header. This goes both for BASIC authentication, and for OAuth authentication. In essence, when we get an Authorization header in, we can simply forward that header as needed. This makes it easy to deploy the enrichment service, since it needs no authorization configuration.

For testing purposes, running an enrichment service on localhost, it might be easier to hard-code an authorization header. Care should be taken that such a configuration doesn't end up in any production environments, since they become easy attack vectors to perform HTTP requests, unless the enrichment service itself is protected by other measures.

To find the unpublished images, we GET each hypertext reference in $hrefs and check if they are published. In order to perform HTTP requests, we use a Guzzle client, that gives us fine control over the HTTP request / responses:

function create_client($app, $request, $config) {
    $auth = $config["authorization"]["header"];
    if ($auth == "") {
        $auth = $request->headers->get("authorization");
    }
    else {
        $app['monolog']->addWarning('Oops, using hard-coded credentials!!');
    }

    return new Client([
        'timeout'  => 2.0,
        'debug' => true,
        'headers' => [ "Authorization" => $auth ],
        'http_errors' => false,
    ]);
}

When we're in our post processing, we can now make the GET request:

$app->post('/autopublisher', function (Request $request) use ($app, $config) {
    $client = create_client($app, $request, $config);
    ...
    foreach ($hrefs as &$href) {
        $GETRequest = new GuzzleHttp\Psr7\Request('GET', (string)$href, [
            "accept"=> "application/atom+xml",
            ]);
        $GETResponse = $client->send($GETRequest);
        if ($GETResponse -> getStatusCode() != 200) continue;

Content items have the following markup that declares their state.

<app:control>
  <app:draft>yes</app:draft>
  <vaext:state name="draft" 
               href="http://.../.../state/draft-published/editor"/>
</app:control>

For our purposes we can look at the name attribute to see if contains the word published or not:

        $entry = register_namespaces(simplexml_load_string($GETResponse -> getBody()));
        $state = $entry->xpath('/atom:entry/app:control/vaext:state');
        if (strpos($state[0]->attributes()["name"], "published") !== false) {
            $skipped ++;
            continue;
        }

Autopublishing images

Performing a state transition of an image is done by doing a GET operation on the content item, and then adding a desired action to the <vaext:state> element, as described in the Integration Guide. Finally, the whole atom entry is then PUT back to the same URL.

        dom_import_simplexml($state[0])->nodeValue = "published";

        $PUTRequest = new GuzzleHttp\Psr7\Request('PUT', (string)$href, [
            "If-Match" => "*",
            "content-type" => "application/atom+xml",
        ], $entry ->asXML());
        $PUTResponse = $client->send($PUTRequest);
    }
    return $RESPONSE_204;

If all goes well, we return a 204 NO CONTENT to indicate success, without any message box.

At this point, CUE should now automatically publish any unpublished related images, but ignore other relations, whenever a story is published.

Handling errors

The code does do some error checking, and it silently ignores some things that could go wrong:

In the final solution shown below, we add some counters to check how many images were attempted to be published and how many succeeded publishing to provide feedback in case some images were not able to be published.

<?php
// web/index.php

require_once __DIR__.'/../vendor/autoload.php';

use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
use GuzzleHttp\Client;

function register_namespaces($xml) {
    $xml -> registerXPathNamespace("atom", "http://www.w3.org/2005/Atom");
    $xml -> registerXPathNamespace("app", "http://www.w3.org/2007/app");
    $xml -> registerXPathNamespace("vaext", "http://www.vizrt.com/atom-ext");
    $xml -> registerXPathNamespace("vdf", "http://www.vizrt.com/types");
    return $xml;
}

function cors_headers($request) {
    return array(
        "Access-Control-Allow-Origin" => $request->headers->get("origin"),
        "Access-Control-Allow-Method" => "POST",
        "Access-Control-Allow-Headers" => "Content-Type",
    );
}

function create_client($app, $request, $config) {
    $app['monolog']->addDebug('Incoming request.');

    // prefer hard-coded auth if provided.
    $auth = $config["authorization"]["header"];
    if ($auth == "") {
        $auth = $request->headers->get("authorization");
    }
    else {
        $app['monolog']->addWarning('Oops, using hard-coded credentials!!');
    }

    return new Client([
        'timeout'  => 2.0,
        'headers' => [ "Authorization" => $auth ],
        'http_errors' => false,
    ]);
}

$config = parse_ini_file(__DIR__.'/../config.ini', true);
$app = new Silex\Application();

$app->register(new Silex\Provider\MonologServiceProvider(), array(
    'monolog.logfile' => __DIR__.'/../debug.log',
));

// CORS
$app->options('/autopublisher', function (Request $request) {
    return new Response("", 200, cors_headers($request));
});

$app->post('/autopublisher', function (Request $request) use ($app, $config) {
    $RESPONSE_204 = new Response("", 204, cors_headers($request));
    $client = create_client($app, $request, $config);
    if (! $client) {
        return new Response("Not authorized", 401, cors_headers($request)
           + ["Www-Authenticate"=>"BASIC realm=Autopublisher"]);
    }

    if ($request -> getContentType() != "atom") return $RESPONSE_204;
    $entry = register_namespaces(simplexml_load_string($request -> getContent()));
    $model = $entry->xpath('/atom:entry/atom:content/vdf:payload/@model');
    if (count($model) != 1) return $RESPONSE_204;
    if (substr($model[0], -6) != "/story" &&
        substr($model[0], -8) != "/gallery") return $RESPONSE_204;

    $hrefs = $entry -> xpath('/atom:entry/atom:link[@rel="related"]' .
                    '[contains(vdf:payload/@model,"/picture")]/@href');

    $failed = 0;
    $success = 0;

    foreach ($hrefs as &$href) {
        $app['monolog']->addDebug('GET ' . $href);
        $GETRequest = new GuzzleHttp\Psr7\Request('GET', (string)$href, [
            "accept"=> "application/atom+xml",
            ]);
        $GETResponse = $client->send($GETRequest);
        $entry = register_namespaces(simplexml_load_string($GETResponse -> getBody()));
        $state = $entry->xpath('/atom:entry/app:control/vaext:state');
        if (count($state) != 1) continue;
        if (strpos($state[0]->attributes()["name"], "published") !== false) {
            $app['monolog']->addDebug('Already ' . $state[0]->attributes()["name"] . ' ' . $href. '');
            continue;
        }
        dom_import_simplexml($state[0])->nodeValue = "published";

        $app['monolog']->addDebug('Publishing ' . $href);
        $PUTRequest = new GuzzleHttp\Psr7\Request('PUT', (string)$href, [
            "If-Match" => "*",
            "content-type" => "application/atom+xml",
        ], $entry ->asXML());
        $PUTResponse = $client->send($PUTRequest);
        if ($PUTResponse -> getStatusCode() >= 300) {
          $failed++;
        }
        else {         
          $success ++;
        }
    }

    if ($failed > 0) {
        return new Response(
            $success . " of " . ($failed + $success) . 
            " related images were autopublished.  " . $failed .
            " could not be autopublished.",
            200,
            array("Content-Type"=>"text/plain") + cors_headers($request));
    }
    else {
        // If all were autopublished successfully, no message to the user is needed.
        return $RESPONSE_204;
    }
});

$app['debug'] = true;

$app->run();

Suggested enhancements

As an exercise to the reader, if you want to extend the enrichment service, here are some ideas:

Conclusion

In this post, we showed how CUE could be extended to be able to automatically publish related images by way of an enrichment service.

The user experience is the best possible, because it's unobtrusive, and the journalist or editor doesn't have to leave CUE, it simply happens at the right time; they would only be notified if there were problems.

As with other enrichment services, these services are stateless, require little or no configuration, and "just work". They can be deployed and upgraded independently of both CUE and Content Engine. This makes them easy for developers to "try out" — even in production environments, and easy for operations to redeploy. They can easily be part of a CI/CD set-up.

They speak HTTP, both on the inbound and outbound, so they are quite easy to test. They can be tested using curl and simple verifications.

The business logic is contained within the service, so changing the service does not essentially affect other services. If the behaviour needs to be different, this particular enrichment service can grow to accommodate the business' needs.

All in all, enrichment services provides us with a loosely coupled, distributed, and flexible system for extending CUE.