Videos of Content Platform seminar


Here is the video of the seminar we gave recently at Internet World 2011 on the Content Platform we built for TUI Travel. You can see these and other videos on the Priocept YouTube channel.

In the seminar, we cover the following:

  • What was the business problem?
  • Proposed solution
  • How did we design and build the system?
  • Why are these technologies so great?
  • How did we make the system scalable?
  • How do I get value from a Content Platform?

Update: we have now published slides of the Content Platform seminar; a Case Study on the TUI Content Platform is also available.

Part 1:

Part 2:

Transcript

Slide 1

Hello, good afternoon everyone.  Welcome to the Content Management theatre at Internet World 2011.  My name is Matthew Skelton, I work for Priocept and today I’ll be sharing with you our experience of building a scalable content platform for TUI Travel.

We’ve got some Twitter hash tags set up over there if you want to join in the conversation online; we’ll show those again at the end of the presentation.

Slide 2

You’ll be glad to know that there are less than 10 slides here in this presentation, nothing too much to take in, all nice and short.  So, we’ll talk about TUI Travel, who they are, what they do.

The situation back in 2008 when TUI Travel came to Priocept with a business problem.  What was that business problem?  We’ll look at the solution that we proposed, how we then went and designed and built that system, what makes the technologies that were involved so great, and then one slide for the “techies” who are here, everyone else can fall asleep for this one slide which is; how do we make it scalable and then finally a question that you probably want to answer today which is:  how would I get value from a content platform? Why would I want one?  Why would I build one?

Slide 3

So, at Priocept, we build systems that underpin online business.  We’ve got some very demanding clients, as you can see on the list.  I’m not going to say anymore, please come and see us at stand 6036 which is just through there [points] in the content management area.

Slide 4

TUI Travel, the world’s leading leisure travel company, operating in 180 countries, 30 million customers worldwide.  They are the only travel company on the FTSE 100 index on the London Stock Exchange and their revenues last year some £13 billion.  So obviously quite large. [TUI has several] brands that some of you may recognise, some of you may not.  So [in the UK] Thompson, Late Rooms, First Choice.  On the content we’ve got Mamara, Arke, Hotelopia and so on and so on. 200 different brands and websites across the whole TUI Group.  They own hotels, resorts, aircraft – in fact they own more aircraft than EasyJet.  It’s a big operation.

Slide 5

So the situation back in 2008 was, following a period of expansion and acquisitions, TUI had more than 200 customer facing websites selling holidays, selling flights and so on.  Particularly in the travel sector but obviously in other industry sectors too.  They [TUI] need content, they need high quality content.  In the travel sector there are some specialist suppliers, for example Lonely Planet actually makes more money from selling digital content than selling the paper guide books that you’re probably familiar with.  [There are] Various other suppliers as well, including some internal content suppliers within TUI themselves.

The various different websites that TUI ran needed to source the content, manipulate the content, package the content, deliver it to their customers via their websites.  We’re talking about movies, text, data, photographs, user reviews, weather forecasts, all sorts of things like this – Geodata.  If we turn to this diagram here on the slide [5] – let’s say we’ve got a series of content sources across the top here numbered 1 to 5 and some websites at the bottom, A, B, C, D.  Let’s say content source 1 supplies high quality photographs of destinations or resorts.  Websites A and B both would like a picture of that destination, they’re both selling holidays to that destination so they both need to make a connection to that content supplier in order to retrieve the content and pull it into their system.  Content supplier 3 – this is supposed to be a weather widget – content supplier 3 provides weather data.  Websites A, C and D all want weather data, they all have to connect into this provider.  So you can see here we’ve got a mass of connections between the websites which need to display the content and the content suppliers at the top who need to provide that content.  It gets very, very complicated and we’re talking about 200 websites not just 4 websites here.  So you can see the situation is very, very complex.

Slide 6

So that, in a nutshell is the business problem, a huge amount of complexity, cost and effort in building, integrating , sourcing, operating this inter-connection of systems and, at the same time, perhaps the same photograph or the same video or the same piece of weather forecast [data] would have been bought by multiple websites so there’s obviously a cost inefficiency there as well.  So a whole proliferation of lots of activity, not very efficient, we’re re-inventing the wheel each time we add a new website or content source.

So TUI were keen to be able to have a solution that would allow them to negotiate group content deals, so, deals with these content providers that would be valid across the whole TUI group … so reducing costs there.  They would also be able to reduce their time to market for new website or new parts of websites if they wanted to add a new user review section or a new photo gallery section – a system that would allow them to do that too.

Slide 7

So working with the ecommerce team within TUI Travel, Priocept proposed a solution that looked something like this (points).  A hub of content, a services based hub of content, high performance and scalable, we’re talking about 200 websites – that’s what we need to think about to support.  It was key that we integrated it with their Corporate Master Data, we’re talking – we’ll see a little bit later about the actual figures – but we’re talking about hundreds of thousands of hotels and destinations.  This is core business data for TUI, we need to integrate it with that, it’s not enough for us simply to search for a given image, it must be an image of a particular place or hotel.  Various other technical things like caching statistics and so on, we [also] need to be able to control access to who has this content.

If we look at the diagram here you can see how much simpler this looks compared to the one we saw previously.  Content sources at the top but the complexity has been vastly reduced because these websites now just have a single connection into the content platform here in the middle.  That’s all they care about: the content platform; they don’t even almost know about these content sources at the top.  So we add a single website, we add one connection, we don’t have to add multiple content connections to all the content sources.  It’s almost like the images here at the top – I’ve put them in sepia – they’re no longer the definitive source of information if you like, that’s now content platform in the centre.  You can think of this system a little bit like a digital concierge in a hotel, given that it’s a travel company we’re talking about.   You go to the hotel desk, in any language which you speak you expect to receive a response.  You ask them “What’s the weather going to be like tomorrow in Madrid?” or “What is there to do here in Barcelona?”; it’s going to give you the information that you want.

Slide 8

How do we go about building the system?  What became very clear to us very quickly is that the solution was not an off-the-shelf solution, we’re not talking about a web content management system here.  That might be suitable if we’re talking about a single website, or perhaps a small set of websites which could be delivered from the same WCM solution, but that’s not the case.  We’re talking about very disparate sites in separate geographic locations all around the world which – the common thread between these sites –is sharing the same content, it’s not about running on the same infrastructure.  Digital asset management , there’s an element of that but there’s so much more that we would need to build around a DAM system that, again, an off-the-shelf DAM wouldn’t have been appropriate.  Perhaps part of an ecommerce system product information management, again, it doesn’t really fit very well; there are some elements which could be shared but not very many.

The technology which stood out as being most appropriate was Java Content Repository and this is an open standard for building systems like this, hubs of content, managing that out of the box with a Java Content Repository comes many features that you’d expect to need and it saves you having to build it yourself – we’ll go through some of those later.  The 2 main technologies for JCR are DAY CRX, they’ve now been bought by Adobe and Apache Jackrabbit.  In fact, the DAY CRX system is actually built on top of Apache Jackrabbit.  We did a prototyping, about a month.  It became clear that the extra features that DAY CRX provide weren’t really needed or appropriate for the task that we were trying to solve so we went with Apache Jackrabbit, that’s a free open source technology.  We ended up with a set of technologies that you can see here (points), so Java, Spring, Enterprise Red Hat Linux, VMware for virtualisation and so on.  We’ll have a look at this in a little more detail later, what they provided.

But it’s not just about the technology is it?  It’s about the process, it’s about the documentation of that, it’s working together with Operations to get that synergy between the development side and operations side.  We owned the product road map as well, we worked with TUI to develop and take that forward and so on and so on.  So, its combination of those technologies and the extra things around it [which was needed].

Slide 9

So what makes these technologies so great for the system that we were building?  As I hinted at, JCR or Jackrabbit and the database [provide] a kind of core for the whole content platform.

Out of the box comes ability to query, control access, [enable] versioning of those content items, replication, all sorts of things that are actually not very interesting to build yourself – we got that, in effect, for free by using Jackrabbit as the core technology there.  In fact Jackrabbit is actually used as the core data store for many Java-based content management systems, so Alfresco and Magnolia, for example, they use that as the core technology there.  Obviously those are very much focused on web content management but they can be used for something much bigger like this.

What these technologies, then, all provided was this huge simplification of content provision for these 200 websites.  As a single integration point for all of these websites , we could then start to build out more exciting things like server-side mash ups, image galleries for example, whether they’re  JavaScript or whether they’re, let’s say PHP or .NET or Java components.  We can re-size on the fly, we can watermark on the fly and concept of content fallback was an interesting one we came up with whereby, let’s say you really would ideally like [a] British English description of a hotel, but you’re happy to accept an American English version, US English version or, at the end of the day, perhaps an Australian English version would be fine for you.  So we developed this concept of content fallback – you specify your first preference and then, if that specific kind of content is not available, then you would fall back to an appropriate kind of content.  That also works with image types and also works with content sources, so you might have a particular preference for one of the content sources that we saw back there because you prefer that or something but you’d be happy to accept content from another source as well.

Another key point was that TUI needed to keep the content fresh and relevant; there’s no way we could have had human beings loading this content in, checking it all the time, managing that – we had to automate it, we had to load on the scheduled basis and on a regular basis.  So, that was a key part of the system as well, making sure the content loading and so on was automated.

Slide 10

If you’re less technical, feel free to fall asleep for this slide and we’ll come back after that in a couple of minutes.

So, how do we make this system scalable?  We needed to plan for these 200 websites connecting.  To start with we were expecting about a terabyte of content, that obviously grows in time and we’re talking about something like about 200,000 hotels and 70,000 locations around the world, each of which may have a whole bunch of content associated with it.  So, we needed clustering, load balancing in the appropriate parts of the system, obviously a separation of content delivery from content editing.  One thing which helped us a lot was the concept of virtual appliance, a slightly unusual deployment model where we put an entire content delivery package on a single virtual machine.  That allowed us to, that gave us some useful advantage, I’ll come back to this is in a second.

Some of the content we stored locally in the repository, some of the content we left external to the system.  There are different reasons for that in different cases – sometimes [for some] content suppliers, part of their licensing agreement was that they kept the content in which case we left the content with them, sometimes it was more appropriate to bring the content within the system and store it locally.  By using Squid Cache, not only did we achieve a huge performance improvement in serving that content out, but we also could develop a way in which the websites which were connecting into the content platform did not need to know where the content was stored.  To them it seems like it’s stored all in the content platform.  As it happens, some content was stored externally, some content internally, in some cases a sort of mixture – the websites connecting in simply didn’t need to know about that, all they need to know was content platform and the way we built it helped to hide the complexity behind there.  I have to say – Squid Cache –  if you‘re looking for a quick win or a performance in a content rich web system or website, you might try content delivery network like Akamai quite expensive, quite difficult to set up and manage – try Squid, I’ve used that for 12 years, it’s an incredible piece of technology, it continues to work very, very well – we were caching everything, apart from some more complex things like interactive and streaming media – [Squid Cache is a] very, very good piece of technology.

This virtual appliance thing allowed us to take instances of the content delivery servers and host them elsewhere, that helps with redundancy and speed so, for example, if they’re hosted in another part of the world that was far away from the where the main part was hosted, that also helps to improve the speed for their local markets.

Slide 11

So how do we get value from a content platform?  Why would you look to build one?  If you’re in a situation where you’re aggregating content, you need to build links to business entities, [you have] complex search and retrieval requirements, if you’re thinking of doing service side mash ups or aggregation or re-presentation of content from the server side, perhaps you’ve got a requirement for a content platform.  If you’ve got a multi-market situation or multi-site and you need to share content within the group, perhaps a content platform is going to be appropriate for you as well.  If you’re looking at subscription models or pay-per-use use models for content – again, something like a content platform is going to be appropriate for you.  It’s independent of web based presentation.

So, the 2 main examples of monetisation of a content platform so far – Guardian Open Platform, that’s been very successful, a lot of traffic around that, a lot of excitement there.  New York Times as well.  I know of at least one other major UK-based media organisation who is literally building a content platform right now.

The other area would be User-Generated Content – let’s say you’ve got lots of reviews, people are uploading photographs, commenting on your products, you don’t really want that information inside your ecommerce system or your web content management system.  A content platform is great for that, you can re-purpose the content later, potentially you could sell those reviews on to somebody else or host them for somebody else to pull onto their site.

Looking forward, we have some very exciting technologies which will be coming up in the next few years, we’ll be able to make more of in the next few years, obviously mobile is increasingly large, HTML5, lots more power in there compared to previous web technologies, search technologies like Solr, plugging those into a system like this – we can derive a lot of benefit.

Slide 12

So this is really a list of things that we’ve looked at so far and a few responses to those. The business problem was those repeated complex integrations, lots of time and money they involved. What was the proposed solution? This hub of content with intelligent processing, caching, server-side side activities, that sort of thing. How did we design and build it?  I would definitely recommend prototyping these kind of systems because it’s not out the box, because it’s not a product which you go out and buy, make sure you understand the technologies and the limitations or the advantages in specific cases of the technologies you’re proposing to use.  Make sure as well that you bring development and operations together, get them talking to each other at an early stage – you’ll win massively by doing that.

So JCR and Jackrabbit in particular, core features … save us having to build a lot of boiler plate stuff which is pretty tedious to build.  You definitely need it for a content system like this.  How do we make it scalable? First off Squid – definitely your friend, get the clustering right.  Virtualisation helps a lot, if you get that right again, there’s some big advantages there. Finally, we just talked about how you get value from a content platform like this.  From being able to re-use that content, serve it to other people, to being able to re-sell it through different channels, etc., etc.

That is the end of what I’ve got to say; I’d really welcome any questions that you’ve got, we’re here on twitter if you want to push any questions there.  We’re on stand 6036 just across the way, do come and talk to us if you want to talk about this or anything else, we very much welcome those questions.

Question #1 from audience member:

Hi, could you tell me something about the timescales involved in actually, from initial start up on this project to actually seeing deliverables?

Answer:

So we were approached in 2008, initial prototyping about a month or so and this first phase to the point where it was a live system, that took us about a year and 3 months or so, so 15 months.

Question from another audience member:

You mentioned load balancing and so on, why did you pick low balancing and clustering over say Cloud based services?

Answer:

So the question was around – I think the question is, why did you have to go down the route of worrying about low balancing and clustering rather than just using cloud based services.

At the time, obviously back in 2008, we weren’t in a position with Cloud services where that was such an option; I think some of them are coming online then, I think now yes, we would definitely look at that as an option instead of having to look at that kind of detail.  There obviously still are issues with Cloud services – Amazon’s outage recently shows that, so it’s still important to understand why you would configure things in a certain way – but yeah, you’re absolutely right, it would definitely be a consideration these days.

Thank you very much everyone, appreciated.

Leave a Comment

(required)