Thursday, December 27, 2012

Tour Shopping on the Web: the... Bad and the Ugly

Back in 1994, when I started hand-coding my first big web site about the tourist resources of a certain European country, we browsed the Web, literally. Web masters created web sites "by hand" (using a plain-text editor). They manually added hyperlinks between pages and to other sites (creating the "web effect"). They submitted their sites to a handful of web resource directories/catalogs, which were also compiled and maintained "manually". The number of web sites was, by today's standards, microscopically small, but even back then many realized that maintaining catalogs of web resources manually, as well as finding information on the Web by actually browsing the Web, would soon become impossible. A few short years later, there was a whole bunch of search engines available. We stopped browsing the Web and started browsing search results. More on how we browse or, rather, scan search results and decide which links to click is in my previous post, Chicken Thighs, SEO, Structured Data... and More.

There probably aren't too many people who are satisfied with what they get from search engines even most of the time, let alone always. Here are some of the reasons why it is so:
  • Humans are generally not very good at formulating questions in a clear and unambiguous way. Consider these examples:
Screenshot 1 (click to enlarge)

Screenshot 2 (click to enlarge)

Screenshot 3 (click to enlarge)
  • As shown above, computers are not very good at understanding questions unless they are clear and unambiguous. Besides, search queries, most of the time, are not even real questions, but just a bunch of strings without context. Warning: if you try asking a search engine questions in a natural human language, you are going to be disappointed even more; there are some exceptions, of course:
Screenshot 4 (click to enlarge). Disambiguation: it's a joke. Google it.
  • Humans are not very good at explaining and describing things, answering questions, etc. in a clear and unambiguous way (at least, not by the computer standards of "clear and unambiguous"). What makes things even more complicated is that humans may create web content that is unclear, ambiguous and/or misleading on purpose.
  • Most, if not all, web content is created with a human reader in mind, yet there is always a machine between the content and the end user, and, yes, machines are not good at understanding content created for humans.
Do I really have to explain what is wrong with this situation?!



Enough theorizing. Let's run a search engine tour shopping experiment.

Everybody knows the basic web-shopping drill: go to a search engine, type in some keywords, scan top 10 or so results, decide which 2-3 links to actually click, make your purchasing decision, buy. We have all used this technique to buy books, tires, laptop batteries and what not. Indeed, "one-piece" items, especially those that have unique identifiers (like a part number or ISBN) are quite convenient to shop for directly from a search engine.

Let's try the same technique to shop for a tour package. Say, I am interested in crossing the Alps from Austria to Italy on a mountain bike in the summer of 2013, and I would like to purchase a guided package tour.

We are going to use the same keywords in four major search engines. Note that, since Bing is essentially just re-branded Yahoo!, even though there are four screenshots below, technically we are dealing with three result sets.

Screenshot 5: Google search results (click to enlarge)

 
Screenshot 6: Yahoo! search results (click to enlarge)

Screenshot 7: Bing search results (click to enlarge)

Screenshot 8: Yandex search results (click to enlarge)


I have no desire to waste my time writing (and yours - making you read) detailed, item-by-item analysis of each of the three result sets. So, I'll keep my remarks as brief as possible.

The top three results shown by Google actually are mountain bike tours in the general area of the Alps I am interested in. The problem, however, is that none of them includes both Austria and Italy. Yes, the starting point of tour #1 is very close to the Italy-Austria border, but, still, on the Italian side. Besides, the search engine was unable to tell the difference between "guided" and "self-guided". So, that is not quite good enough. #4 appears to be a road cycling itinerary from Italy to France. Further down the list, the relevance decreases as expected.

The results produced by the Yahoo!/Bing search engine are all over the place. It seems to favor (although not very consistently) pages that list a lot of "stuff", and, as long as they have something to do with tours, riding a bicycle, the Alps, Austria and Italy, considers them relevant. For example, #1 is a very long list of annotated links to descriptions of trails in Austria, which might be useful for someone self-developing a tour, but not for someone looking for a finished product. #2 and #5 are bicycle tours: #2 includes Austria, Switzerland and Italy (geographically, it's a match), #5 - Switzerland, Italy and France, but they both are road cycling tours, so geography doesn't really matter. #3 is, oddly enough, just a general description of Italy as a tourist destination from a company that doesn't offer any active tours. #4 is another list, this time of over 20 European road cycling tours from a major bicycle tour operator (there's not a single MTB tour on the list though). Further down, it gets even less relevant. In the middle of completely irrelevant links, there is #8, which, although not a tour offer, turned out to be kind of a "hidden gem" (a very detailed blog post of someone who had bought a tour like the one we are searching for with a link to the tour operator's site). Finally, under #10, Google's pick #1 pops up.

Now, let's have a look at the search results returned by Yandex. #1 appears to be an offer from a tour operator, but it doesn't include Italy. #2 suggests that Rick Steves runs mountain biking tours (I strongly doubt that). #3 does meet all the requirements. Starting with #4 all the way down, the results are totally irrelevant. Some of them have something to do with the area and bicycles, others - nothing at all, but none of them is actually a tour offer. The order of relevance, in general, also appears inconsistent.

No, I am not going to announce the winner. That was not and is not my intention. Besides, objectively speaking, they are all losers since the number of relevant results each of them produced is somewhere between 0.75 and 1.

Some might argue that, by playing around with my search strings (e.g., replacing "tour package" with "vacation package", "holiday package", putting the word "package" first, dropping it altogether, changing "guided" to "escorted", etc.), I might have gotten better results. That is true, but that is exactly what proves my point: search engines do not "understand" the meaning of the web pages they index and search (in addition to the fact that it's even harder for them to figure out what the search query itself is about). To be fair, I have to say that they are not totally dumb because, as a result of doing some sort of quantitative analysis of huge amounts of text, they can, on a very basic level, "guess" what a piece of text may be about, but they still have a very long way to go (just look at the relevance of Google AdSense ads, which is very much "hit and miss"; and that's how they make money).

Speaking of money, one has to bear in mind that, even though search engines may be perceived as public service due to their ubiquity, they are not. They are businesses, which means the following:
  • There are no real incentives for any one search engine to drastically improve search quality, which is going to cost a lot of money (not only in research and development initially, but also in increasingly growing processing power demand for years to come), unless another or others do so. In other words, even if they are secretly working (and they probably are) on some advanced technologies to make their search engines considerably smarter, they are not going to roll them out unless either they see that the competition starts draining away their traffic or they decide to destroy one or more of their major competitors. As long as everybody is more or less "happy" with the status quo business-wise, there will be no major improvements technology-wise.
  • It makes more business sense for search engines to keep improving their demographic and on-line behavior data collection and analysis tools (which are already in place and are not that expensive) in order to improve relevance of paid ads rather than heavily invest in technologies like natural language processing, AI and the like (beyond the rudimentary level on which these technologies probably are being used already).
  • Major search engines appear to have figured out a way to off-load at least some of their problems with inability to "understand" unstructured web content onto content producers themselves by giving them (somewhat vague) incentives to structure their own web content (more on this below).



Initially, I was not planning to write about tour portals, booking sites, aggregators and the like. However, some might argue that the general-purpose search engines used in the above experiment are not meant for such specific purposes as searching for complex products using complex search criteria and should not be viewed as tour product distribution channels. So, I decided to take a look at a few portal-type sites that specifically deal with packaged tour products.

Specific implementations of portals and similar applications may vary, but the general principle is essentially the same. They all are built on top of some kind of a database that allows storing information (in our case - information about tours) in a structured way. "Structured" simply means that each tour product has a standard set of clearly and uniformly defined attributes, e.g. start and end dates, price, start and end locations, hotel categories, what services are included, etc. This should allow (in theory, at least) a shopper to search/filter tours pretty much the same way he or she would do it while shopping for, say, a computer on a well-designed computer-shopping site.

There may be thousands, possibly hundreds of thousands, of such sites on the Internet. I have looked at, maybe, a few dozen. Below are just some of my observations. I don't claim this to be a thorough study of the subject.

The first thing one can't help noticing is that, unlike in the air travel and hotel accommodation markets, there aren't really any web sites listing and/or re-selling package tours that one might call "major players". The fact that nobody dominates the niche may be a good thing. However,
  • in order to get more exposure, tour operators have to submit their offers to multiple sites, while
  • tour shoppers have to check multiple sites to find what they are looking for.
Businesses may be willing to put up with the inconvenience (after all, we all know that making money requires time and effort), but potential customers - not so much. Not being able to find what they are looking for on a few such sites, potential customers go back to general-purpose search engines, and those, as we have already seen, are not of much help either. By the way, in our "search engine tour shopping experiment" above, only Google showed top three search results from a booking agent site (those weren't exact matches, but that's a different story).

Most of the sites I have looked at expect visitors to browse (by country, by activity, etc.) and not search/filter. If you have 10-15 categories, and each category has about 10-15 items in it, browsing is OK. Once you exceed those limits, nobody is going to browse your site. So, essentially, you limit yourself to about 200 tours if they are more or less evenly distributed across categories.

Since these sites are product-specific in that they deal with tours only, one would expect tour data to have more granularity, which would allow a potential customer to narrow down his or her options by an array of parameters. Using my earlier computer-shopping analogy, it might look kind of like this:

Screenshot 9: (click to enlarge)

To my (and, I am sure, potential customers') surprise and disappointment, most tours are reduced to just 3-4 parameters: destination (country or, at best, country and state), activity type (usually only for active tours), price, and duration. All other relevant information, even if it is available, is buried in a page or more of unstructured text (or, even worse, a pdf file) that one is expected to read in order to (maybe) find out more important details. What can I say? - Good luck with that marketing strategy!

The main reason why this is happening is the inability of the tour operator industry to recognize that a tour, just like any other product, can and should be described using a universal (i.e., industry-wide) data model. In the absence of such a model, generic product data models (applicable to a wide variety of goods and services across different sectors) are being used. Such complex products as tours, by definition, cannot fit into the constraints of a simplified generic schema. As a result, important attributes that describe the product get "stripped". To counter that, tour operators and re-sellers are trying to use the "brochure" approach. It may have worked in the mid-1990-ies, but it is not working anymore.



Conclusions (sort of):

The 2011 agreement between Google, Bing, and Yahoo! (later joined by Yandex) that gave birth to schema.org, a common vocabulary/ontology used to mark up structured data in HTML documents, may be considered
  • on the one hand - an admission that their late-1990-ies technologies (even though they have been incrementally improved over the years) cannot reliably extract data from unstructured documents,
  • on the other hand - a signal to the wider Web community (not just scientists) that transition from the "web of documents" to the "web of data" either has already begun or is about to begin and that they are willing to cooperate with one another and web content creators to make it happen.
I have never been accused of being overly optimistic, but, even to me, it looks like a promising move in the right direction.

In practical terms, for a tour web site owner (tour operator, aggregator, agent or whoever you may be), adoption of schema.org means that, in addition to regular HTML, you insert schema.org mark-up that "tells" the search engines where on the page structured data is and what every little "piece" of it means. The mark-up looks kind of like this:
    <div itemscope itemtype="http://schema.org/Product/TourPackage">
      <H1 itemprop="name">Epic AlpenX MTB Tour
      </H1>
      <span itemprop="description">Unforgettable experience, breathtaking vistas, adrenaline-pumping downhills... blah...blah...blah...
      </span>
      <div itemprop="tourDuration" itemscope itemtype="http://schema.org/Duration" datetime="P7D"> 8 days / 7 nights
      </div>
      ...
    </div>
This may (and I am deliberately not saying "will") result in one or more of the following benefits:
  • In search results, the snippet of your web page should (in Google's own words) "give users a sense for what’s on the page and why it’s relevant to their query". In my interpretation, the "pieces" of data displayed in search results should allow users looking for package tours to easily separate pages that contain actual tour offers from other pages that just happen to contain similar key words.
  • Over time, as a result of "consuming" your structured data, search engines may "learn" the specifics of your business, which may lead to higher search engine ranking for queries that specifically search for tours. For example, your page with a tour offer may rank higher than, say, a page of a local visitor's bureau that contains the same key words, but is not a real tour offer.
  • Your structured data may be "consumed" by other web-data-aware applications. For example, aggregators may be able to aggregate your tour offers without you having to submit them manually. Also, your data may be presented in new and interesting ways, e.g., as map overlay or mashed up with other data sets, like data about tourist attractions or weather, etc.
The first two is what I meant by search engines' "(somewhat vague) incentives" for web site owners "to structure their own web content".

Some more on this (not in tourism-specific context though) is in my previous post, Chicken Thighs, SEO, Structured Data... and More, which gives a non-technical conceptual explanation of what this structured-data-thing in relation to search engines is about and why you might want to know more about it.

Of course, neither structured data in general nor schema.org in particular should be viewed as a magic pill that will instantly bring you hordes of customers. Also, I would like to point out that, even though schema.org is definitely a major step in the right direction, it is not a perfect fit (at least, not in its current state) for tour products. Yes, it may be used as is, but, in order to be able to describe tour products in all their richness, an industry-specific schema (vocabulary, ontology - call it whatever you want) may need to be developed. Luckily, quite a few types/classes and properties/attributes from schema.org and other vocabularies/ontologies that are applicable to tour operator business may be re-used. Schema.org seems open to the idea of including "extensions that gain significant adoption on the web... into the core schema.org vocabulary". The same seems true for third-party vocabularies/ontologies (for example, GoodRelations and rNews have been recently incorporated into schema.org). Even if that does not happen, and search engines continue to support only the types and properties that are in the schema.org core, the industry will still benefit from having a common machine-understandable language, but that is a topic for another blog post.


P.S. In case this post of mine made you all excited, and now you can't wait to start implementing schema.org markup on your web site, I suggest you first read the post I wrote about ten months later, Describing Package Tours With Schema.org: Too Much Pain, Too Little Gain... If Any.

No comments: