Conquer the Omni-Channel with Visual Search
To say that consumers’ purchase journey has been altered indefinitely is an understatement. Consumers now have the power of connected search in their palms. They seek information as and when they need it and manage their purchase journeys with minimal brand intervention, with just one limiting factor. Their power to conduct searches till now was limited to their ability to articulate what they were looking for in the form of keywords or queries which a search engine could comprehend. Consumers’ ability to articulate information correctly governed the quality of search results.
Visual search technologies enable people to conduct searches with the help of images eliminating the need of entering complex word queries. By combining advanced image recognition technology with search algorithms, today’s Visual Search applications allow users to perform highly intuitive queries based solely on what they see (online and in the world around them). Consumers’ insatiable search for knowledge and answers will take on new heights with this technology.
The ability to access information intuitively through images has the potential to change everything. Using images to predict what a consumer might be looking for and connecting her to the right information using appropriate channels will truly deliver cross-channel bliss. On your way to office, you notice an advertisement for the latest range of ankle boots by Steve Madden. While waiting to pick up your kids from school, you decide to check stylizing options on Pinterest for those boots. You conduct a search on Google and find similar looking boots at Nordstrom and proceed to the nearest store the next day, only to find out that those particular boots are out-of-stock. As you can see your purchase journey at this point has come to a halt, as there is no direct way of purchasing the same boots that you saw earlier.
Visual search accomplishes a previously unattainable feat of consumer engagement at the point-of-consumer-interest or inspiration. With visual search, you now have the ability to quickly take a picture of the boots that you saw and immediately get results of retailers (online as well as brick-and-mortar stores) who stock it, details such as price and availability, along with any current promotions or discounts and any other bundled offerings that might be of interest to you. You can now proceed to either purchase it online or from a store. As the number of channels and touch-points of customer interaction increase, so does the need to deliver a consistent integrated experience across channels. Visual search technology as demonstrated above has the ability to truly deliver a seamless omni-channel experience.
Do you have any questions? Write to us at firstname.lastname@example.org.
Visual Search: What It Is and Why It Matters
To the tune of some three billion keyword searches a day on Google alone, we’re more than a little accustomed to text searching on our computer laptops and desktops over the years. However, as the bulk of Internet usage has swung from computers to smartphones, we have been freed from using the keyboard and mouse as our lone mode of input. Smartphones have empowered us with additional input devices, including our voices (through microphones) and our photographs and images (through built-in cameras).
And it’s a good thing we have more options. At best, text-based input on smartphones is slow, awkward, and error prone. At worst, our best technological efforts to improve this input mode have, for example, inspired numerous Web sites dedicated to the comedic angst of auto-correct failures.
Voice-to-text technology has come a long way towards enabling hands-free typing, and virtual assistants such as Siri and Google Now offer us a limited set of voice-activated commands on our smartphones. While they have made text entry easier on mobile devices, they are still rooted in the limitations of text.
Because when we search for something using text, what we’re really doing is entering a series of keywords that stand in as a surrogate for the actual concept, person, thing, or idea we are trying to find. Traditional search engines work by matching our entered keywords to ones they have pre-guessed we will use to find the various Internet resources that they index. Thus it’s like trying to find a word in English, but both the search engine and its user must first speak to each other in French with the hope that they will each translate correctly to the same thing.
This is why it is often a challenge to come up with the right search terms. The result quality of text-based search engines can also suffer because of common misspellings, typographical errors, or words that can have different meanings in different contexts (e.g., is “nickelback” a search for a music CD or a football player’s position?).
Enter visual search. The ubiquity of camera-ready smartphones have enabled an entirely new form of search input: pictures. And a picture truly is worth a thousand words. They contain details and a level of precision that a string of keywords can only begin to hint at. Photographs offer rich detail encompassing shape, color, texture, and pattern. Furthermore, because visual search does not rely on surrogate text keywords, it won’t confuse matches because of “misspellings” or multiple contextual meanings. It can identify things where you can’t even find the words to describe them!
But even in the much more common case where you can come up with good keywords to find something, the act of searching for it from a smartphone is slow, awkward, and error prone. Who has the patience for correctly typing in the text to find a particular pair of designer sunglasses? And even if you can speak into your microphone and generate a text search using a virtual assistant such as Siri, visual search allows you to simply take a photo of the sunglasses and search for precise matches with just one click.
Previously, visual search technology has existed in the form of bar codes and QR codes. While these provide very precise matches, they have proven to be very problematic for most customers to adopt and use successfully. Now with image-based searching, a click of your smartphone camera is all it takes to match the item and pull up related details -- whether your customers are identifying wine labels, consumer packaging, printed advertisements in magazines and newspapers, or an original artwork. Consumers have ready access to details such as price, reviews, inventory status, supporting photos and videos of the item, and the ability to purchase it over their phones.
Do you think visual search will mark a new standard for how consumers purchase products over the Internet in the future? Write to us at email@example.com
Visual Search: the Now and the Next
In a previous article we introduced visual search technology, and we outlined how it is poised to dramatically change the way customers interact with businesses through their mobile devices. In this article, we identify the industries that will particularly benefit from visual search, what visual search capabilities exist today, where today’s technology still falls a little short, and where these capabilities are headed in the future.
Retail is one of the more obvious potential beneficiaries of visual search technology. Enabling consumers to take a photograph of a product of interest, look up its details (e.g., its specifications, reviews, inventory status), and purchase it straight from their device would remove much of the friction in today’s buying experiences. Applications already exist that support everything from packaged goods to fashion, from home furnishings to appliances. Visual search also offers marketing and customer engagement opportunities, such as triggering consumers to view promotions, enter related sweepstakes or contests, or participate in social media tie-ins. Visual search matches are even triggering Augmented Reality (AR) experiences for customers.
One the other hand, retailers and market intelligence companies are using visual search to help automate the inventory monitoring of store aisles and promotional displays. The technology helps them ensure that physical stores comply with optimized planogram management and inventory programs, providing vast streams of data that can be correlated with Point-of-Sale data and market intelligence.
Manufacturing, government supply, and Computer Aided Design (CAD) are being revolutionized by visual search. With millions of new and replacement equipment parts in circulation, and millions more existing designs for parts at the ready for new production line requests, the parts industry has historically struggled with a lot of waste and inefficiency. Because individual machine parts are poorly distinguished through text-based descriptions and basic categorization, it has typically been easier to design a new replacement part instead of looking up an existing design from extensive parts catalogs. Today manufacturers have the visual tools to identify, classify, and eliminate redundant parts. They can even sketch a part shape by hand and pull up all its potential matches.
Museums, conference centers, libraries and art galleries are using visual search to provide supplemental information about their physical spaces, their collections, and any events within their walls. Visual search has also been used to help identify potential pathologies in medical image scans. It is even being used to help protect intellectual property, by identifying trademark violations, and crime prevention, such as scanning security videos for license plate numbers of known criminals.
Applications of Visual Search Today
Visual search most often leverages detection algorithms that discriminate based on visual properties such as color, texture, and shape. They can support features such as color search (e.g., “find additional clothing closest to this dress shirt color”) or clustering (e.g., “show me similar hammers or screwdrivers”). Visual searches can be triggered off of individual photographs or, as demonstrated with Amazon firefly, the mobile device can trigger continuous product recognition searches as you point your mobile camera at different items and move it about.
Visual search implementations can exist natively on a mobile device, allowing a user to match images to a local database independent of a network connection. However, these implementations are typically limited to 100 or so reference images of items. Most commonly, implementations are hosted in the cloud and require a network connection. The photograph taken with your mobile phone is submitted as a search query to remote servers with plenty of storage and visual matching computing power, and they return the matching results to your device over the network.
These implementations can handle up to millions of items and can support millions of monthly queries. The operating costs of these SaaS implementations are typically the product of how many reference images of items you wish to match against (i.e., your index size/storage) and the volume of customer queries your visual search must support.
While many visual search technologies operate primarily on the raw pixels revealed by your mobile phone camera, some technologies exist that augment it with associated metadata -- such as keywords or contextual social network information (when it can be found). These technologies offer some hybrid solutions between images and text that can provide additional contextual information about the snapped images and improve matching.
Another solution design detail to consider is whether the application should return as many quality matches as possible (as in the case of a best-fit for a machine part) or if it should ideally return one and only one match (as in the case of a specific line of designer shoes).
Technology Capabilities and Limitations
Today’s visual search technology works best on two-dimensional (2D, or printed) images, and it’s also the easiest system to implement, as you are effectively matching a 2D image captured by your camera with a reference-quality 2D image in your matching image database.
Of course, what every CEO of a major retailer wants is flawless 3D visual search where a customer can take a photo of any product in any environment and have the exact match appear on their smartphone. Users also want to be able to take a photo of a dog and find out what kind of breed it is. These are much more difficult problems to solve than the basic 2D match case, and technology labs around the world continue to work diligently on solutions. While there have been some successes in these areas, as we’ll mention later, the technologies that support these applications have not yet been reliable enough to meet consumer expectations.
Thus today’s state-of-the-art in visual search technology is particularly good at identifying items by product packaging, printed catalogs, labels, billboards, magazines and newspapers, artworks/paintings, and other print items. Even the 2D item case presents many technical challenges due to the limitations of mobile cameras, such as extensive glare, deep angled shots, dark or poor lighting, poor object framing, motion blur, shadows, and occlusion (i.e., when there are interfering objects in the foreground). The good news is that a number of these challenges have corrective solutions: flat as well as curved images can be supported, and software can correct for geometric distortions of images (e.g., shape, size, rotation, flips) and photometric distortions (e.g., brightness, contrast, color palate).
With the visual search of 3D items, these challenges literally take on an added dimension. For example, just the reference image collection process takes on a geometric complexity. Instead of matching 2D image to 2D image, the visual search software must first generate what amounts to an internal 3D model of an object from multiple visual perspectives. Thus instead of a single image to match against, an item must typically have 15-20 or more highly controlled reference images taken from multiple perspectives: top, bottom, all sides, and various angles in between. The actual visual search that compares a photographed 2D image with the 3D model must then interpolate all the potential projections between the item’s reference photographs while still correcting for many of the limitations of mobile cameras.
The good news is that there are examples where 3D visual search can work reliably. For example, it works when an object has distinctive patterns and logos (such as on clothing, etc.) or when the object is a distinctive landmark (e.g., a famous park, monument, or restaurant façade, etc.). We have made it work well on machine parts and home tools, such as hammers, screwdrivers, and wrenches. There are some who have even applied it to food.
However, there still remain a number of “miserable” scenarios where visual search needs to stay in the lab for additional improvement: e.g., items with no texture, transparent objects, elastic/bendable objects, tiny text, dynamic (moving) objects, thin or narrow items, highly deformable items (such as many clothing items).
What the Future of Visual Search May Bring
As visual search technologies evolve and improve, we will most certainly see more progress at resolving many of the limitations of mobile cameras and the limitations of 3D visual search. For example, Microsoft Research has been developing Project Adam, which can identify the breed of a live dog with implications for its Bing and Cortana products.
But what other capabilities might the future hold for visual search? For one, how we view and navigate visual search results could change dramatically. In cases where multiple high confidence matches are returned to the user, most search result pages follow the model and structure of their text-based search predecessors: ordered text listings with some graphical context. Future visualizations stand to take advantage of the different type of search content, such as offering spatial representations of images. Users could be prompted to select representative visual groupings of items to help aid the filtering process -- such as item texture, shape, or color. Users might also be empowered to use touch or smartphone gestures to refine visual searches based on specific details within photographs.
Vast projects are also underway in how we use visual search. One of the major search engines could already be on their way of indexing billions of online images to create a global visual search engine. Some pundits predict a future with something like a 3D Wikipedia for everyday objects, where users could trigger entire Augmented Reality experiences off the visual match of an item.
Future applications already underway also include searching for images within entire videos, and not just matching them against reference images of a static object. There are even applications under development that plan to monitor social media streams to trigger responses to what images are being tweeted, posted, and shared across the Internet.
What do you see as a future application of visual search technology? Write to us at firstname.lastname@example.org