This page contains information about my PhD research project. The title of the project is Semantic Web Enhanced Product Search (SWEPS) and it is funded by a NWO Mosaic scholarship (project 017.007.142).
The SWEPS projects aims to improve the online search for products by increasing the product search engine precision and recall and reducing the user search effort. For this purpose, we first investigate how to design a search engine that allows for the search of products using product attributes. Second, we focus on online product reviews for the purpose of review summarization and product attribute extraction. We develop intelligent methods that retrieve, integrate, aggregate, and present product information on the Web. For this purpose, we make use of Semantic Web vocabularies (e.g., GoodRelations, Schema.org).
Online product search, as a tool to help customers find their products of interest, has become more important than in the past, as nowadays consumers purchase more often on the Web. This is due to the fact that there is an increase in product specificity and consumer preference variation, as a result of technical advancement. The reason for this is that the technical advancement of the last years allows for flexible product development, which results in many product versions and variations. A second reason is that general wealth increase causes consumers to strengthen their preferences. Consumers are becoming more aware of their preference and are narrowing down their product search. The product search space on the Web has also grown, which makes product search more important than ever before. Another reason why product search is important is that in the long-run, precise product search systems are expected to contribute to the efficiency of the economy, by presenting relevant information to consumers at the moment they need it.
There are also other reasons why product search on the Internet has become more important. In the past, marketing has been management oriented. Currently, it is becoming more and more important to give customers a central role in the company. If a company is able to establish a successful customer relationship, it is highly likely that sales and revenues will increase. In order to establish a successful relationship with the customer, companies have to analyze the behavior of the customer. Product search is related to this as the consumers can express their preferences through product search. A product search engine that is able to deal with very precise product queries is therefore also useful for companies that want to analyze the queries in order to learn about the customer preferences. Companies would certainly also benefit from a search engine that can, for example, summarize product reviews or contrast two products with each (based on the reviews). This approach also allows companies to directly communicate with their customers, which in turn results in very valuable data and a image boost for the company in question.
Issues with current state of product search
There are several issues with the current state of product search on the Web. First, search engines (e.g., Google, Yahoo!) cannot deal properly with synonyms and homonyms. Although in other areas, such as resource tagging, the issue with synonyms and homonyms have been addressed, there is not a vast amount of related work for the e-commerce domain.
Second, there is no good support for multiple languages, and most importantly, the aggregation of Web-wide information is seldom done. In our context, aggregation refers to the process of collecting information from different sources, identifying the entities in the domain to which this information applies, and applying a aggregation operator to this information. For example, one can give the minimum price for a certain product on the Internet. An example of a more complex aggregation operation is the summarization of product reviews, where a set of review texts is summarized in a few positive and negative aspects of that product. These symptoms are present when we analyze the way we search for products on the Web. We keep switching back and forth from search results in order to compare, for example, prices of a certain product. It would be useful if product information is aggregated and shown in one view to the user.
Third, there is no parametric Web-wide search available. Users cannot use queries like ‘all mobile phones, having a screen resolution of at least 640×480 pixels, and a battery life of at least 10 hours’. The reason for this is that current search engines are designed to retrieve documents (or Web pages), relying mostly on keyword-based search. Because of this, the search engines are not suited for searching product information that is aggregated across different sources of information.
Obviously, data on the Internet is currently only understandable by humans, not by machines. By using Semantic Web technology, the Web shops do not have to customize their data for each product search engine. This is because Semantic Web technology allows Web shops to semantically annotate their Web pages, which enables multiple search engines to understand the data that is being offered. Allowing machines to understand concepts like persons, companies, products, etc., facilitates automatic aggregation of information over resources. In order to allow for this, Web pages must be properly annotated using information from a shared ontology. For instance, consider that a Web page only describes the battery life of a mobile phone and another Web page just the color. If a computer can understand concepts like ‘battery life’ and ‘color’, and it can identify that the two Web pages are about the same resource (a specific mobile phone), it can aggregate this information.
The focus of this project is on the improvement of product search on the Internet. As a result of this project, the Semantic Web Enhanced Product Search (SWEPS) framework will be proposed. The main research question of this PhD research is as following:
‘How can product search be improved for consumers in order for them to find more precisely and efficiently the products of their interest than by using existing Web product search methods?’
Improving product search is defined and researched by considering two main topics that are covered in this research.
First, we are going to investigate how to effectively aggregate product information on the Internet. As already mentioned, we use the term `aggregation’ for the process of collecting information from different sources, identifying the entities in the domain to which this information applies, and applying a aggregation operator to this information. There are several issues that arise with this type of aggregation, one of them being that there are no semantics in most information sources. Further, on the Semantic Web, where there are semantics, users can use different ontologies (‘agreements to describe resources’). Another issue is that users do not annotate Web pages correctly, i.e., they can leave out properties. For example, they do not specify the currency or use loosely defined vocabularies (as Google Rich Snippets vocabulary). As already mentioned, aggregation also applies to product reviews. In this case, we consider the extraction of product features from review text and the summarization of reviews (with positive and negative aspects). This will enable users to extract much more information from the large amounts of data on the Internet in less time.
Second, we focus on how to extend the current product search by enabling search for products by their features (parametric search). These features are obtained from either structured information sources (e.g., annotated Web pages) or information posed in natural language (e.g., reviews). An important goal related to this is to find the optimal user interface design with which users are able to perform parametric search in an easy and intuitive way. Further it is important to design a query language that is flexible enough to use the semantics of the information to the fullest, while remaining simple enough to be used by users that are not familiar with Semantic Web or technical query languages like SPARQL.