Systems that ease the access to web information and knowledge, come with a variety of different issues.
The last thirty years have witnessed a technological and cultural revolution related to the notions of information and knowledge generation, sharing, and access. The birth and progressive evolution of the World Wide Web (WWW) have led to the availability of a massive and distributed repository of heterogeneous data and potential information, openly available to everyone. This phenomenon has been further emphasised by the conception and implementation of Web 2.0 technologies, which allow every user to generate content and share it directly with peers through social media, without almost any traditional form of intermediate trusted control.
The availability of an enormous and intangible world of potential answers to a multiplicity of information needs, has motivated a large wealth of research aimed at defining and developing effective and efficient systems capable to timely provide the right information to users, offering them a support for easily finding a path in the intricate forest of the WWW. Among these, search engines and recommender systems constitute nowadays two prominent categories of systems that address these issues. Huge efforts have been made over the last years to improve the performance of these systems, by increasingly accounting for the notion of context and by leveraging the user-systems interactions in an attempt to automatically learn the real user context and dynamically adapt to it.
To this purpose, a wide range of techniques have been exploited, including machine learning and various other techniques falling under the umbrella of Artificial Intelligence. From recent years, search engines can process multimedia contents and capture some elements of the user context to tailor the search outcome to each specific user, thus overcoming the ‘one size fits all’ search paradigm. Moreover, the conversational search paradigm has been introduced to offer users a human-based dialogue interface, which can ease the user-system interaction and provide better, more precise, and more relevant information through dialogues. In the domain of recommender systems, where personalisation is a core concept, the role of context has also been recognised and is increasingly considered by the research community.
The idea and foundations of the Semantic Web have promised a further step towards offering a semantic structure to the WWW wealth of data and information. The availability of data and content related to various knowledge domains has motivated several attempts to provide formal languages and technologies for representing (domain-specific) knowledge, as well as for reasoning with it, with the perspective of providing users with structured knowledge representation and management; this offers a means to fill their knowledge gaps with a better automated support. Moreover, on top of the WWW some human-generated resources (such as Wikipedia and linked open data) have made it possible to provide structured content (knowledge) that can be easily accessed, also through more traditional search systems.
However, there are some important issues to be considered when defining systems and models that should ease the access to information and knowledge on the Web. Namely: is the quality of the results provided by the available systems reasonable? Is there a bias in the processes producing the output offered to users? Is it possible to avoid filter bubbles in which users could be immersed as a consequence of personalisation processes? Is the intrinsic uncertainty that characterises the process of formation of information and knowledge adequately modelled? In the following paragraphs, these issues will be addressed.
Information quality and credibility
One of the big challenges related to the problem of locating Web content useful to specific user needs and tasks, is how to assess the quality of the content itself, and consequently, how to retrieve only that at the highest standing. One important dimension of quality is credibility, which can be referred to both the source that generated the content and the content itself. The importance of dealing with information credibility has been emphasised by the spread of User-Generated Content (UGC) in the Social Web, where the absence of traditional intermediaries can lead to the diffusion of inaccurate, false, and misleading information.
In fact, in the so-called post-truth era, a huge deal of the current research tries to address the issue of discriminating in an automatic or semi-automatic way the fake contents from the genuine ones; this is, in fact, helpful to users, as humans may fail in assessing the credibility of content, which is affected by several different characteristics.
Another important issue is related to information accessibility and information bias: are the contents available on the Web accessible with equity and in a neutral manner? As pointed out in the literature, neutrality is far from being achieved within the Web, as bias actually encompasses various kinds of aspects, including data bias, activity bias, gender bias and algorithmic bias; some of them are technological, but many are also social and educational.
Personalisation bias
As previously outlined, personalisation may induce an implicit bias in both the search and recommendation tasks. Defining a user model and making use of it to identify content useful to a given user has the risk of immersing said user in a filter bubble, from where the probability is lowered of incurring in an interesting and informative content, which is dissimilar to their previous interactions and choices, and does not match their current experiences. This bubble confines serendipity, which in a metaphor of life in the real world can be assimilated to the discovery of unexpected and unimaginable but beneficial aspects of reality when exploring territories in the search of something interesting.
This phenomenon has important consequences in social relations that are developed in the context of social media: the phenomenon of echo chambers has been in fact observed when communication and repetition of information inside a closed system amplifies and reinforces beliefs of people, leading to a potential confirmation bias. Echo chambers, in this way, are hence known for promoting, under certain circumstances, radicalism and misinformation.
Modelling uncertainty and vagueness in knowledge representation
An important research issue related to the problem of making information and knowledge accessible on the WWW in a system-supported way is how to represent these data, information, and knowledge. Search engines and content-based recommender systems both depend on a formal representation of contents (e.g. web pages, texts, images, sound), which is based on content features. From the simplest representations (e.g., the bag of words representation of texts), in time more accurate, semantically richer and potentially more informative representations have been modelled (e.g., word embeddings).
In particular, with the aim of enhancing from a semantic point of view the representation of information and knowledge, especially of domain knowledge (mainly with the birth of the Semantic Web), richer formal languages have been defined and employed; e.g., to create (domain) ontologies. In 2012, Google introduced the expression “knowledge graph” to refer to its system that represents and combines various kinds of facts to the aim of enhancing search results with additional summarised information.
Although there is still no accepted standard definition of Knowledge Graph, an important issue intrinsic with the long-standing problem of knowledge representation and reasoning in Artificial Intelligence is how to represent and account for the uncertainty that characterises the human cognitive experience.
Research at the IKR3 Lab
The Information and Knowledge Representation, Retrieval, and Reasoning (IKR3) Lab is a research Lab within the Department of Informatics, Systems, and Communication (DISCo) at the University of Milano-Bicocca, Italy. The research undertaken by the Lab is aimed at developing models for the representation of uncertain information and knowledge, and reasoning with it, as well as to define systems capable to exploit data, information, and knowledge in applications such as search engines and recommender systems.
Some of the research questions addressed by the lab refer to credibility (is the information provided reliable, or an attempt at belief manipulation?), personalisation (how to improve the users experience without confining them to stereotypes?), and deduction (how to use – uncertain – domain knowledge to improve results of, for example, recommenders and search engines?).
At the IKR3 Lab, we use a combination of logical, probabilistic, and mathematical tools to provide formal guarantees of our methods, while preserving the flexibility needed to deal with users that do not have a technical background. In fact, we strive to develop tools that can be easily and freely used. To achieve this goal, we must identify the representation languages that provide the best performance in terms of efficiency and flexibility for the application at hand; develop adequate flexible query languages to retrieve relevant information; and construct reasoning tools that can expand the explicitly represented knowledge to obtain other hidden outcomes.
Professor Gabriella Pasi
Department of Informatics, Systems and Communication
University of Milano-Bicocca
+39 02 6448 7847
gabriella.pasi@unimib.it