SIGIR-96 Workshop on Networked Information Retrieval

August 22, 1996

Overview

The recent and rapid growth of the Internet and corporate intranets poses new problems for Information Retrieval. There is now a need for tools that help people navigate the network, select which collections to search, and fuse the results returned from searching multiple collections. These problems are being addressed by the international IR research community, a number of digital library projects around the world, e.g., the U.S. Digital Libraries projects, the ERCIM Digital Libraries projects, and the German MEDOC project. The goal of this workshop was to bring together people from each of these areas to discuss their varying approaches to common problems. Researchers were invited to submit position papers or extended abstracts discussing novel approaches to the following problems: Twelve papers/abstracts were submitted in response to the call for participation. The program committee selected eight of them for talks in the workshop. Fifty one people registered to attend the workshop.

Talks

Paul Francis, of NTT Japan, gave a talk entitled "A Global, Self-Configuring Information Discovery Infrastructure" . His talk described the Ingrid project, in which a document publisher "announces" the availability of a document to web servers that it knows contain similar documents. A server receiving the announcement may choose to create links from similar documents of its own to the new document. Over time, documents on a given subject become linked to other documents on that subject, making it easier for people to browse the Web.

Marc Rittberger, of University of Konstanz, Germany, gave a talk entitled "Information Retrieval in a Regional, Distributed Information Area" . The talk described the Electronic Mall Bodensee, which is a set of Web pages describing shopping, tourist attractions and other commercial activities in the Lake Konstanz area of Europe. The Electronic Mall provides searching capabilities, but the organization and display of results is based on an on-the-fly analysis of hypertext links in retrieved pages. The intent is to better orient the user within this electronic shopping space.

Charles Nicolas, of the University of Maryland, USA, gave a talk about "Resource Selection in CAFE: An Architecture for Networked Information Retrieval" . CAFE is a large-scale information retrieval and filtering system, intended to handle more than a terabyte of data per day, in multiple languages. It is based on a set of processes that perform filtering and/or retrieval, and a broker process that directs arriving queries and/or documents to the appropriate agent. The architecture has been demonstrated on a small-scale. It is now being scaled to large volumes of data.

Daan Velthausz, of the Telematics Research Centre, Netherlands, gave a talk on "Multimedia Information Disclosure in a Distributed Environment" . The talk described the ADMIRE project, which is intended to provide resource selection in a multimedia environment. ADMIRE will handle both content-based and attribute-based retrieval. The architecture enables a hierarchical organization of attributes and content, which raises a number of difficult questions about how to combine evidence for retrieval.

Norbert Fuhr, of the University of Dortmund, Germany, gave a talk about "Optimum Database Selection in Networked IR" . The talk described a decision-theoretic model of networked information retrieval that is based on a probabilistic model. The model uses a location broker to direct queries to the appropriate text database and to merge (fuse) the results returned from different databases. It can be shown that the model minimizes search costs. However, the model requires information that is rarely available in practice. The research challenge is to develop an approximation that behaves similarly with less information.

Kai Grossjohann, of the University of Dortmund, Germany, gave a talk entitled "MeDoc Information Broker - Harnessing the Information in Literature and Full Text Databases". The MeDoc Information Broker must identify which text databases to search for a given query, transform and normalize schemata, and merge results returned from each database. The architecture includes multiple layers, for agents, clients, brokers, and providers. A first prototype of the system supporting WAIS and Z39.50 protocols will be available the Fall of 1996.

J. Sairamesh, of ICS-Forth, Greece, gave a galk about "Architectures for QoS Based Retrieval in Digital Libraries". The talk described Samos, a networked European Computer Science technical report library. The project focus is on the architectural, resource allocation, and quality of service requirements in a large, scalable, decentralized Digital Library.

Martin Doerr, of ICS-Forth, Greece, gave a talk entitled "Authority Services in Global Information Spaces - A Requirements Analysis and Feasability Study". This talk argued that as the information available becomes more heterogeneous, there is an increasing need for standard languages and access mechanisms. Thesauri can serve that purpose initially, but the goal is to evolve towards knowledge-bases that provide a relatively controlled, but always growing, vocabulary for indexing and retrieval.

Discussion

The presentations were followed by a general discussion of the major problems concerning networked information retrieval. That discussion is summarized here .

Future Workshops

Finally, the participants were polled on when it would be appropriate to have another workshop on networked information retrieval. There was strong agreement that another workshop should be held in conjunction with SIGIR-97, in Philadelphia, PA, in the USA.