Important Issues in Networked Information Retrieval

The following is a list of some of the important issues in networked information retrieval that were identified by workshop participants. The discussion was based on a list written by the workshop organizers, and then modified in response to the discussion.

Representation:
- Language: cross-lingual
- Units: words, stems, concepts, LSI, etc
- Thesaurus
- Customization: indexing, query creation
- Schemas, object types, structures, relationships
- Multimedia
Resource Selection:
- Quality
- Costs: financial, quality, time, etc; How to measure?
- "Branding": of location brokers, of publishers
- Metadata
Merging Results / Data Fusion:
- Query creation: Expansion, thesauri, domain information
- Local vs global corpus statistics
- Retrieval models
- Resource constraints
- Creating consistent merged rankings, and explaining them
Evaluation:
- TREC collection-merging track
- Internet archive (Brewster Kahle)
- How to evaluate resource selection & data fusion separately.
Architecture:
- Modularity: How to make retrieval systems/models more modular.
- Communication: Z39.50, etc.
- Data access controls: WWW (open data) vs Z39.50 (proprietary data).
- Distributed, centralized, hierarchical
- Security: Charging, copyright, confidentiality, etc
- Classes of communication: metadata, queries, documents
Other:
- Interoperability: CSTART @ Stanford
- Standards for Metadata in Web pages (Warwick)
- Intranets: Are they a different problem? If so, how?