Houjun Liu

Information Retrival

Information Retrival is trying to find material within large collections which is unstructured which satisfies an information need (of structured info).

Unstructured information has had a massive outburst after the millennium.

IMPORTANTLY: evaluating Information Retrival is based on Precision/Recall/F on information need and not the query.

For ranked system, we can come up with a curve of precision-recall curve by selecting increasing \(k\), or mean average precision.

Basic Terminology

collection

a set of documents—could by static, or dynamically added

goal

retrieve documents with information relevant to the user’s information need + to complete a task

information need

information need is the actual information that is needed by a search; this is usually translated into a search query, which is actually used to search.

query

query is a computer accessible form of text which searches to answer an information need.

Stages of Interpolation

  • user task => info need: we may not be looking for the right info
  • info need => query: we may not be using the best methods to get the info we are looking for

Motivation

“what’s wrong with grepping?”

  1. we cannot afford to do a linear search over web-scale data
  2. a “NOT” query is non-trivial
  3. no semantics
  4. we have no ranking, so we don’t know what’s the “best” document

Ranked Approaches

Ranked Information Retrieval