Diagrama semanal

  • General

    Participación de los ponentes Dr. Ophir Frieder y el Dr. Carlos Ordoñez,
    del Taller Anual del área de Bases de Datos del Posgrado
    http://www.mcc.unam.mx/avisos/avisoTaller09BD.htm

    Nota: La plática del ponente Dr. Ophir Frieder será: Searching in the
    "Real World" y el Dr. Carlos Ordoñez, presentará la plática anunciada en el taller.

    Searching in the "Real World"

    For many, "searching" is considered a mostly solved problem. In fact, for text processing, this belief is factually based. The problem is that most "real world" search applications involve "complex documents", and such applications are far from solved. Complex documents, or less formally, "real world documents", comprise of a mixture of images, text, signatures,
    tables, etc, and are often available only in scanned hardcopy formats. Search systems for such document collections are currently unavailable.

    We describe our efforts at building a complex document information processing prototype. This prototype integrates "point solution" (mature) technologies, such as OCR capability, signature matching and handwritten word spotting techniques, search and mining approaches, among
    others, to yield a system capable of searching "real world documents". The described prototype demonstrates the adage that "the whole is greater than the sum of its parts". Our complex document benchmark development efforts are likewise presented.

    Having described the global approach, we describe some potential future point solutions developed at the IIT IR Lab. These include an Arabic stemmer and a natural language source integration fabric called the IIT Intranet Mediator. In terms of stemming, we developed and licensed an Arabic stemmer and search system. Our approach was evaluated using the
    Arabic TREC collection and favorably compared against the state of the art.

    We also focused on source integration and ease of user interaction. By integrating structured and unstructed sources, we developed and licensed our mediator technology that provides a single, natural language interface to querying distributed sources. Rather than providing a set of links as possible answers, the described approach actually answers the posed question. Both the Arabic stemmer and the mediator efforts are likewise discussed.

    Ambas participaciones tendrán una duración de una hora el miércoles 20
    de agosto.

    Mayor información: Maestro Javier García javgar@servidor.unam.mx

  • 20 de agosto - 26 de agosto