You need to activate javascript for this site.
Menu Conteúdo Rodapé
  1. Home
  2. Courses
  3. Information Technologies and Systems
  4. Information Retrieval and Publishing

Information Retrieval and Publishing

Code 6634
Year 2
Semester S1
ECTS Credits 6
Workload PL(30H)/T(30H)
Scientific area Informatics
Mode of delivery Face to face.
Work placements Not applicable.
Learning outcomes First, students should acquire the ability to define customized markup languages (XML, DTD, XSL, XSLT) for specific purposes. Secondly, they should become familiar with the fundamental technology behind the construction and maintenance of Search Engines, for large document collections. Two fundamental and interconnected modules are studied: The publishing and retrieval of information in the context of the present digital era. We start by focusing on fundamental concepts of information representation, validation, and transformation, using markup languages (XML). Students should acquire the ability to define customized markup languages for specific purposes. The second module deals with general topics from Information Retrieval, with special emphasis on the technology involved in the construction of Search Engines, for large and possibly interconnected document collections, like in the World Wide Web. Fundamental algorithms for efficient indexing and searching are studied.
Syllabus Part 1 — Information Publishing

1.1 Markup languages: SGML, XML, XHTML, SVG, MathML, and RSS.
1.2 Document Validation through DTD and XSD.
1.3 Document transformation through XSLT.

Part 2 — Information Retrieval

2.1 Mathematical models for representing collections of textual documents. Boolean Model and Vector Space Model.
2.2 Indexing and retrieval of textual documents.
2.3 Performance evaluation in Information Retrieval systems.
2.4 Query operations and query expansion.
2.5 “Web Search” — Retrieval in the World Wide Web.
2.6 “Link Analysis” — Topology analysis in a graph of hyper connected documents.
2.7 Relevant NLP methods for Information Retrieval: document clustering and categorization.
Main Bibliography Main Bibliography

D. A. Grossman, O. Frieder, Information Retrieval Algorithms and Heuristics, Kluwer, Boston, 1998, ISBN: 0-7923-8271-4
R. Baeza-Yates & B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, New York, 1999, ISBN: 0-201-39829-X
Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell, O'Reilly 2004, ISBN 0-596-00764-7


Secondary Bibliography

M. Crochemore & W. Rytter, Jewels of Stringology, World Scientific Pub Co, 2002, ISBN: 9810247826
D. Jurafsky et al, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall, 2000, ISBN: 0130950696
C.D. Manning & H. Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999, ISBN: 0262133601

Sal Mangano, XSLT Cookbook, Second Edition, O'Reilly 2006, ISBN 0-596-00974-7
Dave Pawson, XSL-FO - Making XML Look Good in Print, O'Reilly 2002, ISBN 0-596-00355-2
Language Portuguese. Tutorial support is available in English.
Last updated on: 2012-05-24

The cookies used in this website do not collect personal information that helps to identify you. By continuing you agree to the cookie policy.