Learning outcomes |
First, students should acquire the ability to define customized markup languages (XML, DTD, XSL, XSLT) for specific purposes. Secondly, they should become familiar with the fundamental technology behind the construction and maintenance of Search Engines, for large document collections.
Two fundamental and interconnected modules are studied: The publishing and retrieval of information in the context of the present digital era. We start by focusing on fundamental concepts of information representation, validation, and transformation, using markup languages (XML). Students should acquire the ability to define customized markup languages for specific purposes. The second module deals with general topics from Information Retrieval, with special emphasis on the technology involved in the construction of Search Engines, for large and possibly interconnected document collections, like in the World Wide Web. Fundamental algorithms for efficient indexing and searching are studied.
|
Syllabus |
Part 1 — Information Publishing
1.1 Markup languages: SGML, XML, XHTML, SVG, MathML, and RSS. 1.2 Document Validation through DTD and XSD. 1.3 Document transformation through XSLT.
Part 2 — Information Retrieval
2.1 Mathematical models for representing collections of textual documents. Boolean Model and Vector Space Model. 2.2 Indexing and retrieval of textual documents. 2.3 Performance evaluation in Information Retrieval systems. 2.4 Query operations and query expansion. 2.5 “Web Search” — Retrieval in the World Wide Web. 2.6 “Link Analysis” — Topology analysis in a graph of hyper connected documents. 2.7 Relevant NLP methods for Information Retrieval: document clustering and categorization.
|
Main Bibliography |
Main Bibliography
D. A. Grossman, O. Frieder, Information Retrieval Algorithms and Heuristics, Kluwer, Boston, 1998, ISBN: 0-7923-8271-4 R. Baeza-Yates & B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley, New York, 1999, ISBN: 0-201-39829-X Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell, O'Reilly 2004, ISBN 0-596-00764-7
Secondary Bibliography
M. Crochemore & W. Rytter, Jewels of Stringology, World Scientific Pub Co, 2002, ISBN: 9810247826 D. Jurafsky et al, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall, 2000, ISBN: 0130950696 C.D. Manning & H. Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999, ISBN: 0262133601
Sal Mangano, XSLT Cookbook, Second Edition, O'Reilly 2006, ISBN 0-596-00974-7 Dave Pawson, XSL-FO - Making XML Look Good in Print, O'Reilly 2002, ISBN 0-596-00355-2
|