You need to activate javascript for this site.
Menu Conteúdo Rodapé
  1. Home
  2. Courses
  3. Web, Mobile and Cloud Computing
  4. Large Scale Data Science

Large Scale Data Science

Code 16253
Year 3
Semester S2
ECTS Credits 6
Workload PL(30H)/T(30H)
Scientific area Informatics
Entry requirements NA
Learning outcomes This course aims to introduce students to the acquisition, processing, storage, and retrieval of large scale data in support of data science tasks. By the end of the course, the student should be able to (1) list the steps involved in a large-scale data science project and describe the functions of each; (2) know the data science toolkit; (3) understand the fundamental concepts of big data; (4) know how to apply methods for data acquisition using Python software packages, APIs, and web scraping; (5) master the process of storage and retrieval of large-scale data; (6) understand and know how to apply the appropriate strategies for large-scale data processing; (7) be familiar with the map-reduce paradigm; and (8) know the fundamentals of the most relevant large-scale data processing frameworks; (9) know how to use, program, and process large-scale data using the Spark framework.
Syllabus 1. Introduction to Large-Scale Data Science
2. Data Science Toolkit
3. Introduction to Big Data
4. Large-Scale Data Acquisition
5. Large-Scale Storage & Retrieval
6. Large-Scale Data Processing Strategies
7. Programming Large-Scale Applications based on the Map-Reduce Paradigm
8. Large-Scale Data Processing Frameworks – Hadoop; Spark; Dask
9. Large-Scale Data Processing with Spark
Main Bibliography - Bernard Marr (2022). Data Strategy: How to Profit from a World of Big Data, Analytics and the Internet of Things. Kogan Page. USA
- Isaac Triguero, Mikel Galar (2023). Large-Scale Data Analytics with Python and Spark. Cambridge University Press. UK.
- Jonathan Rioux (2022). Data Analysis with Python and PySpark. Manning. New York. USA
- Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen e Josh Wills (2022). Advanced Analytics with PySpark. O'Reilly. USA
- Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia. Learning Spark (2015). O'Reilly. USA
- Jake VanderPlas (2017). Python Data Science Handbook. O’Reilly
- Mike Loukides, Hilary Mason, DJ Patil (2018). Ethics and Data Science. O'Reilly
- Wes McKinney (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O´Reilly
Teaching Methodologies and Assessment Criteria - P1 - Project I (individual): 25%
- P2 - Project II (individual): 35%
- T - Test: 40%

The final classification of the course results from the weighted average of the classifications obtained in the defined evaluation components. The student obtains approval at the course, being exempt from the Exam, in case he/she obtains a grade equal to or greater than 9.5 values.

Evaluation by Exam
- Exam: 100% (computer-based test with only partial access to the contents)

Admission to the Teaching/Learning and Exams:
- Minimum of 70% class attendance during the teaching-learning period (except student workers);
- Minimum score of 6 points in AE, where AE = ((P1 * 25%) + (P2 * 35%) + (T * 40%))

Failure to comply with any of these items (including the submission of any of the projects after the foreseen period) prevents the student from being approved.
Language Portuguese. Tutorial support is available in English.
Last updated on: 2024-02-22

The cookies used in this website do not collect personal information that helps to identify you. By continuing you agree to the cookie policy.