Code |
16253
|
Year |
3
|
Semester |
S2
|
ECTS Credits |
6
|
Workload |
PL(30H)/T(30H)
|
Scientific area |
Informatics
|
Entry requirements |
NA
|
Learning outcomes |
This course aims to introduce students to the acquisition, processing, storage, and retrieval of large scale data in support of data science tasks. By the end of the course, the student should be able to (1) list the steps involved in a large-scale data science project and describe the functions of each; (2) know the data science toolkit; (3) understand the fundamental concepts of big data; (4) know how to apply methods for data acquisition using Python software packages, APIs, and web scraping; (5) master the process of storage and retrieval of large-scale data; (6) understand and know how to apply the appropriate strategies for large-scale data processing; (7) be familiar with the map-reduce paradigm; and (8) know the fundamentals of the most relevant large-scale data processing frameworks; (9) know how to use, program, and process large-scale data using the Spark framework.
|
Syllabus |
1. Introduction to Large-Scale Data Science 2. Data Science Toolkit 3. Introduction to Big Data 4. Large-Scale Data Acquisition 5. Large-Scale Storage & Retrieval 6. Large-Scale Data Processing Strategies 7. Programming Large-Scale Applications based on the Map-Reduce Paradigm 8. Large-Scale Data Processing Frameworks – Hadoop; Spark; Dask 9. Large-Scale Data Processing with Spark
|
Main Bibliography |
- Bernard Marr (2022). Data Strategy: How to Profit from a World of Big Data, Analytics and the Internet of Things. Kogan Page. USA - Isaac Triguero, Mikel Galar (2023). Large-Scale Data Analytics with Python and Spark. Cambridge University Press. UK. - Jonathan Rioux (2022). Data Analysis with Python and PySpark. Manning. New York. USA - Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen e Josh Wills (2022). Advanced Analytics with PySpark. O'Reilly. USA - Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia. Learning Spark (2015). O'Reilly. USA - Jake VanderPlas (2017). Python Data Science Handbook. O’Reilly - Mike Loukides, Hilary Mason, DJ Patil (2018). Ethics and Data Science. O'Reilly - Wes McKinney (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O´Reilly
|
Language |
Portuguese. Tutorial support is available in English.
|