This course familiarizes participants with different aspects of large data sets and how they are managed both on site and in the Cloud. Emphasis is placed on providing participants with hands-on experience from data ingestion to analysis of large data sets, both data-at-rest or data-in-motion (streaming data), including defining Big Data and its 5 V's: Volume, Velocity, Variety, Veracity, and Value. Architectures of distributed databases and storage, ecosystems such as Hadoop and Spark are covered followed by introduction to Scala, Spark-Shell and PySpark.
This course is part of the Professional Development Certificate in Data Science and Machine Learning.
PrerequisitesPre-requisite Course: YCBS 255
Applies Towards the Following Programs
- Professional Development Certificate in Data Science and Machine Learning : Required Courses