This course familiarizes participants with different aspects of large data sets and how they are managed both on site and in the Cloud. Emphasis is placed on providing participants with hands-on experience from data ingestion to analysis of large data sets, both data-at-rest or data-in-motion (streaming data), including defining Big Data and its 5 V's: Volume, Velocity, Variety, Veracity, and Value. Architectures of distributed databases and storage, ecosystems such as Hadoop and Spark are covered followed by introduction to Scala, Spark-Shell and PySpark.
This class is supported by DataCamp, the most intuitive learning platform for data science. Learn R, Python and SQL the way you learn best through a combination of short expert videos and hands-on-the-keyboard exercises. Take over 100+ courses by expert instructors on topics such as importing data, data visualization or machine learning and learn faster through immediate and personalised feedback on every exercise.
PrerequisitesPre-requisite Course: YCBS 255
Applies Towards the Following Programs
- Professional Development Certificate in Data Science and Machine Learning : Required Courses