IT1110 DATA SCIENCE & BIG DATA ANALYTICS
Instructor
Place Email id Syllabus Study Materials |
Ms. Selva Mary. G
UB 812, Information Technology, SRM University [email protected] IT1110_Syllabus (New) | IT1110_Lesson-plan |
UNIT I – INTRODUCTION TO DATA SCIENCE (6 hours) Introduction of Data Science – Basic Data Analytics using R – R Graphical User Interfaces – Data Import and Export – Attribute and Data Types – Descriptive Statistics – Exploratory Data Analysis – Visualization Before Analysis – Dirty Data – Visualizing a Single Variable – Examining Multiple Variables – Data Exploration Versus Presentation. Unit_1_Part_Introduction UNIT II – ADVANCED ANALYTICAL THEORY AND METHODS (6 hours) Overview of Clustering – K-means – Use Cases – Overview of the Method – Perform a K-means Analysis using R – Classification – Decision Trees – Overview of a Decision Tree – Decision Tree Algorithms – Evaluating a Decision Tree – Decision Tree in R – Bayes’ Theorem – Naïve Bayes Classifier – Smoothing – Naïve Bayes in R. UNIT III-BIG DATA FROM DIFFERENT PERSPECTIVES (6 hours) Big data from business Perspective: Introduction of big data-Characteristics of big data-Data in the warehouse and data in Hadoop- Importance of Big data- Big data Use cases: Patterns for Big data deployment. Big data from Technology Perspective: History of Hadoop-Components of Hadoop-Application Development in Hadoop-Getting your data in Hadoop-other Hadoop Component. Unit_III_notes.pdf | Unit_III_PPT.pdf UNIT IV – HADOOP DISTRIBUTED FILE SYSTEM ARCHITECTURE (6 hours) HDFS Architecture – HDFS Concepts – Blocks – NameNode – Secondary NameNode – DataNode – HDFS Federation – Basic File System Operations – Data Flow – Anatomy of File Read – Anatomy of File Write. Unit-IV_PPT.pdf | Unit-IV_NOTES.PDF UNIT V – PROCESSING YOUR DATA WITH MAPREDUCE (6 hours) Getting to know MapReduce – MapReduce Execution Pipeline – Runtime Coordination and Task Management – MapReduce Application – Hadoop Word Count Implementation. Unit-V.PDF | Unit-V.PPT |