Data Science Syllabus

Data Science Syllabus


Skill set that required for Data Science, is a combination of different skills. Like programming, Mathematics, People skill and etc..
Course contents:

Module 1: Introduction to Data Science

➢ What is Data Science?

➢ Skillsets required for Data Scientists

➢ Data Science Process

➢ Standard Lifecycle of Data Science Projects

➢ Job opportunities and demand for Data Scientists

➢ What is Business Intelligence

➢ What is Data Mining

➢ What is Analytics

➢ Types of Analytics

➢ Data Science Roles, Responsibilities , Jobs and Market Demand

➢ What is Machine Learning

➢ What is Deep Learning

➢ What is AI

❖ Data

➢ What is Data

➢ Types of Data

➢ Data collection types

➢ Data Architecture

➢ Components of Data Architecture

Module 2: PYTHON for Data Science

❖ Python programming for Data Science

➢ Python Environment Setup and Essentials

➢ Anaconda & Jupyter Notebook Installation

➢ Variable Assignment, operators, Data types

➢ Indexing & Slicing

➢ Data structures: Lists, Tuples, Sets, Dictionaries

➢ Functions

➢ Conditional flow statements: If, For, While

➢ Map, Filter and Reduce functions

➢ Lambdas and List Comprehensions

❖ Numerical Computing using NumPy

➢ ndarray: Purpose, Properties, Types, Axis

➢ creating 1d, 2d and 3d arrays

➢ Accessing Array Elements

➢ Indexing, Slicing, Iteration, with Boolean and Integer Arrays

➢ Array manipulation

➢ Linear Algebra using Numpy

❖ Data Analysis using PANDAS

➢ Understanding Pandas

➢ Defining Data Structures: Series, Dataframes, Panels

➢ Working with Series and Data Frames

➢ DataFrame operations

➢ Indexing: .loc and .iloc

➢ DataFrame functions: pipe/apply/applymap

❖ Data Analysis: 

➢  Importing and exporting data
➢  Cleaning data [filtering, removing duplicates etc]
➢  Handling missing values
➢  Data wrangling
➢  Grouping and Aggregation
      merging, joining, concatenation

Data Visualization using Matplotlib & Seaborn

➢ Features of Matplotlib

Module 3: STATISTICS

Descriptive Statistics

➢ Variables in Statistics

➢ Measuring the Central Tendency – Mean, Median, mode, Range, Quartiles

➢ Measuring Spread – Variance and Standard Deviation

➢ Understanding Numeric Data – Uniform and Normal Distributions

➢ Probability Refresher

➢ Probability density functions

➢ Central Limit Theorem

Hypothesis Testing & Inferential Statistics

➢ Importance of Hypothesis Testing in Business

➢ Null and Alternate Hypothesis

➢ Type 1 and Type 2 Errors

➢ Significance level and Power

➢ Upper Tail Test and Test Statistics

➢ Z-Test, t-Test and F test

➢ Chi-Square Test

➢ ANOVA

➢ Correlation and covariance

➢ Linear Regression, Logistic regression

Module 4: Exploratory Data Analysis [EDA]

➢ What is EDA

➢ Goals of EDA

➢ Introduction to Statistical Plots

➢ Visualizing Numeric Variables

➢ Visualizing Categorical variables

➢ One Dimensional Charts

➢ Histograms

➢ Bar Charts

➢ Two Dimensional Charts

➢ Visualizing Relationships – Scatterplots

➢ Box Plots

➢ Multi-Dimensional Plots

Module 5: MACHINE LEARNING

Introduction to Machine Learning using Scikit Learn

➢ What is Machine Learning?

➢ How do Machines Learn?

➢ Abstraction and Knowledge Representation

➢ Generalization

➢ Steps to apply Machine Learning to your Data

➢ Choosing a Machine Learning Algorithm

➢ Introduction to Types of Machine Learning Algorithms

Supervised Learning Techniques and Algorithms

➢ Steps in Supervised Learning Techniques and Algorithms

➢ Understanding Process Flow of Supervised Learning Techniques

➢ Training, Validation and Testing

➢ Regression

➢ Gradient Descent

➢ Classification

➢ Measures of Performance

➢ R-Square and RMSE

➢ Confusion Matrix

➢ Accuracy, Precision and Recall

➢ F-Score ➢ ROC curve (Receiver Operating Characteristic curve)

➢ Bias – Variance tradeoff

➢ Underfitting and Overfitting

➢ Understanding Classification and Prediction

➢ K-NN, Naïve Bayes, Support Vector Machines

➢ Decision Trees and Random Forests

Unsupervised Learning Techniques & Algorithms

➢ Studying Clustering

➢ Understanding K-means Clustering

➢ What is Hierarchical Clustering?

➢ Hierarchical Clustering Algorithm

➢ Association Rule Mining

Module 6: Deep Learning and Computer Vision

➢ Understanding Neural Networks

➢ Network Topology

➢ Neural Networks: Master Feed-Forward

➢ Recurrent and Gaussian Neural Network

➢ Training Neural Networks with Backpropagation

➢ Artificial Neural network

➢ Recurrent Neural Network

➢ Introduction to Computer Vision

➢ Convolution neural network

➢ Transfer Learning

➢ Introduction to Tensorflow and Keras

➢ Building Neural network using Tensorflow

Module 7: Natural Language Processing (NLP)

➢ NLP Environment Setup & Applications

➢ NLP Sentence Analysis & Libraries

➢ NLTK

➢ Lemmatization

➢ Stemming

➢ Topic modelling

Module 8: Tableau

Module 9: Structured Query Language (SQL)

Module 10: Projects 

* You can go with R- Language instead of Python. 

*Basic understanding of Big Data and Cloud system will gives you additional advantage.







Comments