Data Science

A course by

kamalakar kamsani

Jan/2025 10 lessons English

Description

Curriculum

Instructor

What you’ll learn

Python for Data Science

Python Basics

Variables, Numbers, Strings
Lists, Dictionaries, Sets, Tuples
Conditional Statements: If-Else, Switch
Loops: For, While
Functions, Lambda functions
Libraries for Data Science: Numpy, Pandas, SciKit-Learn, Matplotlib

Data Manipulation & Analysis with Python

Numpy: Working with arrays, mathematical operations
Pandas: DataFrames, handling missing data, groupby, merging, and joining datasets
Matplotlib/Seaborn: Data visualization, plotting histograms, pie charts, scatter plots, etc.
SciKit-Learn: Basic machine learning functions, preprocessing

Math and Statistics for Data Science

Basics of Statistics

Descriptive vs Inferential statistics
Continuous vs Discrete Data, Nominal vs Ordinal Data
Measures of Central Tendency: Mean, Median, Mode
Measures of Dispersion: Variance, Standard Deviation
Probability Basics
Distributions: Normal Distribution, Exponential, Binomial
Correlation and Covariance
Central Limit Theorem
Hypothesis Testing: P-value, Confidence Interval, Type 1 vs Type 2 Error, Z-test, T-test, ANOVA

Machine Learning – Preprocessing and Feature Engineering

Data Preprocessing

Handling NA values, outlier treatment
Data Normalization and Standardization
Encoding: One-Hot Encoding, Label Encoding
Feature Engineering and Feature Selection
Train-Test Split, Cross-validation

Machine Learning – Model Building

Supervised Learning

Types of Supervised Learning: Regression vs Classification
Linear Models: Linear Regression, Logistic Regression
Gradient Descent: Introduction to optimization
Nonlinear Models (Tree-based Models):
- Decision Tree, Random Forest, XGBoost
Model Evaluation:
- Regression Metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), MAPE
- Classification Metrics: Accuracy, Precision, Recall, F1 Score, ROC Curve, Confusion Matrix
Hyperparameter Tuning: GridSearchCV, RandomSearchCV

Unsupervised Learning

Clustering: K-means, Hierarchical Clustering
Dimensionality Reduction: Principal Component Analysis (PCA)

Deep Learning

Introduction to Neural Networks

Basics of Artificial Neural Networks (ANN)
Convolutional Neural Networks (CNN): Understanding CNN architecture for image recognition
Recurrent Neural Networks (RNN): LSTM and GRU networks for sequential data

Natural Language Processing (NLP)

Text Processing and Feature Extraction

Regular Expressions (Regex) for text preprocessing
Text Representation:
- Count Vectorizer, TF-IDF, Bag of Words (BOW), Word2Vec
- Introduction to Word Embeddings and Pre-trained models like GloVe
Text Classification: Using Naïve Bayes, SVM, and Logistic Regression for NLP
Spacy & NLTK: Introduction to popular NLP libraries for text processing and feature extraction

End-to-End Machine Learning & Deep Learning Projects

Project 1: Predictive Analytics for Sales Forecasting using Linear Regression and Time-Series Forecasting
Project 2: Customer Segmentation using K-Means Clustering and Hierarchical Clustering
Project 3: Sentiment Analysis using NLP and Deep Learning models (e.g., LSTM, BERT)
Capstone Project: End-to-End Data Science Project to solve a real-world business problem, including data collection, cleaning, modeling, evaluation, and deployment.

Model Deployment and Cloud Computing

Model Deployment: Using Flask/Django to create REST APIs for model deployment
Introduction to Docker for containerization
Deployment on Cloud Platforms (AWS, GCP, or Azure) using services like AWS Sagemaker or Azure ML
CI/CD Pipelines for Machine Learning models: Automating the process of retraining and redeployment

Data Science Tools and Technologies

Version Control: Using Git/GitHub for managing code
Big Data: Introduction to handling big data with Hadoop and Spark
Cloud Computing: Overview of using AWS, Google Cloud, and Azure for scalable infrastructure and deployments
Data Visualization: Advanced visualization techniques with Tableau, Power BI, and Plotly for interactive dashboards

Career Guidance and Job Preparation

Resume building and job application strategies for data science roles
Common Interview Questions: SQL, Python, Machine Learning, and Statistics
Mock Interviews with Data Science professionals
Portfolio Development: Building a strong GitHub profile with your projects
Interview Coaching: Tips on handling coding rounds, case studies, and behavioral questions

End-Goal of the Course:

This course is designed to provide you with the skills necessary to become a Data Scientist. By mastering programming, statistical analysis, machine learning, deep learning, and data visualization, you will be equipped to solve complex problems, analyze data efficiently, and deploy predictive models. The Capstone Projects will enable you to showcase your skills and build a strong portfolio for future employers. By the end of this course, you will have the practical knowledge needed to pursue roles such as Data Scientist, Machine Learning Engineer, Data Analyst, and AI Engineer.