Data School – Machine Learning with Text in Python
Solve text-based data science problems using Machine Learning and Natural Language Processing!
Are you trying to master Machine Learning in Python, but tired of wasting your time on courses that don’t move you towards your goal? Do you recognize the enormous value of text-based data, but don’t know how to apply the right Machine Learning and Natural Language Processing techniques to extract that value?
In this Data School course, you’ll gain hands-on experience using Machine Learning and Natural Language Processing to solve text-based data science problems. By the end of the course, you’ll be able to confidently apply these techniques to your own data science problems.
What You’ll Learn In Machine Learning with Text in Python
Module 1: Working with Text Data in scikit-learn
By the end of this module, you’ll be able to confidently perform the basic workflow for Machine Learning with text: creating a dataset, extracting features from unstructured text, building and evaluating models, and inspecting models for further insight. You’ll also gain an understanding of Unicode, enabling you to troubleshoot encoding-based errors.
- Extracting features from unstructured text using CountVectorizer
- Building a MultinomialNB model for text classification
- Examining a model for further insight
- Model evaluation
- Comparing MultinomialNB with LogisticRegression
- Building a new dataset from individual text files using pandas
- Unicode basics
- Handling Unicode errors
Module 2: Applying Natural Language Processing Techniques to Machine Learning
By the end of this module, you’ll be able to apply a handful of Natural Language Processing techniques to Machine Learning problems in order to improve the effectiveness of your models. You’ll also learn how to perform sentiment analysis and build a simple document summarization tool for your own corpus of text.
- What is Natural Language Processing (NLP)?
- NLP terminology and examples
- Tuning CountVectorizer for better model performance
- Term Frequency-Inverse Document Frequency (TF-IDF) using TfidfVectorizer
- Text summarization
- Sentiment analysis using TextBlob
Module 3: Parsing Text Data Using Regular Expressions
By the end of this module, you’ll be able to extract text features from messy data sources using regular expressions. You’ll learn the basic rules and syntax that can be applied across programming languages, and you’ll master the most important Python functions and options for working with regular expressions.
- Basic rules and principles
- Searching with re.search
- Metacharacters
- Greedy and lazy quantifiers
- Match groups
- Character classes
- Alternatives
- Substitution with re.sub
- Anchors
- Option flags
- Efficiently searching for multiple matches with re.findall
- Improving performance with re.compile
- Writing readable regular expressions with re.VERBOSE
Module 4: Workflow for a Text-Based Data Science Problem
By the end of this module, you’ll be able to create an end-to-end workflow for solving a text-based data science problem using scikit-learn and pandas. You’ll gain experience with data exploration, feature engineering, proper model evaluation, model tuning, and generating predictions for new observations.
- Data exploration and visualization
- Feature engineering using pandas
- Custom tokenization using regular expressions
- Multi-class classification
- Model evaluation
- Searching for optimal tuning parameters using GridSearchCV
- Chaining steps into a Pipeline
- Making predictions for out-of-sample data
Module 5: Advanced Machine Learning Techniques
By the end of this module, you’ll be able to apply advanced Machine Learning techniques to improve the accuracy of your models and the efficiency of your workflow. You’ll learn how to build and tune a multi-step, multi-layer Machine Learning pipeline, as well as how to ensemble and stack your models.
- Using a Pipeline for proper cross-validation
- Tuning a Pipeline with GridSearchCV
- Efficiently searching for tuning parameters using RandomizedSearchCV
- Stacking sparse and dense feature matrices using SciPy
- Combining the results of multiple feature extraction processes using FeatureUnion
- Building multi-level pipelines and feature unions
- Building custom transformers using FunctionTransformer
- Improving classifier performance through ensembling
- Unsupervised document clustering using cosine similarity
- Basic strategies for model stacking
More courses from the same author: Data School
Reviews
There are no reviews yet.