Yury Kashnitsky: Firing a cannon at sparrows: BERT vs. logreg
Data Fest Online 2020
Catalyst Workshop Track
There is a Golden Rule in NLP, at least when it comes to classification tasks: “Always start with a tfidf-logreg baseline”. Elaborating a bit, that’s building a logistic regression model on top of tf-idf (term frequency - inverse document frequency) text representation.
This typically works fairly well, is simple to deploy as opposed to neural net and that’s what already deployed and working day and night while you are struggling with fancy transformers. In this presentation, we will go through a couple of real-world text classification problems and speculate on the reasons to resort to BERT as opposed to good old tf-idf & logreg.
Meanwhile, we will discuss a Catalyst text classification pipeline with HuggingFace.
Register and get access to the tracks:
Join the community:
4 views
790
230
4 months ago 00:27:28 5
Юрий Кашницкий: Набитые шишки в Data Science
4 months ago 00:51:46 1
Юрий Кашницкий - Обзор детекции синтетического (в том числе ML-генерированного) текста
4 months ago 00:36:36 1
Yury Kashnitsky: Firing a cannon at sparrows: BERT vs. logreg
5 months ago 00:14:26 1
Соль - что с ней не так? Вся правда о хлориде натрия
1 year ago 00:25:12 1
За и против патриотического воспитания / России нужны патриоты или глобальный мир? / НЕНАВИЖУ ТЕБЯ?
4 years ago 01:10:00 3
How to jump into Data Science
4 years ago 01:01:31 3
What Data Scientists Don’t Mention in Their LinkedIn Profiles - Yury Kashnitsky
7 years ago 00:39:14 1
Data Fest² Minsk 2018: Юрий Кашницкий, О некоторых косяках в анализе данных
7 years ago 00:10:22 4
Подведение итогов открытого курса Open Data Science по машинному обучению | Технострим