Practical Uses of BERT

4 min readJan 4, 2020

BERT (Bidirectional Encoder Representation from Transformers) is the latest and greatest discovery of Pre-trained model is the field of Natural Language Processing. This transfer learning model made it easy to further fine tune NLP tasks in respective field of (science/technology/commerce/etc.) human interest. This technology review is about to analyze different uses cases of BERT and how it’s applied to solve NLP problems.

What is BERT & why it is useful:

BERT was based on couple of strong building blocks of latest state of the art NLP techniques, they are — Self Attention mechanism, Transformers, bidirectional encoders[1]. The full design detail of BERT model is outside of the scope of this technology review but we’ll discuss some important aspects to understand how it helps in several NLP Tasks. There are 2 steps in BERT framework — Pre-Training and Fine Tuning.

During pre-training the model was bidirectionally deeply trained with unlabeled texts from Wikipedia and BookCorpus. This was conducted with 2 unsupervised tasks — (i)Masked Language Model (MLM) by masking & predicting 15% of the input tokens of sentences/paragraphs, and with (ii) Next sentence prediction (NSP) because many NLP tasks depends on understanding relationship between sentences. After pre-training, our transfer model is ready for several NLP tasks in fine tuning step.

During Fine Tuning pre-trained model is fed into labelled data for different NLP tasks to fine tune all the BERT parameters. For each NLP task there will be a different BERT model after fine tune process. So as we can see, practical usability of BERT comes mainly due to following characteristics –

Pre-Trained transfer model ready to use. No need to train from scratch.
Fine Tune process can make the model useful for targeted NLP tasks like Question answering, Named entity recognition, Auto summarization etc.
Fine tuning is very fast

Applications

· BERTSUM or Text Summarization[2]:

BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models.

Extractive summarization systems create a summary by identifying (and subsequently concatenating) the most important sentences in a document. A neural encoder creates sentence representations and a classifier predicts which sentences should be selected as summaries.

Abstractive summarization, on the other hand is a technique in which the summary is generated by generating novel sentences by either rephrasing or using the new words, instead of simply extracting the important sentences. Neural approaches to abstractive summarization conceptualize the task as a sequence-to-sequence problem.

· Google Smart Search[3]

With BERT research, google now can understand the intention of search text and

provide relevant result. This was possible as BERT uses Transformer to analyze multiple tokens and sentences parallelly with maximum self attention.

Pls see the example below how search result changed (ref) –

· SciBERT[4]

The exponential increase in the volume of scientific publications in the past decades has made NLP an essential tool for large-scale knowledge extraction and machine reading of these documents. SCIBERT was focused on scientific NLP related tasks instead of general language models. This model was developed using random 1.4 million papers from semantic scholars. The corpus consists of 18% papers on computer science and 82% from broad biomedical domain. SciBERT outperforms BERT base on several scientific and medical NLP tasks.

· BioBERT[5]

Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora.

BioBERT is a domain-specific language representation model pre-trained on large-scale biomedical corpora. BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts.

· ClinicalBERT[6]

This BERT models Clinical Notes and Predicting Hospital Readmission by contextual embeddings of Clinical texts/notes. ClinicalBERT uncovers high-quality relationships between medical concepts as judged by humans. ClinicalBert outperforms baselines on 30-day hospital readmission prediction using both discharge summaries and the first few days of notes in the intensive care unit. it can save money, time, and lives

· Question Answering and ChatBot

BERT helped SQuAD (Stanford Question Answering Dataset) v1.1 question answering Test to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test to 83.1 (5.1 point absolute improvement). SQuAD is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable (v2.0).[1]

Same feature of BERT can be extended to work as ChatBot on small to large text.