Overview of Extractive Summarization Techniques

I. Introduction

In short, extractive summarization selects sentences from the source text to construct a summary.


The challenge lies in understanding sentence context information and identifying relationships between sentences and words.

II. Techniques

Algorithms Description Weakness Link
TextRank TextRank is a graph-based ranking algorithm inspired by PageRank. It connects words or sentences based on how frequently they appear near each other in the text and uses the number of shared words between sentences to establish similarity. May not capture complex relationships between sentences accurately. TextRank: A Graph-Based NLP Algorithm : Networks Course blog for INFO 2040/CS 2850/Econ 2040/SOC 2090
LexRank LexRank is similar to TextRank but uses cosine similarity of TF-IDF vectors (sentence vectors) and is more tailored towards the extraction of information from multiple texts written about the same topic. The algorithm may not perform well on a set of unclustered/unrelated set of documents LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
LSA LSA creates a term-sentence matrix (frequency of words within sentences of the document then applies SVD (Single-Value Decomposition) to learn about relationships between words and sentences. Struggles with polysemy and synonyms Latent Semantic Analysis

III. Improvements

Co-reference resolution





A quick search for relevant research papers with QnA (questions and answering): https://typeset.io/

Quick Note

This page will be frequently updated. If you have a passion for or are an expert in the field of NLP, please reach out! I am happy to hear your input and advice!