Comparing TF-IDF, GloVe, and SBERT

July 06, 2023 (updated in February 28, 2025)

#machinelearning

I’m exploring better algorithms for findlike. At the moment, only lexical search algorithms have been implemented, but I know for a fact that semantic similarity trumps lexical search by a landslide.

So I ran a quick test to see how large the difference is between a context-aware model (SBERT) and a free-context model (GloVe) and statistical algorithms (TF-IDF, BM25).

Table 1: Algorithms and the pre-trained models used in this experiment.

Algorithm	Pre-trained model
GloVe	globe-wiki-gigaword-50
SBERT	all-MiniLM-L6-v2

When comparing NLP models and algorithms, people tend to use the same cliché phrase pairs like “I like to watch television / I’m wearing a wristwatch”. So let’s perk up things a bit and use more realistic sentences, preferably ones that could well have come straight out of my digital garden:

Reference) “In fact, writing down what matters is an art, as students make a lot of effort to process information and shape it into something useful.”

Sentence A) “Summarizing and transforming huge chunks of text into meaningful knowledge is a rewarding, albeit demanding craft.”

Sentence B) “The Zettelkasten method is the preferred personal knowledge management system for avid note takers nowadays.”

Sentence C) “As a matter of fact, given this useful information, a lot of art students are down to make some effort and get into shape, he writes.”

Notice that:

Sentence A conveys more or less the same idea as reference sentence but with a different wording.
Sentence B is loosely related to reference sentence, and belongs to the same overarching subject (information processing).
Sentence C has nothing to do with reference sentence, but shares a lot of its word roots: “fact”, “write”, “matter”, “art”, “student”, “effort”, “information”, “shape”, “useful”.

Results

sentence	Expected similarity	TF-IDF	BM25	GloVe (averaged)	GloVe + SIF	SBERT
A	medium/high	0.0	0.0	0.96	-0.29	0.39
B	medium/low	0.0	0.0	0.93	-0.63	0.17
C	low	0.75	1.59	0.99	0.47	0.69

Unsurprisingly, the statistical approaches TF-IDF and BM25 see zero similarity between the reference sentence and A and B, and a lot of similarity between reference and C, given that almost every word in the former is also present in the latter.

However, I confess that I expected more from pre-trained SBERT. It wasn’t able to capture the essence of each sentence, especially between Reference and C. Although they overlap in lexical grounds, the two sentences have significantly different themes.

Bruno Arine

Comparing TF-IDF, GloVe, and SBERT

Results

Related Posts