Embedding

Embedding - a process (similar to tokenization) of converting the representation of words to vectors

Word embedding

LLMs use word embedding to understand the meaning of words, which is utilized in (inter alia) generating responses.

Embedding is a way to describe the relationships between words using numbers.

The GPT-3 model uses 768 dimensions to describe words.

Sentence embedding - process of describing the meaning of longer content. This process takes into account:

In this setup the similarity of vector values is basically the similarity of the information associated with them.

This type of embedding will be used when working with vector-databases, which can serve us to:

Sentence embedding often also features a larger number of dimensions