Token window

Token window - the limit on the length of content processed at any given time.

This includes both input data and data generated by the model (output).

Currently:

GPT family: between 4-16k tokens
Claude2 model can process up to 100k tokens

Controlling tokens

When using GPT API we need to handle the token window on our own. We can use tiktoken for estimating the tokens used (tiktoken provides the approximated value, due to the model being updated over time).

For gpt-3.5-turbo and gpt-4 we must also consider the ChatML's structure (which also uses tokens).

Knowing the tokens used before actually using them is not sufficient, though.

You may need to take actions to control the number of tokens in the prompt, by e.g:

Using a model that supports more tokens (e.g. gpt-3.5-turbo-16k)
Choosing different versions of the prompt or its parts
Reducing the context
cutting off earlier conversation messages
compressing the information in the current context (e.g summarizing the conversation with the model up to this point)

INFO

Controlling the token window means finding the balance between providing the meaningful information for the current conversation and its volume.

Even though Claude2 allows for up to 100k tokens in the context, it might be expensive and the enormous context might be prone to noise.

Llm

Neural-networks

Behavioural

Structural

Testing

Configuring-jest

Cypress

Fundamentals

Jest

Data-store

Databases

Testing

Linux-from-scratch

Pocket-chip

Zsh

Glossary

Hardening

Osint

Pentesting

Recce

Tools

Burp

Nmap

Write-ups

Dvwa

PicoCTF

Thm

Hooks

Packages

React-query

Ethereum

Frontend

Solidity

Woo-commerce

Keyboard-maestro

Macros

Script-kit

Shortcuts

Controlling tokens

Configuring-jest

Cypress

Fundamentals

Jest

Recce

Burp

Nmap

Dvwa

PicoCTF

Thm

React-query

Macros

Token window

Controlling tokens ​

Controlling tokens