Parts of a typewriter. Photo by Florian Klauer. The token <|endoftext|> is a special token used as a document separator for OpenAI GPT models. It has become quite prevalent if you look closely: It has been used since GPT-2 and remains present in the OpenAI API for their latest models. Their tokenizer package, tiktoken, includes logic to process text with these special tokens. The markup <| and |> is widely used in the code bases of LangChain and text-generation-webui. …