Intercept OpenAI tokens count in LlamaIndex and LangChain applications

1 min readJul 23, 2023

How to obtain the correct tokens count of OpenAI calls in Llama Index and LangChain

In the last week, I have been working on a knowledge base solution that uses LlamaIndex and LangChain to implement a Q&A workflow of custom documents (text, HTML, PDF, Doc, Powerpoint, etc.).
This solution uses OpenAI for embeddings and completion. It will be integrated in a multi-tenant chat platform, so I need accurate tracking of the tokens used for correct invoicing at the end of each month.

None of the solutions I found online allowed me to obtain the right data. In some cases the tokens counts where off. In other cases there were no discrimination between input and output tokens (depending on the model used, they have different costs). In all the examples I found, the tokens used for the embedding of the user prompts were never accounted for.

So, using my complete and utter ignorance in Python development, I wrote the worst code you’ll ever see that does the job I need.

The Usage class monkey patches some methods of the OpenAI SDK that I needed to intercept and extracts the model and token usage statistics for later use.

It’s an ugly code, but it does what I need. Enjoy!

Intercept OpenAI tokens count in LlamaIndex and LangChain applications

Written by Filippo Toso