8 Comments
User's avatar
Luq's avatar

Another banger post Shivani! One question, since most model predicting one token at a time, does it mean that after each token generated, it keeps repeating the entire process for the next token?

Shivani Virdi's avatar

Yes Luq, at inference time, it will first generate a token, append it to context and then carry out the inference again for the updated context

Luq's avatar

So, that means prompt like “show reason step by step, then give answer” is better than “think step by step, but show only answer”. Since the reasoning context is now ‘fixed’ during each new token update, less room for it to go down different reasoning path at each new inference process. Is this correct understanding? Also thanks for entertaining my questions ☺️

Shivani Virdi's avatar

that’s right Luq, thinking means nothing, models “think” in tokens if it’s in context only then will it pay “attention to those tokens”! Even in reasoning models, that’s exactly what’s happening (the thinking might be hidden from us in the UI, but the model context is enriched with that)!

Happy to answer all questions 😊

Luq's avatar

Thankyouu that helps a lot! You gain new fan here 😝 Looking forward for your next post!

Rohit Kumar Tiwari's avatar

Loved the breakdown @Shivani Virdi. Thanks for sharing!

Shivani Virdi's avatar

So glad to hear that, Rohit!

Akash Agarwal's avatar

Really liked your explanation