Another banger post Shivani! One question, since most model predicting one token at a time, does it mean that after each token generated, it keeps repeating the entire process for the next token?
So, that means prompt like “show reason step by step, then give answer” is better than “think step by step, but show only answer”. Since the reasoning context is now ‘fixed’ during each new token update, less room for it to go down different reasoning path at each new inference process. Is this correct understanding? Also thanks for entertaining my questions ☺️
that’s right Luq, thinking means nothing, models “think” in tokens if it’s in context only then will it pay “attention to those tokens”! Even in reasoning models, that’s exactly what’s happening (the thinking might be hidden from us in the UI, but the model context is enriched with that)!
Another banger post Shivani! One question, since most model predicting one token at a time, does it mean that after each token generated, it keeps repeating the entire process for the next token?
Yes Luq, at inference time, it will first generate a token, append it to context and then carry out the inference again for the updated context
So, that means prompt like “show reason step by step, then give answer” is better than “think step by step, but show only answer”. Since the reasoning context is now ‘fixed’ during each new token update, less room for it to go down different reasoning path at each new inference process. Is this correct understanding? Also thanks for entertaining my questions ☺️
that’s right Luq, thinking means nothing, models “think” in tokens if it’s in context only then will it pay “attention to those tokens”! Even in reasoning models, that’s exactly what’s happening (the thinking might be hidden from us in the UI, but the model context is enriched with that)!
Happy to answer all questions 😊
Thankyouu that helps a lot! You gain new fan here 😝 Looking forward for your next post!
Loved the breakdown @Shivani Virdi. Thanks for sharing!
So glad to hear that, Rohit!