<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>ChatGPT on Simple Enough Blog</title><link>https://blog-dev.simpleenough.net/tags/chatgpt/</link><description>Recent content in ChatGPT on Simple Enough Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 01 Apr 2025 17:52:00 +0100</lastBuildDate><atom:link href="https://blog-dev.simpleenough.net/tags/chatgpt/index.xml" rel="self" type="application/rss+xml"/><item><title>How to Count Tokens Effectively</title><link>https://blog-dev.simpleenough.net/blog/token/</link><pubDate>Tue, 01 Apr 2025 17:52:00 +0100</pubDate><guid>https://blog-dev.simpleenough.net/blog/token/</guid><description>&lt;h2 id="i-what-is-a-token" class="heading">I. What Is a Token?&lt;a href="#i-what-is-a-token" aria-labelledby="i-what-is-a-token">
&lt;!-- &lt;i class="fas fa-link anchor">&lt;/i> -->
 &lt;svg class="svg-inline--fa fas fa-link anchor" fill="currentColor" aria-hidden="true" role="img" viewBox="0 0 640 512">&lt;use href="#fas-link">&lt;/use>&lt;/svg>&amp;nbsp;
 &lt;/a>
&lt;/h2>
&lt;p>A &lt;strong>token&lt;/strong> is a unit of text that the model processes. It could be a full word, part of a word, or even a special character.&lt;/p>




&lt;h3 id="concrete-examples" class="heading">Concrete Examples&lt;a href="#concrete-examples" aria-labelledby="concrete-examples">
&lt;!-- &lt;i class="fas fa-link anchor">&lt;/i> -->
 &lt;svg class="svg-inline--fa fas fa-link anchor" fill="currentColor" aria-hidden="true" role="img" viewBox="0 0 640 512">&lt;use href="#fas-link">&lt;/use>&lt;/svg>&amp;nbsp;
 &lt;/a>
&lt;/h3>






 






&lt;table class="table">
 &lt;thead>
 
 
 &lt;tr>
 &lt;th >Text&lt;/th>
 &lt;th >Number of Tokens&lt;/th>
 &lt;/tr>
 
 &lt;/thead>
 &lt;tbody>
 
 
 &lt;tr>
 &lt;td >Hello&lt;/td>
 &lt;td >1&lt;/td>
 &lt;/tr>
 
 
 
 &lt;tr>
 &lt;td >I am a developer&lt;/td>
 &lt;td >4&lt;/td>
 &lt;/tr>
 
 
 
 &lt;tr>
 &lt;td >Artificial intelligence is fascinating!&lt;/td>
 &lt;td >5&lt;/td>
 &lt;/tr>
 
 
 
 &lt;tr>
 &lt;td >GPT is a powerful model.&lt;/td>
 &lt;td >6&lt;/td>
 &lt;/tr>
 
 &lt;/tbody>
&lt;/table>



&lt;h3 id="specifics-of-tokenization" class="heading">Specifics of Tokenization&lt;a href="#specifics-of-tokenization" aria-labelledby="specifics-of-tokenization">
&lt;!-- &lt;i class="fas fa-link anchor">&lt;/i> -->
 &lt;svg class="svg-inline--fa fas fa-link anchor" fill="currentColor" aria-hidden="true" role="img" viewBox="0 0 640 512">&lt;use href="#fas-link">&lt;/use>&lt;/svg>&amp;nbsp;
 &lt;/a>
&lt;/h3>
&lt;ul>
&lt;li>In English, short words are often 1 token (e.g., &lt;code>&amp;quot;Hello&amp;quot;&lt;/code> = 1 token).&lt;/li>
&lt;li>In French and other languages, longer words can be split into multiple tokens (e.g., &lt;code>&amp;quot;développeur&amp;quot;&lt;/code> or &lt;code>&amp;quot;intelligence&amp;quot;&lt;/code> = 2 tokens).&lt;/li>
&lt;li>Punctuation also counts as tokens.&lt;/li>
&lt;li>Spaces are included with the following word.&lt;/li>
&lt;li>Acronyms are usually treated as 1 token.&lt;/li>
&lt;/ul>
&lt;p>On average, 100 tokens correspond to roughly 75 words, though this can vary depending on the language and writing style.&lt;/p></description></item></channel></rss>