<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>OpenAI on Simple Enough Blog</title><link>https://blog-dev.simpleenough.net/tags/openai/</link><description>Recent content in OpenAI on Simple Enough Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 26 Mar 2025 09:05:00 +0100</lastBuildDate><atom:link href="https://blog-dev.simpleenough.net/tags/openai/index.xml" rel="self" type="application/rss+xml"/><item><title>1000: The Magic Number in the World of LLMs</title><link>https://blog-dev.simpleenough.net/blog/chunk/</link><pubDate>Wed, 26 Mar 2025 09:05:00 +0100</pubDate><guid>https://blog-dev.simpleenough.net/blog/chunk/</guid><description>&lt;h2 id="i-chunk-size-why-1000-tokens" class="heading">I. Chunk Size: Why ~1000 Tokens?&lt;a href="#i-chunk-size-why-1000-tokens" aria-labelledby="i-chunk-size-why-1000-tokens">
&lt;!-- &lt;i class="fas fa-link anchor">&lt;/i> -->
 &lt;svg class="svg-inline--fa fas fa-link anchor" fill="currentColor" aria-hidden="true" role="img" viewBox="0 0 640 512">&lt;use href="#fas-link">&lt;/use>&lt;/svg>&amp;nbsp;
 &lt;/a>
&lt;/h2>
&lt;p>The default value of &lt;strong>1000 tokens per chunk&lt;/strong> is not arbitrary:&lt;/p>
&lt;ul>
&lt;li>A chunk of this size generally contains &lt;strong>enough information to remain semantically coherent&lt;/strong> without being too large.&lt;/li>
&lt;li>It remains &lt;strong>compatible with the context window&lt;/strong> of modern LLMs (4k, 8k, 32k, or even 1M tokens).&lt;/li>
&lt;li>It helps avoid &lt;strong>diluting meaning&lt;/strong> or breaking semantic units.&lt;/li>
&lt;/ul>
&lt;p>In some cases, other sizes may be more appropriate:&lt;/p></description></item></channel></rss>