Segmenter
Free API for segmenting long text into chunks and tokenization.
What is a Segmenter?
A segmenter is a crucial component that converts text into tokens or chunks, which are the basic units of data that an embedding/reranker model or LLM processes. Tokens can represent whole words, parts of words, or even individual characters.
Chunking long documents, lightning fast!
You can also use Segmenter API to cut long documents into smaller chunks, making it easier to process them in embeddings or rerankers. We leverage common structural cues and build a set of rules and heuristics which perform well across diverse types of content, e.g. Markdown, HTML, LaTeX and CJK languages.
Maximum number of characters in each chunk. In practice the chunk length can be smaller than this value, if there is a good boundary in the text.
0 chunks in total
Segmenter API is free!
By providing your API key, you can access a higher rate limit, and your key won't be charged.
Rate Limit
Product | API Endpoint | Description | Allowed Request | Without API Key (RPM) | With API Key (RPM) | With Premium API Key (RPM) | Average Latency (s) | Token Usage Counting | |
---|---|---|---|---|---|---|---|---|---|
Segmenter API | https://segment.jina.ai | Tokenize and segment long text | GET/POST | 20 | 200 | 1000 | 0.3 | Token is not counted as usage. | |
Reader API | https://r.jina.ai | Convert URL to LLM-friendly text | GET/POST | 20 | 200 | 1000 | 1.6 | Count the number of tokens in the output response. | |
Reader API | https://s.jina.ai | Search the web and convert results to LLM-friendly text | GET/POST | 5 | 40 | 100 | 7.7 | Count the number of tokens in the output response. | |
Embedding API | https://api.jina.ai/v1/embeddings | Convert text/images to fixed-length vectors | POST | block | 60keyboard_double_arrow_up | 300keyboard_double_arrow_up | bolt depends on the input size | Count the number of tokens in the input request. | |
Reranker API | https://api.jina.ai/v1/rerank | Tokenize and segment long text | POST | block | 60keyboard_double_arrow_up | 300keyboard_double_arrow_up | bolt depends on the input size | Count the number of tokens in the input request. |
Segmenter API
Our Segmenter API is crucial for helping LLMs manage input within context limits, and optimizing model performance. It allows developers to count tokens and extract relevant text segments, ensuring efficient data processing and cost management.
Use GET request to count tokens
Use POST request for more features
Return the last N tokens
Segmenter
cl100k_base
upload
Request
Bash
Language
curl -X POST 'https://segment.jina.ai/' \
-H "Content-Type: application/json" \
-d '{
"content": "
Jina AI: Your Search Foundation, Supercharged! 🚀
Ihrer Suchgrundlage, aufgeladen! 🚀
您的搜索底座,从此不同!🚀
検索ベース,もう二度と同じことはありません!🚀
"}'
key
API key
Available tokens
0
Rate Limit
Columns
Product | API Endpoint | Description | Allowed Request | Without API Key (RPM) | With API Key (RPM) | With Premium API Key (RPM) | Average Latency (s) | Token Usage Counting | |
---|---|---|---|---|---|---|---|---|---|
Segmenter API | https://segment.jina.ai | Tokenize and segment long text | GET/POST | 20 | 200 | 1000 | 0.3 | Token is not counted as usage. | |
Reader API | https://r.jina.ai | Convert URL to LLM-friendly text | GET/POST | 20 | 200 | 1000 | 1.6 | Count the number of tokens in the output response. | |
Reader API | https://s.jina.ai | Search the web and convert results to LLM-friendly text | GET/POST | 5 | 40 | 100 | 7.7 | Count the number of tokens in the output response. | |
Embedding API | https://api.jina.ai/v1/embeddings | Convert text/images to fixed-length vectors | POST | block | 60keyboard_double_arrow_up | 300keyboard_double_arrow_up | bolt depends on the input size | Count the number of tokens in the input request. | |
Reranker API | https://api.jina.ai/v1/rerank | Tokenize and segment long text | POST | block | 60keyboard_double_arrow_up | 300keyboard_double_arrow_up | bolt depends on the input size | Count the number of tokens in the input request. |
Segmenter-related common questions
How much does the Segmenter API cost?
If I don't provide an API key, what is the rate limit?
If I provide an API key, what is the rate limit?
Will you charge the tokens from my API key?
Does the Segmenter API support multiple languages?
What is the difference between GET and POST requests?
What is the maximum length I can tokenize per request?
How does the chunking feature work? Is it semantic chunking?
How do you handle special tokens such as 'endoftext' in the Segmenter API?
Does chunking support other languages than English?
API-related common questions
code
Can I use the same API key for embedding, reranking, reader, fine-tuning APIs?
code
Can I monitor the token usage of my API key?
code
What should I do if I forget my API key?
code
Do API keys expire?
code
Why is the first request for some models slow?
code
Is user input data used for training your models?
Billing-related common questions
attach_money
Is billing based on the number of sentences or requests?
attach_money
Is there a free trial available for new users?
attach_money
Are tokens charged for failed requests?
attach_money
What payment methods are accepted?
attach_money
Is invoicing available for token purchases?