search
reorder
Newsroom
Crafting AI innovations, one word at a time.
school
Academic
rss_feed
RSS
Featured
September 18, 2024 • 10 minutes read
Jina Embeddings V3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
September 11, 2024 • 12 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
August 30, 2024 • 10 minutes read
Jina ColBERT v2: Multilingual Late Interaction Retriever for Embedding and Reranking
Jina ColBERT v2 supports 89 languages with superior retrieval performance, user-controlled output dimensions, and 8192 token-length.
update
Latest
August 26, 2024 • 13 minutes read
The What and Why of Text-Image Modality Gap in CLIP Models
August 22, 2024 • 8 minutes read
Late Chunking in Long-Context Embedding Models
August 14, 2024 • 17 minutes read
By Hoovering Up the Web, AI Is Poisoning Itself
school
Academic Publications
arXiv
September 18, 2024
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
arXiv
September 07, 2024
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
arXiv
August 30, 2024
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
arXiv
June 21, 2024
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
ICML 2024
May 30, 2024
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
arXiv
February 26, 2024
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings
arXiv
October 30, 2023
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
EMNLP 2023
July 20, 2023
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
8 publications in total.
Featured
All
Press release
Tech blog
Opinion
Knowledge base
Software update
Event
1
2
3
…
23
September 18, 2024 • 10 minutes read
Jina Embeddings V3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
September 11, 2024 • 12 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
August 30, 2024 • 10 minutes read
Jina ColBERT v2: Multilingual Late Interaction Retriever for Embedding and Reranking
Jina ColBERT v2 supports 89 languages with superior retrieval performance, user-controlled output dimensions, and 8192 token-length.
August 26, 2024 • 13 minutes read
The What and Why of Text-Image Modality Gap in CLIP Models
You can't just use a CLIP model to retrieve text and images and sort the results by score. Why? Because of the modality gap. What is it, and where does it come from?
August 22, 2024 • 8 minutes read
Late Chunking in Long-Context Embedding Models
Chunking long documents while preserving contextual information is challenging. We introduce the "Late Chunking" that leverages long-context embedding models to generate contextual chunk embeddings for better retrieval applications.
August 14, 2024 • 17 minutes read
By Hoovering Up the Web, AI Is Poisoning Itself
What does it mean for LLMs when the web has been strip-mined clean, content providers have locked their doors, and there’s barely a trickle of new data to scrape?
August 07, 2024 • 10 minutes read
What We Learned at ICML2024 ft. PLaG, XRM, tinyBenchmark, MagicLens, Prompt Sketching etc.
We had a blast at ICML 2024 in Vienna, and we want to share with you everything we said, saw, and learned.
July 31, 2024 • 17 minutes read
Rephrased Labels Improve Zero-Shot Text Classification by 30%
When using embedding models for zero-shot classification, rephrasing the class label to "This is seriously about 'LABEL'" gives higher accuracy vs. using LABEL alone. But how, and why?
July 24, 2024 • 10 minutes read
Can Embedding/Reranker Models Compare Numbers?
A lot of LLMs can't figure out that 9.11 is actually smaller than 9.9. Can our embedding and reranker models do any better?
July 19, 2024 • 22 minutes read
Is Romance Generative AI's Killer App? We Hope Not
Are AI boyfriends and girlfriends GenAI's killer app? AI romance is no Jane Austen novel, but "social chatbots" are one of the few generative AI businesses with a clear path to profit. Take an up-close and personal look with us.
July 18, 2024 • 11 minutes read
No. You Can't Use Reranker to Improve SEO
But if you work in SEO, it could be interesting to see things from the other side of the table; understand how embeddings and rerankers play their roles in modern search systems.
June 28, 2024 • 7 minutes read
Handcrafting Image Prompts Is Dead: Reverse Engineer Midjourney-style Images with PromptPerfect
From Punk Einstein to Turbo Pigeons: Use PromptPerfect Interactive to reverse engineer prompts from pictures and generate Midjourney-style images with real-time feedback.
1
2
3
…
23
Search by title
search
Filter by product
Filter by author
Offices
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
Geschäftsanschrift: Leipzigerstr. 96, 10117 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing Haidian, China
location_on
Shenzhen, China
402, Floor 4, Fu'an Technology Building, Shenzhen Nanshan, China
Search Foundation
Embeddings
Reranker
Reader
Segmenter
Get Jina AI API key
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Terms & Conditions
Privacy
Manage Cookies
email
language
English
science
Jina AI GmbH © 2020-2024.