Home
>
Data Science
>
Update Pinecone: A Vector Database for Machine Learning Applications

March 30, 2022 by Phu Nguyen

Update Pinecone: A Vector Database for Machine Learning Applications

Main Contents:

Pinecone: A Vector Database for Machine Learning Applications is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Pinecone: A Vector Database for Machine Learning Applications in today’s post !

Read more about Pinecone: A Vector Database for Machine Learning Applications at Wikipedia

You can find content about Pinecone: A Vector Database for Machine Learning Applications from the Wikipedia website

As more applications employ machine learning and artificial intelligence for tasks such as rating, recommendation engines, anomaly detection, and duplication removal, companies face a quandary between development costs and performance as they try to force traditional databases to accomplish tasks for which they weren’t designed.

That’s according to Pinecone founder and CEO Edo Liberty, who left Amazon Web Services with an eye toward building new technology to alleviate this pain.

At AWS, he led Amazon’s AI lab, including the team that built Amazon’s cloud machine-learning platform SageMaker. Before that he ran Yahoo’s Scalable Machine Learning Platforms group and did doctoral and post-doc work on big data and machine learning frameworks.

“It was obvious to me that the world of kind of machine learning and databases were in a head-on collision path where machine learning was representing data as these new objects called vectors that no database was really able to handle. And as time went by, more and more jobs, and more and more applications, were using machine learning to run things like recommendation, personalization, all these things, and they just needed the infrastructure to be able to run it, and it didn’t exist,” Liberty said.

He described his idea for Pinecone, previously called HyperCube, as the “connective tissue between the production world of databases and the continuous and fluid kind of more experimental side of machine learning.”

Forrester projects that the AI software market will grow to $37 billion by 2025, becoming a new middleware category of algorithms, data sets, and tools that enable embedding AI functionality in all software products.

Machine learning models take data such as documents, videos or user behaviors, and convert them into vector embeddings, which describe the semantic similarity of objects and concepts by how close they are to each other as points in vector spaces. These usually are long, complex collections of numbers, and the rows and tables of conventional databases don’t efficiently accommodate them.

Applications that need to accurately filter and rank large collections of vectors in real-time require a highly specialized data infrastructure to answer queries like nearest neighbor and max-dot-product search accurately and in milliseconds.

“When a database that is schematized data, and the way you select out of it is with SQL or some other logic, right, based on keys and values. And so with a search engine with the collection of documents, the way you select from them is specifying terms in the documents. And you can kind of use the intersection of those documents that contain those terms,” Liberty explained.

“When you have high-dimensional vectors, the object is just a very long list of numbers, say 1,024 numbers, just literally floating points, right? Just 0.8, 1.6 so on. You don’t have the table to do like SQL on and you don’t have the documents, and so really the tools and the languages that we have to specify what we’re interested in, just don’t hold anymore,” he said. “The way you fetch from a collection of data, a collection of vectors, has its own logic, and it speaks the language of geometry, like nearest neighbor or in a box.”

While it’s possible to homebrew infrastructure to accomplish this, it’s too labor-intensive for most companies, Liberty said.

“I’ve seen many companies kind of between a rock and a hard place, you know, they want some really cool application, they want to unleash machine learning in real-time. And they see a big potential business improvement, but they have to pay for it with many months of development or some compromise on the quality or simply poor performance. And it’s always a painful self-negotiation they have to go through. With Pinecone, we really try to liberate them from that,” he said.

Speed and Scale

There are three parts to Pinecone. The first is a core index, converting high-dimensional vectors from third-party data sources into a machine-learning ingestible format so they can be saved and searched accurately and efficiently.

Container distribution dynamically ensures performance regardless of scale, handling load balancing, replication, name-spacing, sharding, and more at latencies below 50 milliseconds for queries, updates, and embeddings. Being totally serverless, Pinecone can run on as many nodes as you want.

“There’s absolutely nothing that prevents us from running on 100 billion objects. It’s definitely designed to be able to do that,” Liberty said.

The company professes a real-time indexing speed 30 times higher than open source libraries.

Open The Future of Machine Learning – Pinecone at FICC on YouTube.

The third component is a fully automated cloud management layer that frees users from having to procure and manage hardware or install anything. You can just start an index and pump data into it and start querying. The Python-based API enables updating and querying vector indexes from anywhere, including Jupyter notebooks.

It’s designed for self-service, with consumption-based pricing to enable companies to build proofs of concept with little overhead and to scale effortlessly.

The company recently raised a $10 million seed round led by Wing Venture Capital, one of the major backers of startups including the data warehouse-as-a-service offering Snowflake and the service control platform Kong.

“The world abounds with databases and it is reasonable to ask why it needs another. The answer lies in the distinctive requirements of AI-powered application,” Peter Wagner, founding partner at Wing Venture Capital, wrote in a blog post.

“New workloads and their core data types have always been the catalysts for the creation of new data platforms. ML and its vectors are next in line[…] Looking ahead, it is hard to imagine many interesting applications that aren’t grounded in AI in some fundamental way. AI will be a pervasive property of modern software, as ubiquitous and important as oxygen.”

Most of the people that care about a vector database aren’t the scientists and engineers, though they care about being able to get to production, Liberty said.

“The people who really care about it are the engineers and the ML infrastructure [people], who build those systems and need to run them day in day out,” Liberty said.

“It’s a sigh of relief because they don’t have to figure out like 1,000 different pieces of software and they don’t have to build a distributed system from scratch, or they don’t have to integrate like 10 different tools. … They are able to enable their scientists and engineers and provide the right way to support [them].”

AWS is a sponsor of InApps.

Source: InApps.net

List of Keywords users find our article on Google:

frame vector

thank you vector

party vector

care vector

snowflake vector

eyes vectors

success vectors

rock vector

idea vector

face vector

head vector

education vector

pinecone ai

wings vector

running vector

tradition vector

travel vectors

success vector

ui/ux design vector

database for machine learning

share icon vector

sagemaker snowflake

technology vectors

concept vector

machine vector

modern vector

people vector

collection vector

phone vector

machine learning in database

deep learning applications

embedding tissue

removal company software

machine learning app development

custom application development

software

game ui database

snowflake computing linkedin

party vectors

tissue box design vector

vector database

offer design vector

vector databases

hypercube wiki

menu template vector

wing vector

liberty hardware jobs

framed vector

anomaly detection ml.net

technology icons vector

wawa ratings

website templates vector

vector snowflake

vector frame

capital one ml platform

liberty hardware website

good idea vector

machine learning applications

snowflake sagemaker

traditional vector

edo liberty

email icon vector

vector cartoon

snowflake replication

clouds vector

peter kafka email

vector mobile app

business vectors

convert vector to data frame r

culture vector

kubernetes cloud cost anomaly

day vector

saas geometry

homebrew pump

vector.me

cultural vector

playing vector

k8s cloud cost anomaly detection

machine duplication

post it vector

aws machine learning lab

focus vectors

machine learning databases

the nearest wawa

asia vector

vector artificial intelligence

wing venture capital

capital one ml/ai

ml metadata python

prometheus property management

vector app

sagemaker pricing

vector people

designing data intensive applications

wawa app

capital one venture 100k

fintech app development cost

pinecone

hire work machining

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.