Prerequisites

Build with
generative AI in JavaScript

Phil Nash
Developer relations engineer at DataStax

Phil Nash

🦋 @philna.sh

𝕏 twitter.com/philnash

🐘 @philnash@mastodon.social

💼 linkedin.com/in/philnash

🏡 philna.sh/links

Prerequisites

Create a new database to be used later.

Prerequisites

Create a new repo from this template:
https://github.com/philnash/build-genai-with-js

Prerequisites

Create the new repo from the template

Install dependencies with npm install

Copy the .env.example file to .env

Fill in your GOOGLE_API_KEY from AI Studio

Build with generative AI in JavaScript

Why?

GenAI is
✨ brand new ✨
and changing all the time

Some good news

LLMs

LLM APIs

https://artificialanalysis.ai/

Gemini

import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: env.GOOGLE_API_KEY });

const response = await ai.models.generateContent({
  model: "gemini-2.0-flash",
  contents: "Write a whole presentation on getting started with AI and JavaScript"
});
const text = response.text;

Models are slow

Streaming responses

import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: env.GOOGLE_API_KEY });

const response = await genAI.models.generateContentStream({
  model: "gemini-2.0-flash",
  contents: "Write a whole presentation on getting started with AI and JavaScript",
});

for await (const chunk of result) {
  console.log(chunk.text);
}

Exercise

Use the user input to generate content from the Gemini API
Update to stream the content as it is received

You can import an authenticated genAI object from ./src/bot.js

Pretty cool
not very useful
...yet

OK, how do these things actually work?

Tokens

https://platform.openai.com/tokenizer

Fancy autocomplete

The model generates the next token in the sequence. e.g.

"Once upon a time"

", in"

"Once upon a time, in"

" a"

"Once upon a time, in a"

" land"

Changing the behaviour

topK: The number of tokens to consider when generating the next token

topP: The probability of the tokens to consider when generating the next token

Temperature: the randomness in choosing from the selected tokens

Rational

Keep temperature down or restrict topK or topP

Creative

Turn the temperature up and don't restrict topK and topP

Suggestions

Starting point

Temp: 0.2 | top-P 0.95 | top-K 30

Creative

Temp: 0.9 | top-P 0.99 | top-K 40

Rational

Temp: 0.1 | top-P 0.9 | top-K 20

https://www.kaggle.com/whitepaper-prompt-engineering

Beyond the model

The models don't know

So we need to tell them

Prompt engineering

Zero shot prompting

Instruction

"Tell me a story about a friendly goblin"

Pipkin was, to put it lightly, an anomaly. He was a goblin, yes, green-skinned and possessing the characteristically pointy nose and ears. But unlike the rest of his clan, the Grungle-Knack, Pipkin didn’t enjoy hoarding rusty spoons, tormenting travelers, or generally causing mayhem. Pipkin loved flowers.

Zero shot prompting

Question

"What is the capital of France?"

The capital of France is Paris.

One/few shot prompting

"Australia: Canberra. France: Paris. UK: London. USA:"

You got it! The answer is Washington D.C.

Let me know if you want to try another country! 😊

— Gemini 1.5 Flash

System prompt

System:
"You are a capital city assistant. Only respond with the name of the capital city for the country or state you are provided."

Prompt:
"Australia: Canberra. France: Paris. UK: London. USA:"

Washington, D.C.

Harder challenges

Prompt:
A farmer is on one side of a river with a wolf, a goat, and a cabbage. When he is crossing the river in a boat, he can only take one item with him at a time. The wolf will eat the goat if left alone together, and the goat will eat the cabbage if left alone together. How can the farmer transport only the goat across the river without it being eaten?

Harder challenges

This is a classic riddle! Here's how the farmer can safely transport the goat across the river:

1. **Take the goat across the river.** Leave the wolf and cabbage behind.

2. **Return alone.**

3. **Take the wolf across the river.**

4. **Bring the goat back to the original side.** The goat has to come back to prevent the wolf from eating it on the other side.

5. **Take the cabbage across the river.** Leave it with the wolf.

6. **Return alone.**

7. **Take the goat across the river.**

Now the farmer, wolf, goat, and cabbage are all safely on the other side!

Chain of thought

System:
"Think about your answer step by step and pay attention to the goal you are asked to achieve."

Prompt:
"A farmer is on one side of a river with a wolf, a goat, and a cabbage. When he is crossing the river in a boat, he can only take one item with him at a time. The wolf will eat the goat if left alone together, and the goat will eat the cabbage if left alone together. How can the farmer transport only the goat across the river without it being eaten?"

Chain of thought

Okay, I understand the classic farmer, wolf, goat, and cabbage riddle. The goal is to get *only* the goat across the river, leaving the wolf and cabbage behind. This means the farmer needs to make sure the wolf and cabbage are together. Here's the solution:

1. **Take the goat across the river.** The farmer takes the goat to the other side and returns alone.

Now the goat is on the other side, and the wolf and cabbage are safely on the original side. The farmer has successfully transported *only* the goat.

Chat

To chat with a model, we give the model the conversation context.

import { GoogleGenAI } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: env.GOOGLE_API_KEY });
const chat =  genAI.chats.create({ model: "gemini-2.0-flash" });

let userInput;

while(userInput = getUserInput()) {
  const result = await chat.sendMessage({ message: userInput });
  console.log(result.text);
}

Exercise

Update the app to use the chat API instead of the content API
Experiment with different temperature, top-K, top-P, system prompts

You can use the Bot class from ./src/bot.js

Break

Retrieval-Augmented Generation

RAG: the idea

Store our data
User makes query
Retrieve relevant data
Provide data as context to model with the original query
Model generates response
???
Profit!

RAG: the idea

Store our data
User makes query
Retrieve relevant data
Provide data as context to model with the original query
Model generates response
???
Profit!

Vector search

RAG: the idea

Store our data
User makes query
Retrieve relevant data
Provide data as context to model with the original query
Model generates response
???
Profit!

How do we get the data?

Where does it come from?

PDFs

Web pages

Video

Audio

Word docs

Existing databases

...anywhere data lives

PDFs

PDF choices

Library

Parse the PDF into text, try to get the order right

pdf-parse or Mozilla's pdf.js

Multimodal model

Can also describe tables and images

Gemini Flash or Mistral OCR

Open source services

Community support

Unstructured, Docling

Paid services

Continually improving

Unstructured, Azure AI Document Intelligence, LlamaParse

Web pages

Open source

Fetch data yourself and parse the page

fetch, html-to-markdown

Paid services

Continually improving

Firecrawl, Spider

fetch

import { Readability } from "@mozilla/readability";
import { JSDOM } from "jsdom";

const url = "https://www.datastax.com/blog/how-to-create-vector-embeddings-in-node-js";
const html = await fetch(url).then((res) => res.text());

const doc = new JSDOM(html, { url });
const reader = new Readability(doc.window.document);
const article = reader.parse();

console.log(article);

https://dtsx.io/3Ym2OLT

Chunking

Data needs to capture close to a single meaning

So we break it up into chunks

Chunking

import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
const text = loadText(); // get some text to split

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1024,
  chunkOverlap: 128
});
const documents = splitter.splitText(text);

https://chunkers.vercel.app

How do we turn content into vectors?

import { GoogleGenAI } from "@google/genai";
const genAI = new GoogleGenAI({ apiKey: env.GOOGLE_API_KEY });

const response = await genAI.models.embedContent({
  model: "text-embedding-004",
  contents: "Build with generative AI in JavaScript...",
});

const embedding = result.embeddings[0].values;
console.log(embedding.values);
// => [-0.0018657326, -0.00950177, -0.062905475, 0.011513614, -0.043369178, ... ]

What do we do
with the vectors?

Vector search

Cosine similarity

Database with Vector Index

Vector indexes implement efficient
approximate nearest neighbour (ANN) search over vectors

And let you store metadata

And possibly more

Storing vectors in Astra DB

import { DataAPIClient } from "@datastax/astra-db-ts";

export const client = new DataAPIClient(astraDb.token);
export const db = client.db(astraDb.endpoint);
export const collection = db.collection(astraDb.collection);

const data = [
  {
    content: "Build with generative AI in JavaScript...",
    $vector: vector
  },
  ...  
];

collection.insertMany(data);

Creating vectors with Astra DB

import { DataAPIClient } from "@datastax/astra-db-ts";

export const client = new DataAPIClient(astraDb.token);
export const db = client.db(astraDb.endpoint);
export const collection = db.collection(astraDb.collection);

const data = ["Build with generative AI in JavaScript...", ...];

collection.insertMany(
  data.map((data) => ({ $vectorize: data }))
);

Exercise

Let's create a collection for our database

How do we use the Vector Database?

import { DataAPIClient } from "@datastax/astra-db-ts";

export const client = new DataAPIClient(astraDb.token);
export const db = client.db(astraDb.endpoint);
export const collection = db.collection(astraDb.collection);

const userQuery = "How do I fix a broken doodad?"
const queryVector = await embed(userQuery);

const context = await collection
  .find(
    {},
    {
      sort: { $vector: queryVector }
    }
  )
  .toArray();

How do we use the Vector Database?

import { DataAPIClient } from "@datastax/astra-db-ts";

export const client = new DataAPIClient(astraDb.token);
export const db = client.db(astraDb.endpoint);
export const collection = db.collection(astraDb.collection);

const userQuery = "How do I fix a broken doodad?"

const context = await collection
  .find(
    {},
    {
      sort: { $vectorize: queryVector }
    }
  )
  .toArray();

Exercise

Create a file to vectorize and ingest data into the database
Create a function that you can use to perform a vector search

See data/faqs.md for your data and the exercise README for more details

Augmentation

It's about prompting

const userQuery = "How do I fix a broken doodad?";
const context = await getContext(userQuery);

const prompt = `Given the following context:

---
${context}
---

Answer the following question:

---
${userQuery}
---

If you do not know the answer, say "Have you tried turning it on and off again?".`;

Generation

Pass the prompt to the model

Exercise

Use your vector search function to get the context for the user
Create a prompt that includes the context and the user query
Pass the prompt to the model
Experiment with:
- The model parameters
- the number of documents to return
- The prompt

Evaluation

How do we know if our application is working?

We need to evaluate it

TDD for LLMs

Except TDD is deterministic

And LLMs aren't

So we need to test differently

Evaluation

We know that:

Results are sensitive to prompts
Models are always improving
There are many ways we can handle data for RAG

All of this needs a test suite that tells us whether we are improving our application when we change it

Evaluation

There are so many tools for this

There are so many tools for everything here

New ones being built every day

Promptfoo

Open source

JavaScript based

Promptfoo

prompts:
  - "Translate the following text to {{language}}: {{input}}"
providers:
  - openai:gpt-4o-mini
  - google:gemini-2.0-flash
tests:
  - vars:
      language: French
      input: Hello world
  - vars:
      language: German
      input: How's it going?

Promptfoo

prompts:
  - "Translate the following text to {{language}}: {{input}}"
providers:
  - openai:gpt-4o-mini
  - google:gemini-2.0-flash
tests:
  - vars:
      language: French
      input: Hello world
    assert:
      - type: equal
        value: "Bonjour le monde"
  - vars:
      language: German
      input: How's it going?
    assert: 
      - type: equal
        value: "Wie geht's?"

Exercise

Use the example promptfooconfig.yaml file to run the evals
Experiment with comparing results from different Gemini models
Experiment with different deterministic assertions

There are links to docs in the README file for the exercise

RAG Metrics

We need to evaluate:

the retrieval
the augmentation
The generation

Evaluating retrieval

We need to evaluate that the retrieved context is:

relevant to the query	context relevance
contains the answer to the query	context recall

Context relevance

Analyzes the query and context

Breaks down the context into individual statements

Evaluates each statement's relevance to the query

Calculates a relevance score based on the proportion of relevant statements

Context relevance demo

Context recall

Takes a ground truth statement and the retrieved context

Breaks down the ground truth into individual statements

Checks which statements are supported by the context

Calculates a recall score based on the proportion of supported statements

Context recall demo

Evaluating retrieval

If either of these metrics are failing we need to reconsider:

chunking strategy
embedding model
number of documents to retrieve
data quality

Evaluating augmentation and generation

We need to evaluate that the generated output is:

factual according to the ground truth	factuality
relevant to the query	answer relevance
correct according to the context	context faithfulness

Factuality

Takes the LLM output and the ground truth and checks the following relationships

Output is a subset of the reference and is fully consistent

Output is a superset of the reference and is fully consistent

Output contains all the same details as the reference

Output and reference differ, but differences don't matter for factuality

Output and reference disagree

Factuality demo

Answer Relevance

Takes the LLM output and checks that it is relevant to the original query

Uses an LLM to generate potential questions that the output could be answering

Compares these questions with the original query using embedding similarity

Calculates a relevance score based on the similarity scores

Answer Relevance demo

Context faithfulness

Takes the LLM output and the context and ensures the output doesn't include ideas not contained within the context

Extracts claims and statements from the LLM's output

Verifies each statement against the provided context

Calculates a faithfulness score based on the proportion of supported statements

Context faithfulness demo

Evaluating augmentation and generation

If these metrics are failing we need to reconsider:

prompting strategy
model hyper-parameters
model quality

In the test suite

You can run promptfoo

But it can also be integrated into your test suite

Exercise

Experiment with the example yaml promptfoo tests
Check out the available assertions in the test suite
Write some evaluations for your application

Reranking

We can use a model to rerank the retrieved documents

Reranker models use both the query and the retrieved documents

It can be used to improve the quality of the retrieved context

Reranking in Astra DB

import { DataAPIClient } from "@datastax/astra-db-ts";

export const client = new DataAPIClient(astraDb.token);
export const db = client.db(astraDb.endpoint);
export const collection = db.collection(astraDb.collection);

const userQuery = "How do I fix a broken doodad?"
const queryVector = await embed(userQuery);

const context = await collection
  .findAndRerank(
    {},
    {
      sort: { $hybrid: { $vector: queryVector } },
      rerankOn: "content",
      rerankQuery: userQuery,
      hybridLimits: 10
    }
  )
  .toArray();

Agents

What is an agent?

“...a Generative AI agent can be defined as an application that attempts to achieve a goal by observing the world and acting upon it using the tools that it has at its disposal”

Source: Wiesinger, Marlow, Vuskovic; “Agents”, Google

“...AI's that can perceive, reason, plan, and act”

Jensen Huang Keynote at CES 2025

What is a tool?

“...tools bridge the gap between the agent’s internal capabilities and the external world”

Source: Wiesinger, Marlow, Vuskovic; “Agents”, Google

To consider

Not all models can perform function calling

Tool names and descriptions are important

Execution time may increase

You don't always need an agent

Defining tools

function add({ a, b }) {
  return { additionResult: a + b };
}

Defining tools

import { Type } from '@google/genai';
              
const addFunctionDeclaration = {
  name: "add",
  description:
    "Add two numbers together. Use this for accurate addition.",
  parameters: {
    type: Type.OBJECT,
    description: "The numbers to add together",
    required: ["a", "b"],
    properties: {
      a: {
        type: Type.NUMBER,
        description: "The first number",
      },
      b: {
        type: Type.NUMBER,
        description: "The second number",
      },
    },
  },
}

Using tools

import { GoogleGenAI, FunctionCallingConfigMode } from "@google/genai";

const genAI = new GoogleGenAI({ apiKey: env.GOOGLE_API_KEY });

const response = await ai.models.generateContent({
  model: "gemini-2.0-flash",
  contents: "Add 2 and 3 together",
  config: {
    toolConfig: {
      functionCallingConfig: {
        mode: FunctionCallingConfigMode.ANY
      }
    },
    tools: [addFunctionDeclaration],
  }
});
console.log(response.functionCalls);

The tool loop

1. The agent receives a prompt

2. The agent generates a plan

3. The agent executes the plan

4. The agent receives the result

5. The agent generates a new plan

6. Repeat until the goal is achieved

The tool loop

      const response = await bot.sendMessage(prompt);
      if (!response.functionCalls) {
        output.write(`${response.text}\n`);
      }

      let functionCalls = response.functionCalls;

      while (functionCalls && functionCalls.length > 0) {
        const functionResponses = await Promise.all(
          functionCalls.map(async (call) => {
            const { name, args } = call;
            const response = await functions[name](args);
            return {
              functionResponse: {
                name,
                response,
              },
            };
          })
        );

        const newResponse = await bot.sendMessage(functionResponses);
        if (!newResponse.functionCalls) {
          output.write(newResponse.text);
        }
        functionCalls = newResponse.functionCalls;
      }

Exercise

Create more functions that can be called from the agent, more maths, get today's date/time,
Create a tool that searches the vector database
Create function definitions for the tools
Create a tool that calls another model

See the exercise README for more details

Extras

Libraries

npm install ai

Vercel's AI SDK

Unified interface to AI providers

Simplifies streaming responses for Next.js, Svelte, Nuxt, Node

Libraries

npm install langchain

npm install llamaindex

Popular (originally Python) libraries that provide many tools for AI engineers

Including: unified interface to AI providers, strategies for data ingestion, useful patterns

Langflow

Exercise

Install Langflow
Run an example flow
Build your own agent

Thanks

Don't forget to evaluate the session!

🦋 @philna.sh

𝕏 twitter.com/philnash

🐘 @philnash@mastodon.social

💼 linkedin.com/in/philnash

🏡 philna.sh/links

Prerequisites