Moving Beyond Exact Matches
Discover how to build a quiz capable of recognizing that “Paris is the capital of France” and “The capital of France is Paris” share the same core meaning.
Follow this guide to implement this revolutionary semantic answer grading technique, made entirely possible by Transformers.js.
Traditional Quiz Limitations:
-
❌ Exact match only (case sensitive)
- If the answer is “New York City,” writing “new york city” will be marked wrong.
-
❌ No paraphrasing
- You must match the source exactly. If the correct answer in the system is “The capital of France is Paris,” but you write “Paris is the capital of France,” you will still be marked wrong. Your own words don’t count.
-
❌ Regex/Maintenance Headaches for the Quiz Creator
- The person who creates the quiz has to anticipate every possible way a User might phrase the correct answer.
-
❌ Limited feedback
- The only feedback you get is right or wrong. It can’t offer any suggestions, show you how close your answer was, or explain why your phrasing failed the exact-match test.
Enter Transformers.js
Hugging Face’s library for running transformers models directly in the browser using WebAssembly (WASM).
It brings the power of highly efficient models like Mistral and all-MiniLM-L6-v2 to the client-side, all running entirely in your browser.
Why Browser-Based
| Benefits | Description |
|---|---|
| Privacy | Answers never leave the user’s device |
| Zero cost | Runs efficiently on minimal cloud instances |
| Offline | Works without internet after initial load |
| Low latency | No network round-trips |
How Semantic Grading Works
The Process happens in three steps:
1. Text Embeddings
We convert text into embedding vectors lists of numbers that capture meaning. The model we use (all-MiniLM-L6-v2) produces 384-dimensional vectors.
import { pipeline } from '@huggingface/transformers';
// Load the embedding model
const embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2',
{ dtype: 'q8' } // Quantized for faster loading
);
// Generate embedding for text
async function embed(text) {
const output = await embedder(text, {
pooling: 'mean',
normalize: true
});
return Array.from(output.data);
}
// 1. Import the necessary function from the Transformers.js library.
// The 'pipeline' function is the easiest way to load and use pre-trained AI models.
import { pipeline } from '@huggingface/transformers';
// --- Model Setup ---
// Declare a variable to hold the loaded AI model (the "embedder").
let embedder;
/**
* An Immediately Invoked Function Expression (IIFE) is often used here
* to load the model right when the script starts.
*/
// Load the embedding model
embedder = await pipeline(
'feature-extraction', // Specifies the task: turning text into numerical features/vectors.
'Xenova/all-MiniLM-L6-v2', // The specific model name: a small, fast, highly effective Sentence Transformer.
{ dtype: 'q8' } // Optimization: 'q8' means 8-bit quantization, which makes the model file much smaller and faster to load in the browser.
);
// --- The Core Embedding Function ---
/**
* Converts a piece of text (like a correct answer or a user's answer)
* into its numerical vector (the "fingerprint" of its meaning).
*
* @param {string} text - The input sentence (e.g., "Paris is the capital").
* @returns {number[]} A numerical array (the vector/embedding).
*/
async function embed(text) {
// Pass the text into the loaded model.
const output = await embedder(text, {
// 'mean' pooling averages all the word vectors into a single sentence vector.
pooling: 'mean',
// 'normalize' ensures the vector's length is 1, which is required for accurate cosine similarity.
normalize: true
});
// Convert the model's internal data format into a standard JavaScript Array
// so we can use it easily in our cosineSimilarity function.
return Array.from(output.data);
}
2. Cosine Similarity
To compare two answers, we calculate the cosine similarity between their vectors. This measures the angle between them smaller angle means more similar meaning.
/**
* Calculates the cosine similarity between two vectors (A and B).
* In our case, these are the numerical "fingerprints" of the two sentences.
*
* @param {number[]} a - The vector for the correct answer.
* @param {number[]} b - The vector for the user's answer.
* @returns {number} A similarity score between 0 and 1.
*/
function cosineSimilarity(a, b) {
// 1. Initialize variables used for the numerator and denominator of the formula.
// dotProduct corresponds to the A • B part (the comparison).
let dotProduct = 0;
// normA and normB will hold the squared length of each vector (for the denominator).
let normA = 0;
let normB = 0;
// 2. Loop through every dimension (feature) of the two vectors.
for (let i = 0; i < a.length; i++) {
// Calculate the A • B (dot product):
// Multiply the corresponding elements and add them up.
dotProduct += a[i] * b[i];
// Calculate the squared length of vector A (A * A).
normA += a[i] * a[i];
// Calculate the squared length of vector B (B * B).
normB += b[i] * b[i];
}
// 3. Return the final result using the full formula: (A • B) / (||A|| * ||B||)
// We take the square root of the norms to get the actual lengths (||A|| and ||B||).
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
The formula:
To see how close they are, we calculate the cosine similarity. Think of this as a special measurement that tells us the “distance” between the two meanings.
The formula itself looks a bit intimidating, but don’t worry about solving it by hand. The model handles all the complex math instantly:
Where, for our purposes:
-
is the comparison of the unique dimensions (features) of the two vectors.
-
and are the lengths of the vectors, ensuring the score is always between 0 and 1.
Simply put: The closer the calculated score is to 1, the more confident we are that the user’s answer is correct.
3. Grade Based on Similarity Score
We map similarity scores to grades:
| Similarity | Rating | Points | Meaning |
|---|---|---|---|
| ≥ 85% | Excellent! | 1.0 | Core concept captured accurately |
| ≥ 70% | Good! | 0.8 | Good understanding with minor gaps |
| ≥ 55% | Partial | 0.5 | Some relevant concepts mentioned |
| < 55% | Try Again | 0 | Answer doesn’t match expected concept |
Let’s build it step by step:
1. Project Setup
npm create vite@latest browser-quiz
cd browser-quiz
npm install @huggingface/transformers
2. Quiz Data Structure
Define questions with expected answers and hints. I chose simple, universally-known questions so anyone can test the semantic matching:
const quizData = [
{
question: "What is the capital city of France?",
answer: "Paris is the capital city of France, located in the northern part of the country along the Seine River.",
hint: "It's famous for the Eiffel Tower and is known as the City of Light."
},
{
question: "What do plants need to make their own food?",
answer: "Plants need sunlight, water, and carbon dioxide to perform photosynthesis and produce glucose for energy.",
hint: "Think about what plants get from the sun, soil, and air."
},
{
question: "Why is the sky blue?",
answer: "The sky appears blue because sunlight is scattered by the atmosphere, and blue light is scattered more than other colors due to its shorter wavelength.",
hint: "It has to do with how light interacts with air molecules."
},
// ... more questions
];
Pre-compute Answer Embeddings
For efficiency, we compute answer embeddings once at startup:
// This array will store the numerical "fingerprints" (embeddings)
// for all the correct answers in the quiz data.
let answerEmbeddings = [];
/**
* Initializes the quiz by pre-calculating the semantic embeddings
* for all official correct answers. This only needs to be done once
* when the quiz loads, which saves time during the grading process.
*/
async function initializeQuizEmbeddings() {
// A friendly message to the console so the user knows the AI model
// is currently working in the background.
console.log('Pre-computing quiz answer embeddings...');
// Loop through every question object contained in the 'quizData' array.
// The 'quizData' array should contain all your questions and correct answers.
for (const quiz of quizData) {
// Await is crucial here! We pause and wait for the AI model's 'embed' function
// to process the text (quiz.answer) and return its numerical vector/fingerprint.
const embedding = await embed(quiz.answer);
// Store the resulting numerical vector (the "fingerprint" of the correct meaning)
// into our main storage array.
answerEmbeddings.push(embedding);
}
// Final message to confirm that all correct answer vectors are ready for grading.
console.log('Quiz embeddings ready!');
}
Evaluate User Answers
When a user submits an answer, we compare it to the expected answer:
/**
* Main grading function. It takes the user's text and the question's index,
* and uses semantic comparison to return a detailed score and grade.
*
* @param {string} userAnswer - The text the user typed (e.g., "Paris is the capital").
* @param {number} questionIndex - The position of the current question in the quiz.
* @returns {object} An object containing the final grade, points, and similarity score.
*/
async function evaluateAnswer(userAnswer, questionIndex) {
// 1. Get embedding for user's answer
// We call the AI model's 'embed' function to turn the user's text
// into its unique numerical vector (fingerprint). We must wait for this step.
const userEmbedding = await embed(userAnswer);
// 2. Compare with expected answer
// Retrieve the pre-calculated, correct answer's fingerprint (vector)
// from the array we initialized earlier (answerEmbeddings).
const expectedEmbedding = answerEmbeddings[questionIndex];
// Run the cosineSimilarity function we defined to get a final score (0 to 1).
const similarity = cosineSimilarity(userEmbedding, expectedEmbedding);
// 3. Determine grade based on the score
// We map the continuous similarity score to discrete grades (the grading threshold).
// Note: These threshold numbers (0.85, 0.70, etc.) can be adjusted!
if (similarity >= 0.85) {
// Score is 85% or higher: meanings are highly similar.
return { label: 'Excellent!', points: 1.0, similarity };
} else if (similarity >= 0.70) {
// Score is 70% or higher: meanings are good, maybe slight variation.
return { label: 'Good!', points: 0.8, similarity };
} else if (similarity >= 0.55) {
// Score is 55% or higher: there is some relevant meaning, but it's partial.
return { label: 'Partial', points: 0.5, similarity };
} else {
// Score is below 55%: the meaning is too different from the correct answer.
return { label: 'Try Again', points: 0, similarity };
}
}
The Complete UI
Here’s how I structured the quiz interface:
<div class="quiz-section">
<h2>Interactive ML Quiz</h2>
<div class="quiz-header">
<span id="quiz-progress">Question 1 of 5</span>
<span id="quiz-score">Score: 0/0 (0%)</span>
</div>
<div class="quiz-question" id="quiz-question">
Loading quiz...
</div>
<textarea
id="quiz-answer"
placeholder="Type your answer here..."
disabled
></textarea>
<div class="quiz-buttons">
<button id="submit-answer-btn">Submit Answer</button>
<button id="show-hint-btn">Show Hint</button>
<button id="next-question-btn" style="display: none;">
Next Question
</button>
</div>
<div id="quiz-feedback"></div>
</div>
Real World Examples
Here’s how semantic matching performs on actual answers.
I chose simple questions so you can easily verify the results:
Question: “What is the capital city of France?”
| User Answer | Similarity | Grade |
|---|---|---|
| Paris is the capital of France | 92% | Excellent |
| The capital is Paris | 85% | Excellent |
| Paris | 78% | Good |
The model understands that different phrasings of the same concept should receive credit!
Performance Considerations
Model Loading
The all-MiniLM-L6-v2 model is ~25MB (quantized). First load downloads it, but subsequent loads use the browser cache:
const embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
dtype: 'q8', // 8-bit quantization for smaller size
progress_callback: (progress) => {
console.log(`Loading: ${Math.round(progress.progress)}%`);
}
});
Embedding Speed
On my machine, generating an embedding takes ~50-100ms per sentence. For a quiz with 5-10 questions, pre-computing embeddings adds 0.5-1 second to startup.
Memory Usage
The model uses ~100-150MB of memory. This is reasonable for modern devices but worth considering for mobile users.
Further Readings & Experiments
Interested in trying this in Python? Check out the transformers-python repository. It contains the Python version of these experiments, perfect for comparing client-side vs server-side implementations.