Object detection · Faster R-CNN + MobileNet V3

Detect and label objects in a photo

Upload an image and a neural network will draw bounding boxes around recognised objects with category labels and confidence scores. Useful for cataloguing photos, accessibility descriptions, or simply seeing what a model can spot.

What object detection is

Object detection is one of the foundational tasks of computer vision. Given an image, the model has to do two things at once: find the objects (where in the frame are they?) and recognise them (what is each one?). The output is a list of bounding boxes, each with a class label like person, car, dog, chair, and a confidence score between 0 and 1.

It is a different task from classification (which only labels the dominant subject) and from segmentation (which produces a per-pixel mask). Detection sits in the middle: looser than segmentation, more informative than classification.

The model we use

The detector is a Faster R-CNN with a MobileNet V3 backbone. To unpack that:

What categories it knows

The 80 COCO categories include people, animals (cat, dog, horse, elephant, bear, etc.), vehicles (car, bicycle, bus, train, plane, boat), kitchen items (cup, fork, knife, bowl, banana, apple, sandwich), furniture (chair, couch, bed, dining table), electronics (TV, laptop, mouse, keyboard, cell phone), sports equipment (frisbee, skateboard, surfboard, tennis racket), and a handful of others. It does not know about specific brands, individual people, character or cartoon objects, fine-grained species (it knows "bird," not "robin"), or anything outside the 80 trained classes.

What it gets right

Where it confuses itself

Practical uses

A note on accuracy

Object detection models report a confidence score with each detection. We currently filter to a sensible threshold and show all boxes above it; you may notice some labels you disagree with. That is the model being honest — it is showing you its best guess, even when its confidence is moderate. The score in the label is between 0 and 1; treat anything below 0.6 as maybe, anything above 0.9 as almost certainly.