How colorization actually works
A grayscale image carries one piece of information per pixel: brightness. To turn it back into a colour image, the model has to invent two more channels (the a and b in the Lab colour space) for every single pixel. There is no general way to recover the original colours; they are simply not in the file. What the model can do is make educated guesses: things shaped like sky tend to be blue, things shaped like grass tend to be green, things shaped like skin tend to be in a narrow range of warm pinks and browns.
The specific model we run is the Colorful Image Colorization network published by Zhang, Isola, and Efros at ECCV 2016. It is a deep convolutional network trained on roughly a million colour photographs from ImageNet. During training, each image is converted to grayscale, the model is asked to predict the colour, and the loss function compares the prediction to the original. After enough iterations, the network internalises a probability distribution over plausible colours for every patch of the image, conditioned on the surrounding content.
Why colour predictions can never be "correct"
Information theory makes this clear. A grayscale photo of a red sweater and a grayscale photo of a green sweater of the same brightness are identical at the pixel level. No model, no matter how good, can tell them apart. The best the model can do is guess the more likely option based on the rest of the image — and if the rest of the image is also grayscale, the prior is doing most of the work.
This is why colorized historical photos can look slightly off in specific ways. Military uniforms come back in greens or browns when they should be field-grey. The colorizer guesses based on what most uniforms in the training data look like, not what that particular regiment wore. Similarly, period-specific clothing dyes, custom car paint colours, and unusual food preparations can be miscoloured.
What the model does well
- Outdoor scenes. Sky, grass, trees, water, sand — all of these have strong colour priors. Landscapes and group photos taken outdoors generally come back looking very natural.
- Skin tones. Faces are extremely well-represented in the training data, so the model is confident about pinks, browns, and the warm range of skin in different lighting.
- Common objects. Wood, brick, food, plants, fur — anything the model has seen many examples of in colour will get a believable prediction.
- Daytime lighting. The model was trained predominantly on natural daylight; daylight scenes look most coherent.
What it gets wrong
- Saturated, unusual, or branded colours. Bright reds, electric blues, neon greens — these were under-represented in training and the model often chooses a duller alternative.
- Indoor and night scenes. Mixed colour-temperature lighting confuses the network. Tungsten interiors can come back oddly cool; mixed daylight–fluorescent rooms can look greenish.
- Hair colour on adults. The model tends toward brown for ambiguous tones, even when the subject is clearly blonde or red-haired.
- Vehicles and uniforms. See above — these often have specific historical colours the model cannot recover.
Tips for old family photos
- Scan at high resolution first. A 600 DPI scan with the original brightness range intact gives the model more to work with than a low-res JPG copy.
- Repair scratches and damage before colorizing. Use the Photo Healer first if the photo has visible damage; the colorizer can interpret a scratch as an edge and produce strange colour artefacts along it.
- Crop tightly. The 512px downscale we apply works against you on a wide group photo. Crop to one or two faces at a time for portrait work, then re-assemble in an editor if needed.
- Treat the result as a starting point. If the colorized image is mostly right but has one or two areas that look wrong, open it in an image editor and adjust those regions by hand. The model gives you a draft, not a final.
A note on historical accuracy
If you are colorizing photographs for documentary, archival, or genealogical purposes, please make it clear in any caption that the colour is generated, not original. The same applies to public-domain historical material: a colorized photograph of a 1920s street scene is, strictly speaking, a 21st-century artwork derived from a historical document, and ethics around presentation matter.