Overview
There are only three steps. I tested these steps using the playground app.
Sample Code
Prepare the receipt images.
I found a bunch of receipt images on Github. Almost 200 receipts are there.
You can download the receipt images below
Let’s coding.
A few lines of code are needed. I used the playground app. Let’s follow the steps.
Create a playground file in Xcode(File -> New -> Playground). Choose the iOS or macOS platform.
Drag receipt images into the resource folder. And then click the source file and add a new Helper.swift file.
Helper.swift
I add the functions to load receipt image files and draw the bounds which are detected text area.
Helper.swift file for macOS
import Cocoa
import Vision
public func getTestReceiptImageName(_ number: Int) -> String {
String.init(format: "%d-receipt", number)
}
//https://www.swiftbysundell.com/tips/making-uiimage-macos-compatible/
public extension NSImage {
var cgImage: CGImage? {
var proposedRect = CGRect(origin: .zero, size: size)
return cgImage(forProposedRect: &proposedRect,
context: nil,
hints: nil)
}
}
//https://www.udemy.com/course/machine-learning-with-core-ml-2-and-swift
public func visualization(
_ image: NSImage,
observations: [VNDetectedObjectObservation],
boundingBoxColor: NSColor
) -> NSImage {
var transform = CGAffineTransform.identity
transform = transform.scaledBy(x: image.size.width, y: image.size.height)
image.lockFocus()
let context = NSGraphicsContext.current?.cgContext
context?.saveGState()
context?.setLineWidth(2)
context?.setLineJoin(CGLineJoin.round)
context?.setStrokeColor(.black)
context?.setFillColor(boundingBoxColor.cgColor)
observations.forEach { observation in
let bounds = observation.boundingBox.applying(transform)
context?.addRect(bounds)
}
context?.drawPath(using: CGPathDrawingMode.fillStroke)
context?.restoreGState()
image.unlockFocus()
return image
}
Helper.swift file for iOS
import Foundation
import UIKit
import Vision
public func getTestReceiptImageName(_ number: Int) -> String {
String.init(format: "%d-receipt.jpg", number)
}
//https://www.udemy.com/course/machine-learning-with-core-ml-2-and-swift
public func visualization(_ image: UIImage, observations: [VNDetectedObjectObservation]) -> UIImage {
var transform = CGAffineTransform.identity
.scaledBy(x: 1, y: -1)
.translatedBy(x: 1, y: -image.size.height)
transform = transform.scaledBy(x: image.size.width, y: image.size.height)
UIGraphicsBeginImageContextWithOptions(image.size, true, 0.0)
let context = UIGraphicsGetCurrentContext()
image.draw(in: CGRect(origin: .zero, size: image.size))
context?.saveGState()
context?.setLineWidth(2)
context?.setLineJoin(CGLineJoin.round)
context?.setStrokeColor(UIColor.black.cgColor)
context?.setFillColor(red: 0, green: 1, blue: 0, alpha: 0.3)
observations.forEach { observation in
let bounds = observation.boundingBox.applying(transform)
context?.addRect(bounds)
}
context?.drawPath(using: CGPathDrawingMode.fillStroke)
context?.restoreGState()
let resultImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return resultImage!
}
Vision Framework
A Vision framework enables me to detect text in the receipt image. Also, I can get strings after processing the OCR.
Step 1. Load a receipt image from Resource folder
//Step 1. Load a receipt image from Resource folder
import Vision
import Cocoa
let image = NSImage(imageLiteralResourceName: getTestReceiptImageName(1000))
Step 2. Declare a VNRecognizeTextRequest to detect text in the receipt image
let recognizeTextRequest = VNRecognizeTextRequest { (request, error) in
guard let observations = request.results as? [VNRecognizedTextObservation] else {
print("Error: \(error! as NSError)")
return
}
for currentObservation in observations {
let topCandidate = currentObservation.topCandidates(1)
if let recognizedText = topCandidate.first {
//OCR Results
print(recognizedText.string)
}
}
let fillColor: NSColor = NSColor.green.withAlphaComponent(0.3)
let result = visualization(image, observations: observations, boundingBoxColor: fillColor)
}
recognizeTextRequest.recognitionLevel = .accurate
Step 3. Processing the receipt image using VNImageRequestHandler.
func request(_ image: NSImage) {
guard let cgImage = image.cgImage else {
return
}
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
DispatchQueue.global(qos: .userInitiated).async {
do {
try handler.perform([recognizeTextRequest])
}
catch let error as NSError {
print("Failed: \(error)")
}
}
}
request(image)
Step 4. Check the result
To open the result image, click the QuickLook.
Can Vision Framework detect other languages?
Yes, But It supports a few languages. Take a look at the supported languages.
let supportedLanguages = try VNRecognizeTextRequest.supportedRecognitionLanguages(for: .accurate, revision: VNRecognizeTextRequestRevision2)
print("\(supportedLanguages.count) Languages are available. -> \(supportedLanguages)")
//8 Languages are available. -> ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant"]
NaturalLanguage framework
You can detect the lexical class of string using the NaturalLanguage framework. I modified step2 codes to print out the tag of string.
import NaturalLanguage
let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass])
let recognizeTextRequest = VNRecognizeTextRequest { (request, error) in
guard let observations = request.results as? [VNRecognizedTextObservation] else {
print("Error: \(error! as NSError)")
return
}
let ocrResults = observations.compactMap { $0.topCandidates(1).first?.string }.joined(separator: "\n")
tagger.string = ocrResults
tagger.enumerateTags(in: ocrResults.startIndex..<ocrResults.endIndex, unit: NLTokenUnit.word, scheme: NLTagScheme.nameTypeOrLexicalClass, options: [.omitPunctuation, .omitWhitespace]) { tag, range in
print("Tag: \(tag?.rawValue ?? "unknown") -> \(ocrResults[range])")
return true
}
let fillColor: NSColor = NSColor.green.withAlphaComponent(0.3)
let result = visualization(image, observations: observations, boundingBoxColor: fillColor)
}
recognizeTextRequest.recognitionLevel = .accurate
It gives me more information about Noun, Number, PlaceName, PersonalName, and more.
Conclusion
I’m very impressed with the Vision framework. I can implement the OCR features in just a few lines of code. Also, It works on the device. (We don’t need to connect to the internet)
I hope Apple supports more languages like Korean, Japanese, Thailand, and more.






