Overview

overview.png

There are only three steps. I tested these steps using the playground app.

 

Sample Code

Download

 

Prepare the receipt images.

I found a bunch of receipt images on Github. Almost 200 receipts are there.

You can download the receipt images below

 

Let’s coding.

A few lines of code are needed. I used the playground app. Let’s follow the steps.

Create a playground file in Xcode(File -> New -> Playground). Choose the iOS or macOS platform.

Create a playground file in Xcode(File -> New -> Playground). Choose the iOS or macOS platform.

Drag receipt images into the resource folder. And then click the source file and add a new Helper.swift file.

Drag receipt images into the resource folder. And then click the source file and add a new Helper.swift file.

 

Helper.swift

I add the functions to load receipt image files and draw the bounds which are detected text area.

Helper.swift file for macOS

import Cocoa
import Vision

public func getTestReceiptImageName(_ number: Int) -> String {
    String.init(format: "%d-receipt", number)
}

//https://www.swiftbysundell.com/tips/making-uiimage-macos-compatible/
public extension NSImage {
    var cgImage: CGImage? {
        var proposedRect = CGRect(origin: .zero, size: size)
        return cgImage(forProposedRect: &proposedRect,
                       context: nil,
                       hints: nil)
    }
}

//https://www.udemy.com/course/machine-learning-with-core-ml-2-and-swift
public func visualization(
    _ image: NSImage,
    observations: [VNDetectedObjectObservation],
    boundingBoxColor: NSColor
) -> NSImage {
    var transform = CGAffineTransform.identity
    transform = transform.scaledBy(x: image.size.width, y: image.size.height)

    image.lockFocus()
    let context = NSGraphicsContext.current?.cgContext
    context?.saveGState()

    context?.setLineWidth(2)
    context?.setLineJoin(CGLineJoin.round)
    context?.setStrokeColor(.black)
    context?.setFillColor(boundingBoxColor.cgColor)

    observations.forEach { observation in
        let bounds = observation.boundingBox.applying(transform)
        context?.addRect(bounds)
    }

    context?.drawPath(using: CGPathDrawingMode.fillStroke)
    context?.restoreGState()
    image.unlockFocus()
    return image
}

Helper.swift file for iOS

import Foundation
import UIKit
import Vision

public func getTestReceiptImageName(_ number: Int) -> String {
    String.init(format: "%d-receipt.jpg", number)
}

//https://www.udemy.com/course/machine-learning-with-core-ml-2-and-swift
public func visualization(_ image: UIImage, observations: [VNDetectedObjectObservation]) -> UIImage {
    var transform = CGAffineTransform.identity
        .scaledBy(x: 1, y: -1)
        .translatedBy(x: 1, y: -image.size.height)
    transform = transform.scaledBy(x: image.size.width, y: image.size.height)

    UIGraphicsBeginImageContextWithOptions(image.size, true, 0.0)
    let context = UIGraphicsGetCurrentContext()

    image.draw(in: CGRect(origin: .zero, size: image.size))
    context?.saveGState()

    context?.setLineWidth(2)
    context?.setLineJoin(CGLineJoin.round)
    context?.setStrokeColor(UIColor.black.cgColor)
    context?.setFillColor(red: 0, green: 1, blue: 0, alpha: 0.3)

    observations.forEach { observation in
        let bounds = observation.boundingBox.applying(transform)
        context?.addRect(bounds)
    }

    context?.drawPath(using: CGPathDrawingMode.fillStroke)
    context?.restoreGState()
    let resultImage = UIGraphicsGetImageFromCurrentImageContext()
    UIGraphicsEndImageContext()
    return resultImage!
}

 

Vision Framework

A Vision framework enables me to detect text in the receipt image. Also, I can get strings after processing the OCR.

Step 1. Load a receipt image from Resource folder

//Step 1. Load a receipt image from Resource folder
import Vision
import Cocoa

let image = NSImage(imageLiteralResourceName: getTestReceiptImageName(1000))

 

Step 2. Declare a VNRecognizeTextRequest to detect text in the receipt image

let recognizeTextRequest = VNRecognizeTextRequest  { (request, error) in
    guard let observations = request.results as? [VNRecognizedTextObservation] else {
        print("Error: \(error! as NSError)")
        return
    }
    for currentObservation in observations {
        let topCandidate = currentObservation.topCandidates(1)
        if let recognizedText = topCandidate.first {
            //OCR Results
            print(recognizedText.string)
        }
    }
    let fillColor: NSColor = NSColor.green.withAlphaComponent(0.3)
    let result = visualization(image, observations: observations, boundingBoxColor: fillColor)
}
recognizeTextRequest.recognitionLevel = .accurate

 

Step 3. Processing the receipt image using VNImageRequestHandler.

func request(_ image: NSImage) {
    guard let cgImage = image.cgImage else {
        return
    }
    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    DispatchQueue.global(qos: .userInitiated).async {
        do {
            try handler.perform([recognizeTextRequest])
        }
        catch let error as NSError {
            print("Failed: \(error)")
        }
    }
}
request(image)

 

Step 4. Check the result

To open the result image, click the QuickLook.

To open the result image, click the QuickLook.

Results.png

 

Can Vision Framework detect other languages?

Yes, But It supports a few languages. Take a look at the supported languages.

let supportedLanguages = try VNRecognizeTextRequest.supportedRecognitionLanguages(for: .accurate, revision: VNRecognizeTextRequestRevision2)
print("\(supportedLanguages.count) Languages are available. -> \(supportedLanguages)")

//8 Languages are available. -> ["en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant"]

 

NaturalLanguage framework

You can detect the lexical class of string using the NaturalLanguage framework. I modified step2 codes to print out the tag of string.

import NaturalLanguage
let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass])
let recognizeTextRequest = VNRecognizeTextRequest  { (request, error) in
    guard let observations = request.results as? [VNRecognizedTextObservation] else {
        print("Error: \(error! as NSError)")
        return
    }
    let ocrResults = observations.compactMap { $0.topCandidates(1).first?.string }.joined(separator: "\n")
    tagger.string = ocrResults
    tagger.enumerateTags(in: ocrResults.startIndex..<ocrResults.endIndex, unit: NLTokenUnit.word, scheme: NLTagScheme.nameTypeOrLexicalClass, options: [.omitPunctuation, .omitWhitespace]) { tag, range in
        print("Tag: \(tag?.rawValue ?? "unknown") -> \(ocrResults[range])")
        return true
    }

    let fillColor: NSColor = NSColor.green.withAlphaComponent(0.3)
    let result = visualization(image, observations: observations, boundingBoxColor: fillColor)
}
recognizeTextRequest.recognitionLevel = .accurate

It gives me more information about Noun, Number, PlaceName, PersonalName, and more.

It gives me more information about Noun, Number, PlaceName, PersonalName, and more.

 

Conclusion

I’m very impressed with the Vision framework. I can implement the OCR features in just a few lines of code. Also, It works on the device. (We don’t need to connect to the internet)

I hope Apple supports more languages like Korean, Japanese, Thailand, and more.

Quote of the week

"People ask me what I do in the winter when there's no baseball. I'll tell you what I do. I stare out the window and wait for spring."

~ Rogers Hornsby