Text detection in screen images with a Convolutional Neural Network
Published at
The Journal of Open Source Software
2017
Use deep learning to find text in screen images. Learned from millions of papers.
Abstract
The repository contains a set of scripts to implement text detection from screen
images. The idea is that we use a Convolutional Neural Network (CNN) to predict
a heatmap of the probability of text in an image. The network outputs a heatmap
for text with 64 × 64 pixels and is implemented in Darknet. To train the
network, we use a set of pairs of images and training labels. We obtain the
training data by extracting figures with embedded text from research papers in
PDF form and generated pixel masks from them. With the code, we also provide a
dataset of around 500K labeled images extracted from 1M papers from arXiv and
the ACL anthology.