-->
Home » , , , » OCRopus is a free document analysis and optical character recognition system released under the Apache License
Friday
13 November 2009

OCRopus is a free document analysis and optical character recognition system released under the Apache License

ocropus OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.

OCRopus is a free document analysis and optical character recognition system released under the Apache License, Version 2.0 with a very modular design through the use of plugins. These plugins allow OCRopus to swap out components easily.

OCRopus is currently developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and is sponsored by Google.

OCRopus is developed for Linux; however, users have reported success with OCRopus on Mac OS X and an application called TakOCR[1] has been developed that installs OCRopus on Mac OS X and provides a simple droplet interface.

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

Releases

The current release is ocropus-0.4.3; it is still an alpha release, so don't expect stability or high performance yet. We will not be providing new tar balls until the beta release. To obtain ocropus-0.4.3 and install it, please use something like the following commands:

mkdir ~/build
cd ~/build
hg clone https://iulib.googlecode.com/hg/ iulib
cd iulib
hg update -r ocropus-0.4.3
scons
sudo scons install
cd ~/build
hg clone https://ocropus.googlecode.com/hg/ ocropus
cd ocropus
hg update -r ocropus-0.4.3
scons
sudo scons install


That should work on Ubuntu 9.04 if you have all the necessary packages installed; if not, have a look at the DevInstall page or the Google Group Pages.


Resources


Related Projects
  • iulib Library (you need to install this)

  • hOCR Tools -- tools for manipulating OCR output

  • DECAPOD -- camera-based document capture and tagged PDF generation

  • PyOpenFST -- Python bindings for OpenFST (for language modeling)

Documentation

The following is the most important documentation:

If you want to contribute to the primary documentation, please check out hg clone https://wiki.ocropus.googlecode.com/hg and submit patches against the documentation. Additional links you may find useful are here:

If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog:

Related Post






Linux Links


0 commenti:

Post a Comment

Random Posts

  • Firefox Mobile (Fennec) 1.0 beta 1 released
    29.03.2009 - 0 Comments
    Mozilla released Fennec 1.0 beta 1 last week, which Stuart Parmenter announced on his weblog. “Fennec 1.0 beta 1 includes lots of great improvements, especially around performance. We’ve done heavy optimizations to our frontend code and made a…
  • Gnumeric is a free spreadsheet program that is part of the GNOME desktop.
    01.01.2009 - 0 Comments
    Gnumeric is a free spreadsheet program that is part of the GNOME desktop and has Windows installers available. It is intended to be a free replacement for proprietary spreadsheet programs such as Microsoft Excel, which it broadly and openly…
  • RawTherapee cross-platform raw image processing program: a color-managed Linux workflow (Chapter I).
    20.02.2015 - 1 Comments
    Partly out of curiosity, and partly to help widen the horizon of digital photography, I have worked through a color management workflow entirely in Linux, from raw photo to monitor to print.  It is actually a continuation of my page on zone…
  • A new version of AV Linux, a Debian-based distribution featuring a collection of audio and video applications, has been released.
    22.04.2018 - 0 Comments
    AV Linux is a Linux-based operating system aimed for multimedia content creators. Available for the i386 and x86-64 architectures with a kernel customised for maximum performance and low-latency audio production, it has been recommended as a…
  • Linux Powerful Distros For Hacking Or Security: Tails Security Through Anonymity.
    17.08.2017 - 1 Comments
    Tails protects you in a number of ways. First, since all your traffic is routed through Tor, it's incredibly difficult to track your physical location or see which sites you visit. Tails doesn't use a computer's hard disk, so nothing you do is…
  • How to Install Absolute Linux Distro Lightweight for Older PCs and High Performance.
    15.11.2017 - 0 Comments
    Installing Absolute. The Absolute installer is text-based. You can install via the basic steps used by Slackware (which can be difficult for newer users but very flexible for experts) or you can choose the "autoinstall" option. The main advantage…
  • The Openwall Project is a source for various software, including Openwall GNU/*/Linux (Owl), a security-enhanced operating system designed for servers
    17.12.2010 - 0 Comments
    Openwall GNU/*/Linux (or Owl for short) is a small security-enhanced Linux distribution for servers, appliances, and virtual appliances.Owl live CDs with remote SSH access are also good for recovering or installing systems (whether with Owl or…
  • GMPC is a GTK2 frontend for Music Player Daemon.
    04.09.2012 - 0 Comments
    GMPC (Gnome Music Player Client) is a GTK2 frontend for Music Player Daemon. It is released under the GNU General Public License and is free software.  It is designed to be lightweight and easy to use, while providing full access to all of…
  • BioBrew, Linux Distribution  for Life Scientists
    11.02.2008 - 0 Comments
    BioBrew is a collection of open-source applications for life scientists and an in-house project at Bioinformatics.Org. The BioBrew Roll for Rocks can be used to create Rocks/BioBrew Linux, a distribution customized for both cluster and…
  • WinSlack is a basic Linux install with KDE and Star Office
    17.07.2009 - 0 Comments
    WinSlack is a basic Linux install with KDE and Star Office. It requires no logon, and gives you a desktop environment similar to that other leading PC GUI, it also has supermount compiled into the kernel so that CD-ROM's and floppies are…

Recent Posts

Recent Posts Widget

Popular Posts

Labels

Archive

page counter follow us in feedly
 
Copyright © 2014 Linuxlandit & The Conqueror Penguin
-->