Python-tesseract：光学字符识别Tesseract OCR的Python封装包

jopen 13年前

Python-tesseract 是光学字符识别Tesseract OCR引擎的Python封装类。能够读取任何常规的图片文件(JPG, GIF ,PNG , TIFF等)并解码成可读的语言。在OCR处理期间不会创建任何临文件。
示例1：

import tesseract  api = tesseract.TessBaseAPI()  api.Init(".","eng",tesseract.OEM_DEFAULT)  api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")  api.SetPageSegMode(tesseract.PSM_AUTO)    mImgFile = "eurotext.jpg"  mBuffer=open(mImgFile,"rb").read()  result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)  print "result(ProcessPagesBuffer)=",result

示例2：

import cv2.cv as cv  import tesseract    api = tesseract.TessBaseAPI()  api.Init(".","eng",tesseract.OEM_DEFAULT)  api.SetPageSegMode(tesseract.PSM_AUTO)    image=cv.LoadImage("eurotext.jpg", cv.CV_LOAD_IMAGE_GRAYSCALE)  tesseract.SetCvImage(image,api)  text=api.GetUTF8Text()  conf=api.MeanTextConf()

项目主页：http://www.open-open.com/lib/view/home/1352354768500

Python-tesseract：光学字符识别Tesseract OCR的Python封装包

相关经验

目录