paperwork - 使用扫描仪和OCR转化纸质文件的简单方法
jopen
9年前
Paperwork
Description
Paperwork is a personal document manager for scanned documents (and PDFs).
It's designed to be easy and fast to use. The idea behind Paperwork is "scan & forget": You should be able to just scan a new document and forget about it until the day you need it again.
In other words, let the machine do most of the work for you.
Screenshots
Main Window
Search Suggestions
Labels
Settings window
Main features
- Scan
- Automatic detection of page orientation
- OCR
- Document labels
- Automatic guessing of the labels to apply on new documents
- Search
- Keyword suggestions
- Quick edit of scans
- PDF support
Installation
Contact/Help
Details
Papers are organized into documents. Each document contains pages.
It mainly uses:
- Sane/Pyinsane: To scan the pages
- Tesseract/Pyocr: To extract the words from the pages (OCR)
- GTK: For the user interface
- Whoosh: To index and search documents, and provide keyword suggestions
- Simplebayes: To guess the labels
- Pillow: Image manipulation
Licence
GPLv3 or later. See COPYING.
Archives
Github can automatically provides .tar.gz and .zip files if required. However, they are not required to install Paperwork. They are indicated here as a convenience for package maintainers.
- Paperwork 0.3.0.1
- Paperwork 0.3.0
- Paperwork 0.2.5
- Paperwork 0.2.4
- Paperwork 0.2.3
- Paperwork 0.2.2
- Paperwork 0.2.1
- Paperwork 0.2
- Paperwork 0.1.3
- Paperwork 0.1.2
- Paperwork 0.1.1
- Paperwork 0.1
Development
All the information can be found on the wiki