从文档(office,pdf,hwp)抽取文本的Java类库:JSearch
jopen
9年前
从文档(office,pdf,hwp)抽取文本的Java类库:JSearch。
Download & Installation
JSearch.jar
Just import JSearch.jar to your project
Requirement
- It should work with various types of document. ex) hwp, pdf, office
- It should support extract string and rapidly find keyword from doucments.
- It will be jar library.
- All functions are synchronous.
- a result of extraction contains full string.
- a result of finding contains word count.
Class
public class JSearch
JSearch supports various types of documents with open source engines.
And this library contains 3 types of functions. extract...() and isContainsKeyword...() and getFileList...()
HWP, DOC, PPT, EXCEL, TEXT, PDF and UNKNOWN are supported.
Modifier and Type | Method and Description |
---|---|
static java.lang.String | extractContentsFromFile(java.io.File target) extract string |
static java.lang.String | extractContentsFromFile(java.lang.String filePath) extract string |
static java.util.List | getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword) get a list of files which are containing keyword. |
static java.util.List | getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword, boolean recursive) get a list of files which are containing keyword. |
static boolean | isContainsKeywordFromFile(java.io.File file, java.lang.String keyword) get true or false about containing keyword. |
static boolean | isContainsKeywordFromFile(java.lang.String filePath, java.lang.String keyword) get true or false about containing keyword. |