Friday, 25 March 2016

What are different Java APIs to access different file formats ?


MS Access
JDBC/ODBC bridge 
JDBC driver for ODBC databases, comes as part of the JDK; on Linux, you'll have to get ODBC up and running first: http://www.unixodbc.org/

Jackcess 
Library to read and write MDB files

HXTT Access 
Commercial pure Java JDBC driver for MS Access


CGM
cgmva
An applet to display CGM files; comes with source code


CHM
JChm
Library to read CHM files


MS Excel
Apache POI
Library to read and write XLS files

Ostermiller Utils, CSVObjects, CSVBeans, opencsv, Java CSV
Libraries to read and write CSV files. CSV is not as easy to read and write as it first looks - once all the special cases are considered, one might as well use a library.

JExcelAPI
Library to read and write XLS files.

jXLS
Library for writing XLS files based on templates

Java2Excel  
Library for creating Excel files based on Collections. It is possible to use JDBC to read Excel files

Obba  
works with Excel spreadsheets on Windows

OpenXLS 
OpenXLS is the open source version of ExtenXLS - a Java spreadsheet SDK that allows you to read, modify and create Java Excel spreadsheets from your Java applications.


Gedcom
GDBI, GenJ


HDF (Hierarchical Data Format)
Java products by the HDF Group


Images
ImageJ 
Java image processing application and library that has plugins for lots of image file formats

JIMI 
Library to read and write BMP, CUR, GIF, ICO, JPEG, PICT, PNG, PSD, Sun Raster, TGA, TIFF, XBM and XPM. There's a plugin for using JIMI with ImageJ, which also includes a couple of JIMI patches.
GIF write, and TIFF, RAW, PNM and JPEG2000 read/write support for ImageIO


INI
ini4j "is a simple Java API for handling configuration files in Windows .ini format."


Matlab
JMatIO - Matlab's MAT-file I/O in JAVA


OpenDocument (ODF)
basic Java code for reading ODF files is here

ODFDOM 
A Java library for accessing ODF files.

jDocument.org 
has an open-source library for accessing all Open Document file types.

Obba  
works with OpenOffice? spreadsheets

Office2FO 
converts ODF documents to XSL-FO documents, making possible further transformations (like conversion to PDF using FOP)


Office Open XML
These are the new XML-based Microsoft Office formats.
OpenXML4J

ocx4j 
create and edit docx documents using a JAXB content model matching the WordML schema

Apache POI  
implements these formats.


OpenOffice Java API
OpenOffice can read a number of file formats, and makes them accessible through its API. A starting point might be this article, this article and of course the OO developer site
Some introductory information about the OO file format can be found here and here

oooview 
An OO Viewer written in Java.

JODConverter
A Java library that uses the OO Java API to perform document conversions between any formats supported by OO


Outlook
The Apache POI project developed some code that can read the texual contents of Outlook's MSG files. This page talks about that.

Xena  
converts multiple file formats -including MSG- to XML. Either the result of that conversion, or Xena's source code, may be helpful.

JPST  
can read and extract PST files.


PDF
PDF is a hard to read format. 
The best one can do is try to extract the text contained in a PDF file.

iText
Library to create PDFs; see ItextExample for a code example. The older version iText 2 (which uses a more permissive license) is also available: jar file, javadocs

FOP  
Libray to create PDFs (and other formats) from XML by using XSL-FO transformations

FlyingSaucer
Library to convert CSS-styled XHTML to PDF

PDFBox
Library that can merge, split and print PDFs, extract text, create images from PDFs, encrypt/decrypt PDFs, fill in PDF forms and more

PDF Clown  
General-purpose library to read/create/modify PDF files. 
It features a rich multi-layered object model that allows access even to each single content stream instruction.

JPedal
Library for viewing and printing PDFs, can also extract text (how to print PDFs); commercial (the LGPL version provides PDF viewing only)

PDFTextStream
Commercial library to extract text from PDFs

Adobe AcrobatViewer for JavaBean
Freeware, library to display and print PDFs
This library hasn't been updated in a long time and has problems displaying files that were created with recent PDF versions.
Don't use this for anything new.

PDF Renderer  
A more up-to-date PDF viewer that renders using Java2D. Download, Examples, Printing PDFs

ICEPdf  
Another library that can render PDFs.

Qoppa  
offers numerous libraries for PDF-related tasks

Aspose.Pdf  
A commercial library for reading and writing PDFs


MS PowerPoint
The Apache POI project developed some code that can open and (to a limited extent) edit PPT files. This page talks about it.


Project
The MPXJ library can work with several Project file formats.


PST
LibPST 
C library that could be used through JNI.

Xena  
can convert multiple file formats -including PST- to XML. 
Either the result of that conversion, or Xena's source code, may be helpful.

java-libpst  
A pure Java library that can access 64bit PST files.


QIF (used by Microsoft Money and Quicken)
Buddi and Eurobudget are Java applications that can import and export QIF files (and thus contain code you may be able to use in your application). Both are licensed under the GPL.


RTF
jRTF can create RTFs

iText 2 can create RTFs: jar file, javadocs

JavaCC
A lexer/parser for which an RTF grammar is available. From that an RTF reader can be constructed.


MS Visio
The Apache POI project developed some code that can read Visio files.


MS Word
POI  
Library to read and write DOC and DOCX files. It can also be used for extracting the text of a document.

WordApi.exe  
Native Windows component with a Java interface, which lets you create Word documents, and alter word templates. Some impressions about it can be found here.


Something else ?
If you encounter an obscure format for which no library is available, it may be feasible to create a reader for it if you have a file format description (which may be available on Wotsit). 
Several libraries, so-called lexers and parsers, are available that help in creating a reader, especially if the file format is ASCII, and not binary. 
You will need knowledge of regular expressions, though. 
Some file formats that have been tackled using this approach include RTF, CSV, HPGL and PBM/PGM/PPM. Lexers are easier to start with, but parsers can do more of the work for you. 

All these have ready-to-use examples on their web sites.
Lexers : JFlex (introductory article in the JavaRanch Journal)
Parsers Antlr, SableCC, JavaCC

No comments:

Post a Comment

Note: only a member of this blog may post a comment.