Apache UIMA Sandbox v2.2.2 Release Notes
Contents
1. What is UIMA?
2. What is the Apache UIMA annotator package?
3. Major Changes in this Release
4. How to Get Involved
5. How to Report Issues
6. List of JIRA Issues Fixed in this Release
Unstructured Information Management applications are
software systems that analyze large volumes of
unstructured information in order to discover knowledge
that is relevant to an end user. UIMA is a framework and
SDK for developing such applications. An example UIM
application might ingest plain text and identify
entities, such as persons, places, organizations; or
relations, such as works-for or located-at. UIMA enables
such an application to be decomposed into components,
for example "language identification" -> "language
specific segmentation" -> "sentence boundary
detection" -> "entity detection (person/place names
etc.)". Each component must implement interfaces defined
by the framework and must provide self-describing
metadata via XML descriptor files. The framework manages
these components and the data flow between them.
Components are written in Java or C++; the data that
flows between components is designed for efficient
mapping between these languages. UIMA additionally
provides capabilities to wrap components as network
services, and can scale to very large volumes by
replicating processing pipelines over a cluster of
networked nodes.
Apache UIMA is an Apache-licensed open source
implementation of the UIMA specification (that
specification is, in turn, being developed concurrently
by a technical committee within
OASIS
, a standards organization). We invite and encourage you
to participate in both the implementation and
specification efforts.
UIMA is a component framework for analysing unstructured
content such as text, audio and video. It comprises an
SDK and tooling for composing and running analytic
components written in Java and C++, with some support
for Perl, Python and TCL.
The Apache UIMA annotator package is an add-on package for the base UIMA release.
The add-on package contains annotator components developed for Apache UIMA. The
add-on package fits the Apache UIMA directory structure and adds a directory
called "addons/annotator" that contains the following annotator components:
- DictionaryAnnotator
- RegularExpressionAnnotator
- Tagger
- WhitespaceTokenizer
The Apache UIMA annotator package release version 2.2.2 is the first release
of this package. The package contains the following components:
- DictionaryAnnotator
- RegularExpressionAnnotator
- Tagger
- WhitespaceTokenizer
- SimpleServer
- PearPackagingAntTask
- PearPackagingMavenPlugin
For a list of all JIRA issues fixed with the current Sandbox release,
please refer to chapter 6. List of JIRA Issues Fixed in this Release.
The Apache UIMA project really needs and appreciates any contributions,
including documentation help, source code and feedback. If you are interested
in contributing, please visit
http://incubator.apache.org/uima/get-involved.html.
The Apache UIMA project uses JIRA for issue tracking. Please report any
issues you find at
http://issues.apache.org/jira/browse/uima
Release Notes - UIMA - Version 2.2.2
Bug
- [UIMA-444] - Sandbox projects build only when in same directory as uimaj projects
- [UIMA-588] - fix RegularExpressionAnnotator tests - add type priorities to get the same results for all JVMs
- [UIMA-612] - add License and Notice files
- [UIMA-613] - remove compiler warnings after moving to Java 1.5
- [UIMA-614] - remove compiler warnings after moving to Java 1.5
- [UIMA-617] - change POM to work with Java 1.5
- [UIMA-620] - switch concept file parsing from File to InputStream
- [UIMA-621] - change the way to add the compiled sources to the PEAR package
- [UIMA-625] - update DictionaryAnnotator message catalog
- [UIMA-646] - remove classpath as required argument for the PEAR packaging plugin
- [UIMA-653] - allow feature normalization also on non-String based features
- [UIMA-725] - case sensitive dictionaries do not work correctly
- [UIMA-757] - Tagger throws ClassCastException
- [UIMA-760] - add regex annotator performance test
- [UIMA-762] - rename xmltypes.jar
- [UIMA-765] - fix email address regex - escape "-" in regular expression
- [UIMA-768] - rename xmltypes.jar
- [UIMA-773] - Some files missing license headers
- [UIMA-775] - fix Findbugs issues
- [UIMA-776] - fix Findbugs issues
- [UIMA-778] - fix Findbugs issues
- [UIMA-795] - dictionaries created by DictionaryCreator cannot be used
- [UIMA-803] - change whitespace character definition
- [UIMA-804] - change default multi token separator from \t to |
- [UIMA-808] - SimpleServerServlet throws NullPointerExcpetion if no parameter was specified in doGet or doPost
- [UIMA-812] - Dictionary annotator does not work with several dictionaries in a single descriptor
- [UIMA-819] - rename all shipment jars start with uima-
- [UIMA-820] - fix classpath entry "null;" if no classpath was specified
- [UIMA-827] - fix NPE for interger based feature values that are null
- [UIMA-834] - replace special characters with XML entities when generating dictionaries
- [UIMA-855] - java.lang.ArrayIndexOutOfBoundsException in Tagger
- [UIMA-864] - update version from 2.2.2-incubating-SNAPSHOT to 2.2.2-incubating
- [UIMA-882] - rename Tagger XML descriptor and add license header
- [UIMA-883] - Missing license headers in JCas files and some XML files of simple server
- [UIMA-887] - minor update for money detection regular expression
- [UIMA-909] - Model files are contained twice in pear file
- [UIMA-917] - Simple Server test annotator misses last character in text
- [UIMA-940] - change way of deleting files in the PearPackagingMavenPlugin
- [UIMA-942] - Regex performance test doesn't run on Linux
- [UIMA-943] - DictionaryAnnotator tests doesn't run on Linux
- [UIMA-944] - simple server notice file contains redundant uima reference
- [UIMA-945] - move LICENSE and NOTICE files to toplevel dir for annotator package
- [UIMA-947] - Documentation: resources path in web.xml incorrect
- [UIMA-953] - "\" in regex variables are not escaped
- [UIMA-970] - update annotator package release files
- [UIMA-971] - minor documentation updates for PearPackagingMavenPlugin
- [UIMA-972] - fix jar file name in pear ant taks documentation
- [UIMA-973] - annotator package jars does not have correct Manifest information
Improvement
- [UIMA-350] - add performance test for WhitespaceTokenizer
- [UIMA-550] - Sandbox components: use UIMA artifacts from the repository
- [UIMA-577] - split up the Sandbox documentation build
- [UIMA-590] - change the way the RegularExpressionAnnotator load the configuration files
- [UIMA-592] - add feature value normalization for RegEx Annotator
- [UIMA-594] - update RegexAnnotator with custom anntoation validator
- [UIMA-602] - add PEAR packaging task for RegexAnnotaor
- [UIMA-610] - minor documentation updates - added some real world examples
- [UIMA-615] - update DictionaryBuilder tests to work with XML dictionary formats
- [UIMA-618] - add documentation infrastructure for the DictionaryAnnotator
- [UIMA-631] - Switch dictionary file parsing from File input to InputStream
- [UIMA-634] - improve DictionaryAnnotator exception handling
- [UIMA-635] - add documentation for the PearPackagingMavenPlugin
- [UIMA-637] - add multi-word separator configuration for the DictionaryAnnotator
- [UIMA-644] - update RegexAnnotator tests after test coverage analysis
- [UIMA-647] - add DictionaryAnnotator tests
- [UIMA-666] - update feature normalization interface - add additional information
- [UIMA-691] - add DictionaryCreator command line
- [UIMA-696] - build documentation automatically during the component build
- [UIMA-717] - minor performance improvements
- [UIMA-719] - Current Version of the HMM Tagger
- [UIMA-728] - add money amount detection for regex annotator - use match group names
- [UIMA-753] - Some improvements in the algorithm, structural changes as well as docbook update
- [UIMA-758] - Make the tagger runtime read its properties from the descriptor, not a properties file
- [UIMA-763] - Automatically build PEAR file for Tagger
- [UIMA-779] - Some modifications in the tagger code (esp. in the implementation of the SuffixTree.EDGE class)
- [UIMA-791] - Patch containing some improvements
- [UIMA-806] - use Java NumberFormat to convert string numbers to float or integer
- [UIMA-840] - change uima-as version to 2.2.2-incubating-SNAPSHOT (match uimaj version), add script to update version
- [UIMA-877] - Reverse multiple copyright statements in docbooks, per request at previous release vote
- [UIMA-918] - Fix version number in sandbox docs
New Feature
- [UIMA-95] - add sandbox infrastructure
- [UIMA-151] - Add project for uima whitespace tokenizer implementation
- [UIMA-384] - create a pear packaging ant task
- [UIMA-539] - implement UIMA RegularExpressionAnnotator
- [UIMA-555] - add documentation for the RegularExpressionAnnotator
- [UIMA-595] - add Rule to the RegexAnnotator to detect credit card numbers
- [UIMA-600] - add new DictionaryAnnotator implementation
- [UIMA-601] - initial import of the PEAR packaging maven plugin
- [UIMA-603] - update Sandbox documentation build
- [UIMA-604] - Create HMM POS project in the sandbox
- [UIMA-605] - UIMA Sandbox tagger initial code drop
- [UIMA-642] - allow RegularExpressionAnnotator to match on featurePath values
- [UIMA-645] - minor code updates for WhitespaceTokenizer
- [UIMA-651] - add regex variables to the concept file syntax
- [UIMA-669] - update WhitespaceTokenizer to be sofa aware
- [UIMA-685] - Create documentation for SimpleServer
- [UIMA-692] - allow DictionaryAnnotator to match on featurePath values
- [UIMA-695] - allow DictionaryAnnotator to filter the inputMatch annotations
- [UIMA-697] - add DictionaryAnnotator documentation
- [UIMA-724] - allow match group names for regular expressions
- [UIMA-770] - add PEAR build to WhitespaceTokenizer POM
- [UIMA-771] - call documentation build from POM
- [UIMA-772] - add new Sandbox-dist project that contains the Sandbox build
- [UIMA-884] - Add default output abilities to simple server
- [UIMA-907] - Add SimpleServer to sandbox distribution
Task
- [UIMA-682] - update Sandbox components to work on the new uimaj-2.2.1-incubating release