STAM Library for Python - API Documentation
STAM is a data model for stand-off text annotation and described in detail here. This is a python library (to be more specific; a python binding written in Rust) to work with the model.
What can you do with this library?
Keep, build and manipulate an efficient in-memory store of texts and annotations on texts
- Search in annotations, data and text:
Search annotations by data, textual content, relations between text fragments (overlap, embedding, adjacency, etc),
Search in text (incl. via regular expressions) and find annotations targeting found text selections.
Search in data (set,key,value) and find annotations that use the data.
Elementary text operations with regard for text offsets (splitting text on a delimiter, stripping text).
Convert between different kind of offsets (absolute, relative to other structures, UTF-8 bytes vs unicode codepoints, etc)
- Read and write resources and annotations from/to STAM JSON, STAM CSV, or an optimised binary (CBOR) representation
The underlying STAM modelaims to be clear and simple. It is flexible and does not commit to any vocabulary or annotation paradigm other than stand-off annotation.
This STAM library is intended as a foundation upon which further applications can be built that deal with stand-off annotations on text. We implement all the low-level logic in dealing this so you no longer have to and can focus on your actual application.
This library offers a higher-level interface than the underlying Rust library. We aim to implement the full model and most extensions.
Tutorial
A tutorial for working with this API is available in the form of an interactive Jupyter Notebook: STAM Tutorial: Standoff Text Annotation for Pythonistas.