stam
Module Contents
Classes
Annotation represents a particular instance of annotation and is the central |
|
AnnotationData holds the actual content of an annotation; a key/value pair. (the |
|
An AnnotationDataSet stores the keys ( |
|
An Annotation Store is a collection of annotations, resources and |
|
An Annotations object holds an arbitrary collection of annotations. |
|
A cursor points to a specific point in a text. It is used to select offsets. Units are unicode codepoints (not bytes!) |
|
A Data object holds an arbitrary collection of annotation data. |
|
The DataKey class defines a vocabulary field, it |
|
Encapsulates a value and its type. Held by |
|
Text selection offset. Specifies begin and end offsets to select a range of a text, via two |
|
A Selector identifies the target of an annotation and the part of the |
|
An enumeration of possible selector types |
|
This holds the textual resource to be annotated. It holds the full text in memory. |
|
This holds a slice of a text. |
|
The TextSelectionOperator, simply put, allows comparison of two |
|
A TextSelections object holds an arbitrary collection of text selections. |
- class stam.Annotation
Annotation represents a particular instance of annotation and is the central concept of the model. Annotations can be considered the primary nodes of the graph model. The instance of annotation is strictly decoupled from the data or key/value of the annotation (
AnnotationData
). After all, multiple instances can be annotated with the same label (multiple annotations may share the same annotation data). Moreover, an Annotation can have multiple annotation data associated. The result is that multiple annotations with the exact same content require less storage space, and searching and indexing is facilitated.This structure is not instantiated directly, only returned. Use
AnnotationStore.annotate()
to instantiate a new Annotation.- __iter__() Iterator[AnnotationData]
Returns a iterator over all data (
AnnotationData
) in this annotation; this has little overhead but is less suitable if you want to do further filtering, usedata()
instead for that.- Return type:
Iterator[AnnotationData]
- __len__() int
Returns the number of data items (
AnnotationData
) in this annotation- Return type:
- __str__() str
Returns the text of the annotation. If the annotation references multiple text slices, they will be concatenated with a space as a delimiter, but note that in reality the different parts may be non-contingent!
Use text() instead to retrieve a list of texts
- Return type:
- annotations(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
instances) that are referring to this annotation (i.e. others using an AnnotationSelector).The annotations can be filtered using positional and/or keyword arguments.
- Parameters:
*args (tuple, optional) –
These arguments can any be of the following types:
DataKey
Returns annotations with data matching this key.
AnnotationData
Returns only annotations that have this exact data.
Annotations
|Annotation
Returns only annotations that match any of those specified here.
Data
|AnnotationData
Returns only annotations with data matching any of those specified here.
**kwargs (dict, optional) –
- limit: (Optional[int] = None)
The maximum number of results to return (default: unlimited)
- set: (Optional[Union[str,AnnotationDataSet]] = None)
An ID of a dataset (or an
AnnotationDataSet
instance), only needed when specifying key as a string
- key: (Optional[Union[str,DataKey]] = None)
An ID of a key (or a
DataKey
instance), make sure to specify set as well if you use a string value for this parameter.
- limit: (Optional[int] = None)
The maximum number of results to return (default: unlimited)
- Return type:
Example
Filter by data key and value:
key = store.dataset("linguistic-set").key("part-of-speech") for annotation in store.annotations(key, value="noun"): ...
But if you already have the key, like in the example above, you may just as well do (more efficient):
for annotation in key.annotations(value="noun"): ...
- annotations_in_targets(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
instances) this annotation refers to (i.e. using an AnnotationSelector)The annotations can be filtered using positional and/or keyword arguments; see
annotations()
for full documentation. One extra keyword argument is available for this method (see below).Annotations will returned be in textual order unless recursive is set or a DirectionalSelector is involved.
- Keyword Arguments:
recursive (bool) – Follow AnnotationSelectors recursively (default False)
- Return type:
- data(*args, **kwargs) Data
Returns annotation data (
Data
containingAnnotationData
) used by this annotation.The data can be filtered using keyword arguments. If you don’t care for any filtering and just want a simple iterator overlap the data, then just iterating over the annotation directly (
__iter__()
) will be more efficient. Do note that implementing any filtering yourself in Python is much less performant than letting this data method do it for you.- Parameters:
*args (tuple, optional) –
Filter arguments, these can be of the following types:
DataKey
Returns data matching this key
Annotation
Returns data referenced by the mentioned annotation
AnnotationData
Returns only this exact data. Not very useful, use
test_data()
instead.
Annotations
| [class:Annotation]Returns data references by annotations in the provided collection.
**kwargs (dict, optional) –
- limit: Optional[int] = None
The maximum number of results to return (default: unlimited)
- set: Optional[Union[str,AnnotationDataSet]] = None
An ID of a dataset (or an
AnnotationDataSet
instance), only needed when specifying key as a string
- key: Optional[Union[str,DataKey]] = None
An ID of a key (or a
DataKey
instance), make sure to specify set as well if you use a string value for this parameter.
- value: Optional[Union[str,int,float,bool,List[Union[str,int,float,bool]]]]
Search for data matching a specific value. This holds exact value to search for. Further variants of this keyword are listed below:
- value_not: Optional[Union[str,int,float,bool]]
Value must not match
- value_greater: Optional[Union[int,float]]
Value must be greater than specified (int or float)
- value_less: Optional[Union[int,float]]
Value must be less than specified (int or float)
- value_greatereq: Optional[Union[int,float]]
Value must be greater than specified or equal (int or float)
- value_lesseq: Optional[Union[int,float]]
Value must be less than specified or equal (int or float)
- value_in: Optional[Tuple[Union[str,int,float,bool]]]
Value must match any in the tuple (this is a logical OR statement)
- value_not_in: Optional[Tuple[Union[str,int,float,bool]]]
Value must not match any in the tuple
- value_in_range: Optional[Tuple[Union[int,float]]]
Must be a numeric 2-tuple with min and max (inclusive) values
- limit: Optional[int] = None
The maximum number of results to return (default: unlimited)
- Return type:
Example
Get all part-of-speech data pertaining to this annotation:
key = store.dataset("linguistic-set").key("part-of-speech") for data in annotation.data(filter=key): ...
- datasets(limit: int | None = None) List[AnnotationDataSet]
Returns a list of annotation data sets (
AnnotationDataSet
) this annotation refers to. This only returns the ones referred to via a DataSetSelector, i.e. as metadata.- Parameters:
limit (Optional[int] = None) – The maximum number of results to return (default: unlimited)
- Return type:
List[AnnotationDataSet]
- id() str | None
Returns the public ID (by value, aka a copy) Don’t use this for extensive ID comparisons, use
has_id()
instead as it is more performant (no copy).- Return type:
Optional[str]
- offset() Offset | None
Returns the offset this annotation’s selector targets, exactly as specified
- Return type:
Optional[Offset]
Applies a
TextSelectionOperator
to find all other text selections who are in a specific relation with the ones from the current annotation. Returns a collectionTextSelections
containing all matchingTextSelection
instances.Text selections will be returned in textual order. They may be filtered via positional and/or keyword arguments. See
Annotation.textselections()
.If you are interested in the annotations associated with the found text selections, then add .annotations() to the result.
- Parameters:
operator (
TextSelectionOperator
) – The operator to apply when comparing text selections- Keyword Arguments:
limit (Optional[int] = None) – The maximum number of results to return (default: unlimited)
- Return type:
See
Annotation.textselections()
for further keyword arguments to filter.Examples
Find all text selections that overlap with the annotation:
for textselection in annotation.related_text(TextSelectionOperator.overlaps()): ...
If you want to get the annotations instead, just add
.annotations()
:for annotations in annotation.related_text(TextSelectionOperator.overlaps()).annotations(): ...
Assume sentence is an annotation representing a sentence, we can find text selections inside (embedded in) the sentence as follows:
for textselection in sentence.related_text(TextSelectionOperator.embeds()): ...
Like above, but now we actively look for annotations that are marked as words, effectively selecting all words in a sentence:
data_word = store.dataset("structural-set").key("type").data(value="word", limit=1)[0] for word in sentence.related_text(TextSelectionOperator.embeds()).annotations(filter=data_word): ...
- resources(limit: int | None = None) List[TextResource]
Returns a list of resources (
TextResource
) this annotation refers to- Parameters:
limit (Optional[int] = None) – The maximum number of results to return (default: unlimited)
- Return type:
List[TextResource]
- selector_kind() SelectorKind
Returns the type of the selector of this annotation
- Return type:
- target() Selector
Returns the target selector (
Selector
) for this annotation. This is mainly useful if you want to add another annotation pointing to the same target.- Return type:
- test_annotations(*args, **kwargs) bool
Tests whwther there are annotations (
Annotations
containingAnnotation
) that are referring to this annotation (i.e. others using an AnnotationSelector). This method is likeannotations()
, but only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using keyword arguments. See
Annotation.annotations()
.Example
Filter by data key and value:
key = store.dataset("linguistic-set").key("part-of-speech") for annotation in store.annotations_in_targets(filter=key, value="noun"): ...
- Return type:
- test_data(*args, **kwargs) bool
Tests whether certain annotation data is used by this annotation. The data can be filtered using positional and/or keyword arguments. See
data()
. Unlikedata()
, this method merely tests without returning the data, and as such is more performant.- Return type:
- text() List[str]
Returns the text of the annotation. Note that this will always return a list (even it if only contains a single element), as an annotation may reference multiple texts.
If you are sure an annotation only reference a single contingent text slice or are okay with slices being concatenated, then you can use the str() function instead.
- Return type:
List[str]
- textselections(**kwargs) TextSelections
Returns a collection of all textselections (
TextSelection
) referenced by the annotation (i.e. via a TextSelector). Note that this will always return a collection (even it if only contains a single element), as an annotation may reference multiple text selections.Text selections will be returned in textual order, except if a DirectionalSelector was used.
Text selections may be filtered using the following positionl and/or keyword arguments:
- Parameters:
*args (tuple, optional) –
Filter arguments, can be of the following types:
DataKey
Returns text selections referenced by annotations with data matching this key
AnnotationData
Returns text selections referenced by annotations that have this exact data
Annotations
| [Annotation
]Returns text selections referenced by any annotations that are already in the provided
Annotations
collection (intersection)
Data
| [AnnotationData
]Returns only textselections referenced by annotations with data that is in the provided collection.
**kwargs (dict, optional) –
- limit: Optional[int] = None
The maximum number of results to return (default: unlimited)
- value: Optional[Union[str,int,float,bool]]
Constrain the search to text selections referenced by annotations with data of a certain value. This is usually used together with passing a
DataKey
as filter in the positional arguments. This holds the exact value to search for, there are other variants of this keyword available, seedata()
for a full list.
- Return type:
- class stam.AnnotationData
AnnotationData holds the actual content of an annotation; a key/value pair. (the term feature is regularly seen for this in certain annotation paradigms). Annotation Data is deliberately decoupled from the actual
Annotation
instances so multiple annotation instances can point to the same content without causing any overhead in storage. Moreover, it facilitates indexing and searching. The annotation data is part of anAnnotationDataSet
, which effectively defines a certain user-defined vocabulary.Once instantiated, instances of this type are, by design, largely immutable. The key and value can not be changed. Create a new AnnotationData and new Annotation for edits. This class is not instantiated directly.
- annotations(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
) that make use of this data.The annotations can be filtered using positional and/or keyword arguments.
- Parameters:
*args (tuple, optional) –
Filter arguments, can any be of the following types:
DataKey
Returns annotations with data matching this key.
AnnotationData
Returns only annotations that have this exact data.
Annotations
|Annotation
Returns only annotations that match any of those specified here.
Data
|AnnotationData
Returns only annotations with data matching any of those specified here.
**kwargs (dict, optional) –
- limit: (Optional[int] = None)
The maximum number of results to return (default: unlimited)
- set: (Optional[Union[str,AnnotationDataSet]] = None)
An ID of a dataset (or an
AnnotationDataSet
instance), only needed when specifying key as a string
- key: (Optional[Union[str,DataKey]] = None)
An ID of a key (or a
DataKey
instance), make sure to specify set as well if you use a string value for this parameter.
- value: (Optional[Union[str,int,float,bool]])
Constrain the search to annotations with data of a certain value. This can only be used when you also pass a
DataKey
as filter. This holds the exact value to search for, there are other variants of this keyword available, seedata()
for a full list.
- limit: (Optional[int] = None)
The maximum number of results to return (default: unlimited)
- Return type:
- annotations_len(limit: int | None = None) int
Returns the number of annotations (
Annotation
) that use this data. Note that this is much faster than doing len(annotations())!- Parameters:
limit (Optional[int] = None) – The maximum number of results to return (default: unlimited)
- Return type:
- dataset() AnnotationDataSet
Returns the
AnnotationDataSet
this data is part of- Return type:
- id() str | None
Returns the public ID (by value, aka a copy) Don’t use this for extensive ID comparisons, use
has_id()
instead as it is more performant (no copy).- Return type:
Optional[str]
- test_annotations(*args, **kwargs) bool
Tests whether there are any annotations that make use of this data. This method is like
annotations()
, but only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using keyword arguments. See
Annotation.annotations()
.- Return type:
- class stam.AnnotationDataSet
An AnnotationDataSet stores the keys (
DataKey
) and valuesAnnotationData
(which in turn encapsulatesDataValue
) that are used by annotations.It effectively defines a certain vocabulary, i.e. key/value pairs. The AnnotationDataSet does not store the
Annotation
instances, those are in theAnnotationStore
. The datasets themselves are also held by the AnnotationStore.Use
AnnotationStore.add_annotationset()
to instantiate a new AnnotationDataSet, it can not be constructed directly.- __iter__() Iterator[AnnotationData]
Returns an iterator over all
AnnotationData
in the dataset. If you want to do any filtering, usedata()
instead.- Return type:
Iterator[AnnotationData]
- add_data(key: str, value: DataValue | str | float | int | list | bool, id: str | None = None) AnnotationData
Create a new
AnnotationData
instances and add it to the dataset. Returns the added data.
- annotationdata(id: str) AnnotationData
Basic retrieval method to obtain annotationdata from a dataset, by ID
- Parameters:
id (str) –
- Return type:
- data(*args, **kwargs) Data
Returns annotation data (
Data
containingAnnotationData
) used by this key.The data can be filtered using positional and/or keyword arguments. See
Annotation.data()
. If you don’t intend to do any filtering at all, then just using__iter__()
may be faster.- Return type:
- id() str | None
Returns the public ID (by value, aka a copy) Don’t use this for extensive ID comparisons, use
has_id()
instead as it is more performant (no copy).- Return type:
Optional[str]
- keys() Iterator[DataKey]
Returns an iterator over all
DataKey
instances in the dataset- Return type:
Iterator[DataKey]
- select() Selector
Returns a selector pointing to this annotation dataset (via a DataSetSelector)
- Return type:
- test_data(*args, **kwargs) bool
Tests whether certain annotation data exists in this set. The data can be filtered using positional and/or keyword arguments. See
Annotation.data()
. This method is likedata()
, but merely tests without returning the data, and as such is more performant.- Return type:
- class stam.AnnotationStore(id=None, file=None, string=None, config=None)
An Annotation Store is a collection of annotations, resources and annotation data sets. It can be seen as the root of the graph model and the glue that holds everything together. It is the entry point for any stam model.
To instantiate an AnnotationStore, at least one of id, file or string must be specified as keyword arguments:
- Keyword Arguments:
id (Optional[str], default: None) – The public ID for a new store
file (Optional[str], default: None) – The STAM JSON, STAM CSV or STAM CBOR file to load
string (Optional[str], default: None) – STAM JSON as a string
config (Optional[dict]) –
A python dictionary containing configuration parameters:
- use_include: Optional[bool], default: True
Use the @include mechanism to point to external files, if unset, all data will be kept in a single STAM JSON file.
- debug: Optional[bool], default: False
Enable debug mode, outputs extra information to standard error output (verbose!)
- annotation_annotation_map: Optional[bool], default: True
Enable/disable index for annotations that reference other annotations
- resource_annotation_map: Optional[bool], default: True
Enable/disable reverse index for TextResource => Annotation. Holds only annotations that directly reference the TextResource (via a ResourceSelector), i.e. metadata
- dataset_annotation_map: Optional[bool], default: True
Enable/disable reverse index for AnnotationDataSet => Annotation. Holds only annotations that directly reference the AnnotationDataSet (via DataSetSelector), i.e. metadata
- key_annotation_metamap: Optional[bool], default: True
Enable/disable reverse index for DataKey => Annotation. Holds only annotations that directly reference the DataKey (via DataKeySelector), i.e. metadata
- data_annotation_metamap: Optional[bool], default: True
Enable/disable reverse index for AnnotationData => Annotation. Holds only annotations that directly reference the AnnotationData (via AnnotationDataSelector), i.e. metadata
- textrelationmap: Optional[bool], default: True
Enable/disable the reverse index for text, it maps TextResource => TextSelection => Annotation
- generate_ids: Optional[bool], default: False
Generate pseudo-random public identifiers when missing (during deserialisation). Each will consist of 21 URL-friendly ASCII symbols after a prefix of A for Annotations, S for DataSets, D for AnnotationData, R for resources
- strip_temp_ids: Optional[bool], default: True
Strip temporary IDs during deserialisation. Temporary IDs start with an exclamation mark, a capital ASCII letter denoting the type, and a number
- shrink_to_fit: Optional[bool], default: True
Shrink data structures to optimize memory (at the cost of longer deserialisation times)
- milestone_interval: Optional[int], default: 100
Milestone placement interval (in unicode codepoints) in indexing text resources. A low number above zero increases search performance at the cost of memory and increased initialisation time.
Example
Load a store from file:
store = AnnotationStore(file="hamlet.store.json")
Instantiate a store from scratch and populate it with a resource and annotation:
self.store = AnnotationStore(id="test") resource = self.store.add_resource(id="testres", text="Hello world") self.store.annotate(id="A1", target=Selector.textselector(resource, Offset.simple(6,11)), data={ "id": "D1", "key": "pos", "value": "noun", "set": "testdataset"})
- __iter__() Iterator[Annotation]
Returns an iterator over all annotations (
Annotation
) in this store.This iterator has little runtime overhead but does not provide any filtering options, use
annotations()
instead if you plan to do any filtering, or use the equally named method on other objects for more constrained and filterable annotations (e.g.DataKey.annotations()
,AnnotationDataSet.annotations()
,TextResource.annotations()
)- Return type:
Iterator[Annotation]
- add_dataset(id: str) AnnotationDataSet
Create a new
AnnotationDataSet
and add it to the store. Returns the added instance.- Parameters:
id (str) –
- Return type:
- add_resource(filename: str | None = None, text: str | None = None, id: str | None = None) TextResource
Create a new
TextResource
and add it to the store. Returns the added instance.- Parameters:
- Return type:
- annotate(target: Selector, data: dict | List[dict] | AnnotationData | List[AnnotationData], id: str | None = None) Annotation
Adds a new annotation. Returns the
Annotation
instance that was just created.- Parameters:
target (
Selector
) – A target selector that determines the object of annotationdata (Union[dict,List[dict],AnnotationData,List[AnnotationData]]) – A dictionary or list of dictionaries with data to set. The dictionary may have fields: id (optional),`key`,`set`, and value. Alternatively, you can pass an existing
AnnotationData
instance.id (Optional[str]) – The public ID for the annotation. If unset, one may be autogenerated if this was explicitly enabled in the configuraiton.
- Return type:
Example
Instantiate a store from scratch and populate it with a resource and annotation:
self.store.annotate(id="A1", target=Selector.textselector(store.resource("testres"), Offset.simple(6,11)), data={ "id": "D1", "key": "pos", "value": "noun", "set": "testdataset"})
- annotation(id: str) Annotation
Basic retrieval method that returns an
Annotation
by ID. Raises an exception if not found.- Parameters:
id (str) –
- Return type:
- annotationdata(set_id: str, data_id: str) AnnotationData
Shortcut retrieval method that returns an
AnnotationData
by ID- Parameters:
- Return type:
- annotations(*args, **kwargs) Annotations
Returns an iterator over all annotations (
Annotation
) in this store.Filtering can be applied using positional arguments and/or keyword arguments. It is recommended to only use this method if you apply further filtering, otherwise the memory overhead may be very large if you have many annotations. Otherwise you can fall back to a more low-level iterator,
__iter__()
instead- Parameters:
*args (tuple, optional) –
Filter arguments. These can any be of the following types:
DataKey
Returns annotations with data matching this key.
AnnotationData
Returns only annotations that have this exact data.
Annotations
| [Annotation
]Returns only annotations that match any of those specified here.
Data
| [AnnotationData
]Returns only annotations with data matching any of those specified here.
**kwargs (dict, optional) –
- limit: (Optional[int] = None)
The maximum number of results to return (default: unlimited)
- set: (Optional[Union[str,AnnotationDataSet]] = None)
An ID of a dataset (or an
AnnotationDataSet
instance), only needed when specifying key as a string
- key: (Optional[Union[str,DataKey]] = None)
An ID of a key (or a
DataKey
instance), make sure to specify set as well if you use a string value for this parameter.
- Return type:
- annotations_len() int
Returns the number of annotations in the store (not substracting deletions)
- Return type:
- data(*args, **kwargs) Data
Returns an iterator over all data (
AnnotationData
) in this store.Filtering can be applied using positional arguments and/or keyword arguments. It is recommended to only use this method if you apply further filtering, otherwise the memory overhead may be very large if you have a lot of data.
- Parameters:
*args (tuple, optional) –
Filter arguments, these can be of the following types:
DataKey
Returns data matching this key
Annotation
Returns data referenced by the mentioned annotation
AnnotationData
Returns only this exact data. Not very useful, use
test_data()
instead.
Annotations
| [class:Annotation]Returns data references by annotations in the provided collection.
**kwargs (dict, optional) –
- limit: Optional[int] = None
The maximum number of results to return (default: unlimited)
- set: Optional[Union[str,AnnotationDataSet]] = None
An ID of a dataset (or an
AnnotationDataSet
instance), only needed when specifying key as a string
- key: Optional[Union[str,DataKey]] = None
An ID of a key (or a
DataKey
instance), make sure to specify set as well if you use a string value for this parameter.
- value: Optional[Union[str,int,float,bool,List[Union[str,int,float,bool]]]]
Search for data matching a specific value. This holds exact value to search for. Further variants of this keyword are listed below:
- value_not: Optional[Union[str,int,float,bool]]
Value must not match
- value_greater: Optional[Union[int,float]]
Value must be greater than specified (int or float)
- value_less: Optional[Union[int,float]]
Value must be less than specified (int or float)
- value_greatereq: Optional[Union[int,float]]
Value must be greater than specified or equal (int or float)
- value_lesseq: Optional[Union[int,float]]
Value must be less than specified or equal (int or float)
- value_in: Optional[Tuple[Union[str,int,float,bool]]]
Value must match any in the tuple (this is a logical OR statement)
- value_not_in: Optional[Tuple[Union[str,int,float,bool]]]
Value must not match any in the tuple
- value_in_range: Optional[Tuple[Union[int,float]]]
Must be a numeric 2-tuple with min and max (inclusive) values
- Return type:
- dataset(id: str) AnnotationDataSet
Basic retrieval method that returns an
AnnotationDataSet
by ID. Raises an exception if not found.- Parameters:
id (str) –
- Return type:
- datasets() Iterator[AnnotationDataSet]
Returns an iterator over all annotation data sets (
AnnotationDataSet
) in this store- Return type:
Iterator[AnnotationDataSet]
- datasets_len() int
Returns the number of annotation data sets in the store (not substracting deletions)
- Return type:
- key(set_id: str, key_id: str) DataKey
Shortcut retrieval method that returns an
DataKey
by ID. Raises an exception if not found.
- query(query: str, **kwargs) list
Query the data using STAMQL.
- Parameters:
query (str) – Query in STAMQL. Note that you MUST specify a variable to bind to in your SELECT statement (this is normally optional but is required for calling from Python).
**kwargs (tuple, optional) – You can bind extra context variables using keyword arguments. The keys correspond to the variable names that these will be bound to and which you can subsequently use in the STAMQL query. These keys should not carry the ‘?’ prefix you may be accustomed to in STAMQL. The value must be instances of STAM objects such as
Annotation
,AnnotationData
,DataKey
, :class`TextSelection` etc. These context variables are available to the query but not propagated to the output.
- Return type:
A query returns a list consisting of dictionaries, each corresponding one result row. The keys in the dictionaries match with the variable names in the STAMQL query, the values are result instances of whatever type the query returns, i.e. Annotation, AnnotationData, TextResource, TextSelection, AnnotationDataSet.
Examples
Query for annotations with certain kind of data:
for row in store.query('SELECT ANNOTATION ?a WHERE "some-set" "pos" = "noun";'): for result in row: #just print out the text of the annotation print(str(result['a']))
- resource(id: str) TextResource
Basic retrieval method that returns a
TextResource
by ID. Raises an exception if not found.- Parameters:
id (str) –
- Return type:
- resources() Iterator[TextResource]
Returns an iterator over all text resources (
TextResource
) in this store- Return type:
Iterator[TextResource]
- resources_len() int
Returns the number of text resources in the store (not substracting deletions)
- Return type:
- save() None
Saves the annotation store to the same file it was loaded from or last saved to.
- Return type:
None
- set_filename(filename: str) None
Set the filename for the annotationstore, the format is derived from the extension, can be .json or csv
- Parameters:
filename (str) –
- Return type:
None
- shrink_to_fit()
Reallocates internal data structures to tight fits to conserve memory space (if necessary). You can use this after having added lots of annotations to possibly reduce the memory consumption.
- class stam.Annotations
An Annotations object holds an arbitrary collection of annotations. The annotations are references to items in an AnnotationStore, not copies. You can iterate over it to retrieve
Annotation
instances.- __getitem__(int) Annotation
Returns an annotation in the collection by index
- Return type:
- __iter__() Iterator[Annotation]
Iterator over all annotations in this collection
- Return type:
Iterator[Annotation]
- annotations(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
) that reference annotations in the current collection (e.g. annotations that target of the current any annotations using an AnnotationSelector).The annotations can be filtered using positional and/or keyword arguments; see
Annotation.annotations()
. If no filters are set (default), all annotations are returned (without duplicates) in chronological order.Example
Say annotation represents a word, we can get all annotations that with key “part-of-speech”, that point to this annotation:
key = store.dataset("linguistic-set").key("part-of-speech") for pos_annotation in annotation.annotations(filter=key): data = annotation.data(filter=key,limit=1)[0] ...
- Return type:
- annotations_in_targets(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
) that are being referenced by annotations in the current collection (e.g. annotations we target using an AnnotationSelector).The annotations can be filtered using positional and/or keyword arguments; see
Annotation.annotations()
. One extra keyword argument is available and explained below. If no filters are set (default), all annotations are returned (without duplicates). Annotations are returned in chronological order.- Keyword Arguments:
recursive (bool) – Follow AnnotationSelectors recursively (default False)
- Return type:
- data(*args, **kwargs) Data
Returns annotation data (
Data
containingAnnotationData
) used by annotations in this collection.The data can be filtered using positional and/or keyword arguments; see
Annotation.data()
. If no filters are set (default), all data from all annotations are returned (without duplicates).- Return type:
- is_sorted() bool
Returns a boolean indicating whether the annotations in this collection are sorted chronologically (earlier annotations before later once). Note that this is distinct from any textual ordering.
- Return type:
Applies a
TextSelectionOperator
to find all other text selections who are in a specific relation with any from the current collection of annotations. Returns a collection of all matchingTextSelection
instances.Text selections will be returned in textual order. They may be filtered via keyword arguments. See
Annotation.textselections()
.See
Annotation.related_text()
for allowed paramters/keyword arguments and examples.- Parameters:
operator (TextSelectionOperator) –
- Return type:
- test_annotation(*args, **kwargs) bool
Tests whether certain annotations reference any annotation in this collection. The annotation can be filtered using positional and/or keyword arguments. See
annotations()
. Unlikeannotations()
, this method merely tests without returning the data, and as such is more performant.- Return type:
- test_annotations_in_targets(*args, **kwargs) Annotations
Tests whether annotations in this collection targets the specified annotation. The annotation can be filtered using positional and/or keyword arguments. See
annotations()
. Unlikeannotations_in_targets()
, this method merely tests without returning the data, and as such is more performant.- Return type:
- test_data(*args, **kwargs) bool
Tests whether certain annotation data is used by any annotation in this collection. The data can be filtered using keyword arguments. See
data()
. Unlikedata()
, this method merely tests without returning the data, and as such is more performant.- Return type:
- textselections(limit: int | None = None) TextSelections
Returns a collection of all textselections associated with the annotations in this collection.
- Parameters:
limit (Optional[int]) –
- Return type:
- textual_order() Annotations
Sorts the annotations in textual order (provided they refer to any text at all)
This has some performance cost, so prevent calling this method on methods like
Annotation.annotations_in_targets()
which already produce textual order (in most cases)- Return type:
- class stam.Cursor(index, endaligned: bool = False)
A cursor points to a specific point in a text. It is used to select offsets. Units are unicode codepoints (not bytes!) and are 0-indexed.
The cursor can be either begin-aligned or end-aligned. Where BeginAlignedCursor(0) is the first unicode codepoint in a referenced text, and EndAlignedCursor(0) the last one.
- Parameters:
- class stam.Data
A Data object holds an arbitrary collection of annotation data. The data are references to items in an AnnotationStore, not copies. You can iterate over it to retrieve
AnnotationData
instances.- __getitem__(int) AnnotationData
Returns data in the collection by index
- Return type:
- __iter__() Iterator[AnnotationData]
Iterator over all data in this collection
- Return type:
Iterator[AnnotationData]
- annotations(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
) that are make use of any of the data in this collectionThe annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- test_annotations(*args, **kwargs) bool
Tests whether there are any annotations that make use of any of the data in this collection This method is like
annotations()
, but does only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- class stam.DataKey
The DataKey class defines a vocabulary field, it belongs to a certain
AnnotationDataSet
. AAnnotationData
instance in turn makes reference to a DataKey and assigns it a value.- annotations(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
) that make use of this key.The annotations can be filtered on value using keyword arguments. See
Annotation.annotations()
, but note that not all keyword arguments apply in this context (set and key are predetermined already).Example
Assume the key represents part-of-speech tags, get all annotations for value “noun”:
for annotation in key.annotations(value="noun"): ...
- Return type:
- annotations_count(limit: int | None = None) int
Returns the number of annotations (
Annotation
) that use this data. Note that this is much faster than doing len(annotations())! This method has suffix _count instead of _len because it is not O(1) but does actual counting (O(n) at worst).- Parameters:
limit (Optional[int] = None) – The maximum number of results to return (default: unlimited)
- Return type:
- data(*args, **kwargs) Data
Returns annotation data (
Data
containingAnnotationData
) used by this key.The data can be filtered using positional and/or keyword arguments. See
Annotation.data()
. Note that only a subset makes sense in this context, set and key are already fixed.Example
Assume the key represents part-of-speech tags, get all annotations for value “noun”:
for data in key.data(value="noun"): # returns only one
- Return type:
- dataset() AnnotationDataSet
Returns the
AnnotationDataSet
this key is part of- Return type:
- id() str | None
Returns the public ID (by value, aka a copy) Don’t use this for extensive ID comparisons, use
has_id()
instead as it is more performant (no copy).- Return type:
Optional[str]
- test_annotations(*args, **kwargs) bool
Tests whether there are any annotations that make use of this key. This method is like
annotations()
, but only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using keyword arguments. See
Annotation.annotations()
.Example
Assume the key represents part-of-speech tags, test if there are annotations for data value “noun”:
- if key.test_annotations(value=”noun”):
…
- Return type:
- test_data(*args, **kwargs) bool
Tests whether certain annotation data exists for this key The data can be filtered using keyword arguments. See
Annotation.data()
. Note that only a subset makes sense in this context, set and key are already fixed.This method is like
data()
, but merely tests without returning the data, and as such is more performant.Example
Assume the key represents part-of-speech tags, get all annotations for value “noun”:
if key.test_data(value="noun"): #value exists ...
- Return type:
- class stam.DataValue(value: str | bool | int | float | List)
Encapsulates a value and its type. Held by
AnnotationData
. This type is not a reference but holds the actual value.You can instantiate a new DataValue from a supported Python type, but you usually don’t need to do this explicitly.
- class stam.Offset(begin: Cursor, end: Cursor)
Text selection offset. Specifies begin and end offsets to select a range of a text, via two
Cursor
instances. The end-point is non-inclusive.You can instantiate a new offset on the basis of two
Cursor
instances
- class stam.Selector
A Selector identifies the target of an annotation and the part of the target that the annotation applies to. Selectors can be considered the labelled edges of the graph model, tying all nodes together. There are multiple types of selectors, all captured in this class. There are several static methods available to instantiate a specific type of selector.
- annotation(store: AnnotationStore) Annotation | None
Returns the annotation this selector points at, if any. Works only for AnnotationSelector, returns None otherwise. Requires to explicitly pass the store so the resource can be found.
- Parameters:
store (AnnotationStore) –
- Return type:
Optional[Annotation]
- static annotationselector(annotation: Annotation, offset: Offset | None = None) Selector
Creates an AnnotationSelector - A selector pointing to another annotation. This we call higher-order annotation and is very common in STAM models. If the annotation that is being targeted eventually refers to a text (TextSelector), then offsets MAY be specified that select a subpart of this text. These offsets are now relative to the annotation.
- Parameters:
annotation (Annotation) – The target annotation
offset (Optional[Offset]) – If sets, references a subpart of the annotation’s text. If set to None, it applies to the annotation as such.
- Return type:
Example
Instantiation:
Selector.textselector(store.annotation("A1"), Offset.whole())
- static compositeselector(*subselectors: Selector) Selector
Creates a CompositeSelector - A selector that consists of multiple other selectors (subselectors), these are used to select more complex targets that transcend the idea of a single simple selection. This MUST be interpreted as the annotation applying equally to the conjunction as a whole, its parts being inter-dependent and for any of them it goes that they MUST NOT be omitted for the annotation to make sense.
Example
Instantiation of a composite selector over two annotation selectors:
Selector.compositeselector( Selector.annotationselector(self.store.annotation("A1"), Offset.whole()), Selector.annotationselector(self.store.annotation("A2"), Offset.whole()), )
- dataset(store: AnnotationStore) AnnotationDataSet | None
Returns the annotation dataset this selector points at, ff any. Works only for DataSetSelector, returns None otherwise. Requires to explicitly pass the store so the dataset can be found.
- Parameters:
store (AnnotationStore) –
- Return type:
Optional[AnnotationDataSet]
- static datasetselector(dataset: AnnotationDataSet) Selector
Creates a DataSetSelector - A selector pointing to an annotation dataset as whole. These type of annotation can be interpreted as metadata.
- Parameters:
dataset (AnnotationDataSet) – The annotation data set.
- Return type:
Example
Instantiation:
Selector.datasetselector(store.dataset("my-dataset"))
- static directionalselector(*subselectors: Selector) Selector
Creates a DirectionalSelector - Another selector that consists of multiple other selectors, but with an explicit direction (from -> to), used to select more complex targets that transcend the idea of a single simple selection.
- is_kind(kind: SelectorKind) bool
Tests whether a selector is of a particular type
- Parameters:
kind (SelectorKind) –
- Return type:
- kind() SelectorKind
Returns the type of selector
- Return type:
- static multiselector(*subselectors: Selector) Selector
Creates a MultiSelector - A selector that consists of multiple other selectors (subselectors) to select multiple targets. This MUST be interpreted as the annotation applying to each target individually, without any relation between the different targets.
- offset() Offset | None
Return offset information in the selector. Works for TextSelector and AnnotationSelector, returns None for others.
- Return type:
Optional[Offset]
- resource(store: AnnotationStore) TextResource | None
Returns the resource this selector points at, if any. Works only for TextSelector and ResourceSelector, returns None otherwise. Requires to explicitly pass the store so the resource can be found.
- Parameters:
store (AnnotationStore) –
- Return type:
Optional[TextResource]
- static resourceselector(resource: TextResource) Selector
Creates a ResourceSelector - A selector pointing to a resource as whole. These type of annotation can be interpreted as metadata.
- Parameters:
resource (TextResource) – The resource
- Return type:
Example
Instantiation:
Selector.resourceselector(store.resource("my-resource"))
- static textselector(resource: TextResource, offset: Offset) Selector
Creates a TextSelector. Selects a target resource and a text span within it.
- Parameters:
resource (TextResource) – The text resource
offset (Offset) – An offset pointing to the slice of the text in the resource
- Return type:
Example
Instantiation:
Selector.textselector(store.resource("testres"), Offset.simple(6,11))
- class stam.SelectorKind
An enumeration of possible selector types
- ANNOTATIONDATASELECTOR: SelectorKind
- ANNOTATIONSELECTOR: SelectorKind
- COMPOSITESELECTOR: SelectorKind
- DATAKEYSELECTOR: SelectorKind
- DATASETSELECTOR: SelectorKind
- DIRECTIONALSELECTOR: SelectorKind
- MULTISELECTOR: SelectorKind
- RESOURCESELECTOR: SelectorKind
- TEXTSELECTOR: SelectorKind
- exception stam.StamError
Bases:
Exception
STAM Error
Initialize self. See help(type(self)) for accurate signature.
- class stam.TextResource
This holds the textual resource to be annotated. It holds the full text in memory.
The text SHOULD be in [Unicode Normalization Form C (NFC) (https://www.unicode.org/reports/tr15/) but MAY be in another unicode normalization forms.
- __getitem__(slice: TextResource.__getitem__.slice) str
Returns a text slice
- Parameters:
slice (TextResource.__getitem__.slice) –
- Return type:
- __iter__() Iterator[TextSelection]
Iterates over all known textselections in this resource, in sorted order. This is a low-level iterator,
textselections()
provides a higher-level interface.- Return type:
Iterator[TextSelection]
- __str__() str
Returns the text of the resource (by value, aka a copy), same as
text()
- Return type:
- annotations(*args, **kwargs) Annotations
Returns a collection of annotations (
Annotation
) that reference this resource via a TextSelector (if any). Does NOT include those that use a ResourceSelector, useannotations_metadata()
instead for those instead.The annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- annotations_as_metadata(*args, **kwargs) Annotations
Returns a collection of annotations (
Annotation
) that reference this resource via a ResourceSelector (if any). Does NOT include those that use a TextSelector, useannotations()
instead for those instead.The annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- beginaligned_cursor(endalignedcursor: int) int
Converts an end-aligned cursor to a begin-aligned cursor, resolving all relative end-aligned positions The parameter value must be 0 or negative.
- find_text(fragment: str, limit: int | None = None, case_sensitive: bool | None = None) List[TextSelection]
Searches for the text fragment and returns a list of
TextSelection
instances with all matches (or up to the specified limit)- Parameters:
- Return type:
List[TextSelection]
- find_text_regex(expressions: List[str], allow_overlap: bool | None = False, limit: int | None = None) List[dict]
Searches the text using one or more regular expressions, returns a list of dictionaries like:
code:
{ "textselections": [TextSelection], "expression_index": int, "capturegroups": [int] }
Passing multiple regular expressions at once is more efficient than calling this function anew for each one. If capture groups are used in the regular expression, only those parts will be returned (the rest is context). If none are used, the entire expression is returned. The regular expressions are passed as strings and must follow this syntax: https://docs.rs/regex/latest/regex/#syntax , which may differ slightly from Python’s regular expressions!
The allow_overlap parameter determines if the matching expressions are allowed to overlap. It you are doing some form of tokenisation, you also likely want this set to false. All of this only matters if you supply multiple regular expressions.
Results are returned in the exact order they are found in the text
- id() str | None
Returns the public ID (by value, aka a copy) Don’t use this for extensive ID comparisons, use
has_id()
instead as it is more performant (no copy).- Return type:
Optional[str]
- range(begin, end) Iterator[TextSelection]
Iterates over all known textselections that start in the specified range, in sorted order.
- Return type:
Iterator[TextSelection]
- split_text(delimiter: str, limit: int | None = None) List[TextSelection]
Returns a list of
TextSelection
instances that split the text according to the specified delimiter.- Parameters:
- Return type:
List[TextSelection]
- strip_text(chars: str) TextSelection
Trims all occurrences of any character in chars from both the beginning and end of the text, returning a
TextSelection
. No text is modified.- Parameters:
chars (str) –
- Return type:
- test_annotations(*args, **kwargs) bool
Tests whether there are any annotations that reference the text of this resource (via a TextSelector).
This method is like
annotations()
, but only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- test_annotations_as_metadata(*args, **kwargs) bool
Tests whether there are any annotations that reference this resource as metadata (via a ResourceSelector).
This method is like
annotations_as_metadata()
, but only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- textlen() int
Returns the length of the resources’s text in unicode points (same as len(self.text()) but more performant)
- Return type:
- textselection(offset: Offset) TextSelection
Returns a
TextSelection
instance covering the specified offset.- Parameters:
offset (Offset) –
- Return type:
- textselections() TextSelections
Iterates over all known textselections in this resource, in sorted order.
- Return type:
- class stam.TextSelection
This holds a slice of a text.
- __getitem__(slice: TextSelection.__getitem__.slice) str
Returns a text slice
- Parameters:
slice (TextSelection.__getitem__.slice) –
- Return type:
- __str__() str
Returns the text of the resource (by value, aka a copy), same as
text()
- Return type:
- annotations(**kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
) that reference this text selection via a TextSelector (if any).The annotations can be filtered using keyword arguments. See
Annotation.annotations()
- Return type:
- annotations_len() int
Returns the number of annotations this text selection references
- Return type:
- beginaligned_cursor(endalignedcursor: int) int
Converts an end-aligned cursor to a begin-aligned cursor, resolving all relative end-aligned positions The parameter value must be 0 or negative.
- find_text(fragment: str, limit: int | None = None, case_sensitive: bool | None = None) List[TextSelection]
Searches for the text fragment and returns a list of
TextSelection
instances with all matches (or up to the specified limit)- Parameters:
- Return type:
List[TextSelection]
- find_text_regex(expressions: List[str], allow_overlap: bool | None = False, limit: int | None = None) List[dict]
Searches the text using one or more regular expressions, returns a list of dictionaries like:
code:
{ "textselections": [TextSelection], "expression_index": int, "capturegroups": [int] }
Passing multiple regular expressions at once is more efficient than calling this function anew for each one. If capture groups are used in the regular expression, only those parts will be returned (the rest is context). If none are used, the entire expression is returned. The regular expressions are passed as strings and must follow this syntax: https://docs.rs/regex/latest/regex/#syntax , which may differ slightly from Python’s regular expressions!
The allow_overlap parameter determines if the matching expressions are allowed to overlap. It you are doing some form of tokenisation, you also likely want this set to false. All of this only matters if you supply multiple regular expressions.
Results are returned in the exact order they are found in the text
- find_text_sequence(fragments: List[str], case_sensitive: bool | None = None, allow_skip_whitespace: bool | None = True, allow_skip_punctuation: bool | None = True, allow_skip_numeric: bool | None = True, allow_skip_alphabetic: bool | None = False) List[TextSelection]
Searches for the multiple text fragment in sequence. Returns a list of
TextSelection
instances.Matches must appear in the exact order specified, but may have other intermittent text, determined by the allow_skip_* parameters.
Returns an empty list if the sequence does not match.
- Parameters:
fragments (List[str]) – The fragments to search for, in sequence
case_sensitive (Optional[bool] = None) – Match case sensitive or not (default: True)
allow_skip_whitespace (Optional[bool] = True) – Allow gaps consisting of whitespace (space, tabs, newline, etc) (default: True)
allow_skip_punctuation (Optional[bool] = True) – Allow gaps consisting of punctuation (default: True)
allow_skip_numeric (Optional[bool] = True) – Allow gaps consisting of numbers (default: True)
allow_skip_alphabetic (Optional[bool] = True) – Allow gaps consisting of alphabetic/ideographic characters (default: False)
- Return type:
List[TextSelection]
Applies a
TextSelectionOperator
to find all other text selections who are in a specific relation with this one. Returns all matchingTextSelection
instances in a collectionTextSelections
.Text selections will be returned in textual order. They may be filtered via keyword arguments. See
Annotation.textselections()
.- Parameters:
operator (TextSelectionOperator) – The operator to apply when comparing text selections
- Return type:
See
Annotation.related_text()
for allowed keyword arguments and examples.
- relative_offset(container: TextSelection) Offset
Returns the offset of this text selection relative to another in which it is embedded. Raises a StamError exception if they are not embedded, or not belonging to the same resource.
- Parameters:
container (TextSelection) –
- Return type:
- resource() TextResource
Returns the
TextResource
this textselection is from.- Return type:
- split_text(delimiter: str, limit: int | None = None) List[TextSelection]
Returns a list of
TextSelection
instances that split the text according to the specified delimiter.- Parameters:
- Return type:
List[TextSelection]
- strip_text(chars: str) TextSelection
Trims all occurrences of any character in chars from both the beginning and end of the text, returning a
TextSelection
. No text is modified.- Parameters:
chars (str) –
- Return type:
- test(operator: TextSelectionOperator, other: TextSelection) bool
This method is called to test whether a specific spatial relation (as expressed by the passed operator) holds between a [TextSelection] and another. A boolean is returned with the test result.
- Parameters:
operator (TextSelectionOperator) –
other (TextSelection) –
- Return type:
- test_annotations(**kwargs) bool
Tests whether there are any annotations that reference this text selection via a TextSelector (if any).
This method is like
annotations()
, but only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using keyword arguments. See
Annotation.annotations()
.- Return type:
- test_data(**kwargs) bool
Tests whether there are any annotations that reference this text selection with data that passes the provided filters. The result is functionally equivalent to doing .annotations().test_data(), but this shortcut method is implemented much more efficiently and therefore recommended.
The data can be filtered using keyword arguments. See
Annotations.data()
.- Return type:
- textlen() int
Returns the length of the resources’s text in unicode points (same as len(self.text()) but more performant)
- Return type:
- textselection(offset: Offset) TextSelection
Returns a
TextSelection
that corresponds to the offset WITHIN the current textselection. This returns aTextSelection
with absolute coordinates in the resource.- Parameters:
offset (Offset) –
- Return type:
- class stam.TextSelectionOperator
The TextSelectionOperator, simply put, allows comparison of two
TextSelection
instances. It allows testing for all kinds of spatial relations (as embodied by this class) in which twoTextSelection
instances can be.Rather than operate on single
TextSelection
instances, the implementation goes a bit further and can act also on the basis of multipleTextSelection
instances as a set; allowing you to compare two sets, each containing possibly multiple TextSelections, at once.The operator is instantiated via one of its static methods.
- static after(all: bool | None = False, negate: bool | None = False, limit: int | None = None) TextSelectionOperator
Create an operator to test if one textselection(sets) comes after another Each TextSeleciton In A comes after a textselection in B If modifier all is set: All TextSelections in A come after all textselections in B. There is no overlap (cf. textfabric’s >>)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
limit (Optional[usize]) – Constrain the lookup to at most this many unicode points (increases performance)
- Return type:
- static before(all: bool | None = False, negate: bool | None = False, limit: int | None = None) TextSelectionOperator
Create an operator to test if one textselection(sets) comes before another Each TextSelections in A comes before a textselection in B If modifier all is set: All TextSelections in A come before all textselections in B. There is no overlap (cf. textfabric’s <<)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
limit (Optional[usize]) – Constrain the lookup to at most this many unicode points (increases performance)
- Return type:
- static embedded(all: bool | None = False, negate: bool | None = False, limit: int | None = None) TextSelectionOperator
Create an operator to test if two textselection(sets) are embedded. All TextSelections in B are embedded by a TextSelection in A (cf. textfabric’s [[) If modifier all is set: All TextSelections in B are embedded by all TextSelection in A (cf. textfabric’s [[)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
limit (Optional[usize]) – Constrain the lookup to at most this many unicode points (increases performance)
- Return type:
- static embeds(all: bool | None = False, negate: bool | None = False) TextSelectionOperator
Create an operator to test if two textselection(sets) are embedded. All TextSelections in B are embedded by a TextSelection in A (cf. textfabric’s [[) If modifier all is set: All TextSelections in B are embedded by all TextSelection in A (cf. textfabric’s [[)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
- Return type:
- static equals(all: bool | None = False, negate: bool | None = False) TextSelectionOperator
Create an operator to test if two textselection(sets) occupy cover the exact same TextSelections, and all are covered (cf. textfabric’s ==), commutative, transitive
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
- Return type:
- static overlaps(all: bool | None = False, negate: bool | None = False) TextSelectionOperator
Create an operator to test if two textselection(sets) overlap. Each TextSelection in A overlaps with a TextSelection in B (cf. textfabric’s &&), commutative If modifier all is set: Each TextSelection in A overlaps with all TextSelection in B (cf. textfabric’s &&), commutative
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
- Return type:
- static precedes(all: bool | None = False, negate: bool | None = False) TextSelectionOperator
Create an operator to test if one textselection(sets) is to the immediate left (precedes) of another Each TextSelection in A is ends where at least one TextSelection in B begins. If modifier all is set: The rightmost TextSelections in A end where the leftmost TextSelection in B begins (cf. textfabric’s <:)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
- Return type:
- static samebegin(all: bool | None = False, negate: bool | None = False) TextSelectionOperator
Create an operator to test if two textselection(sets) have the same begin position Each TextSelection in A starts where a TextSelection in B starts If modifier all is set: The leftmost TextSelection in A starts where the leftmost TextSelection in B start (cf. textfabric’s =:)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
- Return type:
- static sameend(all: bool | None = False, negate: bool | None = False) TextSelectionOperator
Create an operator to test if two textselection(sets) have the same end position Each TextSelection in A ends where a TextSelection in B ends If modifier all is set: The rightmost TextSelection in A ends where the rights TextSelection in B ends (cf. textfabric’s :=)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
- Return type:
- static succeeds(all: bool | None = False, negate: bool | None = False) TextSelectionOperator
Create an operator to test if one textselection(sets) is to the immediate right (succeeds) of another Each TextSelection in A is begis where at least one TextSelection in A ends. If modifier all is set: The leftmost TextSelection in A starts where the rightmost TextSelection in B ends (cf. textfabric’s :>)
- Parameters:
all (Optional[bool]) – If this is set, then for each TextSelection in A, the relationship must hold with ALL of the text selections in B. The normal behaviour, when this is set to false, is a match with any item suffices (and may be returned).
negate (Optional[bool]) – Inverses the operator (turns it into a negation).
- Return type:
- class stam.TextSelections
A TextSelections object holds an arbitrary collection of text selections. You can iterate over it to retrieve
TextSelection
instances.- __getitem__(int) TextSelection
Returns a textselection in the collection by index
- Return type:
- __iter__() Iterator[TextSelection]
Iterator over all text selections in this collection
- Return type:
Iterator[TextSelection]
- __str__() str
Returns the text of all textselections.
The results are space-delimited, use
text_join()
instead if you want another delimiter.- Return type:
- annotations(*args, **kwargs) Annotations
Returns annotations (
Annotations
containingAnnotation
) that refer to any of the text selections in this collectionThe annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- data(*args, **kwargs) Data
Returns annotation data (
Data
containingAnnotationData
) used by annotations referring to the text selections in this collection.The data can be filtered using positional and/or keyword arguments; see
Annotation.data()
. If no filters are set (default), all data from all annotations on all text selections are returned (without duplicates).- Return type:
Applies a
TextSelectionOperator
to find all other text selections who are in a specific relation with the ones from the current collections. Returns a collection of all matchingTextSelection
instances.Text selections will be returned in textual order. They may be filtered via positional and/or keyword arguments. See
Annotation.textselections()
.If you are interested in the annotations associated with the found text selections, then add .annotations() to the result.
See
Annotation.related_text()
for allowed keyword arguments and examples.- Parameters:
operator (TextSelectionOperator) –
- Return type:
- test_annotations(**kwargs) bool
Tests whether there are any annotations that refer to any of the text selections in this collection
This method is like
annotations()
, but only tests and does not return the annotations, as such it is more performant.The annotations can be filtered using positional and/or keyword arguments. See
Annotation.annotations()
.- Return type:
- test_data(*args, **kwargs) bool
Tests whether there are any annotations that reference any of the text selections in the iterator, with data that passes the provided filters. The result is functionally equivalent to doing .annotations().test_data(), but this shortcut method is implemented much more efficiently and therefore recommended.
The data can be filtered using positional and/or keyword arguments. See
Annotations.data()
.- Return type:
- text_join(delimiter: str) str
Returns the text of all textselections, separated by the provider delimiter. This is more efficient than calling .text().join() yourself.
- textual_order() TextSelections
Sorts the annotations in textual order.
This has some performance cost, so prevent calling this method on methods that already promise to return textual order (which most textselection methods do!)
- Return type: