OSS Scan #4 — Spatial+Temporal & Swedish NLP

Scope: Viken cluster investigative substrate. Permissive licenses preferred (MIT/Apache/BSD). One Swedish journalist user.

Slice 1 — Spatial + Temporal (area filter, not point)

1. deck.gl (+ DataFilterExtension + MaskExtension)

URL: https://deck.gl / https://github.com/visgl/deck.gl
License: MIT
Status: Active (Uber/vis.gl org, frequent releases)
Lift: DataFilterExtension for range filters (incl. time as filter dim) + MaskExtension to clip layers to a user-drawn polygon. PolygonLayer + TripsLayer for the actual rendering. There is no canonical TimeLayer — the pattern is "push currentTime as a uniform / filter range on every frame."
Concerns: You build the timeline UI yourself; deck.gl gives primitives, not a finished spatiotemporal app. WebGL-heavy on older Macs. Pairs awkwardly with SSR.

2. Kepler.gl

URL: https://kepler.gl / https://github.com/keplergl/kepler.gl
License: MIT
Status: Active (3.x line, latest 3.1.x). Originally Uber, now community-maintained — slower cadence than deck.gl.
Lift: Drop-in finished UI with time playback slider AND polygon-filter on GeoJsonLayer. Could be the whole frontend for a v0. Export config as JSON.
Concerns: Heavy React+Redux blob, hard to customize past surface theming. Polygon filter exists but historically the "polygon + time" combo was a known gap (issue #949). Bundling Kepler into a Svelte/Astro shell is painful — it wants its own world. If you outgrow it, the lift to deck.gl is non-trivial.

3. MapLibre GL JS + temporal plugins

URLs:
- https://github.com/opengeos/maplibre-gl-time-slider
- https://github.com/mug-jp/maplibre-gl-temporal-control
- https://github.com/OpenHistoricalMap/maplibre-gl-dates
License: BSD-3 (MapLibre core), MIT (plugins, mostly)
Status: Active core, plugins mixed (OpenHistoricalMap one is the most maintained; opengeos slider is small but works).
Lift: MapLibre handles polygon-draw via maplibre-gl-draw; combine with a slider plugin. The maplibre-gl-dates plugin manipulates Style expressions to filter features by date range — cheap and clean.
Concerns: You assemble four libraries (map, draw, slider, your data layer). No one of them is "the timeline" — that's your code. Best fit if you want to own the stack.

4. PostGIS + TimescaleDB (the storage combo)

URL: https://postgis.net / https://www.timescale.com
License: GPL-2 (PostGIS), Apache-2 (TimescaleDB community)
Status: Both very active.
Lift: Store events as (geom geometry, ts timestamptz, …), hypertable on ts, GiST index on geom. Polygon+time query is a one-liner: ST_Within(geom, :polygon) AND ts <@ :range. This is the boring correct answer for an internal tool.
Concerns: GPL on PostGIS is sometimes flagged — irrelevant for an internal tool, but worth knowing. TimescaleDB community license has clauses around offering it as a managed service; also irrelevant here. Two extensions to keep current.

5. MobilityDB

URL: https://mobilitydb.com / https://github.com/MobilityDB/MobilityDB
License: PostgreSQL License (BSD-ish, permissive)
Status: Active (ULB academic, OSGeo project, 1.3 series).
Lift: First-class tgeompoint / tgeomseq types for trajectories. If "exploration trip" or "drill campaign moves over weeks" matters, this models it natively. STBox queries handle polygon+temporal range in one predicate.
Concerns: Almost certainly overkill for an events-with-points-and-polygons corpus. Designed for moving objects (GPS traces). If Viken events are static-location-with-timestamp, plain PostGIS+Timescale is simpler.

6. Apache Sedona

URL: https://sedona.apache.org
License: Apache-2
Status: Active Apache TLP.
Lift: Spark-based, ST_Intersects(polygon, point) plus time filters. Right answer if the corpus is hundreds of millions of rows.
Concerns: Spark for one journalist's investigation is absurd. Skip unless you're suddenly doing nationwide scrape of all mining filings since 1900.

7. GeoMesa

URL: https://www.geomesa.org / https://github.com/locationtech/geomesa
License: Apache-2
Status: Active LocationTech project, but slower than it was.
Lift: Z3 space-filling-curve index is purpose-built for "events at a place at a time." Backends: Accumulo/HBase/Cassandra/Redis/Kafka. Streaming-friendly.
Concerns: Same problem as Sedona — enterprise-scale infrastructure for journalist-scale data. Mention to be thorough; do not deploy.

8. vis-timeline (UI only)

URL: https://github.com/visjs/vis-timeline
License: Apache-2 / MIT dual
Status: Active, vis-timeline 8.5.x released May 2026.
Lift: The actual timeline widget — items, ranges, groups, zoom, drag. Pair with MapLibre on the other half of the screen, sync a currentTime store both ways. This is the obvious frontend recipe.
Concerns: Old-school imperative API (not React-native), but a Svelte wrapper is ~50 lines. Don't confuse with TimelineJS (Knight Lab, MIT) — that one is storytelling-oriented, single-narrative, and inappropriate here.

Bonus — event data model

SpatioTemporal Asset Catalog (STAC) — https://stacspec.org, Apache-2, active. Originally for raster, but the Item schema (geometry + datetime/start_datetime+end_datetime + properties) is a clean event model that's already standardized. Worth lifting the JSON schema even if you don't adopt the whole spec.

Recommendation for Slice 1

PostGIS+Timescale storage, MapLibre+maplibre-gl-draw+vis-timeline frontend, STAC-shaped event JSON on the wire. Skip Kepler unless you want a v0 in a weekend and accept the lock-in.

Slice 2 — Swedish NLP / NER / address / orgnr

1. KBLab BERT NER (National Library of Sweden)

URL: https://huggingface.co/KBLab/bert-base-swedish-cased-ner and https://huggingface.co/KB/bert-base-swedish-cased-ner
License: Models on MIT; check each card (KB ones historically CC0-ish, KBLab MIT).
Status: Active — KBLab is KB's official ML group, the canonical Swedish NER source. Multiple variants: cased, reallysimple-ner (cleaner tag set), lowermix.
Lift: This is the production answer. Tags: PRS (persons), ORG (organisations), LOC (locations), TME (time), EVN (events). Fine-tuned on SUC 3.0 / SUCX 3.0.
Concerns: BERT-base latency on CPU is fine for batch but sluggish per-doc interactive. Tag set is coarse — no MISC, no PRODUCT, no MONEY. Historical/19th-century Swedish text is out-of-domain (model trained on modern news/wiki/gov). You'll need regex on top for orgnr, dates, money.

2. spaCy sv_core_news_lg

URL: https://spacy.io/models/sv / https://huggingface.co/spacy/sv_core_news_lg
License: MIT (model + spaCy)
Status: Active, shipped since spaCy 3.3, current with 3.x. Alpha-era feedback existed but it's a real release now.
Lift: Drop-in spacy.load("sv_core_news_lg"). Pipeline includes tagger, parser, lemmatizer, NER. Easier integration than KB-BERT for a Python service.
Concerns: NER quality is noticeably below KB-BERT on Swedish in published comparisons (one community-trained Swedish spaCy hit F1 ~90, but the official sv_core_news_lg is meaningfully lower). Use it when ergonomics matter more than recall.

3. KB-Whisper / KB OCR work (historical newspaper OCR)

URLs:
- https://huggingface.co/KBLab (KB-Whisper for speech, plus OCR-related models)
- Kubhist 2 corpus (5.5B tokens of 19thC Swedish newspapers, KB)
- https://spraakbanken.gu.se (Språkbanken Text — Gothenburg)
License: Mixed; KBLab tends MIT/CC.
Status: Active. KB has a stated push on transformer-based OCR post-correction for 19th-century news.
Lift: For modern PDFs (Bolagsverket filings, regulator decisions) you don't need this — Tesseract swe traineddata is fine. For scanned 1960s–1980s prospectus material, KB's post-correction pipelines (or just fine-tuned Tesseract on swe + sv frak for Fraktur) is where to look.
Concerns: Few of KB's OCR models are released as ready-to-use checkpoints; a lot is "described in a paper, code on request." Plan for Tesseract + LLM post-correct as the realistic path.

4. python-stdnum (orgnr + personnummer validation)

URL: https://github.com/arthurdejong/python-stdnum / https://pypi.org/project/python-stdnum
License: LGPL-2.1
Status: Active, broad library.
Lift: from stdnum.se import orgnr, personnummer, vat — Luhn validation, format normalization. Covers orgnr XXXXXX-XXXX, personnummer, Swedish VAT.
Concerns: LGPL, not MIT/Apache — fine for use, awkward if you're statically embedding. For an internal tool: non-issue. JS equivalent: organisationsnummer npm package (MIT).

5. organisationsnummer.dev (JS/multi-lang)

URL: https://organisationsnummer.dev / https://github.com/organisationsnummer/js
License: MIT
Status: Active, spec-versioned (v1.1).
Lift: If the frontend needs to validate orgnr at input time (or you want a JS lift to copy).
Concerns: Tiny library, code-lift is two functions. Honest answer: just write 20 lines.

6. libpostal (+ Pelias)

URL: https://github.com/openvenues/libpostal / https://github.com/pelias/pelias
License: MIT (libpostal), MIT (Pelias)
Status: libpostal is in low-maintenance mode (the project's main bottleneck for years); Pelias is community-maintained after Mapzen folded.
Lift: libpostal parses Storgatan 12, 831 30 Östersund into components in 60 languages (Swedish included). Pelias does the geocode if you self-host with Lantmäteriet + OSM imports.
Concerns: libpostal Swedish parsing is OK but not great — trained on OSM, which is sparse in Sweden vs e.g. Germany. For high-quality Swedish address geocoding, Lantmäteriet's own data is the answer (see #8). Pelias self-host is a sysadmin chore.

7. OpenAleph / Aleph NER stack

URL: https://openaleph.org / https://github.com/openaleph/openaleph
License: MIT (Aleph core was AGPL historically — verify on OpenAleph; recent statements indicate MIT-leaning OSS).
Status: Active fork by DARC (post-OCCRP), 2025–2026 cadence.
Lift: Reference architecture for "investigative document platform." Their entity-extraction is spaCy with per-language small models (en_core_web_sm, es_core_news_sm, etc.) — so for Swedish they'd default to sv_core_news_sm if at all. The Aleph docs page on entity extraction is the honest doc; not a Swedish strength.
Concerns: Don't expect Aleph/OpenAleph to give you Swedish NER quality out of the box. The lift is the architecture (FollowTheMoney entity schema, ingest pipeline shape), not the model. Replace their NER step with KB-BERT in your fork.

8. Lantmäteriet open data (the actual Swedish geocoder)

URL: https://opendata.lantmateriet.se / https://www.lantmateriet.se/oppnadata
License: CC0 (since Feb 2025 — this is the big change; it used to be paid).
Status: Active. Official source.
Lift: Belägenhetsadress dataset — every Swedish address with coordinates. Load into PostGIS, query directly. This is the ground truth for öäå-correct Swedish addresses, postnummer, ort.
Concerns: It's data, not a service. You build the geocoder (trie lookup, fuzzy match) — or import into Pelias. For Viken, you may also want Fastighetsregistret (real-property register) which has historical owner transitions — relevant for mining-claims archeology.

Bonus — SUC / Språkbanken

Språkbanken Text (Gothenburg) — https://spraakbanken.gu.se — corpora and tools. SUC 3.0 is the gold-standard Swedish NER training set; if you want to evaluate KB-BERT vs spaCy on your domain, SUC test split is the benchmark.

Recommendation for Slice 2

KB-BERT for NER as the primary extractor, spaCy sv_core_news_lg as a fallback/cheap-path, python-stdnum for orgnr+personnummer, Lantmäteriet CC0 data loaded into PostGIS for address geocoding (skip libpostal/Pelias unless you have a sysadmin week to burn), Tesseract swe for OCR with LLM post-correction for historical material. Architecture-lift from OpenAleph (FollowTheMoney schema, ingest shape) — but replace its NER step.

Skeptical bottom line

The exotic options (Sedona, GeoMesa, MobilityDB) are tempting but inappropriate at journalist-scale. Don't deploy big-data infra for an N=hundreds-of-thousands corpus.
Kepler.gl is the fastest path to a "looks impressive" demo, and the slowest path to a tool you actually own.
KB-BERT NER is genuinely good for modern Swedish — but Bolagsverket-style legal/financial text and 1960s–1980s prospectus prose are both out-of-domain in different ways. Budget for a domain eval before trusting it.
Lantmäteriet going CC0 in Feb 2025 is the single biggest infrastructure win for Swedish geo work — don't waste effort on workarounds that predate it.