Summary: This is a thought-experiment about how to unify and upgrade the flow of observations from astronomical observatories into an accessible stream of observations, events, and collections of various sizes. v.4
Status: Thought experiment for discussion, glossary of terms, and cross-field glossary of related terms in the world of astro event brokers.
An observatory network might have a few million observations a night. Most telescopes capture much more data than they preserve; they make their best effort to save what might be useful to their own future work or to others who could use something they’ve seen in the context of other observations.
As a result, astronomical ‘event brokers’ like ANTARES listen to the observation streams of a network and filter for potential events that different audiences [such as other observatories] might be interested in, so as to co-associate further observations or annotations.
What does the network need to do to turn a subset of observations into structured collections?
What does a broker need to do to maintain its own registry (possibly large, largely unused, speculative datasets — costs of storage covered by its brokerage service) for original + filtered collections?
Telescopes are real-world data sources with
hardware type
configuration
calibration functions tied to the {type, config}
calibration data updated at least nightly, tuning those functions
weather data, if terrestrial, informing this
location
An observation from a scope has
target location
angular magnification
field angle (field of view)
time
duration (time over which light is integrated)
this allows things to be somewhat fuzzy: you could compile many observations of different durations from a raw video- or image-stream.
implicit noise: mechanical, electrical, astronomical, cosmic rays.
An event is the appearance of something new in the sky. It has
time of first observation (a lower bound)
duration (if transient)
classification
weight / likelihood of being a real event, observed from most locations that happened to observe the same target (concept not standardized)
An event stream from a set of sources is a feed of timestamped events that cross some threshhold of likelihood to be interesting.
In addition to communication protocols for managing a shared distributed feed, aligning events across different scopes requires cross-calibration.
Where the scopes use the same config, just a comparison of parameters.
Where setups are different, a mapping function that projects both into comparable space+time coordinates.
An event broker is a service that subscribes to event streams from many observatories, lets researchers and observatories register interest in certain kinds of events, and submits requests to participating observatories to follow up on a possible event by renalyzing their own observations from that location, or making new ones.
For instance, traditional scopes may subscribe to the event feed of LIGO to look at points in the sky where there was a possible gravitational event. Or a broker with access to many feeds could track where each scope is looking and give it feedback, if anyone else recorded an observation in that region, to reanalyze its raw data before discarding it for possible additional observations of the same event.
Past examples:
VOEventNet (2007),
SkyAlert (2009),
CRTS (2009),
automated classification of transients (2011)
Current examples:
the Large Synoptic Survey Telescope (’15) : a raw feed of 1M potential transient events a night; a simple ‘event broker’ to help flag events of interest to common users; and access to the full stream (for a fee) granted to external brokers who flag different subsets for different audiences.
the ANTARES (+)and GROWTH projects — both developing machine-learning tools for automating filtering + classifying events, allowing for automated telescopes to choose where to look based on the results
“Maximizing Science in the LSST Era” (pp.138-40, 145-9) has many detailed examples, including data sharing tools, community brokers w/ local data filters, cross-matching of alerts, and the need for common protocols of communication b/t brokers and other data services (e.g., SIMBAD and NED)
Let’s focus on a specific hypothetical example: a network of scopes called Macduff.
Inputs. Say we have 6 telescopes in our network, Ta - Tg,
run by Anubis, Balor, Charon, Dante, Eurydice, and Frank.
Ta and Tb are the same hardware in different locations.
Tc is different hardware with roughly the same settings.
Td is a scope capturing a continuous imagestream, converted later into chunked observations.
Te is an orbital scope.
Tf is a virtual-aperture scope (combined data from multiple scopes on different sides of the planet)
…
Some of the outputs desired from this network: raw data, existence, interest, alignment, combination, and replication.
Raw data - The observational data synthesized into inferred events. Much of this is not stored for long, but feedback from others may lead to reanalysis of observations that would otherwise be ignored. References to observations and their bulk properties are important.
Cleaned data - Cleaning tends to happen on-detector, in-scope, and in the detection software. Then a normalized, binned, error-corrected output — taking weather, mechanical, and other conditions into consideration — might be the official datastream on the detecting machine.
Compressed data - For high-volume feeds, the raw data may be compressed into a longer-lasting file, destined for preservation for some initial period of time as a reference for other work. This can include discarding streaming data without saving to permanent storage media, or overwriting the media used to store a day’s data while compressing the most interesting bits into an archival file that is transferred to a data center.
Summarization
Existence / event list - What events have been observed in this region / within this time-range? With what level of confidence that this was a real event and not noise?
Ambient condition - Metadata about the region and context, and [slower] changes over the course of observation
Alignment with a network of related observers
Interest - What demand is there for events of different shapes? What observatories have registered monitoring scripts with event brokers to help strengthen coverage of certain events?
Sky survey -
(Existence) Did anyone else observe something close to this {time/place/ classification}?
(Significance) Did anyone observe something slightly sub-threshhold?
(Brokering) How have brokers prioritized events for others to see/replicate?
Combination - An aggregate feed from many sources. Combined events: events where the combined observations from multiple sources/spectra pass a combined threshold. A combined feed of only combined-event data.
Inference + Synthesis - Given data from the network, what inferred or synthetic datasets have been made? This includes virtual-aperature datasets from pre-hoc closely-synchronized observations and more loosely connected post-hoc approaches to cleaning data from many scopes.
Real-time replication - A prioritized feed of recent events that merit further observation, often time-sensitive [transient events]. A streamlined feed of high-priority events and their followups, perhaps segmented or filtered by the types of scopes that could respond [which spectra are most interesting to observe, what features are needed].
This is where current concepts of astronomical event brokers often focus: there is opportunity cost to wasted scope time and a desire to align multiple scopes + spectra on interesting parts of the sky during significant events.
(It takes time to slew a large scope’s focus across the sky to a new target — if you have an IR scope and have just finished an observation and want to know where to look next, you may want to know if any other scopes near your current field are looking for IR confirmation of a potential event.)