Information is selection from a domain

This project has been shortlisted for the DPH Innovation Prize – Best Data Driven Innovation


Team: Wolfgang Orthuber (NumericSearch)

Outline: In this general approach we provide a precise definition of information which can be efficiently realized in practice by definition of a domain. The word “domain” denotes an ordered set of possibilities which is common for sender and receiver of information. Before any information exchange, the sender and receiver of information must know the domain (e.g. common vocabulary). Then information is transported as a selection from the domain (digitally as a sequence of numbers).

Information is defined by the domain (common ordered set of possibilities) and the selection (sequence of numbers). Obviously it is important that computer (information) experts are familiar with the definition of domains and of numbers which select from these. An important domain is e.g. “common vocabulary”, as precondition for communication by language. This set can be ordered e.g. alphabetically or according to frequency of usage. Also application-specific ontological, multidimensional order of meaning is possible.

Further important examples for domains are the value ranges of variables resp. parameters. At this the order of the domain is given and directly represented by the order of the selecting number. Natural order is transported because similar values are represented by similar numbers. These are directly searchable. Users can define variables resp. numbers and so also define the search criteria. Multidimensional similarity search is possible after definition of appropriate multidimensional domains with distance functions. These domains are metric spaces and called “Domain Spaces”. In http://numericsearch.com users can define such domains and multidimensional search can be demonstrated. After collection of enough data, search results could be very helpful for decision support. For example medical findings can be made systematically comparable and searchable by criteria of medical experts. To important diagnoses (e.g. from ICD-10) diagnostic-specific parameters can be defined, e.g. age, weight, all results from labor measurements and also from combined measurements e.g. on radiographic images. All this can be represented as comparable sequence of numbers.

After standardized online definition, information can be transported in the short form UL plus sequence of numbers (where “UL” is an efficient pointer to the online definition of the sequence of numbers). This new data structure is identified, globally uniform, comparable, interoperable, language independent, efficiently searchable and transportable, enables maximal competence for definition of data and avoids redundancy. Online definitions can be nested and reused globally and can also define interfaces of programming algorithms. For example AI algorithms can directly learn on these globally defined data.

But up to now the great technical potential of the new online defined data structure is nearly unknown. Up to now we cannot define domains online and so globally. The self-evident conclusion is:

Let’s define domains online (globally)!

We urgently propose to establish a group of experts from science and industry to create the standard for online definitions which globally define domains together with selecting number sequences.