Wikipedia points to the term Canonicalization.
A process for converting data that has more than one possible representation into a "standard" canonical representation. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.
The Unicode example made the most sense to me:
Variable-length encodings in the Unicode standard, in particular UTF-8, have more than one possible encoding for most common characters. This makes string validation more complicated, since every possible encoding of each string character must be considered. A software implementation which does not consider all character encodings runs the risk of accepting strings considered invalid in the application design, which could cause bugs or allow attacks. The solution is to allow a single encoding for each character. Canonicalization is then the process of translating every string character to its single allowed encoding. An alternative is for software to determine whether a string is canonicalized, and then reject it if it is not. In this case, in a client/server context, the canonicalization would be the responsibility of the client.
In summary, a standard form of representation for data. From this form you can then convert to any representation you may need.
"Canonical" is an informal term often used in mathematics. Sometimes it means you and your neighbour would come up with the same map. Sometimes it means it doesn't use any choice. Sometimes it means it does use some choice but it is independent of such choice.
As noted in this paper Imperative Functional Programming, (1993) by Peyton-Jones and Wadler (among the group of researchers that created Haskell):
We focus on purely-functional solutions, which rule out side effects, for two reasons. Firstly, the absence of side effects permits unrestricted use of equational reasoning and program transformation...
the focus is on absence of side-effects to enable program transformations (i.e compiler optimisations).
What are side-effects? This paper in turn points to Integrating functional and imperative programming (1986) by Gifford and Lucassen, which mentions four types of effect classes: Pure, Function, Observer and Procedure. So, the term "pure function" derives from this paper.
- a PURE is referentially transparent: it does not cause side-effects, its value is not affected by side-effects, and it returns the same value each time it is evaluated.
Note, however, that Peyton-Jones and Wadler mentioned shortcomings in this approach. Worth noticing, they say, is the programming language Clean that uses linear types to introduce side-effects in a safe manner (i.e. safe for the compiler). Basically, it threads the World as a variable in all I/O related functions, including the main entry point.
With that, it is possible to have a pure functional language interacting with the world and having side-effects (I/O, the OS, the Windowing system, etc), contradicting partially your wikipedia definition. One can in fact say that Haskell has Clean as one of its influencers; although it departs from linear types and uses another type-level construct (monads) to guarantee linearity, i.e. a single-reference at all times.