">
Common nouns | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | 123..
Proper nouns A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z| HOME

Language


The transition from speech to written text is not always straightforward. Here, Thomas Aquinatis dictating according to a drawing published in Punch, 1955 c.

Work in progress.

Last revised 9/4/2014. Return to Major concepts. See literature, programming.

1. The central role of language
2. Metrics on languages and texts
3.Terminology and ontology
4. Extension and comprehension
5. Paradigm, theory, system
6. Core words and generation/factors
7. Arts and languages

1. The central role of language

Language is the second digital level that constitutes mankind, over the first deeply hidden in our DNA. It is so natural for us to use it that we forget generally:
- the important efforts we have done during our youth to acquire it, from the basic vocabulary to the orthographic reading, the rules or grammar, the customary courtesies, and for some of us the professional quality of a writer or speaker;
- the fact that it is fundamentally digital, and even binary, as was shown by Ferdinand de Saussure[Saussure] at the end of the 19th century, who sees in it a "system oppositions", in words of today, bits.

Along the centuries, language developed more and more formalized forms in order to afford logical operations to work correctly. Then it took mathematical forms to afford calculation (see for instance La révolution symbolique. La constitution de l’écriture symbolique mathématique. by Michel Serfati, Editions Petra, Paris, 2005).

language_emergence.jpg
Language emergence, a tentative scheme

Very soon it was perceived that language bore some magic. From the Bible (Firsts chapters of Genesis and St John) to Harry potter without forgetting Ali Baba.

At last came programming, opening the wide horizons of the information and Internet age. And, as we show in our paper form and matter, there is here really something magic, since from a finite object, a program, we cannot generate another finite object, a (pixelized) image for instance, usine a finite object (the computer), but using mathematical functions which continuity cannot be defined without calling for infinity (topologically defined).

The linguistic model is strongly pregnant in art analysis and critic. Saussure is the main reference. But it really took off with the structuralism and the generative grammars of the 1970's.

They are powerful toolsets to create beings, abstract or representative or other beings. Articulated language was created by mankind, but prepared by a progression in communication systems between animals. Languages are constitutive of computers.

A major, constitutive, feature of languages is the fact that they are shared between populations of beings, thus allowing shorter and more efficient communications. This implies conventions, established by trial and error learning (animal and primitive human language), possibly completed explicit standardization efforts (modern languages). Sharing, externalization of the code, works better in digital forms, due to the very small level of errors on copy.

Two robots creating an ad hoc language to describe what they see. Image from Kaplan (Sony CSL).

Languages can be created automatically by artificial beings ([Kaplan 2001]).

Languages have major disadvantages:
- they are conventional, and not directly intuitive for living beings, then must be learned, or be constitutive of the systems (if one may talk of intra neuronal language), or demand on the using machine a specific translation tool (interpreter, compiler);
- differ from one community to another, then demand translation, which generally entail losses (even from computer to computer).

But they have an exclusive ability: the reference to signified beings, and still more crucial, the self reference and hence the recursion, which brings finite inside of finite, and nest complexity into what looks linear at first.

However, languages are never totally digital. They always encapsulate some residual analogies, fuzzy parts, even letting aside secondary aspects of speech (alliteration, prosody) and typography (font, page setting) .

The first presentation of Starwars robots used sophisticated linguistic research. But, as far as we know, this track has not been explored further in the series.

Formal languages BNF (Backus normal forms)
Generative languages. Balpe. Generative grammars.

Is Midi a language?
- Gau , 4 pages about relations between digital cinema and language.

A programming language does not contains assertions. Declarative languages are for AI.

Text and image may be rather firmly distinguished each other. An image is a copy of some part of the world. It's a raster. It can be interpreted only by copy or by elaborated recognition functions. On the opposite, text is coded. When a processor gets an image as input, it can only copy it and transmit it bit by bit, unless it makes its analysis mediating an elaborate process. For text, parsing affords operations. But a text/operations mapping works perfectly only with formal languages, and languages "understood" by that processor.

Between text and image, there is the streaming time. Series of images, of symbols.

Basically 3D, but may refer to 3D. or be "abstract" images, without reference to a Euclidean space, let alone physical

Why and how far are text structures richer than images ?
A written text is a kind of image, with a particular structure.

More or less formal languages
- natural,
- grammar/dictionary,
- prose/verse,
- specialized languages (law, military, poetic, love... mystical).

quantify the formalization.
lenght of grammar dictionaries, alphabets, axioms

langage.jpg
Various tracks for language formalization.

2. Metrics on languages and texts

Thesis: A representation may be much larger than the being represented. The scale 1 is not at all a limit. That is true in the digital universe in general but also for representation or the physical world. (For instance micro-photography, medical images, images or atoms...).Example : "the bit at address 301523"

Number of letters in the alphabet. Of phonemes.
Number of words in the dictionary. Catach
Lenght of the grammar (...)

The question of optima, and evaluation of the different languages (and writing)

For programming languages, one may count :
- the number of different instructions. Cisc and risc. Relations with the word length of the target machine.
- the number of data types. Typing power.
- the number of accepted characters (was interesting for the ancient codes, which did not accept accepts; that has changed with Unicode)
- the size of the necessary interpreter or complier
- the size of machines able to compile and execute them.

The RNIR (French Social Security number)  examples shows a limit case. There exists an optimum (or a semi-optimum perhaps) in coding economy. In other cases, we can devise more delicate structures, and probably more interesting. One idea would be, at least in a research perspective, to aim at a methodical cleaning of remaining analogisms in our languages.

Function points (see Wikipedia) and more generally metrics on programming.

See Zipf law. Wikipedia

3.Terminology and ontology

The word. Some thousands in one language.
How many bits for a name? Directly, 10 characters * 5 bits, around 50 bits for one word. Which is much more than the number of words...

Structures can be found in the dictionary. Thesaurus, valuations (number of uses, from basic to very rare). Central language plus dedicated languages (medicine, law, divines, computers...).

A definition is a text assigned to a word (sometimes, a small sequence of words, or expression). In formal languages, a word can be replaced by its definition without changing the meaning of the phrase. In other languages, that does not operate correctly. In natural languages, dictionaries are done for human users and are more a sort of intuitive orientation, including contexts and possibly images (bitmaps).

Petit Larousse : Enonciation des caractères essentiels, des qualités propres à un être ou à une chose. Signification du mot qui la désigne.
en TV : nombre de lignes subdivisant l'image à transmettre
- caractère discriminatoire, parfois idéologique,
- caractères statistique
- expression d'une stratégie.

Webster (College edition):
1. a defining or being defined
2. a statement a what a thing is
3. a statement or explanation of what a word or phrase means or has meant
4. a putting or being in clear, sharp outline
5. the power of a lens to show (an object) in clear, sharp outline
6. the degree of distinctness of a photograph, etc.
in radio & television, the degree of accuracy with which sounds or images are reproduced;

A nomenclature is a sort of (semi-) formal dictionary. With "record" structure. There is a typology of P. Berthier (in L'informatique dans la gestion de production. Editor and date unknown. )
- on a technical basis, letters and figures alternating, to give for example the part dimension
- base on plages/ranges : letters and figures pre-selected to part families and sub-families
- sequential, incremental (numeric or alphanumeric)

4. Extension and comprehension

Extension and comprehension are defined for terms, which we shall consider here as synonym of concepts. Extension gives the list of occurrences of the concept, or some criterion to find the occurrences. Comprehension groups all the features specific to this concept.

In our "there is that" model, extension may be seen as a list of addresses (instantiations), and comprehension as the type or class. See 9

If we know how measure for each concept:
- the comprehension (length of the associated text in the dictionary, for instance ; we could limit ourselves, more formally, to the length of the definition itself, letting outside the "encyclop edic" part of the dictionary, like, in Law, the law proper and the jurisprudence)
- the extension, i.e. the number of occurrences of the world in the digital universe (or any world, e.g. 6 human being billions, one hundred of Gothic cathedrals),
then we have a ratio giving the yield of the concept.

For a given data base, and a given global capacity in bits, the base may comprise a large number of element with little bits (large extension, small comprehension) or a few words with large records.

Extension and comprehension can be taken as a kind of metrics. Extension would be a negative pressure (see below). Google transforms it into a positive pressure and temperature.

5. Paradigm, theory, system

Paradigms and theories are high level assemblies. A long text, or a series of coordinated words.
Question: how a theory may be used by a being? Expert systems.

A paradigm may be defined as a disciplinary matrix, literally framing research and giving to the scientists gathered under this paradigm "typical problems and solutions", define the study space; draw passageways in the set of possible research". ( La mesure du social. Informations sociales 1-2/79. CNAF, Paris, 1979).

Also: story, narration, saga, assertion, command.

Assertion + modality
deduction/inference
formal languages

“there is that”

Method, algorithm, description.

6. Core words and generation/factors

When the core of the dictionary is heavy, the other words are deduced of it with short definitions, and reversely (step by step).  ??? In which language.

If we admit that the words in the core are not defined (reserved words in programming languages? ), and the other ones as totally defined using the core words, or still if defined from each other in an intuitive way, of constant length, then we have:

Circles in the core.

If we admit that concatenation is a sort of definition, we can have one only word in the dictionary of the + operator. If we have n words, numbered from 1 to n.

definitions.jpg
Relation between the size of the core and the size of definitions. The larger the core, the shortest can be the definitions.

There is redundancy, etc. Ex. from a catalogue of works, make the most diverse, or the most representative selection.

Topological problems of a dictionary. Since a definition in a dictionary cannot be given but with other words in the dictionary, there are necessarily loops.
Graph theory about that.
Unless some words be given as “axiomatic” , and not to be defined.
Indirect (loop) recursion.

Track to recursion. The best wt: maximal difference. But if there is repetition, other choices based of relations between beings (WFE well formed expressions, grammar). Prime numbers

We must analyze a given being:
- rules of what it understands (language, WFE)
- concrete environment : what pleases at a given time, for instance the puzzle piece
- comprehension, the "that" and the details inside E = {x ; x>1}
- extension : list of locations where "that" is instantiated E ={a,b,c...}

If we have all the beings (filling at n.2n), the comprehension is total, and extensions does not add anything. If they are well ordered, the being is identical to its number. The whole space is of no interest.

On the other hand, if the beings have little common parts (in some way, there are much less beings than 2n, comprehension will be but a small part of each being, and it remains free, disposable parts.

Inside a being, several types may combine.
Reduction of a being: "division" by all its types, the rest bears its individual features, local and irreducible.
This local part can come from
- locality stricto sensu (geographical location, proximity with a high temperature S, software "localization")
- other abstract factors.

ex. male gender, more than 45 old, born in Afghanistan , bread maker, plays flute, Labour, married, 3 children

If the type is long very long abstract description (comprehension)
A few types may be sufficient, with a short code to call them, but a type may call a long sequence (strong comprehension). In this view, that has no meaning for the system itself, it has no memory of its own past, of its traces, but it can be meaningful for other parts of the digital universe;

The bits excess of a concrete S relative to the necessary minimum for the n.2n beings may be done
- either canonically, with a bit zone giving a number, possibly split into several zones
- or, on the contrary, a distribution of the factors over the whole length of the being.

If we suppose that the system has been built by a succession of generating factors, it will, in general, impossible to do the division to get them back (Bayesian analysis, public key coding).

We consider normally that the "natural beings bear an indefinite quantity of bits. Among them, we can select some bits for identification (extension) and description (comprehension). The extension (in locality) is implicitly given by the position of the being in the universe.

For a given beings population, find all the factors, or a sufficient generative set of factors.
Let us take a frequent case : a lot more of concrete bits, for instance 500 images of 1 Mo on a CD of 500 Mo. Then, the pure differentiation takes only 9 bits. All the remaining is "local".
Then we can analyZe these 500 images, with equivalence classes giving generators.

Thesis. To have all, it suffices that the product of the cardinals of classes be superior to the cardinal of the whole. Again, 9 classes of 2 elements are sufficient.

Thesis. More generally, we shall certainly have the fact that we maximize the analyses effect (root) with generators of same length and equiprobable.
Once given the factor family, we can compute the volume of all the beings if all of them were generated. Normally, it will be much larger than the actual concrete, or the available space.
Seen so, concrete is much smaller than abstract: imagination goes beyond reality.

Are “natural languages” so natural? Progressive emergence. The host community.

7. Arts and languages

See literature, programming.
- Laban language for choreography.
- Sideswap language for juggling (mathematic notation).

DICCAN'S PARTNERS:

 

PAS

Paris ACM Siggraph, the French chapter of ACM Siggraph, worldwide non-profit organization of computer graphics.

 

 

 

Galerie Charlot An important supporter of digital art.