Characteristic samples

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Characteristic samples is a concept in the field of grammatical inference, related to passive learning. In passive learning, an inference algorithm I is given a set of pairs of strings and labels S, and returns a representation R that is consistent with S. Characteristic samples consider the scenario when the goal is not only finding a representation consistent with S, but finding a representation that recognizes a specific target language.

A characteristic sample of language L is a set of pairs of the form (s,l(s)) where:

  1. l(s)=1 if and only if sL
  2. l(s)=1 if and only if sL

Given the characteristic sample S, I's output on it is a representation R, e.g. an automaton, that recognizes L.

Formal Definition

[edit | edit source]

The Learning Paradigm associated with Characteristic Samples

[edit | edit source]

There are three entities in the learning paradigm connected to characteristic samples, the adversary, the teacher and the inference algorithm.

Given a class of languages and a class of representations for the languages , the paradigm goes as follows:

  • The adversary A selects a language L and reports it to the teacher
  • The teacher T then computes a set of strings and label them correctly according to L, trying to make sure that the inference algorithm will compute L
  • The adversary can add correctly labeled words to the set in order to confuse the inference algorithm
  • The inference algorithm I gets the sample and computes a representation R consistent with the sample.

The goal is that when the inference algorithm receives a characteristic sample for a language L, or a sample that subsumes a characteristic sample for L, it will return a representation that recognizes exactly the language L.

Sample

[edit | edit source]

Sample S is a set of pairs of the form (s,l(s)) such that l(s){1,1}

Sample consistent with a language

[edit | edit source]

We say that a sample S is consistent with language L if for every pair (s,l(s)) in S:

  1. l(s)=1 if and only if sL
  2. l(s)=1 if and only if sL

Characteristic sample

[edit | edit source]

Given an inference algorithm I and a language L, a sample S that is consistent with L is called a characteristic sample of L for I if:

  • I's output on S is a representation R that recognizes L.
  • For every sample D that is consistent with L and also fulfils SD, I's output on D is a representation R that recognizes L.

A Class of languages is said to have charistaristic samples if every L has a characteristic sample.

[edit | edit source]

Theorem

[edit | edit source]

If equivalence is undecidable for a class over Σ of cardinality bigger than 1, then doesn't have characteristic samples.[1]

Proof

[edit | edit source]

Given a class of representations such that equivalence is undecidable, for every polynomial p(x) and every n, there exist two representations r1 and r2 of sizes bounded by n, that recognize different languages but are inseparable by any string of size bounded by p(n). Assuming this is not the case, we can decide if r1 and r2 are equivalent by simulating their run on all strings of size smaller than p(n), contradicting the assumption that equivalence is undecidable.

Theorem

[edit | edit source]

If S1 is a characteristic sample for L1 and is also consistent with L2, then every characteristic sample of L2, is inconsistent with L1.[1]

Proof

[edit | edit source]

Given a class that has characteristic samples, let R1 and R2 be representations that recognize L1 and L2 respectively. Under the assumption that there is a characteristic sample for L1, S1 that is also consistent with L2, we'll assume falsely that there exist a characteristic sample for L2, S2 that is consistent with L1. By the definition of characteristic sample, the inference algorithm I must return a representation which recognizes the language if given a sample that subsumes the characteristic sample itself. But for the sample S1S2, the answer of the inferring algorithm needs to recognize both L1 and L2, in contradiction.

Theorem

[edit | edit source]

If a class is polynomially learnable by example based queries, it is learnable with characteristic samples.[2]

Polynomialy characterizable classes

[edit | edit source]

Regular languages

[edit | edit source]

The proof that DFA's are learnable using characteristic samples, relies on the fact that every regular language has a finite number of equivalence classes with respect to the right congruence relation, L (where xLy for x,yΣ* if and only if zΣ*:xzLyzL). Note that if x, y are not congruent with respect to L, there exists a string z such that xzL but yzL or vice versa, this string is called a separating suffix.[3]

Constructing a characteristic sample

[edit | edit source]

The construction of a characteristic sample for a language L by the teacher goes as follows. Firstly, by running a depth first search on a deterministic automaton A recognizing L, starting from its initial state, we get a suffix closed set of words, W, ordered in shortlex order. From the fact above, we know that for every two states in the automaton, there exists a separating suffix that separates between every two strings that the run of A on them ends in the respective states. We refer to the set of separating suffixes as S. The labeled set (sample) of words the teacher gives the adversary is {(w,l(w))|wWSWΣS} where l(w) is the correct label of w (whether it is in L or not). We may assume that ϵS.

Constructing a deterministic automata

[edit | edit source]

Given the sample from the adversary W, the construction of the automaton by the inference algorithm I starts with defining P=prefix(W) and S=suffix(W), which are the set of prefixes and suffixes of W respectively. Now the algorithm constructs a matrix M where the elements of P function as the rows, ordered by the shortlex order, and the elements of S function as the columns, ordered by the shortlex order. Next, the cells in the matrix are filled in the following manner for prefix pi and suffix sj:

  1. If pisjWMij=l(pisj)
  2. else, Mij=0

Now, we say row i and t are distinguishable if there exists an index j such that Mij=1×Mtj. The next stage of the inference algorithm is to construct the set Q of distinguishable rows in M, by initializing Q with ϵ and iterating from the first row of M downwards and doing the following for row ri:

  1. If ri is distinguishable from all elements in Q, add it to Q
  2. else, pass on it to the next row

From the way the teacher constructed the sample it passed to the adversary, we know that for every sQ and every σΣ, the row sσ exists in M, and from the construction of Q, there exists a row sQ such that s and sσ are indistinguishable. The output automaton will be defined as follows:

  • The set of states is Q.
  • The initial state is the state corresponding to row ϵQ.
  • The accepting states is the set {sQ| l(s)=1}.
  • The transitions function will be defined δ(s,σ)=s, where s is the element in Q that is indistinguishable from sσ.

Other polynomially characterizable classes

[edit | edit source]
  • Class of languages recognizable by multiplicity automatons[4]
  • Class of languages recognizable by tree automata[5]
  • Class of languages recognizable by multiplicity tree automata[6]
  • Class of languages recognizable by Fully-Ordered Lattice Automata[7]
  • Class of languages recognizable by Visibly One-Counter Automata[8]
  • Class of fully informative omega regular languages[9][10]

Non polynomially characterizable classes

[edit | edit source]

There are some classes that do not have polynomially sized characteristic samples. For example, from the first theorem in the Related theorems segment, it has been shown that the following classes of languages do not have polynomial sized characteristic samples:

  • 𝔽𝔾 - The class of context-free grammars Languages over Σ of cardinality larger than 1[1]
  • 𝕃𝕀𝔾 - The class of linear grammar languages over Σ of cardinality larger than 1[1]
  • 𝕊𝔻𝔾 - The class of simple deterministic grammars Languages[1]
  • 𝔽𝔸 - The class of nondeterministic finite automata Languages[1]

Relations to other learning paradigms

[edit | edit source]

Classes of representations that has characteristic samples relates to the following learning paradigms:

Class of semi-poly teachable languages

[edit | edit source]

A representation class is semi-poly T/L teachable if there exist 3 polynomials p,q,r, a teacher T and an inference algorithm I, such that for any adversary A the following holds:[2]

  • A Selects a representation R of size n from
  • T computes a sample that is consistent with the language that R recognize, of size bounded by p(n) and the strings in the sample bounded by length q(n)
  • A adds correctly labeled strings to the sample computed by T, making the new sample of size m
  • I then computes a representation equivalent to R in time bounded by r(m)

The class of languages that there exists a polynomial algorithm that given a sample, returns a representation consistent with the sample is called consistency easy.

Polynomially characterizable languages

[edit | edit source]

Given a representation class , and a set of identification algorithms for , is polynomially characterizable for if any R has a characteristic sample of size polynomial of R's size, S, that for every I, I's output on S is R.

Releations between the paradigms

[edit | edit source]

Theorem

[edit | edit source]

A consistency-easy class has characteristic samples if and only if it is semi-poly T/L teachable.[1]

Proof
[edit | edit source]

Assuming has characteristic samples, then for every representation R, its characteristic sample S holds the conditions for the sample computaed by the teacher, and the output of I on every sample S such that SS is equivalent to R from the definition of characteristic sample.

Assuming that is semi-poly T/L teachable, then for every representation R, the computed sample by the teacher S is a characteristic sample for R.

Theorem

[edit | edit source]

If has characteristic sample, then is polynomially characterizable.[1]

Proof
[edit | edit source]

Assuming falsely that is not polynomially characterizable, there are two non equivalent representations R1,R2, with characteristic samples S1 and S2 respectively. From the definition of characteristic samples, any inference algorithm I need to infer from the sample S1S2 a representation compatible with R1 and R2, in contradiction.

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ a b c d e f g h Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  8. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  9. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  10. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).