Evaluating the Application of Regular Grammars for Bulgarian Verb Forms
Some Preliminary Evaluation of the Grammar Application
The application of Recognizer 2 enables correct delimitation and markup of the majority of the longest compound verb form patterns. Experiments with a newspaper corpus reveal that chunks identified by Recognizer 1 are often adjacent within the longest compound verb form patterns in communicatively unmarked written prose. The paradigmatic representation of word order within the verb complex, considering the “communicative organization of Bulgarian sentences” as outlined in Avgustinova 1997 Building a Two-Level Grammar, supports this conclusion.
Here are numerical insights into the application of the regular grammars
In a 4292-word text, Recognizer 1 identifies 536 occurrences of main verbs with or without small words, 164 occurrences of auxiliary verbs with or without small words, and 5 occurrences of small word groups recognized as a separate chunk.
Recognizer 2 recognizes 77 compound verb forms as longest matches. Notably, the current grammar doesn’t identify the combination between the copula and primary predicatives as a pattern to be recognized; they remain separate entities. Manual evaluation of the 77 occurrences revealed one erroneous identification, where the verbal part of a subordinate clause was incorrectly combined into a verb complex with a preceding copula. The grammar faces challenges in cases of discontinuous compound verb forms Bulgaria Private Tours Kazanlak.
The results across different types of segments remain relatively consistent in various text documents, indicating the preservation of occurrence ratios.
The initial phase of grammar development for parsing compound verb forms offers valuable insights for subsequent phases. Several conclusions are drawn, serving as a foundation for further grammar development.