Enhancing Grammar for Bulgarian


Verb Forms A Flexible Approach

Performance of the Grammar

The grammar performs well in recognizing complex tense forms with auxiliaries and full-content verbs. However, to build a comprehensive grammar for compound verb forms, it is crucial to learn from rare syntagmatic patterns, enriching the paradigmatic knowledge.

Handling Discontinuity

Discontinuous compound verb forms with adverbial and nominal inserts pose a challenge for the current grammar. To address this, rules identifying shorter segments within the verb complex (auxiliary and main verb chunks) can be applied. This approach, moving from syntagmatic realization to paradigmatic knowledge, is consistent with methodologies in other languages. Exploring discontinuity in relation to main verb forms, such as passive participles separated by adverbials, is essential for refinement Evaluating the Application.

Leveraging Treebank Construction

In the treebank-building process, a core set of sentences from Bulgarian grammar books, manually assigned syntactic structure, becomes a valuable resource. It aids in classifying possible inserts in compound verb forms, including those with or without small words, adjacent or discontinuous. This classification enhances the grammar with rules for recognizing diverse compound verb forms.

Future Development

The described grammar development processes within an integrated XML framework offer flexibility for augmenting datasets with linguistic information tailored to specific applications or research. Further testing and detailed analysis of the frequency of different construct types are necessary. Exploration should be coordinated with other team-developed grammars for various linguistic entities. Identifying idiosyncratic cases where the grammar falls short allows for refinement. Significant future development involves seamless integration between shallow parsing and advanced linguistic analysis, streamlining the treebank construction process.


