Columbia University

Technology Ventures

Database and Annotation Tool for Computational Modeling of Arabic Nominal Gender, Number and Rationality Morphosyntax

Technology #cu14137

Questions about this technology? Ask a Technology Manager

Download Printable PDF

License this Technology
Image Gallery
Nizar Habash
Managed By
Richard Nguyen

This technology is a linguistic database of Arabic functional gender, functional number, and rationality. These are important features for modeling Arabic morphosyntactic agreement. In addition, this technology includes a tool for annotating the Linguistic Data Consortium (LDC) Arabic treebanks with the morphosyntatic information mentioned above. Arabic has complex agreement patterns and irregular morphology; and current Arabic LDC treebanks represent nominal gender and number by shallow (non-functional) forms and do not include nominal rationality. The database and annotation tool can improve computational modeling of Arabic for natural language processing and linguistics research applications.

The annotation tool requires that researchers obtain Arabic corpora from the LDC.

Lead Inventor:

Nizar Habash, Ph.D., Sarah Alkuhlani


  • Annotate Arabic corpora with correct morpho-syntactic agreement computationally.
  • Build computational models of Arabic morphology and syntax.
  • Engineer Arabic language processing systems.
  • Study of Arabic linguistic phenomena.
  • Translate Arabic language with correct nominal gender, number and rationality agreement.


  • Annotates LDC treebanks with missing information regarding nominal gender, number, and rationality agreement.
  • Improves computational modeling of Arabic morphosyntax for natural language processing applications.

Tech Ventures Reference: IR CU14137

Related Publications: