The project's linguistic corpus derives its material from Arabic language sources included in the bibliography of ​ the first time period. The corpus contains a large quantity of texts that reflect the state of the Arabic language in its first seven centuries, in its varied environments and cultural, scientific and civilizational centers, and in its lexical and phrasal semantics evolution. The project was launched according to the following phased plan:

 

1.       Collection of documents and texts found on websites belonging to the chosen historical period.

2.       Comparison of corpus sources to select the best versions available in electronic format.

3.       Determining the best title on the basis of printed and edited editions.

4.       Drawing up and compiling a list of documents not available in digital format.

5.       Digitization of documents not available in digital format.

6.       Entering the contents of the documents listed in the bibliography

7.       Unifying the textual form of all the documents.

8.       Matching the titles of the corpus documents with the titles in the bibliography.

9.       Designing a database for the corpus.

10.   Developing a search interface for the corpus data base.

 

The corpus was subjected to a number of both automatic and human reviews in order to minimize spelling, printing, and other errors. The members of the Doha Historical Dictionary's academic board took part in this. The Executive Committee is trying to make it possible to be able to search the corpus, once the technical and software requirements are in place.

​