SciELO journals
Browse
1/1
17 files

The Corpus of Portuguese from Academic Journals

Download all (1.4 MB)
dataset
posted on 2021-03-24, 08:31 authored by Tanara Zingano Kuhn, José Pedro Ferreira

ABSTRACT The present study aims to describe the challenges faced and solutions found in the compilation of the Corpus de Português Escrito em Periódicos - CoPEP, which contains approximately 40 million words, is balanced between the Brazilian Portuguese and European Portuguese varieties in number of words and covers six large areas of knowledge. Firstly, we will present the context of the creation of CoPEP, namely, the make of an on-line dictionary of Portuguese for university students, to which CoPEP served as the primary source for linguistic evidence extraction. Thus, it was the characteristics of this lexicographic project that informed the design criteria for CoPEP and the consequent decision-making process. Next, we will describe the methodology of data acquisition, with a special focus on the challenges that were faced, and the solutions found. We will conclude with the description of the final compilation phase, which involved procedures for obtaining balance.

History

Usage metrics

    DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC