FETA Corpus: French Eye-TrAcking Corpus

This page is regularly updated as the documentation evolves 😊

The FETA corpus is available, and its documentation is being progressively completed and refined. Additional details, examples, and metadata descriptions will be added over time. If you are interested in the dataset or specific parts of it, please feel free to contact me by email. Depending on the type of data you need, some resources may already be available and can potentially be shared upon request.

FETA is a French eye-tracking corpus designed to study reading behaviour across text types, text complexity, reader expertise, and text simplification.

The corpus contains word-level eye-tracking data collected from native French readers while they read texts from different domains and versions, including original and manually simplified texts.


The dataset includes:

The data were collected using an eye-tracking experiment in which participants read texts presented screen by screen. Eye movements were recorded and exported at the word/AOI level.


Corpus Overview

Category Description
Language French
Data type Eye-tracking reading data
Annotation level Word-level AOI
Text domains General, medical, clinical
Text versions Original, simplified
Participant groups General readers, speech and language therapy students
File format TSV
Eye tracker Tobii Pro Spectrum
Sampling rate 600 Hz

Text Types

The corpus contains three main text domains:

Text type Description
general General-domain texts
medical Medical-domain texts
clinical Clinical case texts

Each original text has a corresponding manually simplified version:

Version Description
original Original version of the text
simplified Manually simplified version of the text

The simplified versions were manually produced following plain-language and text simplification principles.

Participant Groups

The corpus includes two participant groups:

Group label Description
General Participants without medical or speech-language therapy training
SpeechStudents Speech and language therapy students

Participants were native French readers with normal or corrected-to-normal vision.


Dataset Structure

The dataset is organized by experimental set, subset, and participant group.

Folder names indicate the experimental set (SET_1 or SET_2), the subset (A or B), and the participant group (general or orthophoniste).

Example structure:

FETA/
β”œβ”€β”€ SET_1_A_general/
β”œβ”€β”€ SET_1_A_orthophoniste/
β”œβ”€β”€ SET_1_B_general/
β”œβ”€β”€ SET_1_B_orthophoniste/
β”œβ”€β”€ SET_2_A_general/
β”œβ”€β”€ SET_2_A_orthophoniste/
β”œβ”€β”€ SET_2_B_general/
└── SET_2_B_orthophoniste/

Download

The FETA corpus is available as a compressed archive.

The archive contains the cleaned TSV files.

Citation

If you use the FETA corpus, please cite the following papers.

For more information about the participant groups and the comparison between general readers and speech and language therapy students, please cite:

@inproceedings{ivchenko-etal-2026-comparing,
  title = {Comparing Reading Behavior across Reader Expertise and Text Complexity: Insights from the French Eye-Tracking Corpus (FETA)},
  author = {Ivchenko, Oksana and Grabar, Natalia},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
  month = {May},
  year = {2026},
  pages = {6144--6154},
  address = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor = {Piperidis, Stelios and Bel, NΓΊria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
  doi = {10.63317/2xr5zj2u6h7p}
}

For more information about the corpus construction and experimental design, please cite:

@inproceedings{ivchenko-grabar-2025-french,
    title = "A {F}rench Eye-Tracking Corpus of Original and Simplified Medical, Clinical, and General Texts - {FETA}",
    author = "Ivchenko, Oksana  and
      Grabar, Natalia",
    editor = "Acarturk, Cengiz  and
      Nasir, Jamal  and
      Can, Burcu  and
      Coltekin, Cagr{\i}",
    booktitle = "Proceedings of the First International Workshop on Gaze Data and Natural Language Processing",
    month = sep,
    year = "2025",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd., Shoumen, BULGARIA",
    url = "https://aclanthology.org/2025.gaze4nlp-1.5/",
    pages = "37--43"
}

Contact

For questions or access requests, please contact: oksana.ivchenko.etu@univ-lille.fr