Russian Sign Language Corpus

Notation conventions of metadata in the file name

Options (watching videos and annotations, data sorting, search)

Texts

The corpus comprises spontaneous speech (monologues and dialogues), texts written on the basis of stimulus materials (cartoons retelling, picture-based storytelling), and, partly, linguistic questionnaire-based materials.

Part of the texts are studio recordings, however, not all of the signers had an opportunity (or desire) to be recorded in the studio, so the corpus also includes texts recorded in the classrooms or at signers’ homes. Even though it has affected the quality of the video recordings, in such cases we were able to obtain more natural and diverse narratives.

While recording some texts, in order to minimize the influence of spoken Russian on the signer’s speech, the addressee was also a RSL signer. However, we did not seek to obtain “ideal” texts that would represent only “pure” sign language (without code-mixing and code-switching) produced solely by “ideal” RSL signers. RSL exists in close contact with spoken Russian. Thus, depending on the topic being discussed and the conditions of communication, the same RSL signer may not use Sign-Supported Russian at all or, on the contrary, use it quite a lot. Moreover, language competences vary among different signers. They depend greatly on the degree of deafness, age and conditions of sign language acquisition, level of spoken Russian competence, the amount of use of RSL and spoken Russian in everyday communication, different attitudes toward sign language, etc. Therefore, both the texts in “pure” sign language produced by “qualified” signers (whose sign language is acquired from deaf parents with a large RSL vocabulary) and texts which include a considerable amount of Sign-Supported Russian are manifestations of real RSL functioning at the current stage of its existence. In addition, texts that include Sign-Supported Russian represent important material for studying language contact between RSL and spoken Russian, mechanisms of code-switching, code-mixing and borrowings in RSL.

Signers

The RSL corpus comprises texts by male and female RSL signers ranging from 18 to 63 years old with varying degrees of deafness: deaf, hard-of-hearing and CODA (Child of a Deaf Adult – hearing children of deaf parents, whose first native language is RSL).

Most of the signers currently reside in Novosibirsk; in the past, some of them were long-term residents of other regions of Siberia: Tomsk, Kemerovo, Sverdlovsk regions, Altai region, the Republic of Altai, Krasnoyarsk region, the Republic of Sakha (Yakutia), the Republic of Buryatia, the Republic of Khakassia and the north-eastern part of Kazakhstan. The rest of the signers reside in Moscow.

Each RSL signer participated in this project gave consent to process his/her personal data, including the use of the video recordings.

Notation conventions of metadata in the file name

The file name generally takes the following form: language and place of text recording – type of text – signer’s code – addressee’s code – type of markup

Language and place of text recording

RSL – Russian Sign Language

N – Novosibirsk

М – Moscow

Type of text

a – linguistic questionnaire-based materials

b – picture based storytelling

d – conversation

j – interview

n – spontaneous monologue

m – cartoon retelling

In cases when several texts of the same type were recorded from a given signer, they are marked by numbers, for example: n1, b4, m3

Signer’s code

the letter s with the corresponding number

Addressee’s code

d – RSL signer

h – hearing person.

Type of markup:

std – standard markup

dem – simplified markup

For example, a spontaneous narrative with standard markup, obtained from the signer s2, narrated in the presence of a deaf, will be referred to as RSLN-n-s2-d-std.

Options (watching videos and annotations, data sorting, search)

viewing annotated videotext
change video playback speed
repeat playback of a selected video fragment
stretching lines with annotations
viewing signers’ data (gender, age, degree of deafness, regions of inhabitancy, age and conditions of sign language acquisition)
sorting texts by metadata (type of text, place of recording, topic, year and month of recording, signer)
search tokens on selected layers in a separate file or in a group of files sorted by metadata