[feed] pefprints@pef.uni-lj.si | [feed] Atom [feed] RSS 1.0 [feed] RSS 2.0 |
slovenščina
Logo            
  Logo Login | Create Account
 
 

Unsupervised learning for automatic text simplification

Sabina Gorenc (2022) Unsupervised learning for automatic text simplification. MSc thesis.

[img]
Preview
PDF
Download (1271Kb)

    Abstract

    In order to increase the accessibility and variety of easy reading in Slovenian, which contains stylistic and language adaptations, we created a prototype of a system that automatically simplifies texts. This is the first system for automatically converting Slovenian sentences and texts into a simpler form. We have prepared a dataset for the Slovenian language that contains aligned simple and complex sentences, which can be used for further development of models for simplifying texts in Slovenian. We used the slovene T5 model, which is pretrained on other tasks. Namely, the model uses machine learning with knowledge transfer using deep neural networks with an encoder-decoder architecture. To find good values of hyperparameters and evaluate the performance of the system, we used automatic measures ROUGE and BERTScore, which are high and indicate a good performance of the system. The system generates single-clause or simple multi-clause sentences and does not use adverbs or special symbols. From the syntactic simplicity point of view, the system is successful, but we assessed its success in more detail with the help of human evaluation using a questionnaire that could be used to check the comprehensibility and meaningfulness of automatically generated sentences in further studies. With the questionnaire, we found that the model was not successful in generating comprehensible paragraphs. Most reviewers found them to be almost or completely unintelligible. We also investigated the comprehensibility criteria for automatically generated texts and found that the important comprehensibility criteria are conciseness, linguistic correctness, lexical simplicity, syntactic simplicity, coherence and summary relevance. Our system performed the best in syntactic simplicity and lexical simplicity, and the worst in summary relevance, coherence and conciseness. The system is partly useful as an aid to simplifiers, and could potentially be used in combination with summarization to provide simpler vocabulary and simple syntactic structure.

    Item Type: Thesis (MSc thesis)
    Keywords: natural language processing, text simplification in Slovene, easy reading, deep neural networks, sequence-to-sequence model, T5 model, text comprehensibility, comprehensibility criteria
    Number of Pages: 52
    Language of Content: Slovenian
    Mentor / Comentors:
    Mentor / ComentorsIDFunction
    prof. dr. Marko Robnik ŠikonjaMentor
    prof. dr. Marko StabejComentor
    Link to COBISS: https://plus.si.cobiss.net/opac7/bib/peflj/121697027
    Institution: University of Ljubljana
    Department: Faculty of Education
    Item ID: 7371
    Date Deposited: 16 Sep 2022 14:07
    Last Modified: 16 Sep 2022 14:07
    URI: http://pefprints.pef.uni-lj.si/id/eprint/7371

    Actions (login required)

    View Item