Text Mining and Unsupervised Learning Techniques for Tweet Classification in the Peruvian Social Context
DOI:
https://doi.org/10.70504/ijepe.v1i1.10923Keywords:
Text mining, unsupervised learning, tweet classification, social upheaval, Peru, quitAbstract
Background: Currently, there has been an exponential growth in the volume of unstructured data, especially with the use of social networks. Technological progress has allowed the adoption of processes, techniques, and methods to obtain information from these data.
Objective: This work aims to analyze and classify Tweets in the context of Peru's social upheaval, using text mining (TM) and unsupervised learning (UL) techniques.
Methods: More than 268k tweets were collected and processed, with the hashtag of the trends occurred in the first two weeks of February 2023: #ParoNacional, #RenunciaYa and #Renuncia. Within a radius of 1000km from the city of Lima. Data cleaning and feature selection techniques were used. Then, UL techniques, such as clustering and sentiment analysis, were applied to classify the Tweets generated in social networks using the Loss Distribution Approach (LDA) model.
Results: The result of the analysis of Tweets related to the social upheaval in Peru shows a polarization in the opinions of users, with one group supporting the protests and another criticizing them. Recurring themes have been identified as the resignation of the president, Lima centralism, congressional shutdown, corruption, and social inequality. The sentiment analysis shows a mix of positive and negative emotions, with words grouping negative sentiment being the most recurrent. Sentiment was classified into three polarity categories: positive, negative, and neutral, corresponding to 37%, 58% and 5%, respectively. There is also criticism of the government for its lack of action and police violence during demonstrations.
Conclusions: Finally, it is concluded that UL techniques have been effective in classifying Tweets according to their polarity.