Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 50995 (31)
- University: Sharif University of Technology
- Department: Languages and Linguistics Center
- Advisor(s): Bahrani, Mohammad
- Abstract:
- The most common form of communication on the internet and social network websites is text messages. normally communication on social media or even on the web is by posting some sort of text. usually, these messages or posts are short and text used in them may not follow any language standards, this makes it very difficult to process them. Different age groups use a certain language differently and this is shown in the way, each of them writes texts. The advancements made in the field of natural language processing and computational linguistics makes it possible to predict, text authors age groups by analyzing the way they write. This study focuses on ways to automatically recognize the age of Facebook users. A corpus consisting 120 thousand words was gathered. By extracting the right linguistic features and using machine learning algorithms like Support Vector Machine,K Nearest Neighbors and Artificial Neural Networks which are considered the strongest types of classification allows the prediction of author’s age in social media texts. The best accuracy of 70.5% was received when all features were used, except for Unigram language model. This accuracy is achieved by using Artificial Neural Networks. When using unigram language model the best accuracy in Artificial Neural Network was achieved by using neurons per hidden layer of 100,50 which was 66.4%
- Keywords:
- Social Media ; Natural Language Processing ; Data Mining ; Age Groups ; Author Identification ; Automatic Author Age Identification
- محتواي کتاب
- view