Detecting cyberstalking from social media platform(s) using data mining analytics

Mirto, Aimee (2022) Detecting cyberstalking from social media platform(s) using data mining analytics. Doctoral thesis, University of West London.

[thumbnail of Mirto - PhD Thesis Final (June 22).pdf]
Mirto - PhD Thesis Final (June 22).pdf - Accepted Version

Download (4MB) | Preview


Cybercrime is an increasing activity that leads to cyberstalking whilst making the use of data mining algorithms to detect or prevent cyberstalking from social media platforms imperative for this study. The aim of this study was to determine the prevalence of cyberstalking on the social media platforms using Twitter. To achieve the objective, machine learning models that perform data mining alongside the security metrics were used to detect cyberstalking from social media platforms.

The derived security metrics were used to flag up any suspicious cyberstalking content. Two datasets of detailed tweets were analysed using NVivo and R Programming. The dominant occurrence of cyberstalking was assessed with the induction of fifteen unigrams identified from the preliminary dataset such as “abuse”, “annoying”, “creep or creepy”, “fear”, “follow or followers”, “gender”, “harassment”, “messaging”, “relationships p/p”, “scared”, “stalker”, “technology”, “unwanted”, “victim”, and “violent”. Ordinal regression was used to analyse the use of the fifteen unigrams which were categorised according to degree or relationship/link towards cyberstalking on the platform Twitter.

Moreover, two lightweight machine learning algorithms were used for the model performance showcasing cyberstalking indicative content. K Nearest Neighbour and K Means Clustering were both coded in R computer language for the extraction, refined, analysation and visualisation process for this research. Results showed the emotional terms like “bad”, “sad” and “hate” were attached to the unigrams being linked to cyberstalking. Each emotional term was flagged up in correspondence with one of the fifteen unigrams in tweets that correlate cyberstalking indicative content, proving one must accompany the other.

K Means Clustering results showed the two terms “bad” and “sad” were shown within 100 percent of the clustering results and the term “hate” was only seen within 60 percent of the results. Results also revealed that the accuracy of the KNN algorithm was up to 40% in predicting key terms-based cyberstalking content in a real Twitter dataset consisting of 1m data points.

This study emphasises the continuous relationship between the fifteen unigrams, emotional terms, and tweets within numerous datasets portrayed in this research, and reveals a general picture that cyberstalking indicative content in fact happens on Twitter at a vast rate with the corresponding links or relationships within the detection of cyberstalking.

Item Type: Thesis (Doctoral)
Subjects: Computing > Information security > Cyber security
Depositing User: Aimee Mirto
Date Deposited: 16 Jun 2022 13:44
Last Modified: 16 Jun 2022 13:46


Downloads per month over past year

Actions (login required)

View Item View Item