Detection of False Agricultural Climate Information Based on Sentiment Analysis and Random Forest: Methods and Enlightenment
By Tailai Wang – MSc, Capacity Development & Extension, University of Guelph
(The blog was written by the author based on his Major Research Paper)
Since global agricultural systems are now facing the severity of climate change, unverified misinformation about climate change on social media is complicating agricultural decision-making. For example, inaccurate predictions of rainfall or severe weather may lead farmers to make incorrect assumptions about planting timing, which will immediately result in losses in crop yield (Daume et al., 2023). To address this, we conducted a study that investigated the potential of using the VADER sentiment analysis tool and a random forest model to effectively detect false information in agricultural climate-related text on Twitter.
We built a composite detection model based on sentiment, linguistic, and metadata features. It starts with the extraction of tweet sentiment polarity measures (compound sentiment scores and sentiment shift frequency) by utilizing a widened VADER lexicon. The text is, in turn, represented as a 5000-dimensional array using TF-IDF, and the number of times an exclamation mark/question mark has been used, and the frequency of occurrence of agricultural terminology is also counted. Metadata features include user influence (logarithmic transformation of the number of followers), user influence (seasonal weight), and time of publication. Once all the features have been concatenated into a 1510-dimensional vector, a random forest algorithm is used to train the model, which is then optimized via grid search and SMOTE over-sampling.
The experiment used 16465 agricultural climate tweets, and the model achieved outstanding performance in accuracy (0.912) and recall (0.931). Study disclosed that misinformation was highly emotionally polarized with a median composite sentiment score of-0.4 and 0.2 in the cases of authentic material. It is also important to note that the average number of exclamation marks in false tweets is 3, and the correlation coefficient between frequency and labelling as false was 0.86, the highest at which the element was used as a discriminative indicator. Misinformation was negatively associated with user influence (r = -0.79), implying that it was more likely to be propagated by low-influence accounts. The rate of emotional oscillations (r=0.67) also indicated that readers have much in common, as misinformation can easily influence a person by abruptly switching mood.
The real-time monitoring tool developed as part of the current study is being implemented and can quickly identify high-risk content (e.g., tweets with 4 or more exclamation marks and an emotion score of 0.7 or less) during rumour waves (e.g., extreme weather causing total crop failure). Nonetheless, the existing model is restricted in three aspects: first, the scope of its data includes only Twitter posts in English, which is unlikely to cover multi-lingual agricultural areas. Second, its contribution to the seasons is low, meaning the distribution of false information is not limited to the farming schedule. Third, the fixed emotion dictionary is inadequate for keeping up with changing topics, and an updated real-time information system must be created using a TF-IDF-based update mechanism.
In the next phase, the research may focus on developing multilingual models, integrating image-based sentiment analysis, and designing time-series features aligned with crop growth cycles.
Reference
Daume, S., Galaz, V., & Bjersér, P. (2023). Automated framing of climate change? The role of social bots in the twitter climate change discourse during the 2019/2020 Australia bushfires. Social Media+ Society, 9(2).
Choudhary, A., & Arora, A. (2021). Linguistic feature based learning model for fake news detection and classification. Expert Systems with Applications, 169, 114171.
Al-Rawi, A., et al. (2021). Twitter’s fake news discourses around climate change and global warming. Frontiers in Communication, 6, 729818.