IIT Madras-Backed Initiative launched to Train One Million Teachers in AI by 2027

An IIT Madras-backed initiative is set to train one million teachers by 2027, integrating artificial intelligence…

Can AI Detect Hate Speech in Local Languages? This Groundbreaking Research Says Yes

A new AI system trained on 29,361 real social media comments achieves a 97% F1-score in detecting hostile content in Gujarati — one of the first large-scale efforts for low-resource Indian languages.

By Dr. Jagruti Boda  |  Research Area: Hostile Post Detection, Gujarati NLP, Transformer Models

The Growing Threat of Harmful Content on Social Media

Social media has become an important space where people share opinions, news, and ideas. Unfortunately, it is also increasingly used to spread harmful content such as hate speech, fake news, offensive language, and defamation. Because millions of posts are shared every day, it is impossible for humans to manually check and control all such content. Therefore, there is a growing need for automated systems that can detect harmful posts.

Why Gujarati? Bridging the NLP Resource Gap for 55 Million Speakers

My research focuses on detecting hostile content in the Gujarati language, which is spoken by more than 55 million people. Despite its wide usage, Gujarati has very limited datasets and research resources compared to English and other major languages.

Introducing GuHPD: A New Gujarati Hostile Post Detection Dataset

To address this gap, I developed a new dataset called GuHPD (Gujarati Hostile Post Detection) containing 29,361 Gujarati social media comments. Each comment was manually labeled as either Hostile or Non-Hostile, and hostile posts were further categorized into Hate, Fake, Offensive, and Defamation.

Smart Data Augmentation: How Transliteration Outperformed Translation

Since labeled Gujarati data is limited, I used data augmentation techniques to expand the dataset. One method translated Gujarati text into English and back again, while another used transliteration (converting Gujarati text into English script and then back to Gujarati). The transliteration approach performed better because it preserved the meaning of the original sentences more effectively.

Science Matters - Why Not to Use AI

From Preprocessing to Prediction: The Machine Learning Pipeline

After cleaning and preprocessing the data — including converting emojis into descriptive text — I trained several machine learning models to detect hostile content. While traditional models showed reasonable performance, I improved the results using ensemble methods and advanced transformer-based AI models such as Multilingual BERT, IndicBERT, and GujaratiBERT.

GujaratiBERT Achieves 97% F1-Score in Coarse and Fine-Grained Classification

Among these, GujaratiBERT achieved the best performance, reaching an F1-score of 0.97 for binary classification and 0.90 for detailed multi-label classification.

Making AI Explainable: How LIME Reveals Why a Post Is Flagged as Harmful

To make the system more transparent and trustworthy, I also used an Explainable AI technique called LIME, which highlights the specific words in a post that influence the model’s decision. This helps researchers and policymakers understand why the system identifies a post as harmful.

A Blueprint for Safer Online Spaces and Low-Resource Indian Language AI

Overall, this research provides one of the first large-scale resources and AI approaches for detecting hostile content in Gujarati social media. The work can help support safer online environments, digital literacy efforts, and responsible AI systems, and it can also be extended to other Indian languages with limited resources.

Read the Full Research here:

https://www.ije.ir/article_187454.html

481A5613

Dr. Jagruti Boda Sarkhedi is an Assistant Professor in the Department of Computer and IT Engineering at UPL University of Sustainable Technology, Gujarat. She completed her Ph.D. from Gujarat Technological University (GTU) in the area of Artificial Intelligence and Natural Language Processing. Her research focuses on hostile content detection in Indian languages, particularly Gujarati social media analysis using machine learning and deep learning techniques. She has published research papers in Scopus and Web of Science indexed journals and has received the Best Research Paper Award for her work in AI-based social media analysis.

AI Powered Autonomous Robot Developed by Indian Students Secures Patent

Surat, India: In a notable milestone for innovation emerging from Indian campuses, undergraduate students from Sarvajanik…

World Conference of Science Journalists 2025 concludes with urgent calls for equitable science reporting amid global crises

The 13th World Conference of Science Journalists (WCSJ 2025), hosted at the CSIR International Convention Centre…

From Stadiums to Screens: How Data & Tech Are Reinventing Sports

By Divyajot Ahluwalia, Founder – Director, wTVision Solutions  A full stadium, the sharp whistle of the…

Science Matters at UNU Macau AI Conference 2025: Strengthening Truth in the Digital Age

The digital future hinges not just on technological advancement, but on trust. That was the urgent…

NASA International Space Apps Challenge

The NASA Space Apps Challenge is a global hackathon organized by NASA that invites people from…

Coca‑Cola Joins MIT AI Impact Consortium to Tackle Real-World Problems With Tech

By Nivash Jeevanandam The Coca-Cola Company has joined the MIT Generative AI Impact Consortium as a…

Science Matters Strengthens Digital Readiness of Defence Officers with AI Workshop in Gurugram

Gurugram: In a forward-looking initiative to bridge technology and national security, Science Matters hosted an advanced…

Ark+: This new Open-Source AI Is Making Chest X-Rays Smarter, Faster, and More Accurate

              Can AI help doctors save lives? A team at…