IIT Madras-Backed Initiative launched to Train One Million Teachers in AI by 2027

An IIT Madras-backed initiative is set to train one million teachers by 2027, integrating artificial intelligence…

The $30 Trillion Blank Slate: Inside the Scientific Blueprint to Build a Net-Zero India from Scratch

Imagine a global superpower where 90% of the cities, power grids, and transit systems required for…

Can AI Detect Hate Speech in Local Languages? This Groundbreaking Research Says Yes

A new AI system trained on 29,361 real social media comments achieves a 97% F1-score in detecting hostile content in Gujarati — one of the first large-scale efforts for low-resource Indian languages.

By Dr. Jagruti Boda  |  Research Area: Hostile Post Detection, Gujarati NLP, Transformer Models

The Growing Threat of Harmful Content on Social Media

Social media has become an important space where people share opinions, news, and ideas. Unfortunately, it is also increasingly used to spread harmful content such as hate speech, fake news, offensive language, and defamation. Because millions of posts are shared every day, it is impossible for humans to manually check and control all such content. Therefore, there is a growing need for automated systems that can detect harmful posts.

Why Gujarati? Bridging the NLP Resource Gap for 55 Million Speakers

My research focuses on detecting hostile content in the Gujarati language, which is spoken by more than 55 million people. Despite its wide usage, Gujarati has very limited datasets and research resources compared to English and other major languages.

Introducing GuHPD: A New Gujarati Hostile Post Detection Dataset

To address this gap, I developed a new dataset called GuHPD (Gujarati Hostile Post Detection) containing 29,361 Gujarati social media comments. Each comment was manually labeled as either Hostile or Non-Hostile, and hostile posts were further categorized into Hate, Fake, Offensive, and Defamation.

Smart Data Augmentation: How Transliteration Outperformed Translation

Since labeled Gujarati data is limited, I used data augmentation techniques to expand the dataset. One method translated Gujarati text into English and back again, while another used transliteration (converting Gujarati text into English script and then back to Gujarati). The transliteration approach performed better because it preserved the meaning of the original sentences more effectively.

Science Matters - Why Not to Use AI

From Preprocessing to Prediction: The Machine Learning Pipeline

After cleaning and preprocessing the data — including converting emojis into descriptive text — I trained several machine learning models to detect hostile content. While traditional models showed reasonable performance, I improved the results using ensemble methods and advanced transformer-based AI models such as Multilingual BERT, IndicBERT, and GujaratiBERT.

GujaratiBERT Achieves 97% F1-Score in Coarse and Fine-Grained Classification

Among these, GujaratiBERT achieved the best performance, reaching an F1-score of 0.97 for binary classification and 0.90 for detailed multi-label classification.

Making AI Explainable: How LIME Reveals Why a Post Is Flagged as Harmful

To make the system more transparent and trustworthy, I also used an Explainable AI technique called LIME, which highlights the specific words in a post that influence the model’s decision. This helps researchers and policymakers understand why the system identifies a post as harmful.

A Blueprint for Safer Online Spaces and Low-Resource Indian Language AI

Overall, this research provides one of the first large-scale resources and AI approaches for detecting hostile content in Gujarati social media. The work can help support safer online environments, digital literacy efforts, and responsible AI systems, and it can also be extended to other Indian languages with limited resources.

Read the Full Research here:

https://www.ije.ir/article_187454.html

481A5613

Dr. Jagruti Boda Sarkhedi is an Assistant Professor in the Department of Computer and IT Engineering at UPL University of Sustainable Technology, Gujarat. She completed her Ph.D. from Gujarat Technological University (GTU) in the area of Artificial Intelligence and Natural Language Processing. Her research focuses on hostile content detection in Indian languages, particularly Gujarati social media analysis using machine learning and deep learning techniques. She has published research papers in Scopus and Web of Science indexed journals and has received the Best Research Paper Award for her work in AI-based social media analysis.

Nallah Maar – A requiem for Srinagar’s buried stream

By Shakoor Rather A wooden boat glides through placid waters against a setting sun as fishermen…

‘Silent Infernos’: Deadly heatwaves imperil lives of women in India and Pakistan

By Shakoor Rather and Izhar Ullah When heatwaves overpower the cities, women are the ones most…

India Could Save 124,000 Lives Annually with 10% Emission Cut, Study Finds

Cutting carbon emissions in India by just 10 per cent would save 124,000 lives each year.…

AI Powered Autonomous Robot Developed by Indian Students Secures Patent

Surat, India: In a notable milestone for innovation emerging from Indian campuses, undergraduate students from Sarvajanik…

Sudden Stratospheric Warming Behind North India’s Deep Freeze, Scientists Find

North India’s unusually harsh January 2024 cold wave—six straight days of piercing chill from the 21st…

World Conference of Science Journalists 2025 concludes with urgent calls for equitable science reporting amid global crises

The 13th World Conference of Science Journalists (WCSJ 2025), hosted at the CSIR International Convention Centre…

From Stadiums to Screens: How Data & Tech Are Reinventing Sports

By Divyajot Ahluwalia, Founder – Director, wTVision Solutions  A full stadium, the sharp whistle of the…