Data Scientists from India and Japan Assemble in Haryana to Tackle Twitter Spam
The Indian finance and technology hub of Gurgaon recently played host to around 100 students of data science, accredited academicians and seasoned industry captains at the inaugural World Data Science Forum. Held at the ITC Grand Bharat, the event assembled students from the Institute of Technology (IIT) Delhi and the University of Tokyo to deliberate the prospects of the data science industry and how the technology can be applied to solve real-world problems. If you like data science and want to start your career in this field, visit studydatascience.org for more details.
With the theme of ‘The Power Data Science for the Future’, the forum welcomed academicians from India and Japan – as well as veterans within the IT, automotive, telecommunications and healthcare industries from both countries – to discuss how innovations within the internet of things (IoT), artificial intelligence (AI) and the blockchain will foster disruption for individuals, businesses and institutions in the years to come.
While the forum allowed the academicians and industry captains to share their expert insights with a new generation of data scientists, the students were given the opportunity to engage in a Student Assignment, whereby they attempted to work out one of the most pressing problems faced by data analysts today: Twitter spam.
With the pervasiveness of spam Tweets generated on Twitter feeds every day, the students were tasked to apply their data analytics knowledge to distinguish automated tweets from thousands of real ones; particularly from a set of 10,000 anonymous Twitter users ordered by 50 parameters.
Nimish Joseph (1st Place, IIT Delhi), described the challenge as being interesting and complex. “I had to identify spammers from a set of 10,000 anonymous Twitter users. Although the questionnaire gave us parameters such as their followers, friends, number of tweets posted, topics discussed and emotional diversity, such information did not give us any information on whether the profile was real or not. Since the data was unlabelled, I decided to proceed with an unsupervised learning approach. This involved a better anomaly detection mechanism where I had to clean up the dataset provided to ensure that tweets with special characters and null characters are treated properly.”
Meanwhile, Purva Grover (2nd Place, IIT Delhi) described the forum as a profound and enjoyable experience. “I am extremely grateful to have been given this opportunity to be a part of theWorld Data Science Forum”, said Grover. In explaining her approach to the Student Assignment Review, “the data set which was given to us to identify the spammers was not tagged or classified and that is the reason why I chose to use a Semi-Supervised Learning algorithm called the Local Outlier Factor; whereby the learnings of all data must be classified into spammers and non-spammers. Through this algorithm, I identified four attributes – like words used often by these user IDs – and classify users with any of those attributes as a spammer.”
Prof. Arpan Kar – one of the forum’s speakers who also judged the Student Assignment contest – applauded the competition for being insightful and testing the analytical capabilities of the participants. “We received extremely interesting propositions from all the students to the problem presented to them. Although the same data analytics problem can be solved through 50 different approaches, the underpinning idea is not to create the most ‘suitable’approach; but a workable approach using a mix of different methods. I truly appreciate all the effort put in by these students in solving this case”, said Prof. Arpan Kar.
Following the success of the World Data Science Forum, organiser bitgrit, Inc. will seek to use it to better realise its vision to build an Asia-centred data scientist network platform that utilises blockchain technology to allow for the optimal application of data science and AI within societal contexts.
“Data analytics is becoming more integral to our daily lives the demand of data scientists to dissect it has never been higher”, said Kazuya Saginawa, the co-founder and CEO of bitgrit, Inc. “By continuing to expand our network and matching experts with stakeholders and future innovators through initiatives such as the World Data Science Forum, we aim to build a data scientists network that is firmly rooted in Asia yet has the capacity to solve tomorrow’s pressing societal challenges”.