Big Data - Social Media analysis

Ressources for the class on Big Data and Social Media analysis.

Overall introduction

The concept of Big Data, challenges and examples of applications. A specific overview of the analysis of social networks at a large scale is also provided.

Social network Analysis

The basics of social network analysis from graph theory to global metrics and measures of influence and centralities. Application of analysis of a specific community is provided.

Practical work 1 : There are a multitude of systems that can be modeled as graphs. Among most famous examples: the WEB, social networks, neural networks, cellular networks. In this first lab, we will study in detail several real graphs that you will choose from the list of graphs available for download on the sites: https://snap.stanford.edu/data/ and https://github.com/gephi / Gephi / wiki / Datasets. We will apply the set of basic measures of network science in order to draw the first observations and the first conclusions of the functioning of these.

The graph dataset can be downloaded there and the instructions there .

Practical work 2 : On centrality

MongoDB

This class introduces the NoSQL MongoDB database principles. It provides a comprehensive vision of the overall system and examples of the query language.

Practical work 2 : NoSQL is a technology at the heart of Big Data. This distributed data storage paradigm in non-relational format allows unprecedented performance on very large amounts of data. In this lab, we guide you to the installation of MongoDB and its use via the example of a 2 Twitter data collections.

The tweets dataset (geolocalised) for practising on Mongodb is available here for download.

The users dataset containing profile informations is available here.

Map/Reduce

The instructions and scripts can be downloaded there.

Introduction to data mining

This class provides introduction of the data mining processes. It gives an overview of the most famous machine learning approaches.

Practical work : Weka is a data mining software integrating all the main techniques and steps required for data mining. In this TP, we will test the use of Weka on different examples of data samples. We will apply the classification algorithms seen in class on these samples. The laboratory will end with the automatic detection of abnormal profiles on Twitter.

The scripts for harvesting Twitter data and filtering fields can be downloaded here.

Data visualisation

This class introduces the concepts of data visualisation. It gives some important clues about the good practises and theoretical recommandations for data visualisation.

Practical work : Create a data visualisation with tableau software.