The Web Science course focuses on the study of large-scale socio-technical systems associated with the World Wide Web. It considers the relationship between people and technology, the ways that society and technology complement one another and the way they impact on broader society. These analyses are inherently associated with Big Data management issues.
The course is organised in four parts.
1. Syntax
In the first part, the course introduces the basis of content analysis. If focuses on the syntactic aspects, covering the fundamentals of natural language processing and text mining. It describes the structure and typical characteristics of the different web sources, spanning search results, social media contents, social network structures, Web APIs, and so on. It also provides an overview of the basic Web analysis techniques applied in Web search and Web recommendation.
2. Semantics
In the second part, the course presents semantic technologies. These technologies are very important nowadays because they allow to treat the "variety" dimension of Big Data, i.e., they enable integration of multiple and diverse sources of information, which is typical on the modern Web platform. Covered topics include:
- RDF - a flexible data model to represent heterogeneous data
- OWL - a flexible ontological language to model heterogeneous data sources
- SPARQL - a query language for RDF.
It shows how to put all the pieces together in order to achieve interoperability among heterogeneous information sources
3. Time
The third part covers the realm of temporal-dependent data. The topics covered here allow to treat the "velocity" dimension of Big Data. It shows the importance for many Big Data analysis scenarios to process data stream, coming for instance from Internet of Things (IoT) and Social Media sources; and it describes how to apply semantic and syntactic techniques in the context of time-dependent information. For instance, it shows how to extend RDF to model RDF streams, how to extend SPARQL to continuously process RDF streams and how to reason on those RDF Streams
4. Applications
In the fourth part, the course focuses on specific application scenarios and presents the typical settings and problems where the presented techniques can be applied. This part discusses settings such as: big data analysis for smart cities; data analytics for brand monitoring (marketing) and event monitoring; data analysis for trend detection and user engagement; and so on.
Exercise and Laboratory Classes
Exercise and laboratory classes describe how to use all those ingredients together in practice, and how to fuse and analyse data coming from multiple sensor networks (e.g. IoT), social network APIs, and information crawled from the Web and from mobile applications (e.g., through social login and log analysis).
References
[1] http://ec.europa.eu/isa/
[2] http://www.opengeospatial.org/
[3] https://developers.google.com/kml/?hl=en
[4] http://www.w3.org/RDF/
[5] http://www.w3.org/TR/owl-overview/
[6] http://www.w3.org/TR/sparql11-overview/
[7] https://en.wikipedia.org/wiki/Ontology-based_data_integration
[8] http://www.espertech.com/esper/release-5.2.0/esper-reference/html/index.html
[9] http://www.opengeospatial.org/standards/sos
[10] http://www.w3.org/2005/Incubator/ssn/
[11] http://sioc-project.org/
[12] http://streamreasoning.org/
[13] http://jol.telecomitalia.com/jolskil/tag/city-sensing/
|