Software Engineering and Architecture Group (SEARCH) > CS > JBI > FWN > RUG

Extensible Crawler for DevOps Repositories as a Service (XC4DORaaS)

DevOps is an emerging paradigm in the software development community that aims to bridge the gap between developers and operations personnel by enabling tight collaboration between them. A key element to this effort is the end to end automation of development and operation activities by means of a wide variety of tools, artifacts, services, and frameworks. The open source communities created around such technologies are in many cases publicly sharing reusable software artifacts to package, deploy, and operate software components. This creates a situation in which DevOps knowledge is available in a large scale, but is not systematically captured and/or managed [Wettinger et al. 2015]. As such, it is especially important to provide solutions for holistic DevOps knowledge management in order to increase efficiency and capacity for collaboration. [Wettinger et al. 2015] and [Wettinger et al. 2016] discuss a methodology towards this goal which relies on retrieving DevOps technology-specific knowledge and recording it as a knowledge base (KB). A combination of processed manually by experts sources with automated processed ones, through specialized repository crawlers, is used for the development of the KB. An (incomplete but already large) example of such a KB is available online .

The goal of this project is to adopt and adapt the methodology discussed above in order to offer the capability of crawling through a DevOps repository, provided as input, and returning a meta-data rich index of its contents, as a RESTful service (see e.g. [Fielding and Taylor 2002]). A set of appropriate resources, and their interactions through HTTP verbs and hyperlinks will be developed for this purpose. The developed approach must be extensible, allowing for an indefinite number of repository types to be supported by means of a repository-specific plug-in mechanism. At least two repository types are to be supported in order to demonstrate its extensibility, but more types are desirable for the final implementation. Special emphasis will be given to issues of performance efficiency and especially scaling of the service through e.g. caching of results and multi-threaded crawling.

The basic research question of the project is: How can knowledge extraction from DevOps repositories be automated and offered as a service to other applications and tools in an extensible and scalable manner?

The described work is divided into two parts: in the first part, the focus is on the design of the KB schema and the associated resources that allow to expose the envisioned functionality as a service. The second part is concerned with the implementation of the crawling service and its evaluation using two example repositories. Any modern programming language can be used for the latter, but there is a clear mandate for the use of existing frameworks for RESTful service development (e.g. Restlet, Spring, Jersey, Django, etc.) and lightweight Web technologies (e.g. JSON).

If you are interested in this project or have any questions, please do not hesitate to contact: Vasilios Andrikopoulos (

Fielding, Roy T., and Richard N. Taylor. "Principled design of the modern Web architecture." ACM Transactions on Internet Technology (TOIT) 2, no. 2 (2002): 115-150.

Wettinger, Johannes, Vasilios Andrikopoulos, and Frank Leymann. "Automated capturing and systematic usage of devops knowledge for cloud applications." In Cloud Engineering (IC2E), 2015 IEEE International Conference on, pp. 60-65. IEEE, 2015.

Wettinger, Johannes, Uwe Breitenbücher, Michael Falkenthal, and Frank Leymann. "Collaborative gathering and continuous delivery of DevOps solutions through repositories." Computer Science-Research and Development (2016): 1-10.