wiki:software:beuthbot:webscraper

Inhaltsverzeichnis

Scraper Documentation

Scraper Documentation

Web Scraper to request data that is not provided through an API

Table of content

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

You will need a current version of node & npm.

Installing

After cloning the repository, install the dependencies. You can then run the project.

# install dependencies
npm install

# serve at localhost:8000
npm start

Overview

The bot is basically a Node-Express-Backend. Incoming requests are checked and specifically handled.

Structure

The bot is separated into two files. index.js contains the fundamental logic. At the moment there is only one ressource but we expect to expand this with more ressources. A ressource is represented by a route. If the user request the lists of study rooms at our university, the script notices the request at the specified route and prepares a JSON response.

The second script, scrape.js, takes care of the actual web scraping. The given URL is requested with axios and then parsed with cheerio.

Functionalities

Study Rooms

When the resource is requested we scrape theASTA Website and try to return a list of available rooms, that are provided for students, from our university.

Further Development

Add a new route for every ressource in index.js and prepare functions in scrape.js to scrape the requested data from given Websites.

Built With

- Node.js
- Express.js
- Axios
- Cheerio

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

- Tobias Klatt - Initial work - GitHub

See also the list of contributors who participated in this project.