===== Scraper Documentation ===== Web Scraper to request data that is not provided through an API ==== Table of content ==== - [[wiki:software:beuthbot:webscraper|Scraper Doku]] - [[wiki:software:beuthbot:webscraper#table_of_content|Table Of Content]] - [[wiki:software:beuthbot:webscraper#getting_started|Getting Started]] -[[wiki:software:beuthbot:webscraper#prerequisites|Prerequisites]] -[[wiki:software:beuthbot:webscraper#installing |Installing ]] - [[wiki:software:beuthbot:webscraper#overview|Overview]] - [[wiki:software:beuthbot:webscraper#structure|Structure]] - [[wiki:software:beuthbot:webscraper#functionalities|Functionalities]] -[[wiki:software:beuthbot:webscraper#study_rooms|Study Rooms]] - [[wiki:software:beuthbot:webscraper#further_development|Further Development]] - [[wiki:software:beuthbot:webscraper#built_with|Built With]] - [[wiki:software:beuthbot:webscraper#versioning|Versioning]] - [[wiki:software:beuthbot:webscraper#authors|Authors]] ==== Getting Started ==== These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. === Prerequisites === You will need a current version of [[https://nodejs.org/en/|node & npm]]. === Installing === After cloning the repository, install the dependencies. You can then run the project. # install dependencies npm install # serve at localhost:8000 npm start ==== Overview ==== The bot is basically a //Node//-//Express//-Backend. Incoming requests are checked and specifically handled. ==== Structure ==== The bot is separated into two files. ''index.js'' contains the fundamental logic. At the moment there is only one ressource but we expect to expand this with more ressources. A ressource is represented by a //route//. If the user request the lists of study rooms at our university, the script notices the request at the specified route and prepares a //JSON// response. The second script, ''scrape.js'', takes care of the actual web scraping. The given URL is requested with //axios// and then parsed with //cheerio//. ==== Functionalities ==== === Study Rooms === When the resource is requested we scrape the[[https://asta.studis-bht.de/service/lernraeume/|ASTA Website]] and try to return a list of available rooms, that are provided for students, from our university. ==== Further Development ==== Add a new route for every ressource in ''index.js'' and prepare functions in ''scrape.js'' to scrape the requested data from given Websites. ==== Built With ==== - [[https://nodejs.org/en/|Node.js]]\\ - [[https://expressjs.com/|Express.js]]\\ - [[https://github.com/axios/axios|Axios]]\\ - [[https://github.com/cheeriojs/cheerio|Cheerio]]\\ ==== Versioning ==== We use [[http://semver.org/|SemVer]] for versioning. For the versions available, see the [[https://github.com/beuthbot/scraper/tags|tags on this repository]]. ==== Authors ==== - **Tobias Klatt** - //Initial work// - [[https://github.com/T0biWan/|GitHub]] See also the list of [[https://github.com/beuthbot/scraper/graphs/contributors|contributors]] who participated in this project.