===== Scraper Documentation =====
Web Scraper to request data that is not provided through an API
==== Table of content ====
- [[wiki:software:beuthbot:webscraper|Scraper Doku]]
- [[wiki:software:beuthbot:webscraper#table_of_content|Table Of Content]]
- [[wiki:software:beuthbot:webscraper#getting_started|Getting Started]]
-[[wiki:software:beuthbot:webscraper#prerequisites|Prerequisites]]
-[[wiki:software:beuthbot:webscraper#installing |Installing ]]
- [[wiki:software:beuthbot:webscraper#overview|Overview]]
- [[wiki:software:beuthbot:webscraper#structure|Structure]]
- [[wiki:software:beuthbot:webscraper#functionalities|Functionalities]]
-[[wiki:software:beuthbot:webscraper#study_rooms|Study Rooms]]
- [[wiki:software:beuthbot:webscraper#further_development|Further Development]]
- [[wiki:software:beuthbot:webscraper#built_with|Built With]]
- [[wiki:software:beuthbot:webscraper#versioning|Versioning]]
- [[wiki:software:beuthbot:webscraper#authors|Authors]]
==== Getting Started ====
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
=== Prerequisites ===
You will need a current version of [[https://nodejs.org/en/|node & npm]].
=== Installing ===
After cloning the repository, install the dependencies. You can then run the project.
# install dependencies
npm install
# serve at localhost:8000
npm start
==== Overview ====
The bot is basically a //Node//-//Express//-Backend. Incoming requests are checked and specifically handled.
==== Structure ====
The bot is separated into two files. ''index.js'' contains the fundamental logic. At the moment there is only one ressource but we expect to expand this with more ressources. A ressource is represented by a //route//. If the user request the lists of study rooms at our university, the script notices the request at the specified route and prepares a //JSON// response.
The second script, ''scrape.js'', takes care of the actual web scraping. The given URL is requested with //axios// and then parsed with //cheerio//.
==== Functionalities ====
=== Study Rooms ===
When the resource is requested we scrape the[[https://asta.studis-bht.de/service/lernraeume/|ASTA Website]] and try to return a list of available rooms, that are provided for students, from our university.
==== Further Development ====
Add a new route for every ressource in ''index.js'' and prepare functions in ''scrape.js'' to scrape the requested data from given Websites.
==== Built With ====
- [[https://nodejs.org/en/|Node.js]]\\
- [[https://expressjs.com/|Express.js]]\\
- [[https://github.com/axios/axios|Axios]]\\
- [[https://github.com/cheeriojs/cheerio|Cheerio]]\\
==== Versioning ====
We use [[http://semver.org/|SemVer]] for versioning. For the versions available, see the [[https://github.com/beuthbot/scraper/tags|tags on this repository]].
==== Authors ====
- **Tobias Klatt** - //Initial work// - [[https://github.com/T0biWan/|GitHub]]
See also the list of [[https://github.com/beuthbot/scraper/graphs/contributors|contributors]] who participated in this project.