Best Flask open-source libraries and packages

Web scraping NASA

Uses Selenium, Flask, and MongoDB to create a news aggregator of NASA information.
Updated 1 year ago

web-scraping-nasa

Author: Erin James Wills, ejw.data@gmail.com

Nasa Webscraping Photo by NASA on Unsplash


Overview


This repo uses Requests, Splinter (with Selenium), and BeautifulSoup to collect data from several NASA websites. Information collected includes the latest news stories, images, and tables related to Mars. The webscraping for this project was tested in a Jupyter Notebook called mission_to_mars.ipynb. No other tools are needed to run this logic. See instructions below for more details.

To create a user friendly version, the scrape logic was added to a python script file (scrape_mars.py) which is called by a flask app (app.py) that serves a simple webpage with a button that triggers the scrape to run. All data collected from this scrape is stored in a MongoDB collection. The flask app can be run locally if you also have MongoDB installed. Instructions are provided below.


Technologies

  • Python
  • MongoDB
  • Splinter
  • BeautifulSoup
  • Requests

Data Source

The dataset is generated by scrapting the following websites:

Note: The above links could change and cause the project to break. This was not a project emphasizing long-term reliability but instead is a project testing out different technologies such as when to use Splinter verus BeautifulSoup.


Setup and Installation

  1. Environment needs the following:
    • Python 3.6+
    • pandas
    • webdriver_manager.chrome
    • splinter
    • BeautifulSoup
    • requests
    • time
    • flask
    • flask_pymongo
  2. Verify that your MongoDB server is running by starting MongoDB Compass and connecting into the service.
  3. Activate your environment
  4. Clone the repo to your local machine
  5. Start Jupyter Notebook within the environment from the repo
  6. To run and/or troubleshoot the scraping, run mission_to_mars.ipynb.
  7. To run the the flask app:
    • Open a new terminal with a present working directory being your repo folder
    • Activate your environment
    • In terminal run: python app.py
    • You should be able to go to a browser and see the scrape website from http://localhost:5000.
    • The scrape will run and update the MongoDB database and webpage after clicking the scrape button.
  8. The MongoDB database is automatically created if it does not exist and will be named: mars_app and the collection will be named: listings.

Collecting NASA webpage parts

Tags requests