Best Flask open-source libraries and packages

Word search engine

Word search engine based on scraping the html source code and integrated with CI/CD and k8s orchestration
Updated 4 years ago

Word search engine

Build Status Docker

Input:

  • Query to search for (a word).
  • Source where to search (webpage).

Output:

  • Number of times the word exists in the html source.

Constrains:

  • Count only the giving word, not another word containing.

Architecture

TODO: architecture

API

Base API contract is stored in the doc folder. You can see the UI in http://<ENDPOINT>:<PORT>/ and the live documentation in http://<ENDPOINT>:<PORT>/swagger.json tho.

The body of the request can be strict or with limiters.

In case the query is restricted, the search engine search for a perfect match.

In case the strict value is false you need to provide a limiters. In this case the word can be encapsulated between this limiters.

You can provide more than one limiter.

Run

$ FLASK_APP=src/app flask run
# or
$ python src/app.py
$ curl -X PUT http://<ENDPOINT>:<PORT>/search \
      -H 'content-type: application/json' \
      -d @'base.template.json'

Deploy

The deployment is automated by the CI/CD pipeline but you can always run it in your local machine.

# Build docker
$ docker build -t word-search-engine:latest .
$ docker run -p 8000:8000 -e PORT=8000 --name word-search-engine word-search-engine:latest

Pull the latest version from hub.docker from any machine with docker installed on it. You can automate the process with Terraform, and a CI/CD pipeline if you are using ECS or create a deploy/rollback in K8s

$ docker pull jorgechato/word-search-engine:latest
# example with k8s
# do not forget to export the ENV_VARIABLES for the DB connection first
$ kubectl apply -f deploy/k8s.yml

Requirements

Must have

Recommendation for development

Install dependencies

# with anaconda
$ conda env create -f environment.yml # create virtual environment
$ conda activate backend # enter VE
# or
$ source activate backend
(backend) $ conda deactivate # exit VE

FAQ

Can the word be part of any html tag, css or js embedded in the source code of the page?

See also:

  • [business] MVP Questions (#1)

Does the scrapping search take place in the hole site-map of the domain?

No, the search engine only search in the endpoint provided by the requester.

Why K8S and not AWS lambdas?

P 1: When using Serverless platforms the first invocation of a function takes some time since the code needs to be initialized. In this case we will need a fast response since this service will be integrated with a stack of MS.

P 2: Kubernetes might provide better scalability features than some Serverless platforms, since Kubernetes is more mature and provides even HA (high availability) between different zones which not all Serverless platforms provide yet. And we plan to expand our business to different zones.

P 3: it might be easier to use Kubernetes for more complex applications because the platform is more mature. And since we are planning to use a database to store the outcome of the logic, that make sense.

P 4: Serverless doesn’t automatically mean lower costs, like when your applications need to run 24/7. There can also be some hidden costs like extra costs for API management or the costs for the function invocations for tests.

P 5: The monitoring capabilities of Kubernetes applications are much more mature.