Snippets Groups Projects

Login to docker hub

Nico Fehr authored 1 year ago

ad9907fd

ad9907fd 1 year ago

Name	Last commit	Last update
.devcontainer
ai-services
assets
backend
dockerize
frontend
k8manifests
protos
.gitignore
.gitlab-ci.yml
README.md
docker-compose-prod.yml
docker-compose.yml

MarketMinds

Welcome to MarketMinds

As part of the challenge task for the module Distributed Systems (DSy FS-2024) at the Eastern Switzerland University of Applied Sciences, we created MarketMinds. MarketMinds is an application that acts as a financial news aggregator that provides news from various selected sources. News articles are tagged with named entities and can be filtered by tags. This is done by using a named entity recognition model. Furthermore we provide a sentiment analysis of the news articles. The sentiment analysis is done by using a sentiment analysis model, which is trained on a dataset of financial news articles in German.

This repository contains the source code of MarketMinds.

As required by the assignment, the application contains:

A simple React frontend written in Typescript
Traefik as a load balancer
Two instances of the front- and Go backend, that are being load balanced by Traefik
A scheduled task that runs every minute, which polls news from the predefined RSS feed news sources and does AI Analysis on them using two different language models
And of course a Database that provides a persistent storage for all the aggregated news articles, channels and analysis results

Infrastructure

The diagram below explains the infrastructure overview of the dockerized application. Details can be found in the sections below and in the respective folders of the repository for each component.

Backend

Technologies used: Golang, pgxv5, Gin-Gonic, gRPC (client), Make

The backend is written in Go and can be run using make. More information about how to start the component is in the README.md of the backend folder. The backend communicates to a Postgres DB with pgx. As Web-Framework we use the Gin-Gonic framework for providing a API to our service. Also included to the backend is a RSS parser, that parses the newest articles coming in (regularly using cron) from various channels. After importing the articles, we do an analysis with AI on the headlines and descriptions of the articles. This is done by implementing a gRPC client in the backend, that communicates with the AI-Service module of our application (the gRPC server, more in the section below).

REST API

The documentation of the REST API is as follows for the backend component:

Method	Endpoint	Description
GET	/healthz	Returns a health status for checks
GET	/api/status	Returns health status for backend and AI-Services component
GET	/api/import	Starts a new import of news articles from RSS sources
GET	/api/news-channel	Returns all news channels
GET	/api/news-channel/:id	Returns a specified news channel
GET	/api/news	Returns a paginated result of news articles
GET	/api/news/:id	Returns a specified article by id
GET	/api/news/search	Returns a paginated result of news articles by a search query
GET	/api/tags	Returns a list of all named entities (tags)
GET	/api/tag/:id	Returns a specified named entity (tag) with news articles that mention given entity

Database

Technologies used: PostgresSQL, PgAdmin

The database runs on a single instance. We used Postgres because of its performance, reliability and simplicity.

Entity-relationship Model

Frontend

Technologies used: Node, React, Redux-Toolkit, Typescript, MUI

The frontend is built as a Single Page Application (SPA). It interacts with the backend by API calls. The code gets built by npm and can then be statically served using the http-server package, that can be executed by the npm built-in npx. We call the backend API using fetch which is provided by Javascript and provides easy handling of making requests.

AI Services

Technologies used: Python, Hugginface (transformers and pipelines), gRPC (server), click

This service written in Python and implementing a gRPC server is used to communicate to the backend as the analysis tool. To define the connections we use two different proto files. The proto files define the endpoints and requests that are implemented by server and client. Find out more in the "protos" folder of the project.

Two different tasks are done with the AI Services component on the news articles titles and descriptions:

Named Entity Recognition (part of Token Classification)
Sentiment Analysis (part of Text Classification)

Load Balancer

Technologies: Traefik

The load balancer has the open ports HTTP 80 and HTTPS 443. HTTP port 80 will be redirected to HTTPS automatically. For simplicity reasons and because we only run on one client, a real certificate is not implemented yet. One for local testing has been created, but it is not signed by a trustworthy certificate authority. Therefore, a certificate error can be displayed. A certificate could of course be implemented in the future. The service which runs on the port 443 is with HTTP/3 configured. Afterwards the traffic will be forward to the frontend with the port 3000. The fronted gets the data by do request with the prefix /api, the load balancer routes all traffic with this prefix to the backend service on port 8080. The backend service contains two backend containers, the service translates the traffic to the port 8080.

The backend service includes a health check to determine if the backend containers still running and in healthy state using /healthz. If a container fails it takes up to 10s to route all traffic to the healthy one.

The configuration is placed in the "dockerized" folder of the project root. Stored here is also the certificate for running the application locally with SSL.

Containerized System

Technologies used: Docker (compose)

With the docker compose command as described below, the application can be started locally. Each component of the application is built as an image with a Dockerfile in the respective subfolders. With docker compose everything gets started up together in the same network marketminds-network. The database gets health checked before a startup of the backend component, because there is a small migration script running in the Go backend at startup by running a script in the database container.

Usage

First of all we need to create a .env file in the root folder of the project, containing the following variables:

DATABASE_URL=postgres://postgres:postgres@marketminds-postgres-db:5432/marketminds?sslmode=disable
MIGRATION_PATH=file:///app/db/migrations
AI_SERVICES_GRPC_URL=dns:///ai-services:50051

After creating the file we can go to the next step which is starting the application with a docker.

Running on localhost

The application can be started using:

docker-compose up

After starting, the frontend will be available under https://localhost/. Please note that the docker compose command might run into issues if the default port (80) or (443) is already in use.

Building / Running from Dockerfile

Builds from Dockerfile can be done by executing this command:

docker buildx build --tag <tag-of-component> .

And can be run by executing the following:

docker run <tag-of-component>

Find out more in the READMEs inside the folders of the components.

Development

For developing purposes the project can be launched in VSCode in a devcontainer. For this purpose there is a folder called .devcontainer that contains a separate docker-compose.yml for defining the dev setup. When opening the project in VSCode, the programm will ask you to open the environment in a container, if Docker Desktop is installed on your machine.

Authors