MarketMinds
Welcome to MarketMinds
As part of the challenge task for the module Distributed Systems (DSy FS-2024) at the Eastern Switzerland University of Applied Sciences, we created MarketMinds. MarketMinds is an application that acts as a financial news aggregator that provides news from various selected sources. News articles are tagged with named entities and can be filtered by tags. This is done by using a named entity recognition model. Furthermore we provide a sentiment analysis of the news articles. The sentiment analysis is done by using a sentiment analysis model, which is trained on a dataset of financial news articles in German.
This repository contains the source code of MarketMinds.
As required by the assignment, the application contains:
- A simple React frontend written in Typescript
- Traefik as a load balancer
- Two instances of the front- and Go backend, that are being load balanced by Traefik
- A scheduled task that runs every minute, which polls news from the predefined RSS feed news sources and does AI Analysis on them using two different language models
- And of course a Database that provides a persistent storage for all the aggregated news articles, channels and analysis results
Infrastructure
The diagram below explains the infrastructure overview of the dockerized application. Details can be found in the sections below and in the respective folders of the repository for each component.
Backend
Technologies used: Golang, pgxv5, Gin-Gonic, gRPC (client), Make
The backend is written in Go and can be run using make. More information about how to start the component is in the README.md
of the backend folder.
The backend communicates to a Postgres DB with pgx. As Web-Framework we use the Gin-Gonic framework for providing a API to our service.
Also included to the backend is a RSS parser, that parses the newest articles coming in (regularly using cron) from various channels. After importing the articles, we do an analysis with AI on the headlines and descriptions of the articles. This is done by implementing a gRPC client in the backend, that communicates with the AI-Service module of our application (the gRPC server, more in the section below).
REST API
The documentation of the REST API is as follows for the backend component:
Method | Endpoint | Description |
---|---|---|
GET | /healthz | Returns a health status for checks |
GET | /api/status | Returns health status for backend and AI-Services component |
GET | /api/import | Starts a new import of news articles from RSS sources |
GET | /api/news-channel | Returns all news channels |
GET | /api/news-channel/:id | Returns a specified news channel |
GET | /api/news | Returns a paginated result of news articles |
GET | /api/news/:id | Returns a specified article by id |
GET | /api/news/search | Returns a paginated result of news articles by a search query |
GET | /api/tags | Returns a list of all named entities (tags) |
GET | /api/tag/:id | Returns a specified named entity (tag) with news articles that mention given entity |
Database
Technologies used: PostgresSQL, PgAdmin
The database runs on a single instance. We used Postgres because of its performance, reliability and simplicity.
Entity-relationship Model
Frontend
Technologies used: Node, React, Redux-Toolkit, Typescript, MUI
The frontend is built as a Single Page Application (SPA). It interacts with the backend by API calls.
The code gets built by npm and can then be statically served using the http-server
package, that can be executed by the npm built-in npx.
We call the backend API using fetch
which is provided by Javascript and provides easy handling of making requests.
AI Services
Technologies used: Python, Hugginface (transformers and pipelines), gRPC (server), click
This service written in Python and implementing a gRPC server is used to communicate to the backend as the analysis tool. To define the connections we use two different proto files. The proto files define the endpoints and requests that are implemented by server and client. Find out more in the "protos" folder of the project.
Two different tasks are done with the AI Services component on the news articles titles and descriptions:
- Named Entity Recognition (part of Token Classification)
- Sentiment Analysis (part of Text Classification)
Load Balancer
Technologies: Traefik
The load balancer has the open ports HTTP 80
and HTTPS 443
. HTTP port 80
will be redirected to HTTPS automatically.
For simplicity reasons and because we only run on one client, a real certificate is not implemented yet. One for local testing has been created, but it is not signed by a trustworthy certificate authority.
Therefore, a certificate error can be displayed. A certificate could of course be implemented in the future.
The service which runs on the port 443
is with HTTP/3 configured. Afterwards the traffic will be forward to the frontend with the port 3000
.
The fronted gets the data by do request with the prefix /api
, the load balancer routes all traffic with this prefix to the backend service on port 8080
.
The backend service contains two backend containers, the service translates the traffic to the port 8080
.
The backend service includes a health check to determine if the backend containers still running and in healthy state using /healthz
. If a container fails it takes up to 10s to route all traffic to the healthy one.
The configuration is placed in the "dockerized" folder of the project root. Stored here is also the certificate for running the application locally with SSL.
Containerized System
Technologies used: Docker (compose)
With the docker compose command as described below, the application can be started locally. Each component of the application is built as an image with a Dockerfile
in the respective subfolders. With docker compose everything gets started up together in the same network marketminds-network
.
The database gets health checked before a startup of the backend component, because there is a small migration script running in the Go backend at startup by running a script in the database container.
Usage
First of all we need to create a .env
file in the root folder of the project, containing the following variables:
DATABASE_URL=postgres://postgres:postgres@marketminds-postgres-db:5432/marketminds?sslmode=disable
MIGRATION_PATH=file:///app/db/migrations
AI_SERVICES_GRPC_URL=dns:///ai-services:50051
After creating the file we can go to the next step which is starting the application with a docker.
Running on localhost
The application can be started using:
docker-compose up
After starting, the frontend will be available under https://localhost/
. Please note that the docker compose command might run into issues if the default port (80
) or (443
) is already in use.
Building / Running from Dockerfile
Builds from Dockerfile
can be done by executing this command:
docker buildx build --tag <tag-of-component> .
And can be run by executing the following:
docker run <tag-of-component>
Find out more in the README
s inside the folders of the components.
Development
For developing purposes the project can be launched in VSCode in a devcontainer. For this purpose there is a folder called .devcontainer
that contains a separate docker-compose.yml
for defining the dev setup. When opening the project in VSCode, the programm will ask you to open the environment in a container, if Docker Desktop is installed on your machine.