add readme

0ace9ed0 · mkoenig · feda4f46 · 0ace9ed0 · feda4f46 · 0ace9ed0
Commit 0ace9ed0 authored 2 years ago by mkoenig
--- a/README.md
+++ b/README.md
-# ma
+# Learn to Drive from Pixels only
+**Autonomous Driving with an RC car using Reinforcement Learning and Evolution Strategies**

-Master Thesis - Autonomous RC Car Driving with RL 
+This work used a RC car for autonomous driving. The car learned to drive by itself through RL and Evolution Strategies.

-## Getting started
+This work is structured as follows:

-To make it easy for you to get started with GitLab, here's a list of recommended next steps.
+- *In [yolo](./yolo/)* An object detector is implemented to detect cones.

-Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
+- *In [rl](./rl/)* the reinforcement learning algorithms are implemented

-## Add your files
+- *In [evo](./evo/)* the evolution strategies with a master-worker framework is implemented

- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
+- *In [simulations](./simulations/)* The Unity simulations for training the car are contained

-```
-cd existing_repo
-git remote add origin https://gitlab.ost.ch/matthias.koenig/ma.git
-git branch -M main
-git push -uf origin main
-```

-## Integrate with your tools
-
- [ ] [Set up project integrations](https://gitlab.ost.ch/matthias.koenig/ma/-/settings/integrations)
-
-## Collaborate with your team
-
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
-
-## Test and Deploy
-
-Use the built-in continuous integration in GitLab.
-
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
-
-***
-
-# Editing this README
-
-When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!).  Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
-
-## Suggestions for a good README
-Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
-
-## Name
-Choose a self-explaining name for your project.
-
-## Description
-Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
-
-## Badges
-On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
-
-## Visuals
-Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
-
-## Installation
-Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
-
-## Usage
-Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
-
-## Support
-Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
-
-## Roadmap
-If you have ideas for releases in the future, it is a good idea to list them in the README.
-
-## Contributing
-State if you are open to contributions and what your requirements are for accepting them.
-
-For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
-
-You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
-
-## Authors and acknowledgment
-Show your appreciation to those who have contributed to the project.
-
-## License
-For open source projects, say how it is licensed.
-
-## Project status
-If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
--- a/TODO.md
+++ b/TODO.md
-## MA Autonomous Driving with rc car
-
-
-
-## Project Steps:
- [ ] Cone Detection
- [ ] Cone Keypoints Regression
- [ ] Basic driving alorithm
- [x] Basic Simulator (unity)
- [ ] Driving with RL in simulator
- [ ] Driving with basic Algorithm in simulator
- [ ] extend simulator (point cloud)
-
-## == Week 4 15.3 - 23.3 ==
-
- Boundingbox extraction from unity
- Yolo implementation
- Yolo modification
- papers
- Dataset
-
-
-## == Week 5 23.3 - 29.3 ==
-
- yolo training on dataset
- reinforcement learning with mlagent api
- 3D Scan
- PointCloud to Mesh
-
-## == Week 6 30.3 - 5.4 ==
- yolo bb as feature input to RL algorithm
- implement trainer with ml-agent python API
- improve mesh
-
--- a/evo/README.md
+++ b/evo/README.md
+# A Master Worker architecture for training Models with Evolution Strategies (ES).
+
+A master-worker architecture for parallel distributed training of neuronal networks (NN)
+with Evolution Strategies (ES). The implemented ES algorithm is <cite>[Covariance Matrix Adaptation Evolution Strategies][1]</cite> (CMA-ES).
+The implementation is based on <cite>[[2]]</cite> and <cite>[[3]]</cite>.
+In this work, the model is a hard-attention vision model to solve a control task based on image input. The models can be found in [solutions.py](./algorithm/solution.py) and the tasks is defined in [task.py](./algorithm/task.py).
+
+
+
+## Start a Training
+
+An example configuration can be found in [rc_car_config.yaml](./rc_car_config.yaml).
+
+First, on all machines the workers have to be started with:
+
+```
+bash start_workers.sh --config rc_car_config.yaml --num-workers 10
+```
+The number of workers `--num-workers` should be the same for all machines.
+
+The master is then started with
+
+```
+python run_master.py --config rc_car_config.yaml --worker-ip ip_1 ip_2 ... ip_n --num-worker 10 --exp-name 'experiment_name' --track
+```
+The options `--track` enables [Weights & Biases](https://wandb.ai/) tracking.
+For more options run `python run_master.py --help`
+
+
+## Evaluate
+For evaluating a trained model run
+
+```
+python test.py --log-dir ./runs/ppo/exp_name --n-episodes 10
+```
+With `--env UnityEditor` the unity editor is chosen as the environment.
+
+More options with `python test.py --help`
+
+The default model is best_model.npz with `--model-filename` another model can be chosen.
+
+With the option `--overplot` the input image with the overlaying attention patches are shown.
+
+
+
+
+[1]: https://arxiv.org/abs/1604.00772
+[2]: https://github.com/lerrytang/es_on_gke
+[3]: https://github.com/google/brain-tokyo-workshop/tree/master/AttentionAgent
\ No newline at end of file
--- a/evo/rc_car.yaml
+++ b/evo/rc_car.yaml
-#CMA Configuration
-CMA:
-  population_size: 64
-  init_sigma: 0.1
-
-Worker:
-  simulation: simulation/rccar_outdoor_track
-
-
-Master:
-  seed: 0
-  n_repeat: 16
-  max_iter: 2000
-  eval_every_n_iter: 10
-  n_eval_roll_outs: 100
-  timeout: 3600
-
-Solution:
-  image_size: 256
-  query_dim: 4
-  output_dim: 2
-  output_activation: "tanh"
-  activation: "tanh"
-  num_hiddens: [64,]
-  l2_coefficient: 0
-  patch_size: 16
-  patch_stride: 8
-  top_k: 10
-  data_dim: 3
-  normalize_positions: True
-  use_lstm_controller: False 
-
--- a/evo/rc_car_outdoor.yaml
+++ b/evo/rc_car_outdoor.yaml
--- a/evo/test.py
+++ b/evo/test.py
-import sys
-
-sys.path.append(".")
 import argparse
 import os
-import numpy as np
 import shutil
-import yaml
-import utils
-from task import CarRacingTask
-from solution import VisionTaskSolution

-UNITY_PORT = 5012
+import numpy as np
+import algorithm.utils as utils
+import yaml
+from algorithm.solution import VisionTaskSolution
+from algorithm.task import CarRacingTask


 def run_solution(config):
@@ -20,8 +16,8 @@ def run_solution(config):

    logger = utils.create_logger(name="test_solution", debug=False)
    task = CarRacingTask(logger=logger)
-
-    task.init_task(sim_path=settings["Worker"]["simulation"], unity_port=UNITY_PORT)
+    env_path = settings["Worker"]["simulation"] if config.env == None else config.env
+    task.init_task(sim_path=env_path, unity_port=config.port)
    task.seed(config.seed)


@@ -100,6 +96,14 @@ if __name__ == "__main__":
    parser.add_argument(
        "--seed", help="Random seed for evaluation.", type=int, default=1
    )
+    parser.add_argument(
+        "--port", help="Uniyt communication port", type=int, default=7050
+    )
+    parser.add_argument(
+        "--env",
+        default=None,
+        help="Path to Unity simulation. With 'UnityEditor' the editor scene is used",
+    )
    args, _ = parser.parse_known_args()

    run_solution(args)
--- a/papers/README.md
+++ b/papers/README.md
-# Publications - Autonomous driving and Reinforcement Learning
-
-## Autonomous Driving
-
-### Perception pipline with classic path planning
-[AMZ Driverless: The Full Autonomous Racing System](https://arxiv.org/pdf/1905.05150.pdf)
-Visual cone detection: Mono and stereo camera. LIDAR cone detection.
-Cone detection with YOLO and Keypoint regression for 3d pose estimate.
-
-### Reinforcement Learning
-[Reinforcement Learning Approach for Formula Student Technion Driverless](https://gip.cs.technion.ac.il/projects/uploads/180_preport_7.pdf)
-
-CNN with VAE as state encoder. Best model SAC with continues actions space. Stacked 4 consecutive frames. Learned a model only for steering.
-
-[Deep Reinforcement Learning for Autonomous Driving](https://arxiv.org/abs/1811.11329)
-DDPG Algorithm. TORCS Simulator. Simple features from the simulator. Reward: V\_x\*(angle)-V\_y\*sin(angle)-\gamma\*abs(trackpose) - \beta V\_x abs(trackpose)
-
-[High-speed Autonomous Drifting with Deep Reinforcement Learning](https://arxiv.org/pdf/2001.01377.pdf)
-State-Space model. SAC drift controller. CARLA Simulator. Features: state variables and errors between reference and current state.
-
-[End-to-End Race Driving with Deep Reinforcement Learning](https://arxiv.org/pdf/1807.02371.pdf)
-Features: RAW RGB image. Simulator: World rally championship game. A3C Algorithm, CNN+LSTM encoder. Reward: R = v(cos(alpha)-d) where alpha angle to track and d distance to track center.
-
-### Imitation Learning
-[Generative Adversarial Imitation Learning](https://arxiv.org/pdf/1606.03476.pdf)
-
-### Supervised policy learning
-
-
-
-## Reinforcement Learning
-[Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347.pdf)
-
-
-[Soft Actor-Critic](https://arxiv.org/pdf/1801.01290.pdf)
-
-
-[Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf)<br />
-Reinforcement Learning via Sequence Modeling
-
-
-## Misc.
-[YOLOv3](https://arxiv.org/pdf/1804.02767.pdf)
-
-[YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles](https://arxiv.org/pdf/2112.11798.pdf)
-
-[Real-time 3D Pose Estimation with a
-Monocular Camera Using Deep
-Learning and Object Priors](https://arxiv.org/pdf/1809.10548.pdf)
-
-[Accurate, Low-Latency Visual Perception for Autonomous Racing: Challenges, Mechanisms, and Practical Solutions](https://arxiv.org/pdf/2007.13971.pdf)
-
-
--- a/rc_car/flask_server.py
+++ b/rc_car/flask_server.py
-from flask import Flask, render_template, Response, request, json
-import numpy as np
-import cv2
-import sys
-import time
-import torch
-import pathlib
-import yaml
-import serial
-from queue import Queue, Empty
-from threading import Thread
-import multiprocessing
-from multiprocessing import Value, Process, Lock
-
-sys.path.append("..")
-from rl.utils import detect_cones
-from yolo.model import TinyYoloModified
-
-yolo = TinyYoloModified(2, 416)
-if torch.cuda.is_available():
-    yolo = yolo.to("cuda")
-
-yolo.load_weights("../yolo/checkpoints/TinyYoloModified_yellow.pth")
-
-
-app = Flask(__name__)
-
-camera = cv2.VideoCapture("/dev/video0")  # use 0 for web camera
-commands = multiprocessing.Queue()
-global speed_cmd
-speed_cmd = Value("d", 0)
-global steer_cmd
-steer_cmd = Value("d", 0)
-cmd_lock = Lock()
-image_queue = Queue()
-global img_counter
-img_counter = 0
-
-np.set_printoptions(suppress=True, precision=4)
-
-# real footage from Marshall camera with 220 deg fisheye
-LENS_FOV_DEGREE = 220
-LENS_IMAGE_CIRCLE_DIAMETER = 5.1
-SENSOR_WIDTH = 5.76
-SENSOR_HEIGHT = SENSOR_WIDTH / 3840 * 2160
-SENSOR_PIXEL_WIDTH = 3840
-SENSOR_PIXEL_HEIGHT = 2160
-
-
-# remote control config
-config_file = pathlib.Path(__file__).parent / "config.yml"
-
-with open(config_file, "r") as f:
-    config = yaml.safe_load(f)
-
-comport = config["serial_port"]
-baudrate = config["serial_baudrate"]
-ser = serial.Serial(comport, baudrate=baudrate, timeout=0.2)
-speed_factor = float(config["speed_factor"])
-steer_factor = float(config["steer_factor"])
-
-
-def map_azimuth_elevation(img, outshape, pix_per_deg):
-
-    LENS_MM_PER_DEG = 1.92 / 82.50
-
-    h, w = img.shape[:2]
-
-    pix_x, pix_y = np.meshgrid(
-        np.arange(0, outshape[1], dtype=np.float32),
-        np.arange(0, outshape[0], dtype=np.float32),
-    )
-
-    rad_x = (pix_x - outshape[1] // 2) / pix_per_deg * np.pi / 180
-    rad_y = (pix_y - outshape[0] // 2) / pix_per_deg * np.pi / 180
-
-    y = np.sin(rad_y)
-    x = np.sin(rad_x) * np.cos(y)
-    z = np.cos(rad_x) * np.cos(y)
-
-    phi = np.arctan2(y, x)
-    theta = np.arccos(z)
-
-    dist = LENS_MM_PER_DEG * (180 / np.pi) * theta
-
-    pos_x_mm = np.cos(phi) * dist
-    pos_y_mm = np.sin(phi) * dist
-
-    pix_x = pos_x_mm / SENSOR_WIDTH * w + w / 2
-    pix_y = pos_y_mm / SENSOR_HEIGHT * h + h / 2
-
-    out_img = cv2.remap(img, pix_x, pix_y, cv2.INTER_LINEAR)
-
-    return out_img
-
-
-def send_control_cmd(speed, steer, gear):
-    """
-    speed :float -1...1
-    steer :float -1...1
-    gear: bool
-    """
-
-    # print("speed", speed, "steer:", steer)
-    if speed > 0.1:
-        speed = speed**2 * 0.9 * abs(speed_factor) + 10
-    elif speed < -0.1:
-        speed = -(speed**2) * 0.9 * abs(speed_factor) - 10
-
-    speed = int(speed if speed_factor >= 0 else -speed)
-    steer = int(steer * steer_factor)
-    speed = speed + 256 if speed < 0 else speed
-    steer = steer + 256 if steer < 0 else steer
-    bits = 1 * gear
-
-    s = f"{speed:02x}{steer:02x}{bits:02x}\n"
-
-    ser.write(s.encode("utf8"))
-
-
-def control_loop(spe_cmd, ste_cmd, lock):
-    gear = False
-    while True:
-        start = time.time()
-        # ser.flush()
-        with lock:
-            speed = spe_cmd.value
-            steer = ste_cmd.value
-        # send_control_cmd(spe_cmd.value, ste_cmd.value, gear)
-        # print(speed, steer)
-        # send_control_cmd(speed, steer, gear)
-        # print(1./25-(time.time()-start))
-        time.sleep(max(1.0 / 2 - (time.time() - start), 0))
-
-
-def capture_frames():  # generate frame by frame from camera
-
-    pix_per_deg = 10
-    outshape = (65 * pix_per_deg, 87 * pix_per_deg)
-
-    while True:
-        # Capture frame-by-frame
-        success, frame = camera.read()  # read the camera frame
-        if not success:
-            break
-        else:
-            frame = map_azimuth_elevation(frame, outshape, pix_per_deg)
-            frame1 = cv2.resize(frame, (768, 512), interpolation=cv2.INTER_AREA)
-
-            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-
-            detection = detect_cones(
-                yolo, frame, img_size=416, conf_thres=0.1, nms_thres=0.5
-            )
-            for box in detection:
-                width = int(box[2] * 0.5)
-                height = int(box[3] * 0.5)
-                x = int(box[0])
-                y = int(box[1])
-                color = (0, 0, 255) if box[4] == 0 else (230, 220, 10)
-
-                frame = cv2.rectangle(
-                    frame,
-                    (x, y),
-                    (x + width, y + height),
-                    color,
-                    3,
-                )
-
-            frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
-            image_queue.put_nowait(frame1)
-            time.sleep(0.05)
-
-
-def gen_images():
-    frame = None
-    while True:
-        try:
-            frame = image_queue.get_nowait()
-            ret, buffer = cv2.imencode(".jpg", frame)
-            frame = buffer.tobytes()
-            yield (
-                b"--frame\r\n" b"Content-Type: image/jpeg\r\n\r\n" + frame + b"\r\n"
-            )  # concat frame one by one and show result
-        except Empty:
-            pass
-
-
-@app.route("/video_feed")
-def video_feed():
-    # Video streaming route. Put this in the src attribute of an img tag
-    return Response(gen_images(), mimetype="multipart/x-mixed-replace; boundary=frame")
-
-
-@app.route("/drive", methods=["GET"])
-def drive():
-    with cmd_lock:
-        speed_cmd.value = float(request.args.get("speed"))
-        steer_cmd.value = float(request.args.get("steer"))
-
-    # commands.put({"speed": speed, "steer": steer})
-
-    return app.response_class(json.dumps(True), content_type="application/json")
-
-
-@app.route("/capture", methods=["GET"])
-def capture():
-    global img_counter
-    print("capture")
-    if not image_queue.empty():
-        img_counter += 1
-        frame = image_queue.get_nowait()
-        cv2.imwrite(f"frames/frame{img_counter}.png", frame)
-    return app.response_class(json.dumps(True), content_type="application/json")
-
-
-@app.route("/")
-def index():
-    """Video streaming home page."""
-    return render_template("index.html")
-
-
-if __name__ == "__main__":
-    # Thread(target=control_loop, daemon=True).start()
-    # Process(target=capture_frames).start()
-    # speed_cmd.value = 0.3
-    # steer_cmd.value = 0.5
-    p = multiprocessing.Process(
-        target=control_loop, args=(speed_cmd, steer_cmd, cmd_lock)
-    ).start()
-    Thread(target=capture_frames, daemon=True).start()
-
-    app.run(host="0.0.0.0", debug=True)
--- a/rc_car/policy_controller.py
+++ b/rc_car/policy_controller.py
+import sys
+
+sys.path.append("../evo2")
+sys.path.append("../rl")
+import os
+import yaml
+import cv2
+import numpy as np
+import base64
+from rl.ppo_continous import Agent
+from rl.cardriver_env import DummyEnv
+from evo2.solution import VisionTaskSolution
+import torch
+import abc
+import time
+
+frame_rate = 20
+
+def load_attention_agent(path, model, overplot=False):
+    with open(os.path.join(path, "config.yaml"), "r") as f:
+        settings = yaml.safe_load(f)
+
+    solution = VisionTaskSolution(
+        image_size=settings["Solution"]["image_size"],
+        query_dim=settings["Solution"]["query_dim"],
+        output_dim=settings["Solution"]["output_dim"],
+        output_activation=settings["Solution"]["output_activation"],
+        num_hiddens=settings["Solution"]["num_hiddens"],
+        l2_coefficient=settings["Solution"]["l2_coefficient"],
+        patch_size=settings["Solution"]["patch_size"],
+        patch_stride=settings["Solution"]["patch_stride"],
+        top_k=settings["Solution"]["top_k"],
+        data_dim=settings["Solution"]["data_dim"],
+        activation=settings["Solution"]["activation"],
+        normalize_positions=settings["Solution"]["normalize_positions"],
+        use_lstm_controller=settings["Solution"]["use_lstm_controller"],
+        use_conv_features=settings["Solution"]["use_conv_features"],
+        show_overplot=overplot,
+    )
+    model_file = os.path.join(path, model + ".npz")
+    solution.load(model_file)
+    solution.set_log_dir("runs/exp_outdoor")
+    return solution
+
+
+class BasePolicyController:
+    def __init__(self, cam, speed, steer, stop):
+        self.speed = speed
+        self.steer = steer
+        self.stop = stop
+        self.agent = None
+        self.cam = cam
+
+    def drive(self):
+        raise NotImplementedError()
+
+
+class RLPolicyController(BasePolicyController):
+    def __init__(self, path, cam, speed, steer, stop, n_stacked_obs=8):
+        super(RLPolicyController, self).__init__(cam, speed, steer, stop)
+        dummy_env = DummyEnv()
+        self.agent = Agent(dummy_env)
+        self.agent.load_agent(path)
+        self.state = np.zeros((n_stacked_obs, 6, 3))
+
+    def drive(self):
+        self.stop.value = False
+        while True:
+            start = time.time()
+            # print("Speed:", speed.value, "Steer:", steer.value)
+            detections = self.cam.get_detections()
+            obs = self.get_obs_from_dedections(detections)
+            obs = torch.Tensor(obs)
+            action = self.agent.get_action(obs)
+            action = action.numpy()
+            self.speed.value = np.clip(action[0], -0.2, 1.0)
+            self.steer.value = np.clip(action[1], -1.0, 1.0)
+            print("speed action:", action[0])
+            if self.stop.value:
+                self.speed.value = 0
+                break
+            time.sleep(max(1.0 / frame_rate - (time.time() - start), 0))
+
+    def get_obs_from_dedections(self, detection):
+        self.state[1:, :, :] = self.state[:-1, :, :]
+        # sort detection by box size
+        bb_area = np.prod(detection[:, 2:4] - detection[:, :2], axis=1)
+        idx_sorted = bb_area.argsort()[::-1]  # reversed order
+        detection = detection[idx_sorted]
+        # seperate blue/yellow cones
+        blue = detection[detection[:, 5] == 0]
+        yellow = detection[detection[:, 5] == 1]
+        # blue cones
+        if len(blue) > 0:
+            obs = 3 if len(blue) > 3 else len(blue)
+            bb_box_size = blue[:, 2:4] - blue[:, :2]
+            self.state[0, :obs, :2] = (
+                blue[:obs, :2] + bb_box_size[:obs, :2] * 0.5
+            )  # Position normalized
+
+            self.state[0, :obs, 0] /= 768
+            self.state[0, :obs, 1] /= 512
+            self.state[0, :obs, 2] = blue[:obs, 4]  # object confidence
+
+        # yellow cones
+        if len(yellow) > 0:
+            obs = 3 if len(yellow) > 3 else len(yellow)
+            bb_box_size = yellow[:, 2:4] - yellow[:, :2]
+            self.state[0, 3 : 3 + obs, :2] = (
+                yellow[:obs, :2] + bb_box_size[:obs, :2] * 0.5
+            )  # Position
+            self.state[0, 3 : 3 + obs, 0] /= 768
+            self.state[0, 3 : 3 + obs, 1] /= 512
+
+            self.state[0, 3 : 3 + obs, 2] = yellow[:obs, 4]  # object confidence
+
+        return self.state.reshape(-1)
+
+
+class AttentionPolicyController(BasePolicyController):
+    def __init__(self, path, model, cam, speed, steer, stop):
+        super(AttentionPolicyController, self).__init__(cam, speed, steer, stop)
+        self.agent = load_attention_agent(path, model, overplot=True)
+
+    def drive(self):
+        self.stop.value = False
+        alpha = 0.9
+        while True:
+            start = time.time()
+            # print("Speed:", speed.value, "Steer:", steer.value)
+            obs = self.cam.get_frame()
+            action = self.agent.get_output(obs)
+            self.speed.value = self.speed.value*alpha+(1-alpha)*np.clip(action[0], -0.6, 0.7)
+            self.steer.value = np.clip(action[1]*1.0, -1.0, 1.0)
+            print("speed action:", action[0])
+            if self.stop.value:
+                self.speed.value = 0
+                break
+            time.sleep(max(1.0 / frame_rate - (time.time() - start), 0))
--- a/rl/README.md
+++ b/rl/README.md
-# Reinforcement learning library for training the RC Car
+# Reinforcement Learning library for training the RC Car

-A small reinforcment learning libary for training the RC Car with reinforcement learning algorithm.
-The two algorithm Proximal Policy Optimization <cite>[PPO][1]</cite> and Soft Actor Critic <cite>[SAC][2]</cite> are implemented.
+A small library for training the RC Car with reinforcement learning algorithms.
+The two algorithms Proximal Policy Optimization <cite>[PPO][1]</cite> and Soft Actor-Critic <cite>[SAC][2]</cite> are implemented.
 A new experiment can be started by running train.py

+
+## Train
+
 ```
 python train.py --config train_ppo.yaml  --log-dir runs/ppo --exp-name 'experiment_name' --track  --port 8001 --cuda
 ```
+The options `--track` enables [Weights & Biases](https://wandb.ai/) tracking and 
+`--port` defines the communication port for the UnityEnvironment.
+
+For more options run `python train.py --help`

-for more options run `python train.py --help`
+Example configurations can be found in [train_ppo.yaml](./train_ppo.yaml) and [train_sac.yaml](./train_sac.yaml).

 The environments are saved in [../simulations](../simulations)

+
+## Evaluate
 For evaluating a trained model run

 ```
@@ -25,7 +34,8 @@ The default model is best_model.pth with `--model-filename` another model can be
 With `--env UnityEditor` the unity editor is chosen as the environment.


-The file [policy_gradient.py](./policy_gradient.py) containes a simple from scratch implementation of the policy gradient algorithm in numpy. An example to understand the theorem and algorithm.
+In the file [policy_gradient.py](./policy_gradient.py) a simple from-scratch implementation of the policy gradient algorithm in numpy is contined. An example to understand the theorem and algorithm.
+




--- a/rl/requirements.txt
+++ b/rl/requirements.txt
+numpy
+pytorch
+torchvision
+tensorboard
+tqdm
+imgaug
+Pillow
+
--- a/rl/test.py
+++ b/rl/test.py
-import sys
-
-sys.path.append(".")
 import argparse
 import os
+
 import numpy as np
-import shutil
 import yaml
+
 import utils
-from cardriver_env import CarDriverEnvCont, CarDriverEnvCont2
-from algorithm.ppo import PPOAgent, LSTM_PPOAgent
+from algorithm.ppo import LSTM_PPOAgent, PPOAgent
 from algorithm.sac import SACAgent
+from cardriver_env import CarDriverEnvCont
+
 #from ppo_test import Agent


@@ -62,7 +61,7 @@ def run_solution(config):
        num_features=settings["env"]["num_features"],
        worker_id=config.port,
    )
-    env.unity_env.seed(config.seed)
+    env.env.seed(config.seed)

    rewards = []


--- a/yolo/README.md
+++ b/yolo/README.md
+# A Yolo Implementation for Detecting Cones
+
+A PyTorch Yolo implementation for detecting cones. The detected cones were used as features for reinforcement learning algorithms.
+
+
+## Train
+
+A new training can be started with:
+```
+python train.py -m config/tiny_yolo.cfg -d config/cones.data -e 300 --evaluation_interval 10 --run_id 'experiment_name' --pretrained_weights weights/yolov3-tiny.weights
+```
+
+For more options run `python train.py --help`
+
+An Example configuration can be found in [config/tiny_yolo.cfg](config/tiny_yolo.cfg) and the dataset is defined in [config/cones.data](config/cones.data).
+
+
+## Evaluate
+For evaluating a trained model on a test set run:
+
+```
+python test.py -w checkpoints/saved_model.pth -d ./config/test_s.data
+```
+
+More options with `python test.py --help`
+
+
+
--- a/yolo/requirements.txt
+++ b/yolo/requirements.txt
+numpy
+pytorch
+torchvision
+tensorboard
+tqdm
+imgaug
+Pillow
+