AWS DeepRacer League

Hello friends! we have come with another useful topic as usual, OK have you heard about DeepRacer? 🤔🤔🤔 Don't worry if you haven't yet. Today we will go through the DeepRacer event with a past result deeply. DeepRacer is a tool created by Amazon AWS to learn machine learning (ML) easily. They use a special learning type called Reinforcement Learning. We can build applications for vehicles for autonomous driving in both virtual and physical environments. All the information is there in the AWS DeepRacer Developer Guide. Here we are going to share our experiences we have got in the training period and the virtual race day, they may be good or bad but will be beneficial for future racers.

This is the introduction page created for DeepRacer and it quickly gives you an idea about Reinforcement Learning in the How it works section. The page has all the links to functions that we need to use in the training for the league.

DeepRacer console

This page is also a little bit informative for DeepRacer Virtual Circuit.

Virtual circuit

We have to deal with the open community race created for our company hereon. It is informative and says about the day remaining for the main event/the race day, race type here it is Time-trial, level, sensors, track details here it is Bowtie Track, our rank, and the details about the top racer.

Community races

Now we can get started with the hands-on session for the preparation or training for the virtual race with the provided features in the DeepRacer console.

Get started

Make your car

What we need to do at first in the console, we need to have an agent. Here it is a 1/18th scaled car. In the Your garage section, all cars that are already built are listed and it also provides a button to create new cars as well.

Garage

If we click the button it will ask for all the specifications, action space, and personalization. Here DeepRacer team will recommend selecting the radio button Camera only in Sensor modification.

DeepRacer - build a car

The next step is to set up the action space for the vehicle. Here also DeepRacer team has recommended choosing Discrete as the Action space type. It is better to keep the maximum steering angle that 30 degrees or it is up to you as per your requirement. Steering granularity defines the possible angles of the front wheels. As per the below image, we can have 5 different angles for the wheels for the steering angle granularity 5 as {-30, -15, 0, 15, 30}. If it is 3, the number of angles will be 3 as {-30, 0, 30}, and if it is 7, the angles will be {-30, -20, -10, 0, 10, 20, 30}. Then it asks about the max speed of the car, so here we just keep it as 3 but the DeepRacer team recommended entering the max value, here it is 4. At last, it asks about the granularity of the speed and we keep it as 2, ow we can have 2-speed variations as 3m/s and 1.5m/s.

Action space

As per the aforementioned steering angles and speed values we got the below action list which elaborates all possible states of the vehicle.

Action list

We can give it a name and a color as the last step.

Customize the car

Make you model

We just finished building our car, the next step is to create a model to train. The below page lists all the available models that we have created and a button to create a new model as well.

Models

If we select the button to create a new model, we have to provide all the details about our new model. At first, we need to give it a name and a description (optional) and select a track. DeepRacer team recommended choosing Bowtie Track since we have to play the virtual race day on the same track.

Create model - model name and environment

And the race type is Time trial, again the DeepRacer team asked to select that. Most of the time we keep all the default values in the Training algorithm and hyperparameters section but if you want you can fine-tune them to get a much better result. We can select a car from the garage as the agent for the model.

Create model - training type, training algorithm, and agent

As the last step, we can define a reward function for the model. This is the most tricky part of the model creating. We can remain the default reward function as it is or alter them. Here we have used custom reward functions. We can define the training time in minutes as the Stop condition. 2 to 4 hours is the best training time as per many past racers. but we can go up to 24 hours. By the way, we trained 8 and 10 hours our best models.

Create model - customize reward function

Reward functions

The reward function is the key part of our model training because though the are few other configurable values that we can control to get a better result, this is the main algorithm that we use in model training. As you can see, there are few functions already provided by AWS

1] Time trial - follow the center line (Default)

def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    
    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width
    
    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track
    
    return float(reward)

2] Time trial - stay inside the two borders

def reward_function(params):
    '''
    Example of rewarding the agent to stay inside the two borders of the track
    '''
    
    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    
    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and
    # the agent is somewhere in between the track borders
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return float(reward)

3] Time trial - prevent zig-zag

def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''
    
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the agent is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15

    # Penalize reward if the agent is steering too much
    if steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8

    return float(reward)

4] Object avoidance and head-to-head - stay on one lane and not crashing (default for OA and h2h)

def reward_function(params):
    '''
    Example of rewarding the agent to stay inside two borders
    and penalizing getting too close to the objects in front
    '''

    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    objects_distance = params['objects_distance']
    _, next_object_index = params['closest_objects']
    objects_left_of_center = params['objects_left_of_center']
    is_left_of_center = params['is_left_of_center']

    # Initialize reward with a small number but not zero
    # because zero means off-track or crashed
    reward = 1e-3

    # Reward if the agent stays inside the two borders of the track
    if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
        reward_lane = 1.0
    else:
        reward_lane = 1e-3

    # Penalize if the agent is too close to the next object
    reward_avoid = 1.0

    # Distance to the next object
    distance_closest_object = objects_distance[next_object_index]
    # Decide if the agent and the next object is on the same lane
    is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center

    if is_same_lane:
        if 0.5 <= distance_closest_object < 0.8: 
            reward_avoid *= 0.5
        elif 0.3 <= distance_closest_object < 0.5:
            reward_avoid *= 0.2
        elif distance_closest_object < 0.3:
            reward_avoid = 1e-3 # Likely crashed

    # Calculate reward by putting different weights on 
    # the two aspects above
    reward += 1.0 * reward_lane + 4.0 * reward_avoid

    return reward

But we used a customized algorithm in the training as below. Actually, we have tried out multiple algorithms and these are the best two out of them.

Function that we used in the traning before the race day (model M20)

import math

def reward_function(params):    
    progress = params['progress']
    
    # Read input variables
    waypoints = params['waypoints']
    closest_waypoints = params['closest_waypoints']
    heading = params['heading']
    
    reward = 1.0
        
    if progress == 100:
        reward += 100
    
    # Calculate the direction of the center line based on the closest waypoints
    next_point = waypoints[closest_waypoints[1]]
    prev_point = waypoints[closest_waypoints[0]]    

    # Calculate the direction in radius, arctan2(dy, dx), the result is (-pi, pi) in radians
    track_direction = math.atan2(next_point[1] - prev_point[1], next_point[0] - prev_point[0]) 

    # Convert to degree
    track_direction = math.degrees(track_direction)    

    # Calculate the difference between the track direction and the heading direction of the car
    direction_diff = abs(track_direction - heading)    

    # Penalize the reward if the difference is too large
    DIRECTION_THRESHOLD = 10.0
    
    malus=1
    
    if direction_diff > DIRECTION_THRESHOLD:
        malus=1-(direction_diff/50)
        if malus<0 or malus>1:
            malus = 0
        reward *= malus
    
    return reward

Function that we used in the race day (model M11)

import math

def reward_function(params):
    '''
    Use square root for center line - ApiDragons-M11
    '''
	
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    speed = params['speed']
    progress = params['progress']
    all_wheels_on_track = params['all_wheels_on_track']
    SPEED_TRESHOLD = 3    

    reward = 1 - (distance_from_center / (track_width/2))**(4)

    if reward < 0:
        reward = 0    

    if speed > SPEED_TRESHOLD:
        reward *= 0.8        

    if not (all_wheels_on_track):
        reward = 0

    if progress == 100:    
        reward += 100        

    return float(reward)

Model training and evaluation

This is one of our best models which has taken 8 hours to train. The model evaluation shows 100% completed 5 laps but it has taken a little longer than our other best model. It has taken around 15 seconds to complete a lap on the track. You can train 4 multiple models at the same time.

Model training and evaluation - M20

We have another model M11, which reported the best lap time 11.141 seconds in the evaluation and 12.057 seconds in the leader board. But it had only 100% one completed lap in the evaluation.

Model training and evaluation - M11

We had just put our model for March Qualifier to get an idea and it secured the below rank 😆. But the track is different and this is not particularly designed for the March Qualifier so we didn't have to worry about it.

March qualifier

Our M11 model secured 4th place out of 35 candidates on the leader board before the race day.

Rank in the community race - before racing day

This is the leader board in the virtual race day. As we have already mentioned this post is a retrospective for the event and we have identified few mistakes we had to corrected before the race day as listed below.

We used 3m/s as the speed but we had to choose the max value, 4m/s for the action space.
We had used a longer period of time like 8, 10 hours as the training time but it is enough to train 2 to 4 hour for better results.
We did not trust default reward functions and always looked for a complex function but it is not correct, the results would be better if the reward function was a little bit simpler than that we have used.
We had just concerned about the best lap time in the leader board before the race day so we had selected model M11 for the race day. but it was not much stable. If we went with model M20 which has already shown a 5 100% completed laps, the final results would rather be joyful though it did not have a better lap time. Because it was a stable model that M11.

Rank in the community race - racing day

However, it was a fun and energetic period of time because it was courageous us when we go up on the leader board. It motivated us to find another good reward function and set of hyperparameter values. We had addicted to training new models again and again before the race day. Hope this news about our experience will be beneficial for your future DeepRacer Leagues. That is all we have to share with you. We will meet you again with this kind of valuable topic. Until then bye bye!

References:

https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-reward-function-examples.html

AWS DeepRacer League - a retrospective after the race day