Translate

Showing posts with label AWS Machine Learning Blog. Show all posts
Showing posts with label AWS Machine Learning Blog. Show all posts

Friday, November 13, 2020

Predicting qualification ranking based on practice session performance for Formula 1 Grand Prix

If you’re a Formula 1 (F1) fan, have you ever wondered why F1 teams have very different performances between qualifying and practice sessions? Why do they have multiple practice sessions in the first place? Can practice session results actually tell something about the upcoming qualifying race? In this post, we answer these questions and more. We show you how we can predict qualifying results based on practice session performances by harnessing the power of data and machine learning (ML). These predictions are being integrated into the new “Qualifying Pace” insight for each F1 Grand Prix (GP). This work is part of the continuous collaboration between F1 and the Amazon ML Solutions Lab to generate new F1 Insights powered by AWS.

Each F1 GP consists of several stages. The event starts with three practice sessions (P1, P2, and P3), followed by a qualifying (Q) session, and then the final race. Teams approach practice and qualifying sessions differently because these sessions serve different purposes. The practice sessions are the teams’ opportunities to test out strategies and tire compounds to gather critical data in preparation for the final race. They observe the car’s performance with different strategies and tire compounds, and use this to determine their overall race strategy.

In contrast, qualifying sessions determine the starting position of each driver on race day. Teams focus solely on obtaining the fastest lap time. Because of this shift in tactics, Friday and Saturday practice session results often fail to accurately predict the qualifying order.

In this post, we introduce deterministic and probabilistic methods to model the time difference between the fastest lap time in practice sessions and the qualifying session (∆t = tq-tp). The goal is to more accurately predict the upcoming qualifying standings based on the practice sessions.

Error sources of ∆t

The delta of the fastest lap time between practice and qualifying sessions (∆t) comes primarily from variations in fuel level and tire grip.

A higher fuel level adds weight to the car and reduces the speed of the car. For practice sessions, teams vary the fuel level as they please. For the second practice session (P2), it’s common to begin with a low fuel level and run with more fuel in the latter part of the session. During qualifying, teams use minimal fuel levels in order to record the fastest lap time. The impact of fuel on lap time varies from circuit to circuit, depending on how many straights the circuit has and how long these straights are.

Tires also play a significant role in an F1 car’s performance. During each GP event, the tire supplier brings various tire types with varying compounds suitable for different racing conditions. Two of these are for wet circuit conditions: intermediate tires for light standing water and wet tires for heavy standing water. The remaining dry running tires can be categorized into three compound types: hard, medium, and soft. These tire compounds provide different grips to the circuit surface. The more grip the tire provides, the faster the car can run.

Past racing results showed that car performance dropped significantly when wet tires were used. For example, in the 2018 Italy GP, because the P1 session was wet and the qualifying session was dry, the fastest lap time in P1 was more than 10 seconds slower than the qualifying session.

Among the dry running types, the hard tire provides the least grip but is the most durable, whereas the soft tire has the most grip but is the least durable. Tires degrade over the course of a race, which reduces the tire grip and slows down the car. Track temperature and moisture affects the progression of degradation, which in turn changes the tire grip. As in the case with fuel level, tire impact on lap time changes from circuit to circuit.

Data and attempted approaches

Given this understanding of factors that can impact lap time, we can use fuel level and tire grip data to estimate the final qualifying lap time based on known practice session performance. However, as of this writing, data records to directly infer fuel level and tire grip during the race are not available. Therefore, we take an alternative approach with data we can currently obtain.

The data we used in the modeling were records of fastest lap times for each GP since 1950 and partial years of weather data for the corresponding sessions. The lap times data included the fastest lap time for each session (P1, P2, P3, and Q) of each GP with the driver, car and team, and circuit name (publicly available on F1’s website). Track wetness and temperature for each corresponding session was available in the weather data.

We explored two implicit methods with the following model inputs: the team and driver name, and the circuit name. Method one was a rule-based empirical model that attributed observed  to circuits and teams. We estimated the latent parameter values (fuel level and tire grip differences specific to each team and circuit) based on their known lap time sensitivities. These sensitivities were provided by F1 and calculated through simulation runs on each circuit track. Method two was a regression model with driver and circuit indicators. The regression model learned the sensitivity of ∆t for each driver on each circuit without explicitly knowing the fuel level and tire grip exerted. We developed and compared deterministic models using XGBoost and AutoGluon, and probabilistic models using PyMC3.

We built models using race data from 2014 to 2019, and tested against race data from 2020. We excluded data from before 2014 because there were significant car development and regulation changes over the years. We removed races in which either the practice or qualifying session was wet because ∆t for those sessions were considered outliers.

Managed model training with Amazon SageMaker

We trained our regression models on Amazon SageMaker.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. Specifically for model training, it provides many features to assist with the process.

For our use case, we explored multiple iterations on the choices of model feature sets and hyperparameters. Recording and comparing the model metrics of interest was critical to choosing the most suitable model. The Amazon SageMaker API allowed customized metrics definition prior to launching a model training job, and easy retrieval after the training job was complete. Using the automatic model tuning feature reduced the mean squared error (MSE) metric on the test data by 45% compared to the default hyperparameter choice.

We trained an XGBoost model using the Amazon SageMaker’s built-in implementation. Its built-in implementation allowed us to run model training through a general estimator interface. This approach provided better logging, superior hyperparameter validation, and a larger set of metrics than the original implementation.

Rule-based model

In the rule-based approach, we reason that the differences of lap times ∆t primarily come from systematic variations of tire grip for each circuit and fuel level for each team between practice and qualifying sessions. After accounting for these known variations, we assume residuals are random small numbers with a mean of zero. ∆t can be modeled with the following equation:

∆tf(c) and ∆tg(c) are known sensitivities of fuel mass and tire grip, and  is the residual. A hierarchy exists among the factors contained in the equation. We assume grip variations for each circuit (g(c)) are at the top level. Under each circuit, there are variations of fuel level across teams (f(t,c)).

To further simplify the model, we neglect  because we assume it is small. We further assume fuel variation for each team across all circuits is the same (i.e., f(t,c) = f(t)). We can simplify the model to the following:

Because ∆tf(c) and ∆tg(c) are known, f(t) and g(c), we can estimate team fuel variations and tire grip variations from the data.

The differences in the sensitivities depend on the characteristics of circuits. From the following track maps, we can observe that the Italian GP circuit has fewer corner turns and the straight sections are longer compared to the Singapore GP circuit. Additional tire grip gives a larger advantage in the Singapore GP circuit.

 

ML regression model

For the ML regression method, we don’t directly model the relation between  and fuel level and grip variations. Instead, we fit the following regression model with just the circuit, team, and driver indicator variables:

Ic, It, and Id represent the indicator variables for circuits, teams, and drivers.

Hierarchical Bayesian model

Another challenge with modeling the race pace was due to noisy measurements in lap times. The magnitude of random effect (ϵ) of ∆t could be non-negligible. Such randomness might come from drivers’ accidental drift from their normal practice at the turns or random variations of drivers’ efforts during practice sessions. With deterministic approaches, such random effect wasn’t appropriately captured. Ideally, we wanted a model that could quantify uncertainty about the predictions. Therefore, we explored Bayesian sampling methods.

With a hierarchical Bayesian model, we account for the hierarchical structure of the error sources. As with the rule-based model, we assume grip variations for each circuit (g(c))) are at the top level. The additional benefit of a hierarchical Bayesian model is that it incorporates individual-level variations when estimating group-level coefficients. It’s a middle ground between two extreme views of data. One extreme is to pool data for every group (circuit and driver) without considering the intrinsic variations among groups. The other extreme is to train a regression model for each circuit or driver. With 21 circuits, this amounts to 21 regression models. With a hierarchical model, we have a single model that considers the variations simultaneously at the group and individual level.

We can mathematically describe the underlying statistical model for the hierarchical Bayesian approach as the following varying intercepts model:

Here, i represents the index of each data observation, j represents the index of each driver, and k represents the index of each circuit. μjk represents the varying intercept for each driver under each circuit, and θk represents the varying intercept for each circuit. wp and wq represent the wetness level of the track during practice and qualifying sessions, and ∆T represents the track temperature difference.

Test models in the 2020 races

After predicting ∆t, we added it into the practice lap times to generate predictions of qualifying lap times. We determined the final ranking based on the predicted qualifying lap times. Finally, we compared predicted lap times and rankings with the actual results.

The following figure compares the predicted rankings and the actual rankings for all three practice sessions for the Austria, Hungary, and Great Britain GPs in 2020 (we exclude P2 for the Hungary GP because the session was wet).

For the Bayesian model, we generated predictions with an uncertainty range based on the posterior samples. This enabled us to predict the ranking of the drivers relatively with the median while accounting for unexpected outcomes in the drivers’ performances.

The following figure shows an example of predicted qualifying lap times (in seconds) with an uncertainty range for selected drivers at the Austria GP. If two drivers’ prediction profiles are very close (such as MAG and GIO), it’s not surprising that either driver might be the faster one in the upcoming qualifying session.

Metrics on model performance

To compare the models, we used mean squared error (MSE) and mean absolute error (MAE) for lap time errors. For ranking errors, we used rank discounted cumulative gain (RDCG). Because only the top 10 drivers gain points during a race, we used RDCG to apply more weight to errors in the higher rankings. For the Bayesian model output, we used median posterior value to generate the metrics.

The following table shows the resulting metrics of each modeling approach for the test P2 and P3 sessions. The best model by each metric for each session is highlighted.

MODEL MSE MAE RDCG
  P2 P3 P2 P3 P2 P3
Practice raw 2.822 1.053 1.544 0.949 0.92 0.95
Rule-based 0.349 0.186 0.462 0.346 0.88 0.95
XGBoost 0.358 0.141 0.472 0.297 0.91 0.95
AutoGluon 0.567 0.351 0.591 0.459 0.90 0.96
Hierarchical Bayesian 0.431 0.186 0.521 0.332 0.87 0.92

All models reduced the qualifying lap time prediction errors significantly compared to directly using the practice session results. Using practice lap times directly without considering pace correction, the MSE on the predicted qualifying lap time was up to 2.8 seconds. With machine learning methods which automatically learned pace variation patterns for teams and drivers on different circuits, we brought the MSE down to smaller than half a second. The resulting prediction was a more accurate representation of the pace in the qualifying session. In addition, the models improved the prediction of rankings by a small margin. However, there was no one single approach that outperformed all others. This observation highlighted the effect of random errors on the underlying data.

Summary

In this post, we described a new Insight developed by the Amazon ML Solutions Lab in collaboration with Formula 1 (F1).

This work is part of the six new F1 Insights powered by AWS that are being released in 2020, as F1 continues to use AWS for advanced data processing and ML modeling. Fans can expect to see this new Insight unveiled at the 2020 Turkish GP to provide predictions for the upcoming qualifying races at practice sessions.

If you’d like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab .

 


About the Author

Guang Yang is a data scientist at the Amazon ML Solutions Lab where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art ML/AI solutions.



from AWS Machine Learning Blog https://ift.tt/3nqbL22
via A.I .Kung Fu

Predicting qualification ranking based on practice session performance for Formula 1 Grand Prix

If you’re a Formula 1 (F1) fan, have you ever wondered why F1 teams have very different performances between qualifying and practice sessions? Why do they have multiple practice sessions in the first place? Can practice session results actually tell something about the upcoming qualifying race? In this post, we answer these questions and more. We show you how we can predict qualifying results based on practice session performances by harnessing the power of data and machine learning (ML). These predictions are being integrated into the new “Qualifying Pace” insight for each F1 Grand Prix (GP). This work is part of the continuous collaboration between F1 and the Amazon ML Solutions Lab to generate new F1 Insights powered by AWS.

Each F1 GP consists of several stages. The event starts with three practice sessions (P1, P2, and P3), followed by a qualifying (Q) session, and then the final race. Teams approach practice and qualifying sessions differently because these sessions serve different purposes. The practice sessions are the teams’ opportunities to test out strategies and tire compounds to gather critical data in preparation for the final race. They observe the car’s performance with different strategies and tire compounds, and use this to determine their overall race strategy.

In contrast, qualifying sessions determine the starting position of each driver on race day. Teams focus solely on obtaining the fastest lap time. Because of this shift in tactics, Friday and Saturday practice session results often fail to accurately predict the qualifying order.

In this post, we introduce deterministic and probabilistic methods to model the time difference between the fastest lap time in practice sessions and the qualifying session (∆t = tq-tp). The goal is to more accurately predict the upcoming qualifying standings based on the practice sessions.

Error sources of ∆t

The delta of the fastest lap time between practice and qualifying sessions (∆t) comes primarily from variations in fuel level and tire grip.

A higher fuel level adds weight to the car and reduces the speed of the car. For practice sessions, teams vary the fuel level as they please. For the second practice session (P2), it’s common to begin with a low fuel level and run with more fuel in the latter part of the session. During qualifying, teams use minimal fuel levels in order to record the fastest lap time. The impact of fuel on lap time varies from circuit to circuit, depending on how many straights the circuit has and how long these straights are.

Tires also play a significant role in an F1 car’s performance. During each GP event, the tire supplier brings various tire types with varying compounds suitable for different racing conditions. Two of these are for wet circuit conditions: intermediate tires for light standing water and wet tires for heavy standing water. The remaining dry running tires can be categorized into three compound types: hard, medium, and soft. These tire compounds provide different grips to the circuit surface. The more grip the tire provides, the faster the car can run.

Past racing results showed that car performance dropped significantly when wet tires were used. For example, in the 2018 Italy GP, because the P1 session was wet and the qualifying session was dry, the fastest lap time in P1 was more than 10 seconds slower than the qualifying session.

Among the dry running types, the hard tire provides the least grip but is the most durable, whereas the soft tire has the most grip but is the least durable. Tires degrade over the course of a race, which reduces the tire grip and slows down the car. Track temperature and moisture affects the progression of degradation, which in turn changes the tire grip. As in the case with fuel level, tire impact on lap time changes from circuit to circuit.

Data and attempted approaches

Given this understanding of factors that can impact lap time, we can use fuel level and tire grip data to estimate the final qualifying lap time based on known practice session performance. However, as of this writing, data records to directly infer fuel level and tire grip during the race are not available. Therefore, we take an alternative approach with data we can currently obtain.

The data we used in the modeling were records of fastest lap times for each GP since 1950 and partial years of weather data for the corresponding sessions. The lap times data included the fastest lap time for each session (P1, P2, P3, and Q) of each GP with the driver, car and team, and circuit name (publicly available on F1’s website). Track wetness and temperature for each corresponding session was available in the weather data.

We explored two implicit methods with the following model inputs: the team and driver name, and the circuit name. Method one was a rule-based empirical model that attributed observed  to circuits and teams. We estimated the latent parameter values (fuel level and tire grip differences specific to each team and circuit) based on their known lap time sensitivities. These sensitivities were provided by F1 and calculated through simulation runs on each circuit track. Method two was a regression model with driver and circuit indicators. The regression model learned the sensitivity of ∆t for each driver on each circuit without explicitly knowing the fuel level and tire grip exerted. We developed and compared deterministic models using XGBoost and AutoGluon, and probabilistic models using PyMC3.

We built models using race data from 2014 to 2019, and tested against race data from 2020. We excluded data from before 2014 because there were significant car development and regulation changes over the years. We removed races in which either the practice or qualifying session was wet because ∆t for those sessions were considered outliers.

Managed model training with Amazon SageMaker

We trained our regression models on Amazon SageMaker.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. Specifically for model training, it provides many features to assist with the process.

For our use case, we explored multiple iterations on the choices of model feature sets and hyperparameters. Recording and comparing the model metrics of interest was critical to choosing the most suitable model. The Amazon SageMaker API allowed customized metrics definition prior to launching a model training job, and easy retrieval after the training job was complete. Using the automatic model tuning feature reduced the mean squared error (MSE) metric on the test data by 45% compared to the default hyperparameter choice.

We trained an XGBoost model using the Amazon SageMaker’s built-in implementation. Its built-in implementation allowed us to run model training through a general estimator interface. This approach provided better logging, superior hyperparameter validation, and a larger set of metrics than the original implementation.

Rule-based model

In the rule-based approach, we reason that the differences of lap times ∆t primarily come from systematic variations of tire grip for each circuit and fuel level for each team between practice and qualifying sessions. After accounting for these known variations, we assume residuals are random small numbers with a mean of zero. ∆t can be modeled with the following equation:

∆tf(c) and ∆tg(c) are known sensitivities of fuel mass and tire grip, and  is the residual. A hierarchy exists among the factors contained in the equation. We assume grip variations for each circuit (g(c)) are at the top level. Under each circuit, there are variations of fuel level across teams (f(t,c)).

To further simplify the model, we neglect  because we assume it is small. We further assume fuel variation for each team across all circuits is the same (i.e., f(t,c) = f(t)). We can simplify the model to the following:

Because ∆tf(c) and ∆tg(c) are known, f(t) and g(c), we can estimate team fuel variations and tire grip variations from the data.

The differences in the sensitivities depend on the characteristics of circuits. From the following track maps, we can observe that the Italian GP circuit has fewer corner turns and the straight sections are longer compared to the Singapore GP circuit. Additional tire grip gives a larger advantage in the Singapore GP circuit.

 

ML regression model

For the ML regression method, we don’t directly model the relation between  and fuel level and grip variations. Instead, we fit the following regression model with just the circuit, team, and driver indicator variables:

Ic, It, and Id represent the indicator variables for circuits, teams, and drivers.

Hierarchical Bayesian model

Another challenge with modeling the race pace was due to noisy measurements in lap times. The magnitude of random effect (ϵ) of ∆t could be non-negligible. Such randomness might come from drivers’ accidental drift from their normal practice at the turns or random variations of drivers’ efforts during practice sessions. With deterministic approaches, such random effect wasn’t appropriately captured. Ideally, we wanted a model that could quantify uncertainty about the predictions. Therefore, we explored Bayesian sampling methods.

With a hierarchical Bayesian model, we account for the hierarchical structure of the error sources. As with the rule-based model, we assume grip variations for each circuit (g(c))) are at the top level. The additional benefit of a hierarchical Bayesian model is that it incorporates individual-level variations when estimating group-level coefficients. It’s a middle ground between two extreme views of data. One extreme is to pool data for every group (circuit and driver) without considering the intrinsic variations among groups. The other extreme is to train a regression model for each circuit or driver. With 21 circuits, this amounts to 21 regression models. With a hierarchical model, we have a single model that considers the variations simultaneously at the group and individual level.

We can mathematically describe the underlying statistical model for the hierarchical Bayesian approach as the following varying intercepts model:

Here, i represents the index of each data observation, j represents the index of each driver, and k represents the index of each circuit. μjk represents the varying intercept for each driver under each circuit, and θk represents the varying intercept for each circuit. wp and wq represent the wetness level of the track during practice and qualifying sessions, and ∆T represents the track temperature difference.

Test models in the 2020 races

After predicting ∆t, we added it into the practice lap times to generate predictions of qualifying lap times. We determined the final ranking based on the predicted qualifying lap times. Finally, we compared predicted lap times and rankings with the actual results.

The following figure compares the predicted rankings and the actual rankings for all three practice sessions for the Austria, Hungary, and Great Britain GPs in 2020 (we exclude P2 for the Hungary GP because the session was wet).

For the Bayesian model, we generated predictions with an uncertainty range based on the posterior samples. This enabled us to predict the ranking of the drivers relatively with the median while accounting for unexpected outcomes in the drivers’ performances.

The following figure shows an example of predicted qualifying lap times (in seconds) with an uncertainty range for selected drivers at the Austria GP. If two drivers’ prediction profiles are very close (such as MAG and GIO), it’s not surprising that either driver might be the faster one in the upcoming qualifying session.

Metrics on model performance

To compare the models, we used mean squared error (MSE) and mean absolute error (MAE) for lap time errors. For ranking errors, we used rank discounted cumulative gain (RDCG). Because only the top 10 drivers gain points during a race, we used RDCG to apply more weight to errors in the higher rankings. For the Bayesian model output, we used median posterior value to generate the metrics.

The following table shows the resulting metrics of each modeling approach for the test P2 and P3 sessions. The best model by each metric for each session is highlighted.

MODEL MSE MAE RDCG
  P2 P3 P2 P3 P2 P3
Practice raw 2.822 1.053 1.544 0.949 0.92 0.95
Rule-based 0.349 0.186 0.462 0.346 0.88 0.95
XGBoost 0.358 0.141 0.472 0.297 0.91 0.95
AutoGluon 0.567 0.351 0.591 0.459 0.90 0.96
Hierarchical Bayesian 0.431 0.186 0.521 0.332 0.87 0.92

All models reduced the qualifying lap time prediction errors significantly compared to directly using the practice session results. Using practice lap times directly without considering pace correction, the MSE on the predicted qualifying lap time was up to 2.8 seconds. With machine learning methods which automatically learned pace variation patterns for teams and drivers on different circuits, we brought the MSE down to smaller than half a second. The resulting prediction was a more accurate representation of the pace in the qualifying session. In addition, the models improved the prediction of rankings by a small margin. However, there was no one single approach that outperformed all others. This observation highlighted the effect of random errors on the underlying data.

Summary

In this post, we described a new Insight developed by the Amazon ML Solutions Lab in collaboration with Formula 1 (F1).

This work is part of the six new F1 Insights powered by AWS that are being released in 2020, as F1 continues to use AWS for advanced data processing and ML modeling. Fans can expect to see this new Insight unveiled at the 2020 Turkish GP to provide predictions for the upcoming qualifying races at practice sessions.

If you’d like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab .

 


About the Author

Guang Yang is a data scientist at the Amazon ML Solutions Lab where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art ML/AI solutions.



from AWS Machine Learning Blog https://ift.tt/3nqbL22
via A.I .Kung Fu

Thursday, November 12, 2020

AWS expands language support for Amazon Lex and Amazon Polly

At AWS, our mission is to enable developers and businesses with no prior machine learning (ML) expertise to easily build sophisticated, scalable, ML-powered applications with our AI services. Today, we’re excited to announce that Amazon Lex and Amazon Polly are expanding language support. You can build ML-powered applications that fit the language preferences of your users. These easy-to-use services allow you to add intelligence to your business processes, automate workstreams, reduce costs, and improve the user experience for your customers and employees in a variety of languages.

New and improved features

Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex now supports French, Spanish, Italian and Canadian French. With the addition of these new languages, you can build and expand your conversational experiences to better understand and engage your customer base in a variety of different languages and accents. Amazon Lex can be applied to a diverse set of use cases such as virtual agents, conversational IVR systems, self-service chatbots, or application bots. For a full list of languages, please go to Amazon Lex languages.

Amazon Polly, a service that turns text into lifelike speech offers voices for all Amazon Lex languages. Our first Australian English voice, Olivia, is now generally available in Neural Text-to-Speech (NTTS). Olivia’s unique vocal personality and voice sounds expressive, natural and is easy to follow. You can now choose among three Australian English voices: Russell, Nicole and Olivia. For a full list of Amazon Polly’s voices, please go to Amazon Polly voices.

“Growing demand for conversational experiences led us to launch Amazon Lex and Amazon Polly to enable businesses to connect with their customers more effectively,” shares Julien Simon, AWS AIML evangelist.

“Amazon Lex uses automatic speech recognition and natural language understanding to help organizations understand a customer’s intent, fluidly manage conversations and create highly engaging and lifelike interactions. We are delighted to advance the language capabilities of Lex and Polly. These launches allow our customers to take advantage of AI in the area of conversational interfaces and voice AI,” Simon says.

“Amazon Lex is a core AWS service that enables Accenture to deliver next-generation, omnichannel contact center solutions, such as our Advanced Customer Engagement (ACE+) platform, to a diverse set of customers. The addition of French, Italian, and Spanish to Amazon Lex will further enhance the accessibility of our global customer engagement solutions, while also vastly enriching and personalizing the overall experience for people whose primary language is not English. Now, we can quickly build interactive digital solutions based on Amazon’s deep learning expertise to deflect more calls, reduce contact center costs and drive a better customer experience in French, Italian, and Spanish-speaking markets. Amazon Lex can now improve customer satisfaction and localized brand awareness even more effectively,” says J.C. Novoa, Advanced Customer Engagement (ACE+) for Accenture.

Another example is Clevy, a French start-up and AWS customer. François Falala-Sechet, the CTO of Clevy adds, “At Clevy, we have been utilizing Amazon Lex’s best-in-class natural language processing services to help bring customers a scalable low-code approach to designing, developing, deploying and maintaining rich conversational experiences with more powerful and more integrated chatbots. With the addition of Spanish, Italian and French in Amazon Lex, Clevy can now help our developers deliver chatbot experiences to a more diverse audience in our core European markets.”

Eudata helps customers implement effective contact and management systems. Andrea Grompone, the Head of Contact Center Delivery at Eudata says, “Ora Amazon Lex parla in italiano! We are excited about the new opportunities this opens for Eudata. Amazon Lex simplifies the process of creating automated dialog-based interactions to address challenges we see in the market. The addition of Italian allows us to build a customer experience that ensures both service speed and quality in our markets.”

Using the new features

To use the new Amazon Lex languages, simply choose the language when creating a new bot via the  Amazon Lex console or AWS SDK. The following screenshot shows the console view.

To learn more, visit the Amazon Lex Development Guide.

You can use new Olivia voice in the Amazon Polly console, the AWS Command Line Interface (AWS CLI), or AWS SDK. The feature is available across all AWS Regions supporting NTTS. For the full list of available voices, see Voices in Amazon Polly, or log in to the Amazon Polly console to try it out for yourself.

Summary

Use Amazon Lex and Amazon Polly to build more self-service bots, to voice-enable applications, and to create an integrated voice and text experience for your customers and employees in a variety of languages. Try them out for yourself!

 


About the Author

Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.



from AWS Machine Learning Blog https://ift.tt/32AVoHK
via A.I .Kung Fu

AWS expands language support for Amazon Lex and Amazon Polly

At AWS, our mission is to enable developers and businesses with no prior machine learning (ML) expertise to easily build sophisticated, scalable, ML-powered applications with our AI services. Today, we’re excited to announce that Amazon Lex and Amazon Polly are expanding language support. You can build ML-powered applications that fit the language preferences of your users. These easy-to-use services allow you to add intelligence to your business processes, automate workstreams, reduce costs, and improve the user experience for your customers and employees in a variety of languages.

New and improved features

Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex now supports French, Spanish, Italian and Canadian French. With the addition of these new languages, you can build and expand your conversational experiences to better understand and engage your customer base in a variety of different languages and accents. Amazon Lex can be applied to a diverse set of use cases such as virtual agents, conversational IVR systems, self-service chatbots, or application bots. For a full list of languages, please go to Amazon Lex languages.

Amazon Polly, a service that turns text into lifelike speech offers voices for all Amazon Lex languages. Our first Australian English voice, Olivia, is now generally available in Neural Text-to-Speech (NTTS). Olivia’s unique vocal personality and voice sounds expressive, natural and is easy to follow. You can now choose among three Australian English voices: Russell, Nicole and Olivia. For a full list of Amazon Polly’s voices, please go to Amazon Polly voices.

“Growing demand for conversational experiences led us to launch Amazon Lex and Amazon Polly to enable businesses to connect with their customers more effectively,” shares Julien Simon, AWS AIML evangelist.

“Amazon Lex uses automatic speech recognition and natural language understanding to help organizations understand a customer’s intent, fluidly manage conversations and create highly engaging and lifelike interactions. We are delighted to advance the language capabilities of Lex and Polly. These launches allow our customers to take advantage of AI in the area of conversational interfaces and voice AI,” Simon says.

“Amazon Lex is a core AWS service that enables Accenture to deliver next-generation, omnichannel contact center solutions, such as our Advanced Customer Engagement (ACE+) platform, to a diverse set of customers. The addition of French, Italian, and Spanish to Amazon Lex will further enhance the accessibility of our global customer engagement solutions, while also vastly enriching and personalizing the overall experience for people whose primary language is not English. Now, we can quickly build interactive digital solutions based on Amazon’s deep learning expertise to deflect more calls, reduce contact center costs and drive a better customer experience in French, Italian, and Spanish-speaking markets. Amazon Lex can now improve customer satisfaction and localized brand awareness even more effectively,” says J.C. Novoa, Global Technical Evangelist – Advanced Customer Engagement (ACE+) for Accenture.

Another example is Clevy, a French start-up and AWS customer. François Falala-Sechet, the CTO of Clevy adds, “At Clevy, we have been utilizing Amazon Lex’s best-in-class natural language processing services to help bring customers a scalable low-code approach to designing, developing, deploying and maintaining rich conversational experiences with more powerful and more integrated chatbots. With the addition of Spanish, Italian and French in Amazon Lex, Clevy can now help our developers deliver chatbot experiences to a more diverse audience in our core European markets.”

Eudata helps customers implement effective contact and management systems. Andrea Grompone, the Head of Contact Center Delivery at Eudata says, “Ora Amazon Lex parla in italiano! We are excited about the new opportunities this opens for Eudata. Amazon Lex simplifies the process of creating automated dialog-based interactions to address challenges we see in the market. The addition of Italian allows us to build a customer experience that ensures both service speed and quality in our markets.”

Using the new features

To use the new Amazon Lex languages, simply choose the language when creating a new bot via the  Amazon Lex console or AWS SDK. The following screenshot shows the console view.

To learn more, visit the Amazon Lex Development Guide.

You can use new Olivia voice in the Amazon Polly console, the AWS Command Line Interface (AWS CLI), or AWS SDK. The feature is available across all AWS Regions supporting NTTS. For the full list of available voices, see Voices in Amazon Polly, or log in to the Amazon Polly console to try it out for yourself.

Summary

Use Amazon Lex and Amazon Polly to build more self-service bots, to voice-enable applications, and to create an integrated voice and text experience for your customers and employees in a variety of languages. Try them out for yourself!

 


About the Author

Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.



from AWS Machine Learning Blog https://ift.tt/32AVoHK
via A.I .Kung Fu

Join the Final Lap of the 2020 DeepRacer League at AWS re:Invent 2020

AWS DeepRacer is the fastest way to get rolling with machine learning (ML). It’s a fully autonomous 1/18th scale race car driven by reinforcement learning, a 3D racing simulator, and a global racing league. Throughout 2020, tens of thousands of developers honed their ML skills and competed in the League’s virtual circuit via the AWS DeepRacer console and 14 AWS Summit online events.

The AWS DeepRacer League’s 2020 season is nearing the final lap with the Championship at AWS re:Invent 2020. From November 10 through December 15, there are three ways to join in the racing fun: learn how to develop a competitive reinforcement learning model through our sessions, enter and compete in the racing action for a chance to win prizes, and watch to cheer on other developers as they race for the cup. More than 100 racers have already qualified for the Championship Cup, but there is still time to compete. Log in today to qualify for a chance to win the Championship Cup by entering the Wildcard round, offering the top 5 racers spots in the Knockout Rounds. Starting December 1, it’s time for the Knockout Rounds to start – and for racers to compete all the way to the checkered flag and the Championship Cup. The Grand Prize winner will receive a choice of either 10,000 USD AWS promotional credits and a chance to win an expenses-paid trip to an F1 Grand Prix in the upcoming 2021 season or a Coursera online Machine Learning degree scholarship with a maximum value of up to 20,000 USD. See our AWS DeepRacer 2020 Championships Official Rules for more details.

Watch the latest episode of DRTV news to learn more about how the Championship at AWS re:Invent 2020 will work.

Congratulations to our 2020 AWS re:Invent Championship Finalists!

Thanks to the thousands of developers who competed in the 2020 AWS DeepRacer League. Below is the list of our Virtual and Summit Online Circuit winners who qualified for the Championship at AWS re:Invent 2020.

Last chance for the Championship: Enter the Wildcard

Are you yet to qualify for the Championship Cup this season? Are you brand new to the league and want to take a shot at the competition? Well, you have one last chance to qualify with the Wildcard. Through November, the open-play wildcard race will be open. This race is a traditional virtual circuit style time trial race, taking place in the AWS DeepRacer console. Participants have until 11:59pm UTC November 30 (6:59pm EST, 3:59pm PST) to submit their fastest model. The top five competitors from the wildcard race will advance to the Championship Cup knockout.

Don’t worry if you don’t advance to the next round. There are chances for developers of all skill levels to compete and win at AWS re:Invent, including the AWS DeepRacer League open racing and special live virtual races. Visit our DeepRacer page for complete race schedule and additional details.

Here’s an overview of how the Championships are organized and how many racers participate in each round from qualifying through to the Grand Prix Finale.

Round 1: Live Group Knockouts

On December 1, racers need to be ready for anything in the championships, no matter what road blocks they may come across. In Round 1, competitors have the opportunity to participate in a brand-new live racing format on the console. Racers submit their best models and control maximum speed remotely from anywhere in the world, while their autonomous models attempt to navigate the track, complete with objects to avoid. They’ll have 3 minutes to try to achieve their single best lap to top the leaderboard. Racers will be split into eight groups based on their time zone, with start order determined by the warmup round (with the fastest racers from round 1 getting to go last in their group). The top four times in each group will advance to our bracket round. Tune in to AWS DeepRacer TV  throughout AWS re:Invent to catch the championship action. 

Round 2: Bracket Elimination

The top 32 remaining competitors will be placed into a single elimination bracket, where they face off against one another in a head-to-head format in a five-lap race. Head-to-head virtual matchups will proceed until eight racers remain. Results will be released on the AWS DeepRacer League page and in the console. 

Round 3: Grand Prix Finale

The final race will take place before the closing keynote on December 15 as an eight-person virtual Grand Prix. Similar to the F1 ProAm in May, our eight finalists will submit their model on the console and the AWS DeepRacer team will run the Grand Prix, where the eight racers simultaneously face off on the track in simulation, to complete five laps. The first car to successfully complete 5 laps and cross the finish line will be crowned the 2020 AWS DeepRacer Champion and officially announced at the closing keynote.

More Options for your ML Journey

If you’re ready to get over the starting line on your ML journey, AWS DeepRacer re:Invent sessions are the best place to learn ML fast.  In 2020, we have not one, not two, but three levels of ML content for aspiring developers to go from zero to hero in no time! Register now for AWS re:Invent to learn more about session schedules when they become available.

  • Get rolling with Machine Learning on AWS DeepRacer (200L). Get hands-on with AWS DeepRacer, including exciting announcements and enhancements coming to the league in 2021. Learn about the basics of machine learning and reinforcement learning (a machine learning technique ideal for autonomous driving). In this session, you can build a reinforcement learning model and submit that model to the AWS DeepRacer League for a chance to win prizes and glory.
  • Shift your Machine Learning model into overdrive with AWS DeepRacer analysis tools (300L). Make your way from the middle of the pack to the top of the AWS DeepRacer podium! This session extends your machine learning skills by exploring how human analysis of reinforcement learning through logs will improve your performance through trend identification and optimization to better prepare for new racing divisions coming to the league in 2021.
  • Replicate AWS DeepRacer architecture to master the track with SageMaker Notebooks (400L). Complete the final lap on your machine learning journey by demystifying the underlying architecture of AWS DeepRacer using Amazon SageMaker, AWS RoboMaker, and Amazon Kinesis Video Streams. Dive into SageMaker notebooks to learn how others have applied the skills acquired through AWS DeepRacer to real-world use cases and how you can apply your reinforcement learning models to relevant use cases.

You can take all the courses live during re:Invent or learn at your own speed on-demand. It’s up to you.  Visit the DeepRacer page at AWS re:Invent to register and find out more on when sessions will be available.

As you can see, there are many opportunities to up-level your ML skills, join in the racing action and cheer on developers as they go for the Championship Cup. Watch this page for schedule and video updates all through AWS re:Invent 2020!

 


About the Author

Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.



from AWS Machine Learning Blog https://ift.tt/36yqVer
via A.I .Kung Fu

Join the Final Lap of the 2020 DeepRacer League at AWS re:Invent 2020

AWS DeepRacer is the fastest way to get rolling with machine learning (ML). It’s a fully autonomous 1/18th scale race car driven by reinforcement learning, a 3D racing simulator, and a global racing league. Throughout 2020, tens of thousands of developers honed their ML skills and competed in the League’s virtual circuit via the AWS DeepRacer console and 14 AWS Summit online events.

The AWS DeepRacer League’s 2020 season is nearing the final lap with the Championship at AWS re:Invent 2020. From November 10 through December 15, there are three ways to join in the racing fun: learn how to develop a competitive reinforcement learning model through our sessions, enter and compete in the racing action for a chance to win prizes, and watch to cheer on other developers as they race for the cup. More than 100 racers have already qualified for the Championship Cup, but there is still time to compete. Log in today to qualify for a chance to win the Championship Cup by entering the Wildcard round, offering the top 5 racers spots in the Knockout Rounds. Starting December 1, it’s time for the Knockout Rounds to start – and for racers to compete all the way to the checkered flag and the Championship Cup. The Grand Prize winner will receive a choice of either 10,000 USD AWS promotional credits and a chance to win an expenses-paid trip to an F1 Grand Prix in the upcoming 2021 season or a Coursera online Machine Learning degree scholarship with a maximum value of up to 20,000 USD. See our AWS DeepRacer 2020 Championships Official Rules for more details.

Watch the latest episode of DRTV news to learn more about how the Championship at AWS re:Invent 2020 will work.

Congratulations to our 2020 AWS re:Invent Championship Finalists!

Thanks to the thousands of developers who competed in the 2020 AWS DeepRacer League. Below is the list of our Virtual and Summit Online Circuit winners who qualified for the Championship at AWS re:Invent 2020.

Last chance for the Championship: Enter the Wildcard

Are you yet to qualify for the Championship Cup this season? Are you brand new to the league and want to take a shot at the competition? Well, you have one last chance to qualify with the Wildcard. Through November, the open-play wildcard race will be open. This race is a traditional virtual circuit style time trial race, taking place in the AWS DeepRacer console. Participants have until 11:59pm UTC November 30 (6:59pm EST, 3:59pm PST) to submit their fastest model. The top five competitors from the wildcard race will advance to the Championship Cup knockout.

Don’t worry if you don’t advance to the next round. There are chances for developers of all skill levels to compete and win at AWS re:Invent, including the AWS DeepRacer League open racing and special live virtual races. Visit our DeepRacer page for complete race schedule and additional details.

Here’s an overview of how the Championships are organized and how many racers participate in each round from qualifying through to the Grand Prix Finale.

Round 1: Live Group Knockouts

On December 1, racers need to be ready for anything in the championships, no matter what road blocks they may come across. In Round 1, competitors have the opportunity to participate in a brand-new live racing format on the console. Racers submit their best models and control maximum speed remotely from anywhere in the world, while their autonomous models attempt to navigate the track, complete with objects to avoid. They’ll have 3 minutes to try to achieve their single best lap to top the leaderboard. Racers will be split into eight groups based on their time zone, with start order determined by the warmup round (with the fastest racers from round 1 getting to go last in their group). The top four times in each group will advance to our bracket round. Tune in to AWS DeepRacer TV  throughout AWS re:Invent to catch the championship action. 

Round 2: Bracket Elimination

The top 32 remaining competitors will be placed into a single elimination bracket, where they face off against one another in a head-to-head format in a five-lap race. Head-to-head virtual matchups will proceed until eight racers remain. Results will be released on the AWS DeepRacer League page and in the console. 

Round 3: Grand Prix Finale

The final race will take place before the closing keynote on December 15 as an eight-person virtual Grand Prix. Similar to the F1 ProAm in May, our eight finalists will submit their model on the console and the AWS DeepRacer team will run the Grand Prix, where the eight racers simultaneously face off on the track in simulation, to complete five laps. The first car to successfully complete 5 laps and cross the finish line will be crowned the 2020 AWS DeepRacer Champion and officially announced at the closing keynote.

More Options for your ML Journey

If you’re ready to get over the starting line on your ML journey, AWS DeepRacer re:Invent sessions are the best place to learn ML fast.  In 2020, we have not one, not two, but three levels of ML content for aspiring developers to go from zero to hero in no time! Register now for AWS re:Invent to learn more about session schedules when they become available.

  • Get rolling with Machine Learning on AWS DeepRacer (200L). Get hands-on with AWS DeepRacer, including exciting announcements and enhancements coming to the league in 2021. Learn about the basics of machine learning and reinforcement learning (a machine learning technique ideal for autonomous driving). In this session, you can build a reinforcement learning model and submit that model to the AWS DeepRacer League for a chance to win prizes and glory.
  • Shift your Machine Learning model into overdrive with AWS DeepRacer analysis tools (300L). Make your way from the middle of the pack to the top of the AWS DeepRacer podium! This session extends your machine learning skills by exploring how human analysis of reinforcement learning through logs will improve your performance through trend identification and optimization to better prepare for new racing divisions coming to the league in 2021.
  • Replicate AWS DeepRacer architecture to master the track with SageMaker Notebooks (400L). Complete the final lap on your machine learning journey by demystifying the underlying architecture of AWS DeepRacer using Amazon SageMaker, AWS RoboMaker, and Amazon Kinesis Video Streams. Dive into SageMaker notebooks to learn how others have applied the skills acquired through AWS DeepRacer to real-world use cases and how you can apply your reinforcement learning models to relevant use cases.

You can take all the courses live during re:Invent or learn at your own speed on-demand. It’s up to you.  Visit the DeepRacer page at AWS re:Invent to register and find out more on when sessions will be available.

As you can see, there are many opportunities to up-level your ML skills, join in the racing action and cheer on developers as they go for the Championship Cup. Watch this page for schedule and video updates all through AWS re:Invent 2020!

 


About the Author

Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.



from AWS Machine Learning Blog https://ift.tt/36yqVer
via A.I .Kung Fu

Tuesday, November 10, 2020

Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation

Amazon SageMaker is a fully managed service that provides every machine learning (ML) developer and data scientist with the ability to build, train, and deploy ML models quickly. Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models. Amazon SageMaker Studio provides all the tools you need to take your models from experimentation to production while boosting your productivity. You can write code, track experiments, visualize data, and perform debugging and monitoring within a single, integrated visual interface.

This post outlines how to configure access control for teams or groups within Amazon SageMaker Studio using attribute-based access control (ABAC). ABAC is a powerful approach that you can utilize to configure Studio so that different ML and data science teams have complete isolation of team resources.

We provide guidance on how to configure Amazon SageMaker Studio access for both AWS Identity and Access Management (IAM) and AWS Single Sign-On (AWS SSO) authentication methods. This post helps you set up IAM policies for users and roles using ABAC principals. To demonstrate the configuration, we set up two teams as shown in the following diagram and showcase two use cases:

  • Use case 1 – Only User A1 can access their studio environment; User A2 can’t access User A1’s environment, and vice versa
  • Use case 2 – Team B users cannot access artifacts (experiments, etc.) created by Team A members

You can configure policies according to your needs. You can even include a project tag in case you want to further restrict user access by projects within a team. The approach is very flexible and scalable.

Authentication

Amazon SageMaker Studio supports the following authentication methods for onboarding users. When setting up Studio, you can pick an authentication method that you use for all your users:

  • IAM – Includes the following:
    • IAM users – Users managed in IAM
    • AWS account federation – Users managed in an external identity provider (IdP)
  • AWS SSO – Users managed in an external IdP federated using AWS SSO

Data science user personas

The following table describes two different personas that interact with Amazon SageMaker Studio resources and the level of access they need to fulfill their duties. We use this table as a high-level requirement to model IAM roles and policies to establish desired controls based on resource ownership at the team and user level.

User Personas Permissions
Admin User

Create, modify, delete any IAM resource.

Create Amazon SageMaker Studio user profiles with a tag.

Sign in to the Amazon SageMaker console.

Read and describe Amazon SageMaker resources.

Data Scientists or Developers

Launch an Amazon SageMaker Studio IDE assigned to a specific IAM or AWS SSO user.

Create Amazon SageMaker resources with necessary tags. For this post, we use the team tag.

Update, delete, and run resources created with a specific tag.

Sign in to the Amazon SageMaker console if an IAM user.

Read and describe Amazon SageMaker resources.

Solution overview

We use the preceding requirements to model roles and permissions required to establish controls. The following flow diagram outlines the different configuration steps:

Applying your policy to the admin user

You should apply the following policy to the admin user who creates Studio user profiles. This policy requires the admin to include the studiouserid tag. You could use a different name for the tag if need be. The Studio console doesn’t allow you to add tags when creating user profiles, so we use the AWS Command Line Interface (AWS CLI).

For admin users managed in IAM, attach the following policy to the user. For admin users managed in an external IdP, add the following policy to the rule that the user assumes upon federation. The following policy enforces the studiouserid tag to be present when the sagemaker:CreateUserProfile action is invoked.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CreateSageMakerStudioUserProfilePolicy",
            "Effect": "Allow",
            "Action": "sagemaker:CreateUserProfile",
            "Resource": "*",
            "Condition": {
                "ForAnyValue:StringEquals": {
                    "aws:TagKeys": [
                        "studiouserid"
                    ]
                }
            }
        }
    ]
}

AWS SSO doesn’t require this policy; it performs the identity check.

Assigning the policy to Studio users

The following policy limits Studio access to the respective users by requiring the resource tag to match the user name for the sagemaker:CreatePresignedDomainUrl action. When a user tries to access the Amazon SageMaker Studio launch URL, this check is performed.

For IAM users, attach the following policy to the user. Use the user name for the studiouserid tag value.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerPresignedUrlPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreatePresignedDomainUrl"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "sagemaker:ResourceTag/studiouserid": "${aws:username}" 
                }
            }
        }
    ]
}

For AWS account federation, attach the following policy to role that the user assumes after federation:

{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Sid": "AmazonSageMakerPresignedUrlPolicy",
           "Effect": "Allow",
           "Action": [
                "sagemaker:CreatePresignedDomainUrl"
           ],
           "Resource": "*",
           "Condition": {
                  "StringEquals": {
                      "sagemaker:ResourceTag/studiouserid": "${aws:PrincipalTag/studiouserid}"
                 }
            }
      }
  ]
}

Add the following statement to this policy in the Trust Relationship section. This statement defines the allowed transitive tag.

"Statement": [
     {
        --Existing statements
      },
      {
      "Sid": "IdentifyTransitiveTags",
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<account id>:saml-provider/<identity provider>"
      },
      "Action": "sts:TagSession",
      "Condition": {
        "ForAllValues:StringEquals": {
          "sts:TransitiveTagKeys": [
            "studiouserid"
          ]
        }
      }
  ]

For users managed in AWS SSO, this policy is not required. AWS SSO performs the identity check.

Creating roles for the teams

To create roles for your teams, you must first create the policies. For simplicity, we use the same policies for both teams. In most cases, you just need one set of policies for all teams, but you have the flexibility to create different policies for different teams. In the second step, you create a role for each team, attach the policies, and tag the roles with appropriate team tags.

Creating the policies

Create the following policies. For this post, we split them into three policies for more readability, but you can create them according to your needs.

Policy 1: Amazon SageMaker read-only access

The following policy gives privileges to List and Describe Amazon SageMaker resources. You can customize this policy according to your needs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerDescribeReadyOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:Describe*",
                "sagemaker:GetSearchSuggestions"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerListOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:List*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerUIandMetricsOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:*App",
                "sagemaker:Search",
                "sagemaker:RenderUiTemplate",
                "sagemaker:BatchGetMetrics"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerEC2ReadOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcEndpoints",
                "ec2:DescribeVpcs"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerIAMReadOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "iam:ListRoles"
            ],
            "Resource": "*"
        }
    ]
}

Policy 2: Amazon SageMaker access for supporting services

The following policy gives privileges to create, read, update, and delete access to Amazon Simple Storage Service (Amazon S3), Amazon Elastic Container Registry (Amazon ECR), and Amazon CloudWatch, and read access to AWS Key Management Service (AWS KMS). You can customize this policy according to your needs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerCRUDAccessS3Policy",
            "Effect": "Allow",
            "Action": [
"s3:PutObject",
"s3:GetObject",
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:CreateBucket",
"s3:ListBucket",
"s3:PutBucketCORS",
"s3:ListAllMyBuckets",
"s3:GetBucketCORS",
                "s3:GetBucketLocation"         
              ],
            "Resource": "<S3 BucketName>"
        },
        {
            "Sid": "AmazonSageMakerReadOnlyAccessKMSPolicy",
            "Effect": "Allow",
            "Action": [
                "kms:DescribeKey",
                "kms:ListAliases"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerCRUDAccessECRPolicy",
            "Effect": "Allow",
            "Action": [
"ecr:Set*",
"ecr:CompleteLayerUpload",
"ecr:Batch*",
"ecr:Upload*",
"ecr:InitiateLayerUpload",
"ecr:Put*",
"ecr:Describe*",
"ecr:CreateRepository",
"ecr:Get*",
                        "ecr:StartImageScan"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerCRUDAccessCloudWatchPolicy",
            "Effect": "Allow",
            "Action": [
"cloudwatch:Put*",
"cloudwatch:Get*",
"cloudwatch:List*",
"cloudwatch:DescribeAlarms",
"logs:Put*",
"logs:Get*",
"logs:List*",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:ListLogDeliveries",
"logs:Describe*",
"logs:CreateLogDelivery",
"logs:PutResourcePolicy",
                        "logs:UpdateLogDelivery"
            ],
            "Resource": "*"
        }
    ]
} 

Policy 3: Amazon SageMaker Studio developer access

The following policy gives privileges to create, update, and delete Amazon SageMaker Studio resources.
It also enforces the team tag requirement during creation. In addition, it enforces start, stop, update, and delete actions on resources restricted only to the respective team members.

The team tag validation condition in the following code makes sure that the team tag value matches the principal’s team. Refer to the bolded code for specifcs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerStudioCreateApp",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateApp"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerStudioIAMPassRole",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerInvokeEndPointRole",
            "Effect": "Allow",
            "Action": [
                "sagemaker:InvokeEndpoint"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerAddTags",
            "Effect": "Allow",
            "Action": [
                "sagemaker:AddTags"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerCreate",
            "Effect": "Allow",
            "Action": [
                "sagemaker:Create*"
            ],
            "Resource": "*",
            "Condition": { "ForAnyValue:StringEquals": { "aws:TagKeys": [ "team" ] }, "StringEqualsIfExists": { "aws:RequestTag/team": "${aws:PrincipalTag/team}" } }
        },
        {
            "Sid": "AmazonSageMakerUpdateDeleteExecutePolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:Delete*",
                "sagemaker:Stop*",
                "sagemaker:Update*",
                "sagemaker:Start*",
                "sagemaker:DisassociateTrialComponent",
                "sagemaker:AssociateTrialComponent",
                "sagemaker:BatchPutMetrics"
            ],
            "Resource": "*",
            "Condition": { "StringEquals": { "aws:PrincipalTag/team": "${sagemaker:ResourceTag/team}" } }
        }
    ]
}

Creating and configuring the roles

You can now create a role for each team with these policies. Tag the roles on the IAM console or with the CLI command. The steps are the same for all three authentication types. For example, tag the role for Team A with the tag key= team and value = “<Team Name>”.

Creating the Amazon SageMaker Studio user profile

In this step, we add the studiouserid tag when creating Studio user profiles. The steps are slightly different for each authentication type.

IAM users

For IAM users, you create Studio user profiles for each user by including the role that was created for the team the user belongs to. The following code is a sample CLI command. As of this writing, including a tag when creating a user profile is available only through AWS CLI.

aws sagemaker create-user-profile --domain-id <domain id> --user-profile-name <unique profile name> --tags Key=studiouserid,Value=<aws user name> --user-settings ExecutionRole=arn:aws:iam::<account id>:role/<Team Role Name>

AWS account federation

For AWS account federation, you create a user attribute (studiouserid) in an external IdP with a unique value for each user. The following code shows how to configure the attribute in Okta:

Example below shows how to add “studiouserid” attribute in OKTA. In OKTA’s SIGN ON METHODS screen, configure following SAML 2.0 attributes, as shown in the image below. 

Attribute 1:
Name: https://aws.amazon.com/SAML/Attributes/PrincipalTag:studiouserid 
Value: user.studiouserid

Attribute 2:
Name: https://aws.amazon.com/SAML/Attributes/TransitiveTagKeys
Value: {"studiouserid"}

The following screenshot shows the attributes on the Okta console.

Next, create the user profile using the following command. Use the user attribute value in the preceding step for the studiouserid tag value.

aws sagemaker create-user-profile --domain-id <domain id> --user-profile-name <unique profile name> --tags Key=studiouserid,Value=<user attribute value> --user-settings ExecutionRole=arn:aws:iam::<account id>:role/<Team Role Name>

AWS SSO

For instructions on assigning users in AWS SSO, see Onboarding Amazon SageMaker Studio with AWS SSO and Okta Universal Directory.

Update the Studio user profile to include the appropriate execution role that was created for the team that the user belongs to. See the following CLI command:

aws sagemaker update-user-profile --domain-id <domain id> --user-profile-name <user profile name> --user-settings ExecutionRole=arn:aws:iam::<account id>:role/<Team Role Name> --region us-west-2

Validating that only assigned Studio users can access their profiles

When a user tries to access a Studio profile that doesn’t have studiouserid tag value matching their user name, an AccessDeniedException error occurs. You can test this by copying the link for Launch Studio on the Amazon SageMaker console and accessing it when logged in as a different user. The following screenshot shows the error message.

Validating that only respective team members can access certain artifacts

In this step, we show how to configure Studio so that members of a given team can’t access artifacts that another team creates.

In our use case, a Team A user creates an experiment and tags that experiment with the team tag. This limits access to this experiment to Team A users only. See the following code:

import sys
!{sys.executable} -m pip install sagemaker
!{sys.executable} -m pip install sagemaker-experiments

import time
import sagemaker
from smexperiments.experiment import Experiment

demo_experiment = Experiment.create(experiment_name = "USERA1TEAMAEXPERIMENT1",
                                    description = "UserA1 experiment",
                                    tags = [{'Key': 'team', 'Value': 'TeamA'}])

If a user who is not in Team A tries to delete the experiment, Studio denies the delete action. See the following code:

#command run from TeamB User Studio Instance
import time
from smexperiments.experiment import Experiment
experiment_to_cleanup = Experiment.load(experiment_name="USERA1TEAMAEXPERIMENT1")
experiment_to_cleanup.delete()

[Client Error]
An error occurred (AccessDeniedException) when calling the DeleteExperiment operation: User: arn:aws:sts:: :<AWS Account ID>::assumed-role/ SageMakerStudioDeveloperTeamBRole/SageMaker is not authorized to perform: sagemaker:DeleteExperiment on resource: arn:aws:sagemaker:us-east-1:<AWS Account ID>:experiment/usera1teamaexperiment1

Conclusion

In this post, we demonstrated how to isolate Amazon SageMaker Studio access using the ABAC technique. We showcased two use cases: restricting access to a Studio profile to only the assigned user (using the studiouserid tag) and restricting access to Studio artifacts to team members only. We also showed how to limit access to experiments to only the members of the team using the team tag. You can further customize policies by applying more tags to create more complex hierarchical controls.

Try out this solution for isolating resources by teams or groups in Amazon SageMaker Studio. For more information about using ABAC as an authorization strategy, see What is ABAC for AWS?


About the Authors

Vikrant Kahlir is Senior Solutions Architect in the Solutions Architecture team. He works with AWS strategic customers product and engineering teams to help them with technology solutions using AWS services for Managed Databases, AI/ML, HPC, Autonomous Computing, and IoT.

 

 

 

Rakesh Ramadas is an ISV Solution Architect at Amazon Web Services. His focus areas include AI/ML and Big Data.

 

 

 

 

Rama Thamman is a Software Development Manager with the AI Platforms team, leading the ML Migrations team.



from AWS Machine Learning Blog https://ift.tt/38vljnT
via A.I .Kung Fu