Cross Region Inference
Cross-Region Inference
To manage traffic during on-demand inferencing by utilising compute across regions, you can use AWS Bedrock Cross-Region Inference. While setting model ID in TrueFoundry, use the Inference Profile ID instead of the model ID.
You can more information about cross-region inferencing here.
Use Inference Profile ID as model ID while adding model to TFY AI Gateway
How Cross-Region Inference Works
AWS Bedrock supports cross-Region inference using system-defined inference profiles, which help manage unexpected traffic spikes by distributing inference workloads across multiple AWS Regions. This lets you maintain availability and performance by leveraging compute resources beyond the region where the request originated.
For example
Cross-Region Inference
As we can see in the above example, we can wee that model us us.anthropic.claude-sonnet-4-20250514-v1:0 is availabile in us-west-2, us-east-1, us-east-2 regions. When you make a request to this model from any of these regions (us-west-2, us-east-1, us-east-2), AWS Bedrock can automatically reroute the request to any of the other destination regions based on factors like:
- Current load and capacity in each region
- Network latency and performance
- Regional availability and health
This automatic routing helps ensure optimal performance and reliability by distributing traffic across multiple regions, even if the original request region is experiencing high load or other issues.
If you want to know more about regions supported by AWS Bedrock for models, you can check here.
Access Control
When using system-defined inference profiles for cross-Region routing, you must ensure proper access control configuration:
- Grant model access permissions in all destination Regions where traffic may be routed for inference
- Access permissions in just the source Region are not sufficient
This is important because AWS Bedrock may dynamically route requests to any of the configured destination Regions based on factors like load and availability. Without proper permissions in all potential destination Regions, some inference requests may fail even if the source Region permissions are correctly set up.
To give access to the model in a region, you need to add the model in that region. For example: Below is the model access page in eu-central-1 region. https://eu-central-1.console.aws.amazon.com/bedrock/home?region=eu-central-1#/modelaccess