SAA 考试每日练习 - 2024/11/28

来源：Amazon AWS Certified Solutions Architect - Associate SAA-C03 Exam
15 题 (No.101 ~ No.115)，仅供自己复习使用。
如果侵权请联系删除。

🌟 单词：

retrieve_{v. 重新得到，收回，取回；检索；挽回，补救｜n. 恢复，取回；检索}
indicate_{v. 显示，表明；暗示；是……的标志，象征}

一、Private subnet connect internet

A solutions architect is designing a VPC with public and private subnets. The VPC and subnets use IPv4 CIDR blocks. There is one public subnet and one private subnet in each of three Availability Zones (AZs) for high availability. An internet gateway is used to provide internet access for the public subnets. The private subnets require access to the internet to allow Amazon EC2 instances to download software updates. What should the solutions architect do to enable Internet access for the private subnets?

✅ Create three NAT gateways, one for each public subnet in each AZ. Create a private route table for each AZ that forwards non-VPC traffic to the NAT gateway in its AZ.
Create three NAT instances, one for each private subnet in each AZ. Create a private route table for each AZ that forwards non-VPC traffic to the NAT instance in its AZ.
Create a second internet gateway on one of the private subnets. Update the route table for the private subnets that forward non-VPC traffic to the private internet gateway.
Create an egress-only internet gateway on one of the public subnets. Update the route table for the private subnets that forward non-VPC traffic to the egress-only Internet gateway.

✨ 关键词：private subnets require access to the internet

1️⃣ ✅

💡 解析：同一 VPC 的三个 可用区 中，每个都有一个 公有子网 和一个 私有子网，现在想私有子网的 EC2 实例能够访问互联网进行软件更新。
使用 NAT 网关或者 NAT 实例都能解决。NAT 网关和 NAT 实例都需要部署在公有子网中，并在私有子网将流量路由到对应网关或实例。

👨‍👨‍👦‍👦 社区讨论：NAT Instances - OUTDATED BUT CAN STILL APPEAR IN THE EXAM!
However, given that A provides the newer option of NAT Gateway, then A is the correct answer.
B would be correct if NAT Gateway wasn’t an option.

（这里答主就犯了错误，2️⃣ 错在 NAT instances 需要建立在公有子网中）
🙋‍♂️ 回复：NAT instance or NAT Gateway always created in public subnet to provide internet access to private subnet. In option B. they are creating NAT Instance in private subnet which is not correct.

二、DataSync

A company wants to migrate an on-premises data center to AWS. The data center hosts an SFTP server that stores its data on an NFS-based file system. The server holds 200 GB of data that needs to be transferred. The server must be hosted on an Amazon EC2 instance that uses an Amazon Elastic File System (Amazon EFS) file system.
Which combination of steps should a solutions architect take to automate this task? (Choose two.)

Launch the EC2 instance into the same Availability Zone as the EFS file system.
✅ Install an AWS DataSync agent in the on-premises data center.
Create a secondary Amazon Elastic Block Store (Amazon EBS) volume on the EC2 instance for the data.
Manually use an operating system copy command to push the data to the EC2 instance.
✅ Use AWS DataSync to create a suitable location configuration for the on-premises SFTP server.

✨ 关键词：automate

2️⃣ 5️⃣ ✅

💡 解析：200 GB 文件存储在基于 NFS 的文件系统上，并通过 SFTP 提供访问。数据需要传输到 AWS，要求最终服务部署在 EC2 实例上并使用 EFS。问如何实现自动化。
EFS 是天生跨可用区高可用的，~~1️⃣ 存在错误~~（可以选单可用区部署，因此不存在逻辑错误，）。不过这个操作没有那么有必要，多做和少做仅可能对文件传输的速度和质量有些微影响。
数据的传输需要使用到 DataSync，2️⃣ 不存在问题。
3️⃣ 提到创建一个新的 EBS 卷挂载到 EC2 实例上，没有意义。
4️⃣ 手动执行命令将数据推送到 EC2 实例，而 5️⃣ 则在本地 SFTP 服务上配置 DataSync，显然 5️⃣ 可行且更优雅。

确认下 EFS 的知识点：什么是 Amazon Elastic File System？

这项服务在可扩展性、可用性和持久性方面都十分出众。Amazon EFS 提供以下文件系统类型以满足您的可用性和持久性需求：

区域（推荐）— 区域文件系统（推荐）将数据冗余存储在同一区域内的多个地理位置分开的可用区。 AWS 区域跨多个可用区存储数据可为数据提供持续可用性，即使其中一个或多个可用区不可用 AWS 区域也是如此。

一个可用区 — 一个区域文件系统将数据存储在单个可用区内。将数据存储在单个可用区可为数据提供持续可用性。但是，在不太可能发生的全部或部分可用区丢失或损坏的情况下，存储在这些类型的文件系统中的数据可能会丢失。

比较令人意外的是 EFS 其实也是可以选单可用区的，那么 1️⃣ 其实就不存在逻辑错误了。

什么是 AWS DataSync？

以下是 DataSync 的主要使用案例：
…

迁移数据 - 通过网络将活动数据集快速移动到AWS存储服务中。DataSync包括自动加密和数据完整性验证，以帮助确保您的数据安全、完好无损地到达并随时可用。

复制数据 - 将数据复制到任何 Amazon S3 存储类中，根据您的需求选择最具成本效益的存储类别。您还可以将数据发送到亚马逊 EFS、FSx for Windows File Server、适用于 Lustre 的 FsX 或备用文件系统的适用于 OpenZFS 的 FsX。

通过使用 DataSync，您可以获得以下好处：
…

自动移动数据 - DataSync 使通过网络在存储系统和服务之间移动数据变得更加容易。DataSync自动管理数据传输过程和高性能和安全数据传输所需的基础架构。

👨‍👨‍👦‍👦 社区讨论：nswer and HOW-TO B. Install an AWS DataSync agent in the on-premises data center.
E. Use AWS DataSync to create a suitable location configuration for the on-premises SFTP server.

To automate the process of transferring the data from the on-premises SFTP server to an EC2 instance with an EFS file system, you can use AWS DataSync. AWS DataSync is a fully managed data transfer service that simplifies, automates, and accelerates transferring data between on-premises storage systems and Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server.
To use AWS DataSync for this task, you should first install an AWS DataSync agent in the on-premises data center. This agent is a lightweight software application that you install on your on-premises data source. The agent communicates with the AWS DataSync service to transfer data between the data source and target locations.

三、AWS Glue

A company has an AWS Glue extract, transform, and load (ETL) job that runs every day at the same time. The job processes XML data that is in an Amazon S3 bucket. New data is added to the S3 bucket every day. A solutions architect notices that AWS Glue is processing all the data during each run.
What should the solutions architect do to prevent AWS Glue from reprocessing old data?

✅ Edit the job to use job bookmarks.
❌ Edit the job to delete data after the data is processed.
Edit the job by setting the NumberOfWorkers field to 1.
Use a FindMatches machine learning (ML) transform.

✨ 关键词：Glue is processing all the data during each run.

2️⃣ ❌ -> 1️⃣ ✅

💡 解析：公司有个 Glue 应用在每天的同一时间进行转换和数据处理工作。处理 S3 存储桶内的 XML 文件并添加回 S3。架构师发现每次 Glue 都处理所有数据，需要做什么来让 Glue 不二次处理旧数据。
删除已经处理过的数据似乎是个不错的主意。但是针对 Glue 官方提供了作业书签 (job bookmarks) 来维护状态信息，防止重新处理旧数据。

什么是 AWS Glue？

AWS Glue 是一项无服务器数据集成服务，可让使用分析功能的用户轻松发现、准备、移动和集成来自多个来源的数据。您可以将其用于分析、机器学习和应用程序开发。它还包括用于编写、运行任务和实施业务工作流程的额外生产力和数据操作工具。

通过使用 AWS Glue，您可以发现并连接到 70 多个不同的数据来源，并在集中式数据目录中管理您的数据。您可以直观地创建、运行和监控“提取、转换、加载（ETL）”管道，以将数据加载到数据湖中。此外，您可以使用 Amazon Athena、Amazon EMR 和 Amazon Redshift Spectrum 立即搜索和查询已编目数据。

使用作业书签 (job bookmarks) 跟踪已处理的数据

AWS Glue 通过保存作业运行的状态信息来跟踪上次运行 ETL 作业期间已处理的数据。此持久状态信息称为作业书签。作业书签可帮助 AWS Glue 维护状态信息，并可防止重新处理旧数据。有了作业书签，您可以在按照计划的时间间隔重新运行时处理新数据。作业书签包含作业的各种元素的状态，如源、转换和目标。例如，您的 ETL 任务可能会读取 Amazon S3 文件中的新分区。AWS Glue 跟踪任务已成功处理哪些分区，以防止任务的目标数据存储中出现重复处理和重复数据。

👨‍👨‍👦‍👦 社区讨论：his is the purpose of bookmarks: “AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data.”
https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

四、DDoS attack

A solutions architect must design a highly available infrastructure for a website. The website is powered by Windows web servers that run on Amazon EC2 instances. The solutions architect must implement a solution that can mitigate a large-scale DDoS attack that originates from thousands of IP addresses. Downtime is not acceptable for the website.
Which actions should the solutions architect take to protect the website from such an attack? (Choose two.)

✅ Use AWS Shield Advanced to stop the DDoS attack.
Configure Amazon GuardDuty to automatically block the attackers.
✅ Configure the website to use Amazon CloudFront for both static and dynamic content.
Use an AWS Lambda function to automatically add attacker IP addresses to VPC network ACLs.
Use EC2 Spot Instances in an Auto Scaling group with a target tracking scaling policy that is set to 80% CPU utilization.

✨ 关键词：HA、Windows web servers、DDoS attack from thousands of IP addresses、Downtime is not acceptable

1️⃣ 3️⃣ ✅

💡 解析：网页允许在 Windows 服务器上，在来自数千个 IP 发起的 DDoS 攻击下需要不离线。
首先使用 Shield Advanced 进行 DDoS 防护是必选项。
同时使用 CloudFront 来保护源站 IP 不泄漏也是个不错的选择。社区还有人提到了使用 CloudFront 后流量会被分流到各个地区的边缘节点上，缓解了 DDoS 攻击的压力。

Amazon GuardDuty 通过智能威胁检测，保护您的 AWS 账户、工作负载和数据

Amazon GuardDuty 将 ML 与来自 AWS 和领先第三方的集成威胁情报相结合，帮助保护您的 AWS 账户、工作负载和数据免受威胁侵害。

Amazon GuardDuty 只是威胁检测工具，无法抵御攻击。

👨‍👨‍👦‍👦 社区讨论：I think it is AC, reason is they require a solution that is highly available. AWS Shield can handle the DDoS attacks. To make the solution HA you can use cloud front. AC seems to be the best answer imo.
AB seem like redundant answers. How do those answers make the solution HA?

CloudFront is a content delivery network (CDN) that integrates with other Amazon Web Services products, such as Amazon S3 and Amazon EC2, to deliver content to users with low latency and high data transfer speeds. By using CloudFront, the solutions architect can distribute the website’s content across multiple edge locations, which can help absorb the impact of a DDoS attack and reduce the risk of downtime for the website.

五、Lambda policy

A company is preparing to deploy a new serverless workload. A solutions architect must use the principle of least privilege to configure permissions that will be used to run an AWS Lambda function. An Amazon EventBridge (Amazon CloudWatch Events) rule will invoke the function.
Which solution meets these requirements?

Add an execution role to the function with lambda:InvokeFunction as the action and * as the principal.
Add an execution role to the function with lambda:InvokeFunction as the action and Service: lambda.amazonaws.com as the principal.
Add a resource-based policy to the function with lambda:* as the action and Service: events.amazonaws.com as the principal.
✅ Add a resource-based policy to the function with lambda:InvokeFunction as the action and Service: events.amazonaws.com as the principal.

✨ 关键词：PoLP for AWS Lambda function、EventBridge

4️⃣ ✅

💡 解析：需要在 Lambda 方法上实施最小权限原则，让 EventBridge 能调用它。
1️⃣ 是创建角色权限，将动作设置为 lambda:InvokeFunction，主体设置为。结果是所有人都能调用。
2️⃣ 是创建角色权限，将动作设置为 lambda:InvokeFunction，主体设置为 Service: lambda.amazonaws.com。结果是让 lambda 服务自己能调用方法。
3️⃣ 是创建资源策略，将动作设置为 lambda:，主体设置为 Service: events.amazonaws.com。让事件对 Lambda 做任何操作。
4️⃣ 是创建资源策略，将动作设置为 lambda:InvokeFunction，主体设置为 Service: events.amazonaws.com。让事件能调用方法。
显然答案是 4️⃣。

官方给了个一幕一样的样例：AWS Lambda 权限
要使用 EventBridge 规则调用您的 AWS Lambda 函数，请在 Lambda 函数的策略中添加以下权限。
{
  "Effect": "Allow",
  "Action": "lambda:InvokeFunction",
  "Resource": "arn:aws:lambda:region:account-id:function:function-name",
  "Principal": {
    "Service": "events.amazonaws.com"
  },
  "Condition": {
    "ArnLike": {
      "AWS:SourceArn": "arn:aws:events:region:account-id:rule/rule-name"
    }
  },
  "Sid": "InvokeLambdaFunction"
}

👨‍👨‍👦‍👦 社区讨论：Best way to check it… The question is taken from the example shown here in the documentation:
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-use-resource-based.html#eb-lambda-permissions

Why other options are wrong
Option A is incorrect because it grants the lambda:InvokeFunction action to any principal (), which would allow any entity to invoke the function and goes beyond the minimum permissions needed.
Option B is incorrect because it grants the lambda:InvokeFunction action to the Service: lambda.amazonaws.com principal, which would allow any Lambda function to invoke the function and goes beyond the minimum permissions needed.
Option C is incorrect because it grants the lambda: action to the Service: events.amazonaws.com principal, which would allow Amazon EventBridge to perform any action on the function and goes beyond the minimum permissions needed.

六、KMS

A company is preparing to store confidential data in Amazon S3. For compliance reasons, the data must be encrypted at rest.
Encryption key usage must be logged for auditing purposes. Keys must be rotated every year.
Which solution meets these requirements and is the MOST operationally efficient?

Server-side encryption with customer-provided keys (SSE-C)
Server-side encryption with Amazon S3 managed keys (SSE-S3)
Server-side encryption with AWS KMS keys (SSE-KMS) with manual rotation
✅ Server-side encryption with AWS KMS keys (SSE-KMS) with automatic rotation

✨ 关键词：Encryption key usage must be logged

4️⃣ ✅

💡 解析：重要数据需要加密存储在 S3 上，且加密密钥的使用记录需要被保存，密钥还需要每年轮转。需要效率最高的解决方法。
需要引入 KMS 进行密钥轮转和使用记录，然后开启自动轮转。

👨‍👨‍👦‍👦 社区讨论：The MOST operationally efficient one is D.
Automating the key rotation is the most efficient.
Just to confirm, the A and B options don’t allow automate the rotation as explained here:
https://aws.amazon.com/kms/faqs/#:~:text=You%20can%20choose%20to%20have%20AWS%20KMS%20automatically%20rotat e%20KMS,KMS%20custom%20key%20store%20feature

七、Data storing and retrieving

A bicycle sharing company is developing a multi-tier architecture to track the location of its bicycles during peak operating hours. The company wants to use these data points in its existing analytics platform. A solutions architect must determine the most viable multi-tier option to support this architecture. The data points must be accessible from the REST API.
Which action meets these requirements for storing and retrieving_检索 location data?

Use Amazon Athena with Amazon S3.
Use Amazon API Gateway with AWS Lambda.
❌ Use Amazon QuickSight with Amazon Redshift.
✅ Use Amazon API Gateway with Amazon Kinesis Data Analytics.

✨ 关键词：existing analytics platform、data points must be accessible from the REST API

3️⃣ ❌ -> 4️⃣ ✅

💡 解析：多层架构跟踪自行车的坐标数据，并使用现有的分析平台进行分析。数据需要能通过 REST API 进行操作。需要兼顾保存和检索的方案。
QuickSight 是商业分析服务，Redshift 是数据湖，如果 Redshift 支持 REST API 操作的话，看起来就是四个选项中最优的解决方案。
Amazon Kinesis Data Analytics 也是用于数据流分析的，但是和题目中要求的使用现有分析平台冲突了。
但是社区的投票比例是 B 为 48%、D 为 45%、A 为 7%，我没法理解。

看起来 API Gateway 在出现 REST API 字眼后就是必须的了，因此可以排除 A 和 C。
而再根据要使用现有的分析平台所以可以排除 4️⃣。
最终只能选 2️⃣。

争议其实就在数据的存储上，4️⃣ 的支持者倾向于表示 Lambda 不具有数据存储功能。
我觉得 4️⃣ 更合理，因为官方明确表示了 Amazon Kinesis Data Analytics 支持数据读取、处理和存储功能：什么是适用于 SQL 应用程序的 Amazon Kinesis Data Analytics？

通过使用 Amazon Kinesis Data Analytics，您可以快速编写 SQL 代码以使用近乎实时的方式持续读取、处理和存储数据。通过对流数据采用标准 SQL 查询，您可以构建转换数据并深入了解这些数据的应用程序。下面提供了一些使用 Kinesis Data Analytics 的示例方案：

它是在必须使用 API Gateway 情况下的唯一解。

使用 Amazon Redshift 数据 API

Amazon Redshift 数据 API 消除了管理数据库驱动程序、连接、网络配置、数据缓冲、凭证等的需求，从而简化了对 Amazon Redshift 数据仓库的访问。您可以通过 AWS SDK，使用数据 API 操作来运行 SQL 语句。

题目归题目，真实场景下我依然会选择使用 Amazon Redshift。

👨‍👨‍👦‍👦 社区讨论：API Gateway is needed to get the data so option A and C are out. “The company wants to use these data points in its existing analytics platform” so there is no need to add Kynesis. Option D is also out.
This leaves us with option B as the correct one.

🙅 反对：I dont understand why you will vote B? how are you going to store data with just lambda?

Which action meets these requirements for storing and retrieving location data

In this use case there will obviously be a ton of data and you want to get real-time location data of the bicycles, and to analyze all these info kinesis is the one that makes most sense here.

八、RDS update and notification

A company has an automobile sales website that stores its listings in a database on Amazon RDS. When an automobile is sold, the listing needs to be removed from the website and the data must be sent to multiple target systems.
Which design should a solutions architect recommend?

✅ Create an AWS Lambda function triggered when the database on Amazon RDS is updated to send the information to an Amazon Simple Queue Service (Amazon SQS) queue for the targets to consume.
Create an AWS Lambda function triggered when the database on Amazon RDS is updated to send the information to an Amazon Simple Queue Service (Amazon SQS) FIFO queue for the targets to consume.
Subscribe to an RDS event notification and send an Amazon Simple Queue Service (Amazon SQS) queue fanned out to multiple Amazon Simple Notification Service (Amazon SNS) topics. Use AWS Lambda functions to update the targets.
❌ Subscribe to an RDS event notification and send an Amazon Simple Notification Service (Amazon SNS) topic fanned out to multiple Amazon Simple Queue Service (Amazon SQS) queues. Use AWS Lambda functions to update the targets.

✨ 关键词：RDS、notification

4️⃣ ❌ -> 1️⃣ ✅

💡 解析：汽车销售数据存在 RDS 中，汽车销售后需要在页面上移除对应条目并将信息发送给多个系统。
通知功能符合发布订阅模型，需要使用 SNS。
3️⃣ 和 4️⃣ 区别于 SQS 与 SNS 的位置，标准的 Fan Out 是先 SNS 再 SQS，因此选 4️⃣。

社区 2/3 选 1️⃣ 而 1/3 选 4️⃣，争论的重心在 RDS 是否支持数据条目更新的事件类型。
列出 Amazon RDS 事件通知类型
 AWS RDS notification when record is added to a table
参照了现有的文档和讨论，官方确实不支持针对数据条目操作的通知事件，因此只能选 1️⃣。
不过可以使用 stackoverflow 中评论提到的，监控 RDS 日志的方式来监控数据条目的操作事件。

👨‍👨‍👦‍👦 社区讨论：Interesting point that Amazon RDS event notification doesn’t support any notification when data inside DB is updated.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.overview.html

So subscription to RDS events doesn’t give any value for Fanout = SNS => SQS
B is out because FIFO is not required here.
A is left as correct answer

九、S3 Legal holds

A company needs to store data in Amazon S3 and must prevent the data from being changed. The company wants new objects that are uploaded to Amazon S3 to remain unchangeable for a nonspecific amount of time until the company decides to modify the objects. Only specific users in the company’s AWS account can have the ability to delete the objects.
What should a solutions architect do to meet these requirements?

Create an S3 Glacier vault. Apply a write-once, read-many (WORM) vault lock policy to the objects.
❌ Create an S3 bucket with S3 Object Lock enabled. Enable versioning. Set a retention period of 100 years. Use governance mode as the S3 bucket’s default retention mode for new objects.
Create an S3 bucket. Use AWS CloudTrail to track any S3 API events that modify the objects. Upon notification, restore the modified objects from any backup versions that the company has.
✅ Create an S3 bucket with S3 Object Lock enabled. Enable versioning. Add a legal hold to the objects. Add the s3:PutObjectLegalHold permission to the IAM policies of users who need to delete the objects.

✨ 关键词：S3、prevent the data from being changed、have the ability to delete the objects

2️⃣ ❌ -> 4️⃣ ✅

💡 解析：文件上传到 S3 后需要确保一段时间不能被修改。之后有权限的用户可以删除。
使用 S3 Object Lock 的 governance 模式（贪腐的政府官员 - 有权限就能修改文件）可以解决这个问题。

社区在 2️⃣ 和 4️⃣ 间存在争议，4️⃣ 选择人更多。
这里涉及到 S3 Object Lock 和 IAM 策略优先级的问题：S3 对象锁定依法保留

您可以将 S3 分批操作与对象锁定一起使用，以便同时添加对很多 Amazon S3 对象的依法保留。为此，请在清单中指定目标对象的列表，并将该列表提交给批量操作。S3 批量操作对象锁定法定保留任务将持续运行，直至完成、取消或达到失败状态。

在处理清单中的任何对象之前，S3 批量操作会验证是否已在您的 S3 存储桶上启用对象锁定。要执行对象操作和存储桶级别验证，S3 批量操作需要 AWS Identity and Access Management（IAM）角色中的 s3:PutObjectLegalHold 和 s3:GetBucketObjectLockConfiguration。这些权限可让 S3 批量操作代表您调用 S3 对象锁定。

上面的文章有点绕，来看下更直接的：使用对象锁定以锁定对象

依法保留 (Legal holds)
使用对象锁定，您还可以在对象版本上实施依法保留。与保留期限相似，依法保留可防止对象版本被覆盖或删除。但是，依法保留没有关联的固定时间长度，会一直有效，直至删除。拥有 s3:PutObjectLegalHold 权限的任何用户均可自由实施和删除依法保留。

可以看到依法保留 (Legal holds) 完美符合题目要求：不定时的永久锁定存储、拥有权限的用户可以自由删除。

👨‍👨‍👦‍👦 社区讨论：A - No as “specific users can delete”
B - No as “nonspecific amount of time”
C - No as “prevent the data from being change”
D - The answer: “The Object Lock legal hold operation enables you to place a legal hold on an object version. Like setting a retention period, a legal hold prevents an object version from being overwritten or deleted. However, a legal hold doesn’t have an associated retention period and remains in effect until removed.” https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-legal-hold.html

十、Reduce coupling

A social media company allows users to upload images to its website. The website runs on Amazon EC2 instances. During upload requests, the website resizes the images to a standard size and stores the resized images in Amazon S3. Users are experiencing slow upload requests to the website.
The company needs to reduce coupling within the application and improve website performance. A solutions architect must design the most operationally efficient process for image uploads.
Which combination of actions should the solutions architect take to meet these requirements? (Choose two.)

Configure the application to upload images to S3 Glacier.
✅ Configure the web server to upload the original images to Amazon S3.
Configure the application to upload images directly from each user’s browser to Amazon S3 through the use of a presigned URL
✅ Configure S3 Event Notifications to invoke an AWS Lambda function when an image is uploaded. Use the function to resize the image.
Create an Amazon EventBridge (Amazon CloudWatch Events) rule that invokes an AWS Lambda function on a schedule to resize uploaded images.

✨ 关键词：reduce coupling

2️⃣ 4️⃣ ✅

💡 解析：用户将图片上传至社交媒体网站，针对这个上传请求，应用程序重新设置大小然后存到 S3 中。用户上传很慢，公司需要解耦。
首先考虑使用 SQS 异步处理图片上传和处理工作，但是没有这个选项。
有一个先存储原始图片，再通过事件通知调用 Lambda 函数处理图片的方法，看上去也可行。

社区中选 3️⃣ 的很多：通过预签名 URL 让用户之间把文件上传到 S3 存储桶中绕过后端。
但是预签名 URL 的使用场景是临时：使用预签名 URL 下载和上传对象

您可以使用预签名 URL 授予对 Amazon S3 中对象的限时访问权限，而不更新存储桶策略。可以在浏览器中输入预签名 URL，或者程序使用预签名 URL 来下载对象。预签名 URL 使用的凭证是生成该 URL 的 AWS 用户的凭证。

在公开的网页上提供自动生成的 S3 存储桶预签名 URL 供客户使用，在安全方面绝对不是一个好主意。

👨‍👨‍👦‍👦 社区讨论：To meet the requirements of reducing coupling within the application and improving website performance, the solutions architect should consider taking the following actions:
C. Configure the application to upload images directly from each user’s browser to Amazon S3 through the use of a pre-signed URL. This will allow the application to upload images directly to S3 without having to go through the web server, which can reduce the load on the web server and improve performance.
D. Configure S3 Event Notifications to invoke an AWS Lambda function when an image is uploaded. Use the function to resize the image. This will allow the application to resize images asynchronously, rather than having to do it synchronously during the the image. This will allow the application to resize images asynchronously, rather than having to do it synchronously during the upload request, which can improve performance.

十一、HA for MQ application

A company recently migrated a message processing system to AWS. The system receives messages into an ActiveMQ queue running on an Amazon EC2 instance. Messages are processed by a consumer application running on Amazon EC2. The consumer application processes the messages and writes results to a MySQL database running on Amazon EC2. The company wants this application to be highly available with low operational complexity.
Which architecture offers the HIGHEST availability?

Add a second ActiveMQ server to another Availability Zone. Add an additional consumer EC2 instance in another Availability Zone. Replicate the MySQL database to another Availability Zone.
Use Amazon MQ with active/standby brokers configured across two Availability Zones. Add an additional consumer EC2 instance in another Availability Zone. Replicate the MySQL database to another Availability Zone.
Use Amazon MQ with active/standby brokers configured across two Availability Zones. Add an additional consumer EC2 instance in another Availability Zone. Use Amazon RDS for MySQL with Multi-AZ enabled.
✅ Use Amazon MQ with active/standby brokers configured across two Availability Zones. Add an Auto Scaling group for the consumer EC2 instances across two Availability Zones. Use Amazon RDS for MySQL with Multi-AZ enabled.

✨ 关键词：ActiveMQ、MySQL database running on Amazon EC2、HA

4️⃣ ✅

💡 解析：现有架构是自建的 MQ 和自建的数据库，消费者也运行在 EC2 上。公司希望新架构高可用且更少操作。
使用 AWS 提供的 Amazon MQ 和 RDS 可以解决问题，4️⃣ 还部署了弹性 EC2 组，更全面。

Amazon MQ 面向开源消息代理的完全托管式服务

消息代理允许软件系统（通常在各种平台上使用不同编程语言）进行通信和交换信息。Amazon MQ 是一种适用于 Apache ActiveMQ 和 RabbitMQ 的托管式消息代理服务，可简化 AWS 上消息代理的设置、操作和管理。只需几个步骤，Amazon MQ 便可为您的消息代理预置软件版本升级支持。

👨‍👨‍👦‍👦 社区讨论：Answer is D as the “HIGHEST available” and less “operational complex”
The “Amazon RDS for MySQL with Multi-AZ enabled” option excludes A and B
The “Auto Scaling group” is more available and reduces operational complexity in case of incidents (as remediation it is automated) than just adding one more instance. This excludes C.
C and D to choose from based on
D over C since is configured

十二、Container Auto Sacling

A company hosts a containerized web application on a fleet of on-premises servers that process incoming requests. The number of requests is growing quickly. The on-premises servers cannot handle the increased number of requests. The company wants to move the application to AWS with minimum code changes and minimum development effort.
Which solution will meet these requirements with the LEAST operational overhead?

✅ Use AWS Fargate on Amazon Elastic Container Service (Amazon ECS) to run the containerized web application with Service Auto Scaling. Use an Application Load Balancer to distribute the incoming requests.
Use two Amazon EC2 instances to host the containerized web application. Use an Application Load Balancer to distribute the incoming requests.
Use AWS Lambda with a new code that uses one of the supported languages. Create multiple Lambda functions to support the load. Use Amazon API Gateway as an entry point to the Lambda functions.
Use a high performance computing (HPC) solution such as AWS ParallelCluster to establish an HPC cluster that can process the incoming requests at the appropriate scale.

✨ 关键词：containerized

1️⃣ ✅

💡 解析：容器化的 Web 应用访问量激增且需要迁移到 AWS。最少的代码改修和开发影响。还需要支持弹性扩容。
既然已经容器化了，那么迁移就和 ECS 或 EKS 有关了，需要弹性扩容的话 AWS Fargate 就能简单实现。

👨‍👨‍👦‍👦 社区讨论：Less operational overhead means A: Fargate (no EC2), move the containers on ECS, autoscaling for growth and ALB to balance consumption.
B - requires configure EC2
C - requires add code (developpers)
D - seems like the most complex approach, like re-architecting the app to take advantage of an HPC platform.

十三、AWS Snow Family and Glue

A company uses 50 TB of data for reporting. The company wants to move this data from on premises to AWS. A custom application in the company’s data center runs a weekly data transformation job. The company plans to pause the application until the data transfer is complete and needs to begin the transfer process as soon as possible.
The data center does not have any available network bandwidth for additional workloads. A solutions architect must transfer the data and must configure the transformation job to continue to run in the AWS Cloud.
Which solution will meet these requirements with the LEAST operational overhead?

Use AWS DataSync to move the data. Create a custom transformation job by using AWS Glue.
Order an AWS Snowcone device to move the data. Deploy the transformation application to the device.
✅ Order an AWS Snowball Edge Storage Optimized device. Copy the data to the device. Create a custom transformation job by using AWS Glue.
❌ Order an AWS Snowball Edge Storage Optimized device that includes Amazon EC2 compute. Copy the data to the device. Create a new EC2 instance on AWS to run the transformation application.

✨ 关键词：50 TB of data、not have any available network bandwidth for additional workloads、transformation job to continue to run in the AWS Cloud

4️⃣ ❌ -> 3️⃣ ✅

💡 解析：50 TB 的本地数据需要传输到 AWS。本地还有一个每周运行到任务需要使用到这些数据，公司计划暂停这个应用直到数据传输完成。之后这个处理程序也要部署到 AWS。需要最简单的解决方案。
网络不好不使用 DataSync。Snowcone 最大只支持 14 TB 不够容量。50 TB 的数据使用一台 Snowball Edge 设备就够了。
3️⃣ 和 4️⃣ 对数据的处理方式不同。
3️⃣ 选择使用 AWS Glue 数据集成（收集和移动）工具，~~并不支持数据转换~~支持数据转换。
4️⃣ 将应用程序部署到 EC2 上，合理但是操作更多。

什么是 AWS Glue？

AWS Glue 是一项无服务器数据集成服务，可让使用分析功能的用户轻松发现、准备、移动和集成来自多个来源的数据。您可以将其用于分析、机器学习和应用程序开发。它还包括用于编写、运行任务和实施业务工作流程的额外生产力和数据操作工具。

With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

什么是 ETL（提取、转换、加载）？

提取 (Extract)、转换 (Transform)、加载 (Load) 过程将多个来源的数据组合到称为数据仓库的大型中央存储库中。ETL 使用一组业务规则来清理和组织原始数据，并为存储、数据分析和机器学习（ML）做好准备。您可以通过数据分析满足特定的商业智能需求（例如预测业务决策的结果、生成报告和控制面板、减少无效运营等）。

👨‍👨‍👦‍👦 社区讨论：A. Use AWS DataSync to move the data. Create a custom transformation job by using AWS Glue. - No BW available for DataSync, so “asap” will be weeks/months (?)
B. Order an AWS Snowcone device to move the data. Deploy the transformation application to the device. - Snowcone will just store 14TB (SSD configuration).
C. Order an AWS Snowball Edge Storage Optimized device. Copy the data to the device. Create a custom transformation job by using AWS Glue. - SnowBall can store 80TB (ok), takes around 1 week to move the device (faster than A), and AWS Glue allows to do ETL jobs. This is the answer.
D. Order an AWS Snowball Edge Storage Optimized device that includes Amazon EC2 compute. Copy the data to the device. Create a new EC2 instance on AWS to run the transformation application. - Same as C, but the ETL job requires the deployment/configuration/maintenance of an EC2 instance, while Glue is serverless. This means D has more operational overhead than C.

十四、Scale

A company has created an image analysis application in which users can upload photos and add photo frames to their images.
The users upload images and metadata to indicate_显示 which photo frames they want to add to their images. The application uses a single Amazon EC2 instance and Amazon DynamoDB to store the metadata.
The application is becoming more popular, and the number of users is increasing. The company expects the number of concurrent users to vary significantly depending on the time of day and day of week. The company must ensure that the application can scale to meet the needs of the growing user base.
Which solution meats these requirements?

Use AWS Lambda to process the photos. Store the photos and metadata in DynamoDB.
Use Amazon Kinesis Data Firehose to process the photos and to store the photos and metadata.
✅ Use AWS Lambda to process the photos. Store the photos in Amazon S3. Retain DynamoDB to store the metadata.
Increase the number of EC2 instances to three. Use Provisioned IOPS SSD (io2) Amazon Elastic Block Store (Amazon EBS) volumes to store the photos and metadata.

✨ 关键词：scale

3️⃣ ✅

💡 解析：处理图片的应用程序由 EC2 + DynamoDB 构成，用户数量开始增多，需要弹性的解决方案。
可以使用 Lambda 完成图片处理，然后由 S3 存储，元数据保存在 DynamoDB 中。

👨‍👨‍👦‍👦 社区讨论：Solution C offloads the photo processing to Lambda. Storing the photos in S3 ensures scalability and durability, while keeping the metadata in DynamoDB allows for efficient querying of the associated information.
Option A does not provide an appropriate solution for storing the photos, as DynamoDB is not suitable for storing large binary data like images.
Option B is more focused on real-time streaming data processing and is not the ideal service for processing and storing photos and metadata in this use case.
Option D involves manual scaling and management of EC2 instances, which is less flexible and more labor-intensive compared to the serverless nature of Lambda. It may not efficiently handle the varying number of concurrent users and can introduce higher operational overhead.

In conclusion, option C provides the best solution for scaling the application to meet the needs of the growing user base by leveraging the scalability and durability of Lambda, S3, and DynamoDB.

十五、Gateway endpoint

A medical records company is hosting an application on Amazon EC2 instances. The application processes customer data files that are stored on Amazon S3. The EC2 instances are hosted in public subnets. The EC2 instances access Amazon S3 over the internet, but they do not require any other network access.
A new requirement mandates that the network traffic for file transfers take a private route and not be sent over the internet.
Which change to the network architecture should a solutions architect recommend to meet this requirement?

Create a NAT gateway. Configure the route table for the public subnets to send traffic to Amazon S3 through the NAT gateway.
Configure the security group for the EC2 instances to restrict outbound traffic so that only traffic to the S3 prefix list is permitted.
✅ Move the EC2 instances to private subnets. Create a VPC endpoint for Amazon S3, and link the endpoint to the route table for the private subnets.
Remove the internet gateway from the VPC. Set up an AWS Direct Connect connection, and route traffic to Amazon S3 over the Direct Connect connection.

✨ 关键词：private route to S3

3️⃣ ✅

💡 解析：目前 EC2 不需要联网，但是 EC2 和 S3 通过公有网络进行连接，现在要求改走内网。
先把 EC2 移动到私有子网或者移除当前子网路由表中对公网的路由记录，然后再在 VPC 级别部署网关终端节点使实例能够访问 S3。

👨‍👨‍👦‍👦 社区讨论：Option A (creating a NAT gateway) would not meet the requirement since it still involves sending traffic to S3 over the internet.
NAT gateway is used for outbound internet connectivity from private subnets, but it doesn’t provide a private route for accessing S3.

Option B (configuring security groups) focuses on controlling outbound traffic using security groups. While it can restrict outbound traffic, it doesn’t provide a private route for accessing S3.

Option D (setting up Direct Connect) involves establishing a dedicated private network connection between the on-premises environment and AWS. While it offers private connectivity, it is more suitable for hybrid scenarios and not necessary for achieving private access to S3 within the VPC.

In summary, option C provides a straightforward solution by moving the EC2 instances to private subnets, creating a VPC endpoint for S3, and linking the endpoint to the route table for private subnets. This ensures that file transfer traffic between the EC2 instances and S3 remains within the private network without going over the internet.