SAA 考试每日练习 - 2024/12/01

来源：Amazon AWS Certified Solutions Architect - Associate SAA-C03 Exam
30 题 (No.191 ~ No.220) 只记录了 10 道首次碰到的、错误的或有疑问的题目，仅供自己复习使用。
如果侵权请联系删除。

🌟 单词：

comprehend_{v. 理解；领悟；懂}
transcribe_{v. 记录，抄录，把…转成（另一种书写形式），用音标标音}

一、Medical information

A hospital wants to create digital copies for its large collection of historical written records. The hospital will continue to add hundreds of new documents each day. The hospital’s data team will scan the documents and will upload the documents to the AWS Cloud.
A solutions architect must implement a solution to analyze the documents, extract the medical information, and store the documents so that an application can run SQL queries on the data. The solution must maximize scalability and operational efficiency.
Which combination of steps should the solutions architect take to meet these requirements? (Choose two.)

Write the document information to an Amazon EC2 instance that runs a MySQL database.
✅ Write the document information to an Amazon S3 bucket. Use Amazon Athena to query the data.
Create an Auto Scaling group of Amazon EC2 instances to run a custom application that processes the scanned files and extracts the medical information.
Create an AWS Lambda function that runs when new documents are uploaded. Use Amazon Rekognition to convert the documents to raw text. Use Amazon Transcribe Medical to detect and extract relevant medical information from the text.
✅ Create an AWS Lambda function that runs when new documents are uploaded. Use Amazon Textract to convert the documents to raw text. Use Amazon Comprehend Medical to detect and extract relevant medical information from the text.

✨ 关键词：documents、SQL queries、medical information

2️⃣ 5️⃣ ✅

💡 解析：现有大量文档，之后每天还有几百和文件。要能使用 SQL 查询文档数据并提取医学数据。
过下涉及到的 AI 工具：

Amazon Rekognition - 图像、视频识别

Amazon Transcribe Medical - 医疗语音转文本

Amazon Textract - 文档文本检测和分析、信息提取

Amazon Comprehend Medical - 医疗文本分析和信息提取

什么是 Amazon Rekognition？

Amazon Rekognition 是一项基于云的图像和视频分析服务，可以轻松地向应用程序添加高级计算机视觉功能。该服务由久经考验的深度学习技术提供支持，无需任何机器学习专业知识即可使用。Amazon Rekognition 包含一个 easy-to-use 简单的 API，可以快速分析存储在 Amazon S3 中的任何图像或视频文件。

Amazon Transcribe Medical

Amazon Transcribe Medical 是一种自动语音识别 (ASR) 服务，让您能够轻松地为具有语音功能的应用程序添加医疗语音转文本功能。医疗保健提供者和患者之间的对话为患者的诊断和治疗计划以及临床文档工作流程提供了基础。确保这些信息准确无误是至关重要的。然而，准确的医疗转录（如口授记录器和抄写员）价格昂贵、耗时长，而且会破坏患者的体验。某些组织使用现有的医疗转录软件，但发现它们的效率和质量都很低。

什么是 Amazon Textract？

Amazon Textract 让您可以向应用程序轻松添加文档文本检测和分析功能。
以下是使用 Amazon Textract 的常见使用案例：

创建智能搜索索引

使用智能文本提取功能进行自然语言处理 (NLP)

加快来自不同来源的数据的捕获和标准化

自动从表单中捕获数据

什么是 Amazon Comprehend Medical？

Amazon Comprehend Medical 可以检测并返回非结构化临床文本中的有用信息，例如医生记录、出院摘要、检验结果、病例记录等。Amazon Comprehend Medical 使用自然语言处理 (NLP) 模型来检测实体，这些实体是对医疗信息 [例如医学状况、药物或受保护的健康信息 (PHI)] 的文本引用。有关检测到的实体的完整列表，请参阅检测实体（版本 2）。Amazon Comprehend Medical 还允许用户通过本体链接操作将这些检测到的实体与标准化医学知识库（ RxNorm 例如 ICD-10-CM）关联起来。

👨‍👨‍👦‍👦 社区讨论：B and E are correct.Textract to extract text from files. Rekognition can also be used for text detection but after Rekognition - it’s mentioned that Transcribe is used.Transcribe is used forSpeech to Text.So that option D may not be valid.

二、DynamoDB keep data for 30 days

A company runs an application on a large fleet of Amazon EC2 instances. The application reads and writes entries into an Amazon DynamoDB table. The size of the DynamoDB table continuously grows, but the application needs only data from the last 30 days. The company needs a solution that minimizes cost and development effort.
Which solution meets these requirements?

❌ Use an AWS CloudFormation template to deploy the complete solution. Redeploy the CloudFormation stack every 30 days, and delete the original stack.
Use an EC2 instance that runs a monitoring application from AWS Marketplace. Configure the monitoring application to use Amazon DynamoDB Streams to store the timestamp when a new item is created in the table. Use a script that runs on the EC2 instance to delete items that have a timestamp that is older than 30 days.
Configure Amazon DynamoDB Streams to invoke an AWS Lambda function when a new item is created in the table. Configure the Lambda function to delete items in the table that are older than 30 days.
✅ Extend the application to add an attribute that has a value of the current timestamp plus 30 days to each new item that is created in the table. Configure DynamoDB to use the attribute as the TTL attribute.

✨ 关键词：DynamoDB、needs only data from the last 30 days

1️⃣ ❌ -> 4️⃣ ✅

💡 解析：DynamoDB 删除 30 天以上的数据。
忘记 AWS CloudFormation 是什么了：什么是 AWS CloudFormation？

AWS CloudFormation 是一项服务，可帮助您对 AWS 资源进行建模和设置，以便能花较少的时间管理这些资源，而将更多的时间花在运行于 AWS 中的应用程序上。您创建一个描述您所需的所有 AWS 资源（如 Amazon EC2 实例或 Amazon RDS 数据库实例）的模板，并且 CloudFormation 将负责为您预置和配置这些资源。您无需单独创建和配置 AWS 资源并了解 what; CloudFormation 句柄处理该工作时所依赖的内容。以下方案演示 CloudFormation 如何提供帮助。

它其实就是和 Terraform 一样的 IaC（基础设施即代码）工具，因此这里肯定不选 1️⃣。
3️⃣ 其实是我自己项目里的解决方案，新的数据来的时候做一次旧数据的删除。但是存在问题一直不来新数据怎么办。
4️⃣ 虽然要改到代码，但是如果 DynamoDB 本身不支持对数据插入时间的判断的话，已经是最优解了。

来看下 DynamoDB TTL 是什么：Using time to live (TTL) in DynamoDB

DynamoDB 的 “有效时间”（TTL）是删除不再相关的项目的一种经济有效的方法。TTL 允许你定义每个项目的过期时间戳，以指示何时不再需要项目。DynamoDB 会在过期后几天内自动删除过期项目，而不会消耗写吞吐量。

要使用 TTL，首先要在表上启用它，然后定义一个特定属性来存储 TTL 过期时间戳。时间戳必须以 Unix 时间格式存储，粒度为秒。每次创建或更新项目时，都可以计算过期时间并将其保存在 TTL 属性中。

显然它就是本题的考点。

👨‍👨‍👦‍👦 社区讨论：changing myanswer to D after researching a bit.
The DynamoDB TTL feature allows you to define a per-item timestamp to determine when an item is no longer needed.Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming any write throughput.

三、Amazon Transcribe

A telemarketing company is designing its customer call center functionality on AWS. The company needs a solution that provides multiple speaker recognition and generates transcript files. The company wants to query the transcript files to analyze the business patterns. The transcript files must be stored for 7 years for auditing purposes.
Which solution will meet these requirements?

Use Amazon Rekognition for multiple speaker recognition. Store the transcript files in Amazon S3. Use machine learning models for transcript file analysis.
✅ Use Amazon Transcribe for multiple speaker recognition. Use Amazon Athena for transcript file analysis.
Use Amazon Translate for multiple speaker recognition. Store the transcript files in Amazon Redshift. Use SQL queries for transcript file analysis.
Use Amazon Rekognition for multiple speaker recognition. Store the transcript files in Amazon S3. Use Amazon Textract for transcript file analysis.

✨ 关键词：generates transcript files from voice、file analysis

2️⃣ ✅

💡 解析：需要语音转文字，并将文本保存和分析。
Amazon Rekognition 是语音和图像识别工具，处理类似与打标签或是判断是否存在物体的工作。
题目中涉及到的 AI 服务：

Amazon Transcribe - 语音转文本

Amazon Translate - 结合机器学习的翻译服务

Amazon Textract - 文档文本检测和分析、信息提取

什么是 Amazon Transcribe？

Amazon Transcribe是一种自动语音识别服务，它使用机器学习模型将音频转换为文本。您可以用Amazon Transcribe作独立的转录服务，也可以向任何应用程序添加speech-to-text功能。

什么是 Amazon Translate？

Amazon Translate 是一种文本翻译服务，它使用先进的机器学习技术，按需提供高质量的翻译。您可以使用 Amazon Translate 来翻译非结构化文本文档或构建使用多种语言的应用程序。

👨‍👨‍👦‍👦 社区讨论：The correct answer is B: Use Amazon Transcribe for multiple speaker recognition. Use Amazon Athena for transcript file analysis.
Amazon Transcribe isa service that automatically transcribes spoken language into written text. It can handle multiple speakers and can generate transcript files in real-time or asynchronously.These transcript files can be stored in Amazon S3 for long-term storage.
Amazon Athena isa query service that allows you to analyze data stored in Amazon S3 using SQL. You can use it to analyze the transcript filesand identify patterns in the data.

Option A is incorrect because Amazon Rekognition isa service for analyzing imagesand videos, not transcribing spoken language.
Option C is incorrect because Amazon Translate isa service for translating text from one language to another, not transcribing spoken language.
Option D is incorrect because Amazon Textract isa service forextracting text and data from documentsand images, not transcribing spoken language.

四、Amazon Cognito and API access

A company hosts its application on AWS. The company uses Amazon Cognito to manage users. When users log in to the application, the application fetches required data from Amazon DynamoDB by using a REST API that is hosted in Amazon API Gateway. The company wants an AWS managed solution that will control access to the REST API to reduce development efforts.
Which solution will meet these requirements with the LEAST operational overhead?

Configure an AWS Lambda function to be an authorizer in API Gateway to validate which user made the request.
For each user, create and assign an API key that must be sent with each request. Validate the key by using an AWS Lambda function.
Send the user’s email address in the header with every request. Invoke an AWS Lambda function to validate that the user with that email address has proper access.
✅ Configure an Amazon Cognito user pool authorizer in API Gateway to allow Amazon Cognito to validate each request.

✨ 关键词：Amazon Cognito、control access to the REST API

4️⃣ ✅

💡 解析：使用 Amazon Cognito 管理用户，用户登录应用后可以通过 Amazon API Gateway 提供的 API 获取 DynamoDB 的数据。现在希望使用 AWS 的服务来解决 API 权限问题。
使用 Amazon Cognito 用户池 (user pool) 控制用户对 REST API 的访问是官方的解决方案。
大概分为 3 步：

在 Amazon Cognito 控制台创建用户池

在 API Gateway 控制台选定用户池创建 API Gateway Authorizer（授权方）

在 API Gateway 控制台对指定的 API 启动授权方

什么是 Amazon Cognito？

Amazon Cognito 是 Web 和移动应用程序的身份平台。它是一个用户目录、一个身份验证服务器以及一个用于 OAuth 2.0 访问令牌和 AWS 凭据的授权服务。使用 Amazon Cognito，您可以对内置用户目录、企业目录以及 Google 和 Facebook 等使用者身份提供者中的用户进行身份验证和授权。

使用 Amazon Cognito 用户池作为授权方控制对 REST API 的访问

作为使用 IAM 角色和策略或 Lambda 授权方（以前称为自定义授权方）的替代方案，您可以使用 Amazon Cognito 用户池来控制谁可以在 Amazon API Gateway 中访问您的 API。

要将 Amazon Cognito 用户池与您的 API 一起使用，您必须先创建 COGNITO_USER_POOLS 类型的授权方，然后配置 API 方法以使用该授权方。部署 API 之后，客户端必须先将用户注册到用户池，获取用户的身份令牌或访问令牌，然后使用令牌之一调用 API 方法，这通常设置为请求的 Authorization 标头。只有提供了所需的令牌并且提供的令牌有效时，API 调用才会成功，否则，客户端未获得授权来执行调用，因为客户端没有可用于授权的凭证。

使用身份令牌，基于已登录用户的身份声明来授权 API 调用。使用访问令牌，基于指定访问受保护资源的自定义范围授权 API 调用。

要为 API 创建和配置 Amazon Cognito 用户池，请执行以下任务：

使用 Amazon Cognito 控制台、CLI/开发工具包或 API 创建用户池，或者使用由其他 AWS 账户拥有的用户池。

使用 API Gateway 控制台、CLI/开发工具包或 API 创建具有选定用户池的 API Gateway Authorizer。

使用 API Gateway 控制台、CLI/开发工具包或 API，在所选 API 方法上启用授权方。

👨‍👨‍👦‍👦 社区讨论：KEYWORD: LEAST operational overhead
To control access to the REST API and reduce development efforts, the company can use an Amazon Cognito user pool authorizer in API Gateway.This will allow Amazon Cognito to validate each request and ensure that onlyauthenticated users can access the API.This solution has the LEAST operational overhead,as it does not require the company to develop and maintain anyadditional infrastructure or code.
Therefore, Option D is the correct answer.
Option D. Configure an Amazon Cognito user pool authorizer in API Gateway to allow Amazon Cognito to validate each request.

五、SMS messages

A company is developing a marketing communications service that targets mobile app users. The company needs to send confirmation messages with Short Message Service (SMS) to its users. The users must be able to reply to the SMS messages.
The company must store the responses for a year for analysis.
What should a solutions architect do to meet these requirements?

❌ Create an Amazon Connect contact flow to send the SMS messages. Use AWS Lambda to process the responses.
✅ Build an Amazon Pinpoint journey. Configure Amazon Pinpoint to send events to an Amazon Kinesis data stream for analysis and archiving.
Use Amazon Simple Queue Service (Amazon SQS) to distribute the SMS messages. Use AWS Lambda to process the responses.
Create an Amazon Simple Notification Service (Amazon SNS) FIFO topic. Subscribe an Amazon Kinesis data stream to the SNS topic for analysis and archiving.

✨ 关键词：send SMS and get the reply

1️⃣ ❌ -> 2️⃣ ✅

💡 解析：公司需要发送短信并受到回复，回复需要保存一年用作分析。

什么是 Amazon Pinpoint？

您可以使用 Amazon Pinpoint 通过多个消息渠道与客户互动。 AWS 服务您可以使用 Amazon Pinpoint 通过自定义渠道发送推送通知、应用程序内通知、电子邮件、文本消息、语音消息等。它包括客户细分、活动和旅程功能，可帮助您在正确的时间通过正确的渠道向正确的客户发送正确的信息。

显然使用 Amazon Pinpoint 可以实现发送 SMS 这个需求，那么接收响应呢？
需要使用到 two-way SMS：在 “AWS 最终用户SMS消息” 中为电话号码设置双向消息 SMS

AWS 最终用户消息SMS包括对双向的支持SMS。设置双向时SMS，您可以接收来自客户的传入消息。您还可以将双向消息与 Lambda 和 Amazon Lex 等其他 AWS 服务消息一起使用，以创建交互式短信体验。

当您的一位客户向您的电话号码发送消息时，消息正文将发送到亚马逊SNS主题或 Amazon Connect 实例进行处理。

而关于数据存储一年的需求，Amazon Kinesis Data Stream 足以做到了：更改数据留存期

Amazon Kinesis Data Streams 支持更改数据流的数据记录保留期。Kinesis 数据流是数据记录的有序序列，可用于执行实时写入和读取。因此，数据记录临时存储在您的流的分片中。从添加记录开始，到记录不再可供访问为止的时间段称为保留期。默认情况下，Kinesis 数据流的记录存储时间从 24 小时到 8760 小时（365 天）不等。

👨‍👨‍👦‍👦 社区讨论：By using Pinpoint, the company can effectively send SMS messages to its mobile app users. Additionally, Pinpoint allows the configuration of journeys, which enable the tracking and management of user interactions.The events generated during the journey, including user responses to SMS, can be captured and sent to an Kinesis data stream.This data stream can then be used for analysisand archiving purposes.

A. Creating an Amazon Connect contact flow is primarily focused on customer support and engagement,and it lacks the capability to store and processSMS responses for analysis.
C. Using SQS isa message queuing service and is not specifically designed for handling SMS responses or capturing them for analysis.
D. Creating an SNS FIFO topic and subscribing a Kinesis data stream is not the most appropriate solution for capturing and storing SMS responses,asSNS is primarily used for message publishing and distribution.

In summary, option B is the best choice as it leverages Pinpoint to send SMS messagesand captures user responses for analysis and archiving using an Kinesis data stream.

六、Data Lake and fine-grained permissions

An online retail company has more than 50 million active customers and receives more than 25,000 orders each day. The company collects purchase data for customers and stores this data in Amazon S3. Additional customer data is stored in Amazon RDS.
The company wants to make all the data available to various teams so that the teams can perform analytics. The solution must provide the ability to manage fine-grained permissions for the data and must minimize operational overhead.
Which solution will meet these requirements?

Migrate the purchase data to write directly to Amazon RDS. Use RDS access controls to limit access.
Schedule an AWS Lambda function to periodically copy data from Amazon RDS to Amazon S3. Create an AWS Glue crawler. Use Amazon Athena to query the data. Use S3 policies to limit access.
✅ Create a data lake by using AWS Lake Formation. Create an AWS Glue JDBC connection to Amazon RDS. Register the S3 bucket in Lake Formation. Use Lake Formation access controls to limit access.
Create an Amazon Redshift cluster. Schedule an AWS Lambda function to periodically copy data from Amazon S3 and Amazon RDS to Amazon Redshift. Use Amazon Redshift access controls to limit access.

✨ 关键词：get data from S3 and RDS、fine-grained permissions

3️⃣ ✅

💡 解析：需要汇总 S3 和 RDS 中的数据并提供精细的权限控制。
显然需要使用到数据湖或者数据仓库。
看下 AWS Lake Formation：什么是 AWS Lake Formation？

AWS Lake Formation 帮助您集中管理、保护和全球共享用于分析和机器学习的数据。您可以对 Amazon Simple Storage Service (Amazon S3) 上的数据湖数据及其在 AWS Glue Data Catalog中的元数据进行精细访问控制。

显然完美符合题目需求。
再看下同为数据仓库的 Redshift：什么是 Amazon Redshift

Amazon Redshift 是云中一种完全托管的 PB 级数据仓库服务。Amazon Redshift Serverless 让您可以访问和分析数据，而无需对预置数据仓库执行任何配置操作。

数据库安全

您可以通过控制哪些用户可以访问哪些数据库对象来管理数据库安全。可以为用户分配角色或组，授予给用户、角色或组的权限决定了他们可以访问哪些数据库对象。

4️⃣ 看上去也能完成任务，但是操作比 3️⃣ 更复杂。

👨‍👨‍👦‍👦 社区讨论：Answer : C keyword “manage-fine-grained”
https://aws.amazon.com/blogs/big-data/manage-fine-grained-access-control-using-aws-lake-formation/

Lake Formation enables the creation of a secure and scalable data lake on AWS,allowing centralized access controls for both S3 and RDS data. By using Lake Formation, the company can manage permissionseffectivelyand integrate RDS data through the AWS Glue JDBC connection. Registering the S3 in Lake Formation ensures unified access control.This solution reduces operational overhead while providing fine-grained permissions management.
A. Directly writing purchase data to Amazon RDS with RDS access controls lacks comprehensive permissions management for both S3 and RDS data.
B. Periodically copying data from RDS to S3 using Lambda and using AWS Glue and Athena for querying does not offer finegrained permissions management and introduces data synchronization complexities.
D. Creating an Redshift cluster and copying data from S3 and RDS to Redshift adds complexityand operational overhead without the flexibility of Lake Formation’s permissions management capabilities.

七、EC2 connect to S3

A company needs to move data from an Amazon EC2 instance to an Amazon S3 bucket. The company must ensure that no API calls and no data are routed through public internet routes. Only the EC2 instance can have access to upload data to the S3 bucket.
Which solution will meet these requirements?

✅ Create an interface VPC endpoint for Amazon S3 in the subnet where the EC2 instance is located. Attach a resource policy to the S3 bucket to only allow the EC2 instance’s IAM role for access.
❌ Create a gateway VPC endpoint for Amazon S3 in the Availability Zone where the EC2 instance is located. Attach appropriate security groups to the endpoint. Attach a resource policy to the S3 bucket to only allow the EC2 instance’s IAM role for access.
Run the nslookup tool from inside the EC2 instance to obtain the private IP address of the S3 bucket’s service API endpoint. Create a route in the VPC route table to provide the EC2 instance with access to the S3 bucket. Attach a resource policy to the S3 bucket to only allow the EC2 instance’s IAM role for access.
Use the AWS provided, publicly available ip-ranges.json file to obtain the private IP address of the S3 bucket’s service API endpoint. Create a route in the VPC route table to provide the EC2 instance with access to the S3 bucket. Attach a resource policy to the S3 bucket to only allow the EC2 instance’s IAM role for access.

✨ 关键词：move data from an Amazon EC2 instance to an Amazon S3 bucket、no data are routed through public internet

2️⃣ ❌ -> 1️⃣ ✅

💡 解析：EC2 的数据要走私网传输到 S3 存储桶中。
社区在 1️⃣ 和 2️⃣ 争议较大。争议的重点在于 2️⃣ 的 “Attach appropriate security groups to the endpoint” 这句，将适当的安全组附加到终端节点上。
如果它可以实现，那么 2️⃣ 显然是最优选择，但如果它无法实现，那就只能选 1️⃣。
使用接口 VPC 端点访问 AWS 服务

前提条件
为端点网络接口 (endpoint network interface) 创建一个安全组，允许来自 VPC 资源的预期流量。例如，为确保 AWS CLI 可以向 AWS 服务发送 HTTPS 请求，安全组必须允许入站 HTTPS 流量。

首先我们明确 interface VPC endpoint 是可以配置安全组的。实测了下也确实：

那么网关端点呢？很遗憾的是 AWS 的文档里并没有明说，实践看下吧：

创建过程中和创建完成后都没有安全组相关配置，因此得出结论 Gateway endpoint 不支持安全组配置，选 1️⃣。

👨‍👨‍👦‍👦 社区讨论：I thinkanswer should be A and not B.
as we cannot “Attach a security groups to a gatewayendpoint.”

八、Files convert

A company’s reporting system delivers hundreds of .csv files to an Amazon S3 bucket each day. The company must convert these files to Apache Parquet format and must store the files in a transformed data bucket.
Which solution will meet these requirements with the LEAST development effort?

Create an Amazon EMR cluster with Apache Spark installed. Write a Spark application to transform the data. Use EMR File System (EMRFS) to write files to the transformed data bucket.
✅ Create an AWS Glue crawler to discover the data. Create an AWS Glue extract, transform, and load (ETL) job to transform the data. Specify the transformed data bucket in the output step.
Use AWS Batch to create a job definition with Bash syntax to transform the data and output the data to the transformed data bucket. Use the job definition to submit a job. Specify an array job as the job type.
❌ Create an AWS Lambda function to transform the data and output the data to the transformed data bucket. Configure an event notification for the S3 bucket. Specify the Lambda function as the destination for the event notification.

✨ 关键词：convert files from .csv to Apache Parquet format、S3

4️⃣ ❌ -> 2️⃣ ✅

💡 解析：需要将 S3 存储桶内的 .csv 文件转为 Apache Parquet 格式再存入另一个桶中。
4️⃣ 当然可以解决问题，但是太过繁琐。

Glue 是官方建议的解决方式：Three AWS Glue ETL job types for converting data to Apache Parquet

On the Amazon Web Services (AWS) Cloud, AWS Glue is a fully managed extract, transform, and load (ETL) service. AWS Glue makes it cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams.

还需要补充一点是 Glue 是支持数据流的：使用流式处理数据源

您可以创建连续运行并使用来自流式处理源的数据的流式处理提取、转换和负载（ETL）任务，例如 Amazon Kinesis Data Streams、Apache Kafka 和 Amazon Managed Streaming for Apache Kafka（Amazon MSK）。

顺便看到 1️⃣ 的时候愣了一下，又忘记 EMR 是什么了：什么是 Amazon EMR？

Amazon EMR（以前称为 Amazon Elastic MapReduce）是一个托管集群平台，可简化在AWS上运行大数据框架（如 Apache Hadoop 和 Apache Spark）的过程，以处理和分析海量数据。使用这些框架和相关的开源项目，您可以处理用于分析目的的数据和业务情报工作负载。Amazon EMR 还允许您转换大量数据并移出/移入到其它AWS数据存储和数据库中，例如 Amazon Simple Storage Service（Amazon S3）和 Amazon DynamoDB。

Amazon EMR 是大数据框架托管平台。

👨‍👨‍👦‍👦 社区讨论：AWS Glue isa fully managed ETL service that simplifies the process of preparing and transforming data for analytics. Using AWS Glue requires minimal development effort compared to the other options.

Option A requires more development effort as it involves writing a Sparkapplication to transform the data. It also introduces additional infrastructure management with the EMR cluster.
Option C requires writing and managing custom Bash scripts for data transformation. It requires more manual effort and does not provide the built-in capabilities of AWS Glue for data transformation.
Option D requires developing and managing a custom Lambda for data transformation. While Lambda can handle the transformation, it requires more effort compared to AWS Glue, which is specifically designed for ETL operations.

Therefore, option B provides the easiest and least development effort by leveraging AWS Glue’s capabilities for data discovery, transformation,and output to the transformed data bucket.

九、Second infrastructure for DR

A company runs a global web application on Amazon EC2 instances behind an Application Load Balancer. The application stores data in Amazon Aurora. The company needs to create a disaster recovery solution and can tolerate up to 30 minutes of downtime and potential data loss. The solution does not need to handle the load when the primary infrastructure is healthy.
What should a solutions architect do to meet these requirements?

✅ Deploy the application with the required infrastructure elements in place. Use Amazon Route 53 to configure active-passive failover. Create an Aurora Replica in a second AWS Region.
Host a scaled-down deployment of the application in a second AWS Region. Use Amazon Route 53 to configure active-active failover. Create an Aurora Replica in the second Region.
Replicate the primary infrastructure in a second AWS Region. Use Amazon Route 53 to configure active-active failover. Create an Aurora database that is restored from the latest snapshot.
Back up data with AWS Backup. Use the backup to create the required infrastructure in a second AWS Region. Use Amazon Route 53 to configure active-passive failover. Create an Aurora second primary instance in the second Region.

✨ 关键词：DR

1️⃣ ✅

💡 解析：公司需要灾备方案，允许 30 分钟的离线和数据丢失，但是要求这个方案（容灾架构）在主服务正常的情况下不要工作。
为了达到主服务正常的情况下不工作的需求，需要使用 Amazon Route 53 的 主动/被动 (active-passive) 故障转移。
关于 主动/主动 (active-active) 和 主动/被动 (active-passive) 的区别：主动/主动和主动/被动故障转移

主动/主动故障转移
如果您希望所有资源在大部分时间内都可用，可使用此故障转移配置。当某个资源不可用时，Route 53 可以检测到它运行状况不佳并且停止在响应查询时包含该资源。
在双活故障转移中，具有相同名称、相同类型（例如 A 或 AAAA）和相同路由策略（如加权或延迟）的所有记录处于活动状态，除非 Route 53 认为它们运行状况不良。Route 53 可以使用任何运行状况良好的记录响应 DNS 查询。

主动/被动故障转移
如果您希望主资源或资源组在大部分时间内可用，同时希望辅助资源或资源组处于备用状态以防所有主资源均不可用，可使用主动/被动故障转移配置。响应查询时，Route 53 将只包含运行状况良好的主资源。如果所有主资源的运行状况都不佳，Route 53 将只在 DNS 查询的响应中包含运行状况良好的辅助资源。

4️⃣ 无疑也是可以做到恢复架构的，社区里有人提到了 AWS Backup 的 RTO 是以小时计算的，我并没有找到详细的资料说明。但是可以肯定的是它一定没有选项 1️⃣ 恢复得快。
4️⃣ 的多主数据库仅适用于 MySQL 引擎，不过在这里并没有必要，应用都停了，数据库还跑着没有意义。

👨‍👨‍👦‍👦 社区讨论：Anything that is not instant recovery isactive - passive.
In active -passive we have:

Aws Backup(least op overhead) - RTO/RPO = hours

Pilot Light ( Basic Infra isalready deployed, but needs to be fully implemented) -RTO/RPO = 10’s of minutes.

Warm Standby- (Basic infra + runs small loads ( might need to add auto scaling) -RTO/RPO= minutes

( ACTIVE -ACTIVE ) : Multi AZ option : instant

here we can tolerate 30 mins
hence B,D are incorrect. AWS backup is in hours, hence D is incorrect.
therefore A

十、In-memory tasks

A company’s application is having performance issues. The application is stateful and needs to complete in-memory tasks on Amazon EC2 instances. The company used AWS CloudFormation to deploy infrastructure and used the M5 EC2 instance family.
As traffic increased, the application performance degraded. Users are reporting delays when the users attempt to access the application.
Which solution will resolve these issues in the MOST operationally efficient way?

Replace the EC2 instances with T3 EC2 instances that run in an Auto Scaling group. Make the changes by using the AWS Management Console.
Modify the CloudFormation templates to run the EC2 instances in an Auto Scaling group. Increase the desired capacity and the maximum capacity of the Auto Scaling group manually when an increase is necessary.
Modify the CloudFormation templates. Replace the EC2 instances with R5 EC2 instances. Use Amazon CloudWatch built-in EC2 memory metrics to track the application performance for future capacity planning.
✅ Modify the CloudFormation templates. Replace the EC2 instances with R5 EC2 instances. Deploy the Amazon CloudWatch agent on the EC2 instances to generate custom application latency metrics for future capacity planning.

✨ 关键词：in-memory tasks

4️⃣ ✅

💡 解析：M5 型的 EC2 运行内存型任务出现了性能瓶颈。问有什么最具操作性价比的行为。
1️⃣ 2️⃣ 选择了水平扩容；3️⃣ 4️⃣ 选择了垂直扩容并监控状态为之后的扩容计划做准备。
显然是 3️⃣ 4️⃣ 更加合理。
过一下各类型的 EC2 实例：Amazon EC2 实例类型

M 系列（通用型实例）- 提供了计算、内存和网络资源的平衡，可用于各种不同的工作负载。

这些实例非常适合于以相等比例使用这些资源的应用程序，例如 Web 服务器和代码库。

C 系列（计算优化型实例）- 是计算限制型应用程序的理想选择，可以受益于高性能处理器。

非常适合于批处理工作负载、媒体转码、高性能 Web 服务器、高性能计算（HPC）、科学建模、专用游戏服务器和广告服务器引擎、机器学习推理和其他计算密集型应用程序。

R 系列（内存优化型实例）- 内存优化型实例旨在为处理内存中大型数据集的工作负载提供快速性能。

内存密集型工作负载，如开源数据库、内存缓存和实时大数据分析。

👨‍👨‍👦‍👦 社区讨论：D is the correct answer.
”in-memory tasks” => need the “R” EC2 instance type to archive memory optimization.So we are concerned about C & D.
Because EC2 instances don’t have built-in memory metrics to CW by default. Asa result, we have to install the CW agent to archive the purpose.