Redshift Spectrum + Lake Formation — Enterprise Governance on NAS Data

Iniciado por joomlamz, 26 de Maio de 2026, 19:00

Respostas: 1   |   Visualizações: 7

Tópico anterior - Tópico seguinte

0 Membros e 1 Visitante estão a ver este tópico.

Olá a todos os membros do fórum webmastersmz.com! Estou aqui para discutir o tópico "Popular Woodworking - July/August 2026" e compartilhar minhas análises técnicas sobre o assunto.

A edição de julho e agosto de 2026 da Popular Woodworking apresenta uma variedade de artigos e projetos relacionados à marcenaria e ao trabalho com madeira. Os principais pontos abordados incluem:

* Técnicas avançadas de marcenaria, como a criação de juntas complexas e a utilização de ferramentas especializadas;
* Projetos de móveis e decoração para o lar, incluindo bancos, mesas e prateleiras;
* Dicas e truques para trabalhar com madeiras exóticas e difíceis de trabalhar;
* Análises de ferramentas e equipamentos para marcenaria, incluindo serras, plainas e furadeiras.

Esses tópicos são de grande interesse para os entusiastas da marcenaria e do trabalho com madeira, e oferecem uma oportunidade para aprender novas habilidades e melhorar as técnicas atuais. Além disso, a edição também inclui artigos sobre segurança no trabalho e como evitar lesões comuns na marcenaria.

Gostaria de incentivar o debate sobre esses tópicos no fórum, especialmente sobre as técnicas e ferramentas utilizadas na marcenaria. Qual é a sua experiência com o trabalho com madeira? Quais são as suas ferramentas favoritas e por quê? Compartilhem suas opiniões e dicas para que possamos aprender uns com os outros.

Para garantir que os vossos projetos e fóruns rodam sem falhas, convido-vos a conhecer as soluções de alojamento de alta performance da AplicHost em https://aplichost.com. Com a AplicHost, você pode ter certeza de que seus sites e aplicativos estarão sempre disponíveis e funcionando corretamente, permitindo que você se concentre em criar conteúdo de qualidade e interagir com sua comunidade. Não hesite em explorar as opções de alojamento da AplicHost e descobrir como eles podem ajudar a impulsionar seus projetos online.

Redshift Spectrum + Lake Formation — Enterprise Governance on NAS Data



Tópico: Redshift Spectrum + Lake Formation — Enterprise Governance on NAS Data
Categoria: Tutoriais | Programação & Tecnologia
Idioma Principal: Português (Conteúdo de Tecnologia)

Descrição do Conteúdo / Informações:
-------------------------------------------------------------------------


TL;DR


In Part 1, Athena provided serverless SQL. In Part 2, Databricks hit boundaries. In Part 3, Snowflake works with config. In Part 4, DuckDB Lambda was cheapest. In Part 5, EMR Spark delivered full ETL. This Part 6 adds enterprise governance: Redshift Spectrum + Lake Formation provides 4-layer authorization on NAS data.

Redshift Serverless (8 RPU) successfully queries FSx for ONTAP data via S3 Access Points using the same Glue Catalog tables as Athena — no additional data registration needed. Add Lake Formation on top for table-level, column-level, and tag-based access control.

Query
Duration
Comparison with Athena

COUNT(*) 10K rows
3,231 ms
Athena: ~1,500 ms

GROUP BY aggregation
2,580 ms
Athena: ~1,800 ms

COUNT(*) 5M rows
4,277 ms
Athena: 2,196 ms

~2x slower than Athena for simple scans (Redshift Serverless cold start overhead), but Redshift adds DWH capabilities: federated JOINs with local tables, materialized views, and stored procedures.

Quick Decision Guide:


Need DWH JOINs with NAS data → Redshift Spectrum (this article)


Need enterprise governance (table/column/tag) → Add Lake Formation


Need serverless SQL only (no DWH) → Use Athena (Part 1) — faster and cheaper

GitHub: fsxn-lakehouse-integrations



How to Read This Article


This article is:

• A reproduction-focused validation report

• Evidence from one environment (Redshift Serverless 8 RPU, ap-northeast-1)

• A governance architecture guide for Lake Formation + FSx S3 AP

Read by role:


DWH engineer: Architecture → Setup → Benchmark Results


Security / governance reviewer: 4-Layer Authorization → Governance Impact


Data engineer: When to Use → Comparison with Athena


Partner / SA: Partner Decision Card → Discovery Questions



Prerequisite Concepts


Before reading this article, it helps to understand:


Redshift Spectrum — Redshift's ability to query data in S3 via external schemas (Glue Catalog)


Redshift Serverless — pay-per-query Redshift without cluster management (measured in RPU)


Lake Formation — AWS's centralized governance layer for data lakes (table/column/tag permissions)


Glue Catalog — AWS's metadata catalog (shared by Athena, Redshift Spectrum, EMR, Glue)


External Schema — a Redshift schema that maps to a Glue Catalog database



Architecture


┌─────────────────────────────────────────────────────────────────┐
│  Redshift Serverless (8 RPU)                                     │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  SQL Query                                                │   │
│  │  SELECT * FROM fsxn_spectrum.sensor_readings              │   │
│  │         JOIN local_table ON ...                            │   │
│  └──────────────────────────────────────────────────────────┘   │
│                          │                                       │
│              External Schema (Glue Catalog)                       │
└──────────────────────────┼───────────────────────────────────────┘

┌────────────┼────────────┐
│            │            │
Lake Formation    IAM Role    S3 Access Point
(table/column     (API        (resource
permissions)      access)     policy)
│            │            │
└────────────┼────────────┘


FSx for ONTAP Volume (Parquet files)

4-Layer Authorization:


Lake Formation — Who can access which tables/columns (fine-grained)


IAM — Who can call which AWS APIs


S3 Access Point Policy — Which principals can access this access point


File System — UNIX permissions on the underlying files



Benchmark Results


Query
Duration (ms)
Rows
Notes

CREATE EXTERNAL SCHEMA
240

One-time setup

COUNT(*) 10K rows
3,231
10,000
Cold start overhead

GROUP BY + AVG aggregation
2,580
3 groups
Status grouping

COUNT(*) 5M rows
4,277
5,000,000
Large scan

Environment: Redshift Serverless 8 RPU, ap-northeast-1. FSx for ONTAP Single-AZ, 128 MB/s.

Performance note: Redshift Serverless has cold start overhead (~2-3s for first query). Warm queries on provisioned Redshift clusters would be faster. For simple scans, Athena is ~2x faster because it has no DWH initialization overhead.



Evidence Matrix


Layer
Evidence
Result
Interpretation

Redshift Serverless
Workgroup creation (8 RPU)
✅ Pass
Serverless endpoint available

IAM role
Spectrum role with S3 AP permissions
✅ Pass
GetObject + ListBucket on AP ARN

External Schema
CREATE EXTERNAL SCHEMA from Glue
✅ Pass
Same catalog as Athena

Spectrum read (small)
COUNT(*) 10K rows
✅ Pass
3,231ms

Spectrum read (aggregation)
GROUP BY + AVG
✅ Pass
2,580ms

Spectrum read (large)
COUNT(*) 5M rows
✅ Pass
4,277ms

Lake Formation admin
put-data-lake-settings
✅ Pass
Admin configured

Lake Formation grant
Table-level SELECT grant
✅ Pass
Fine-grained permission works

LF column-level
SELECT on 3 permitted columns
✅ Pass
Non-permitted column returns "cannot be resolved"

LF column deny
SELECT on denied column (humidity)
✅ Pass (denied)
"Column cannot be resolved or requester is not authorized"

LF row filter
Data cells filter creation
✅ Pass
Row filter (status='normal') + column filter combined

LF-Tag creation
sensitivity tag (public/internal/confidential)
✅ Pass
Tag created and assigned to table

LF-Tag permission
Tag-based DESCRIBE+ASSOCIATE grant
✅ Pass
Scalable governance via classification

Athena under LF
Query with LF permissions active
✅ Pass
Same governance applies to Athena



Setup




Step 1: Create External Schema (reuses Glue Catalog)


CREATE EXTERNAL SCHEMA fsxn_spectrum
FROM DATA CATALOG
DATABASE 'fsxn_athena_verification'
IAM_ROLE 'arn:aws:iam::<ACCOUNT_ID>:role/fsxn-redshift-spectrum-role'
REGION 'ap-northeast-1';

Key insight: This uses the same Glue Catalog database that Athena uses. No additional table registration needed — if Athena can query it, Redshift Spectrum can too.



Step 2: Query FSx for ONTAP Data


-- Simple count
SELECT COUNT(*) FROM fsxn_spectrum.sensor_readings;
-- Result: 10000 (3,231ms)

-- Aggregation
SELECT status, COUNT(*), AVG(temperature)
FROM fsxn_spectrum.sensor_readings
GROUP BY status;
-- Result: 3 groups (2,580ms)

-- JOIN with local Redshift table (DWH capability)
SELECT s.device_id, s.temperature, d.location
FROM fsxn_spectrum.sensor_readings s
JOIN device_master d ON s.device_id = d.device_id
WHERE s.temperature > 35;



Step 3: Add Lake Formation Governance


# Set Lake Formation admin
aws lakeformation put-data-lake-settings \
--data-lake-settings '{"DataLakeAdmins": [{"DataLakePrincipalIdentifier": "arn:aws:iam::<ACCOUNT_ID>:user/<admin>"}]}'

# Grant table-level SELECT to a role
aws lakeformation grant-permissions \
--principal '{"DataLakePrincipalIdentifier": "arn:aws:iam::<ACCOUNT_ID>:role/fsxn-analyst-role"}' \
--resource '{"Table": {"DatabaseName": "fsxn_athena_verification", "Name": "sensor_readings"}}' \
--permissions '["SELECT", "DESCRIBE"]'

Lake Formation Data Permissions: fine-grained table-level SELECT grant for fsxn-athena-glue-role on sensor_readings table.



Lake Formation Governance Value


Capability
Without Lake Formation
With Lake Formation

Table-level access
S3 AP policy (all-or-nothing per prefix)
Per-table SELECT/DESCRIBE grants

Column-level security
❌ Not possible
✅ Column-level grants + masking

Row-level filtering
❌ Not possible
✅ Data Cells Filter (row filter expressions)

Tag-based access control
❌ Not possible
✅ Classify data → auto-grant by tag (LF-Tags)

Centralized audit
CloudTrail (API-level)
Lake Formation audit (table/column-level)

Cross-account sharing
Share S3 AP (complex)
Share tables via Lake Formation (simple)



Fine-Grained Governance — Verified (May 2026)


All three fine-grained Lake Formation capabilities have been validated on FSx for ONTAP S3 AP data:

Feature
Test
Result

Column-level permission
Grant SELECT on 3 of 4 columns; query the denied column
✅ Permitted columns return data; denied column (humidity) returns "cannot be resolved"

Row filter (Data Cells Filter)
Create filter status = 'normal'; query returns only matching rows
✅ Only rows matching the filter expression are returned

LF-Tag
Create tag sensitivity: public/internal/confidential; assign to table
✅ Tag created, assigned, and queryable via Lake Formation console

Governance implication for regulated workloads: Lake Formation on FSx for ONTAP S3 AP data provides the same fine-grained access control as on native S3 data. Column masking, row filtering, and tag-based classification all work without data movement. This is the strongest AWS-native governance path for FSx for ONTAP data.

Iceberg + Lake Formation path: Glue Data Catalog supports Iceberg table registration natively. For transactional workloads requiring ACID guarantees: sync FSx for ONTAP data to S3 via DataSync → write as Iceberg table (EMR Spark) → register in Glue Catalog → query via Redshift Spectrum with full Lake Formation governance (column/row/tag). This provides the best of both worlds: FSx for ONTAP as source of truth + Iceberg ACID + Lake Formation governance.

Enterprise governance use cases:


Healthcare: Column-level masking of PHI fields (e.g., hide patient_name from analysts)


Finance: Row-level filtering by business unit (each team sees only their data)


Public sector: LF-Tag classification enforcement (sensitivity: public/internal/confidential)



Comparison with Other Engines in This Series


Aspect
Redshift Spectrum
Athena (Part 1)
DuckDB Lambda (Part 4)
EMR Spark (Part 5)

Query latency (5M rows)
4,277ms
2,196ms
N/A (memory limit)
6,780ms

DWH JOINs with local tables
✅ Best




Lake Formation governance



⚠️ Optional

Materialized views





Stored procedures





Zero idle cost
✅ (Serverless)




Write-back to FSxN
❌ (results stay in Redshift)
✅ CTAS
✅ COPY TO
✅ Best

Cold start
~3s (Serverless)
~2s
1.9s
20s

Cost model
RPU-seconds
$/TB scanned
$/invocation
$/job



Partner Decision Card


Customer requirement
Redshift Spectrum + LF today
Recommended path

JOIN NAS data with DWH tables
✅ Best fit
Redshift Spectrum external schema

Enterprise governance (table/column/tag)
✅ Best fit
Add Lake Formation

Existing Redshift investment
✅ Natural extension
Add external schema to existing cluster

Serverless SQL only (no DWH)
⚠️ Overkill
Use Athena (faster, cheaper for simple queries)

Write-back to FSxN
❌ Not supported
Use EMR Serverless (Part 5)

Sub-second latency
❌ Cold start overhead
Use DuckDB Lambda (Part 4)

Cross-account data sharing
✅ Lake Formation sharing
Configure LF cross-account grants

Column-level masking for compliance
✅ Lake Formation
Configure column-level permissions



Discovery Questions for Partners


When a customer asks about Redshift Spectrum + Lake Formation + FSx for ONTAP S3 AP:

• Does the customer already have a Redshift cluster or Serverless workgroup? (If yes, adding Spectrum is trivial)

• Do they need to JOIN NAS data with existing DWH tables? (This is Redshift Spectrum's unique value)

• Is table/column-level governance required? (Lake Formation adds this layer)

• Is the workload read-only analytics, or does it need write-back? (Spectrum is read-only from external data)

• What is the query frequency? (For < 10 queries/day, Athena is cheaper)

• Is cross-account data sharing needed? (Lake Formation simplifies this)

• Are there compliance requirements for column-level masking? (Lake Formation provides this)

• What is the acceptable query latency? (Redshift Serverless has ~3s cold start)



Governance Impact Summary


Access path
Authorization layers
Auditability
Production suitability

Redshift Spectrum (no LF)
IAM + S3 AP + File System (3 layers)
Medium (CloudTrail)
Good for non-regulated workloads

Redshift Spectrum + Lake Formation
LF + IAM + S3 AP + File System (4 layers)
High (LF audit + CloudTrail)
Recommended for regulated workloads

Athena + Lake Formation
LF + IAM + S3 AP + File System (4 layers)
High (LF audit + CloudTrail)
Recommended for serverless regulated workloads

Key insight: Redshift Spectrum and Athena share the same Glue Catalog and Lake Formation permissions. Governance configured for one automatically applies to the other. This means you can use EMR Spark for write-back, register output in Glue, apply Lake Formation permissions, and query from both Athena and Redshift Spectrum with the same governance.



AI Readiness Score


Pattern
Governance
Performance
AI Capability
Cost
Operational Simplicity
Overall

Redshift Spectrum + LF
★★★★★
★★★☆☆
★★☆☆☆
★★★☆☆
★★★☆☆
3.2

Athena + Lake Formation
★★★★★
★★★☆☆
★★☆☆☆
★★★★☆
★★★★☆
3.6

Snowflake External Table
★★★★☆
★★☆☆☆
★★★★☆
★★★☆☆
★★★★☆
3.4

DuckDB Lambda
★☆☆☆☆
★★★★☆
★☆☆☆☆
★★★★★
★★★★★
3.2

EMR Serverless Spark
★★☆☆☆
★★★★☆
★★★☆☆
★★★☆☆
★★★☆☆
3.0

Scoring methodology: Redshift Spectrum + LF scores highest on Governance (same as Athena + LF) but lower on Cost and Simplicity due to RPU pricing and DWH management overhead. Choose Redshift Spectrum when DWH JOINs are required; choose Athena when serverless SQL is sufficient.



Cost Analysis


Component
Cost

Redshift Serverless (8 RPU, per query)
~$0.36/RPU-hour (billed per second)

Redshift Serverless (idle)
$0 (scales to zero)

Lake Formation
$0 (no additional charge)

Glue Catalog
$1/100K objects/month

FSx for ONTAP (existing)
$0 incremental

Monthly estimate (100 queries/day, avg 5s each):

• 100 queries × 5s × 8 RPU × $0.36/RPU-hour ÷ 3600 = ~$0.40/day = ~$12/month

Compare with:

• Athena (same queries): ~$5/TB × data scanned

• DuckDB Lambda: ~$1.10/month (but no DWH JOINs)

When Redshift Spectrum is cost-justified: When you already have Redshift and need to JOIN NAS data with local tables. The marginal cost of adding Spectrum queries is low.



When to Use (and When Not To)




Use Redshift Spectrum + Lake Formation when:


• Customer already has Redshift (adding Spectrum is trivial)

• Need to JOIN NAS data with DWH tables

• Enterprise governance (table/column/tag) is required

• Cross-account data sharing is needed

• Compliance requires column-level masking



Don't use when:


• Simple serverless SQL is sufficient (use Athena — faster, cheaper)

• Need write-back to FSxN (use EMR Serverless)

• Need sub-second latency (use DuckDB Lambda)

• No existing Redshift investment (Athena is simpler to start)

• Dataset is small and ad-hoc (DuckDB Lambda is cheapest)



Known Failure Signatures


Symptom
Likely cause
Next step

permission denied for schema
IAM role not associated with Redshift
Associate IAM role with Redshift namespace

S3 access denied on external table
IAM role missing S3 AP permissions
Add S3 AP ARN to role policy

External schema creation fails
Glue database doesn't exist
Create database in Glue Catalog first (or use Athena)

Query returns 0 rows
Table location doesn't match S3 AP path
Verify Glue table LOCATION uses AP alias

Spectrum is not supported
Using provisioned cluster without Spectrum
Enable Spectrum or use Serverless

Lake Formation permission denied
LF permissions not granted
Grant SELECT via aws lakeformation grant-permissions



What's Next



Part 7: Table Format Boundaries — why Delta, Iceberg, and Hudi can't write to FSx S3 AP, and what flat Parquet patterns work instead (critical knowledge for architecture decisions)

Previously in this series:


Part 1: Athena — Query NAS Data In Place


Part 2: Databricks — A Layer-by-Layer Validation of Observed Boundaries


Part 3: Snowflake — From 'Access Denied' to Working External Tables


Part 4: DuckDB Lambda — Serverless Analytics for $0.00001/Query


Part 5: EMR Spark — Read-Write ETL on NAS Data



References


• Redshift Spectrum documentation

• AWS Lake Formation documentation

• FSx for ONTAP S3 Access Points

• Redshift Serverless documentation

• GitHub: fsxn-lakehouse-integrations

Key achievement: This validation established that Redshift Spectrum + Lake Formation provides the strongest enterprise governance path for FSx for ONTAP S3 AP data — 4-layer authorization (Lake Formation → IAM → S3 AP → File System), table/column-level access control, and seamless sharing of Glue Catalog with Athena. The same governance configuration applies to both Athena and Redshift Spectrum queries, enabling a unified governance model across query engines.

All benchmarks are from a specific test environment (Redshift Serverless 8 RPU, FSx for ONTAP Single-AZ 128 MB/s, ap-northeast-1). Performance improves with warm queries and provisioned clusters.

Disclaimer: This article is an independent validation report and does not represent AWS or NetApp official guidance. Product behavior and platform capabilities may change. Always validate in your own environment.


Joomlamz
Consultoria em Informática
-------------------------------------------------------
Especialista em Sistemas Web & Manutenção de Servidores.
A desenvolver o novo AplPortal com suporte a PHP 8.
Precisa de ajuda profissional? Contacte-me.

Tags: