
Compressing your data, using columnar file formats like Apache Parquet, and using partitions can save you a lot of query time and money in the long run.Īmazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.Ī data warehouse is a repository of databases containing all your data, in a central place. Athena charges based on the data it scanned on S3, around $5 per TB. In the worst case, it took us few hours to get data. The products sound similar by their description but one main difference is that Athena allows you to query data in S3 but thats all you can do while Redshift.


dbt can interact with Amazon Redshift Spectrum to create external tables, refresh external table partitions, and access raw data in an Amazon S3-based data lake from the data warehouse. AWS Redshift Spectrum, Athena, S3 Redshift Spectrum is a powerful feature that enables data querying in Redshift directly from S3. Redshift Spectrum is a logical extension of Redshift to query the data from Redshift as well as Amazon S3 data lakes whereas. We have used Athena with several terabytes of data, AWS says it can easily scale to petabytes of data. The external tables exist in an external data catalog, which can be AWS Glue, the data catalog that comes with Amazon Athena, or an Apache Hive metastore. Athena’s cost is 5 per terabyte of data scanned, while Redshift’s hourly costs range from 0.250 to 4.800 per hour for a DC instance, and 0.850 to 6.800 per hour for a DS instance. Basically, you have your data in AWS S3, in one of the formats like CSV, TSV, Apache Parquet, JSON, etc, Athena can help you analyze the data within it in some minutes of setup and query it using SQL in a few seconds. It is based on Prestodb, developed by Netflix and Facebook.

Therefore, it is useful for rarely-accessed data or for one-off queries on a dataset without having to import the data into Redshift. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Redshift Spectrum is handy for accessing data stored in Amazon S3 without having to load it into the Redshift cluster, but it will not be as fast as accessing data stored in normal Redshift. What’s right for you?Īmazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. To get the most from this course, you should have a basic understanding of Amazon Redshift, Amazon Athena, AWS Glue, and data analytics concepts.
