They may be in one common bucket or two separate ones. Let’s say we have a transaction log and product data stored in S3. We create a separate table for each dataset. They contain all metadata Athena needs to know to access the data, including: It makes sense to create at least a separate Database per (micro)service and environment. Here they are just a logical structure containing Tables. The alternative is to use an existing Apache Hive metastore if we already have one. As the name suggests, it’s a part of the AWS Glue service. The default one is to use the AWS Glue Data Catalog. The metadata is organized into a three-level hierarchy:ĭata Catalog is a place where you keep all the metadata. Data Catalogs, Databases and Tablesīefore we begin, we need to make clear what the table metadata is exactly and where we will keep it. The only things you need are table definitions representing your files’ structure and schema. To run a query you don’t load anything from S3 to Athena. It’s also great for scalable Extract, Transform, Load (ETL) processes. It’s used for Online Analytical Processing (OLAP) when you have Big Data ALotOfData™ and want to get some information from it. Amazon Athena and dataĪmazon Athena is a serverless AWS service to run SQL queries on files stored in S3 buckets. Also, I have a short rant over redundant AWS Glue features. More importantly, I show when to use which one (and when don’t) depending on the case, with comparison and tips, and a sample data flow architecture implementation. Here I show three ways to create Amazon Athena tables.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |