Webvar
Tuberculosis - Total number of cases in the US | CDC - logo

Tuberculosis - Total number of cases in the US | CDC

Centers for Disease Control and Prevention provides free and open access to various health related data. This release contains total number of Tuberculosis cases reported in the United States, by region and by states, in accordance with the current method of displaying WONDER data. Data on United States will exclude counts from US territories. The data is available for past 2 years.

Available in

AWS Marketplace

Available in

AWS Marketplace

Purchase this listing from Webvar in AWS Marketplace using your AWS account. In AWS Marketplace, you can quickly launch pre-configured software with just a few clicks. AWS handles billing and payments, and charges on your AWS bill.

About

CDC works 24/7 to protect America from health, safety and security threats, both foreign and in the U.S. Whether diseases start at home or abroad, are chronic or acute, curable or preventable, human error or deliberate attack, CDC fights disease and supports communities and citizens to do the same. As the nation’s health protection agency, CDC saves lives and protects people from health threats. To accomplish its mission, CDC conducts critical science and provides health information that protects against expensive and dangerous health threats, and responds when these arise.

This release contains total number of Tuberculosis cases reported in the United States, by region and by states, in accordance with the current method of displaying WONDER data. Data on United States will exclude counts from US territories. The data updates every quarter and is available for the past 2 years. This data is anonymized/aggregated.

# More Information:

* [Source - Division of STD Prevention, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention](https://data.cdc.gov/NNDSS/NNDSS-Table-II-Tuberculosis/5avu-ff58)

* [Schema Definitions](https://s3.amazonaws.com/rearc-data-provider/tuberculosis-cdc/public/tuberculosis-cdc-schema.docx)

* [Sample Dataset](https://s3.amazonaws.com/rearc-data-provider/tuberculosis-cdc/public/sample.csv)

* [Terms of Use](https://www.usa.gov/government-works)

* [CDC Data Homepage](https://data.cdc.gov/)

* Frequency: Annual

# What's included?

You will receive access to the following:

* Total number of Tuberculosis cases reported in the US (tuberculosis-cdc.csv)

* CloudFormation template that setups up automatic revision updates plus AWS analytics services such as AWS Glue and Amazon Athena (cloudformation.yaml)

* AWS Lambda code for revision updates (post-processing-code.zip)

*Please note, in the post processing code, we use a Lambda layer that extends the AWS Python SDK (boto3) that is built into the Lambda Python runtime by adding the AWS Data Exchange and AWS Marketplace Catalog API SDKs as of November 13, 2019. Once the public SDKs are updated to include AWS Data Exchange APIs, we will update the code to remove this Lambda layer.*

# Deploy CloudFormation template to set up automatic revision updates and AWS Analytics services

Assuming you have subscribed to this product listing, below are the detailed steps to deploy CloudFormation template:

(*Please note that you will need IAM permissions for CloudFormation, AWS Data Exchange, IAM, Lambda, Glue, Athena and QuickSight, in order to deploy the CloudFormation template.*)

* Under the product listing, scroll down to `Data sets` section and click on the Data set name

* Under the `Revisions` section, click on the most recent revision

* Under `Assets`, checkmark `tuberculosis-cdc/automation/post-processing-code.zip` and click `Export to S3`

* Choose the S3 Bucket where you would like to store the dataset. Make sure you only choose the S3 bucket. The asset comes with a pre-defined directory structure

* Under `Assets`, checkmark `tuberculosis-cdc/automation/cloudformation.yaml` and click either `Export to S3` or `Export to computer`

* If you exported the `cloudformation.yaml` to S3, go to the S3 UI on the AWS console and navigate to the location where the `cloudformation.yaml` is stored. In S3, click on the cloudformation.yaml and copy the url from the `Object URL`

* Now, from your AWS Management Console, log onto Amazon CloudFormation UI and click `Create Stack`

* Under `Choose a template` either provide the template via uploading from local computer or specify the S3 object url and click `Next`

* Provide a friendly stack name in the `Stack name` text box

* In the `SourceS3Bucket` field, input the S3 bucket name that you chose earlier to store the tuberculosis-cdc/automation/post-processing-code.zip file

* Leave rest of the fields as is

* Click `Next`

* In the `Options` screen, click `Next`

* Tick mark the `I acknowledge that AWS CloudFormation might create IAM resources.` box

* Click `Create`

## At a high level, CloudFormation will setup following resources automatically.

* Lambda function to setup automatic AWS Data Exchange revision updates for this dataset

* CloudWatch Event rule that will automatically trigger the Lambda function every time a new revision update is published

* Another Lambda function to setup AWS Glue and Amazon Athena

* Necessary IAM roles and permissions

If you are interested in looking at the AWS Lambda code or the CloudFormation template, feel free to inspect files inside `tuberculosis-cdc/automation/post-processing-code.zip` and `tuberculosis-cdc/automation/cloudformation.yaml`

# Analytics & Visualizations

Apart from the source data, what we are also providing in this product listing is an easy way to interact and extract value out of the dataset. Native AWS Analytics services such as AWS Glue, Amazon Athena and Amazon QuickSight provide different ways to interact and visualize the data. The included AWS CloudFormation template sets up AWS Glue and Amazon Athena automatically in your AWS account.

[Data Analysis - This diagram shows how all the AWS services interact](https://s3.amazonaws.com/rearc-data-provider/tuberculosis-cdc/public/data-analysis.png)

## Using AWS Glue and Amazon Athena to run interactive queries against the dataset

Once the CloudFormation template is successfully deployed, the data is immediately searchable, queryable, and available on Athena. You can go to the Athena UI from the AWS Management Console and run SQL queries on the dataset.

### Here are some sample Athena SQL queries you can try on the dataset.

**# list total no. of Tuberculosis cases reported for year 2018 based on quarterly updated data**

```bash

SELECT "reporting_area", "mmwr_quarter", "tuberculosis_cum_2018" FROM "tuberculosis_cdc"."data" ORDER BY "tuberculosis_cum_2018" DESC;

```

**# compare total no. of Tuberculosis cases reported for year 2018 and 2019 based on quarterly updated data**

```bash

SELECT "reporting_area", "mmwr_quarter", "tuberculosis_cum_2018", "tuberculosis_cum_2019" FROM "tuberculosis_cdc"."data" ORDER BY "reporting_area" ASC;

```

## Setup Amazon QuickSight to create visualizations on the dataset

Below are the detailed steps to analyze dataset using Amazon QuickSight

* From your AWS Management Console, log onto Amazon QuickSight

* Click `Manage data`

* Click `New data set`

* If you ran the provided CloudFormation template, you should already have your database and table with schema created in AWS Glue and Athena

* Click on `Athena` to connect to your data source

* Provide a name for your QuickSight `Data source name` and click `Create data source`

* In the `Database: contain sets of table` dropdown, choose database as `tuberculosis_cdc` and under `Tables: contain the data you can visualize`, choose table as `data`

* At this point, you can `Edit/Preview data` if you like

* You can then click on `Select`

* In the `Finish data set creation` screen, you can select `Visualize` to finish the creation of data set process

* Visualize the data set by selecting the `Horizontal bar chart` from the `Visual types`

* Drag `reporting_area` field to the `Y axis` in `Field wells` and for e.g. drag `tuberculosis_cum_2018` field in the `Value` block to chart the data

You are now ready to start analyzing and visualizing the dataset.

## Contact Information

If you have questions about the source data, please contact cdcinfo@cdc.gov. If you have any questions about the CloudFormation stack, Lambda code or any of the AWS services being used, please contact data@rearc.io.

## About Rearc

Rearc is a cloud, software and services company. We believe that empowering engineers drives innovation. Cloud-native architectures, modern software and data practices, and the ability to safely experiment can enable engineers to realize their full potential. We have partnered with several enterprises and startups to help them achieve agility. Our approach is simple — empower engineers with the best tools possible to make an impact within their industry.

Related Products

How it works?

Search

Search 25000+ products and services vetted by AWS.

Request private offer

Our team will send you an offer link to view.

Purchase

Accept the offer in your AWS account, and start using the software.

Manage

All your transactions will be consolidated into one bill in AWS.

Contact Us

Contact Us

Questions? Comments? Call us today at (844) 470-5300 or fill out the form below