Code
import pandas as pd
import numpy as np
Rebekah Chuang
August 20, 2023
The following csv files are the 4 given original sample data files:
brands.csv
receipt_items.csv
receipts.csv
users.csv
The goals of this project are:
Review CSV data and diagram a new structured relational data model
Generate queries that answer predetermined business question(s)
Choose something noteworthy to share with a non-technical stakeholder
Skills: Python
(NumPy
, Pandas
), SQL
, Tableau
# library(reticulate)
# install python and create virtual env for the first time
# version = "3.9.12"
# install_python(version = version)
# virtualenv_create("my_python_env", python_version = version)
# install packages
# virtualenv_install(envname = "my_python_env", "pandas", ignore_installed = FALSE, pip_options = character())
# virtualenv_install(envname = "my_python_env", "numpy", ignore_installed = FALSE, pip_options = character())
# virtualenv_install(envname = "my_python_env", "matplotlib", ignore_installed = FALSE, pip_options = character())
# virtualenv_install(envname = "my_python_env", "plotly", ignore_installed = FALSE, pip_options = character())
# virtualenv_install(envname = "my_python_env", "seaborn", ignore_installed = FALSE, pip_options = character())
# virtualenv_install(envname = "my_python_env", "jupyter", ignore_installed = FALSE, pip_options = character())
# run this if env already exists
library(reticulate)
use_virtualenv("my_python_env")
First, I cleaned the data using Python
(NumPy
, Pandas
). I dropped useless columns, converted datatype, and saved them into new files.
When saving new files, I add a suffix _new
to differentiate them from the original files. However, I changed the file name to the original one when importing csv files into SQL Server to make it more convenient when running queries.
ID | BARCODE | BRAND_CODE | CPG_ID | CATEGORY | CATEGORY_CODE | NAME | ROMANCE_TEXT | RELATED_BRAND_IDS | |
---|---|---|---|---|---|---|---|---|---|
0 | 5a8c35dde4b0ccf165fac9e6 | 511111904175 | PEPPERIDGE FARM | 5a734034e4b0d58f376be874 | Grocery | GROCERY | Pepperidge Farm | Pepperidge Farm has been making exceptional co... | NaN |
1 | 6234af8f4e09b6067c237adb | 511111212997 | CHEX MIX | 6233966e8942a67af4934aa1 | Snacks | SNACKS | CHEX MIX | NaN | NaN |
2 | 5332f7d3e4b03c9a25efd14e | 511111803393 | NaN | 5332f5f2e4b03c9a25efd0aa | Snacks | SNACKS | Cheez-It | NaN | NaN |
3 | 5d6412e9a3a018514994f426 | 511111304982 | BETTER HOMES & GARDENS | 53e10d6368abd3c7065097cc | Magazines | MAGAZINES | Better Homes & Gardens Magazine | Better Homes & Gardens offers beautiful photog... | 5d6413156d5f3b23d1bc790a : 5d6417dda3a01851499... |
4 | 621e777eacedc065cefa99a7 | 511111912859 | TRUVIA | 621e7754d759b10969cbcc08 | Baking | BAKING | Truvia® | NaN | NaN |
ID | STORE_NAME | PURCHASE_DATE | PURCHASE_TIME | DATE_SCANNED | TOTAL_SPENT | REWARDS_RECEIPT_STATUS | USER_ID | USER_VIEWED | PURCHASED_ITEM_COUNT | ... | PENDING_DATE | MODIFY_DATE | FLAGGED_DATE | PROCESSED_DATE | FINISHED_DATE | REJECTED_DATE | NEEDS_FETCH_REVIEW | DIGITAL_RECEIPT | DELETED | NON_POINT_EARNING_RECEIPT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 62868f660a72546bef0b2dd0 | TOWN OF ROCKY MOUNT | 2022-05-19T00:00:00Z | 2:05 PM | 2022-05-19T18:41:42.53Z | 859.87 | FINISHED | 61375682c625197c5db73602 | True | 1.0 | ... | 2022-05-19T18:57:57.228Z | 2022-05-22T11:20:57.998Z | 2022-05-19T18:42:57.042Z | NaN | 2022-05-19T18:57:57.336Z | NaN | NaN | False | NaN | True |
1 | 6096b7370a7216d316001149 | NaN | NaN | NaN | 2021-05-08T16:07:19.03Z | NaN | SUBMITTED | 60047e8a2d7db612a69d2e18 | True | NaN | ... | NaN | 2021-05-08T16:07:19.03Z | NaN | NaN | NaN | NaN | NaN | False | NaN | NaN |
2 | 6269a4ea0a7241077408b6e1 | FAMILY DOLLAR | 2022-04-27T00:00:00Z | 4:15 PM | 2022-04-27T20:17:46.09Z | 11.00 | FINISHED | 61578e96c2fb9321213720fc | True | 2.0 | ... | 2022-04-27T20:17:54.214Z | 2022-04-27T20:17:55.521Z | NaN | NaN | 2022-04-27T20:17:54.394Z | NaN | NaN | False | NaN | False |
3 | 625b25e70a723eb9730d2c9c | PUBLIX | 2022-04-15T00:00:00Z | 4:45 PM | 2022-04-16T20:24:07.259Z | 10.67 | FINISHED | 6048de2ec9106b12d2b1ff61 | True | 4.0 | ... | 2022-04-16T20:24:23.399Z | 2022-04-16T20:24:24.064Z | NaN | NaN | 2022-04-16T20:24:23.598Z | NaN | NaN | False | NaN | False |
4 | 60e3bd7e0a7215bd550fb8cc | COSTCO | 2021-06-30T00:00:00Z | 1:16 PM | 2021-07-06T02:18:38.495Z | 61.90 | FINISHED | 6048de2ec9106b12d2b1ff61 | True | 10.0 | ... | 2021-07-06T02:18:49.55Z | 2021-07-06T02:18:51.906Z | NaN | NaN | 2021-07-06T02:18:50.283Z | NaN | NaN | False | NaN | False |
5 rows × 21 columns
REWARDS_RECEIPT_ID | ITEM_INDEX | REWARDS_RECEIPT_ITEM_ID | DESCRIPTION | BARCODE | BRAND_CODE | QUANTITY_PURCHASED | TOTAL_FINAL_PRICE | POINTS_EARNED | REWARDS_GROUP | ORIGINAL_RECEIPT_ITEM_TEXT | MODIFY_DATE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 60bb28c10a720d557b128262 | 0 | 1efd6d7c75ecbae32214acb6cda41d12 | RLGULAR SALE | NaN | NaN | 1.0 | 9.99 | NaN | NaN | RLGULAR SALE | 2021-06-05T07:33:36.91Z |
1 | 60bb28c10a720d557b128262 | 1 | 79482a8fa3bd0eef3d626f1c862042e8 | 82 GOURMET HOUSEW | 000240292012 | NaN | 1.0 | 4.99 | NaN | NaN | 82 GOURMET HOUSEW | 2021-06-05T07:33:36.91Z |
2 | 627151230a724d730825106a | 0 | b26669cf4ce90cc9d7d3b0ab588cb04b | GOLDILOCKS NOPIA R BLAGK | NaN | NaN | 1.0 | 4.75 | NaN | NaN | GOLDILOCKS NOPIA R BLAGK | 2022-05-03T15:58:41.769Z |
3 | 627151230a724d730825106a | 1 | b4fafd04d8274a1e95b97155edaade2f | KURI-IRI DORAYAKI CAKE | NaN | NaN | 1.0 | 5.99 | NaN | NaN | KURI-IRI DORAYAKI CAKE | 2022-05-03T15:58:41.769Z |
4 | 627151230a724d730825106a | 2 | 39694b0880b511e8a12bfb76cf2c20f3 | YIZMANG FISH BALL | NaN | NaN | 1.0 | 12.00 | NaN | NaN | YIZMANG FISH BALL | 2022-05-03T15:58:41.769Z |
CREATED_DATE | BIRTH_DATE | GENDER | LAST_REWARDS_LOGIN | STATE | SIGN_UP_PLATFORM | SIGN_UP_SOURCE | ID | |
---|---|---|---|---|---|---|---|---|
0 | 2021-12-20T00:29:17.118Z | 1984-03-20T00:00:00Z | transgender | 2023-01-04T16:32:15Z | FL | NaN | Apple | 61bfce5d6655417f803b6538 |
1 | 2021-10-21T17:15:25.825Z | 1987-08-08T05:00:00Z | prefer_not_to_say | 2023-01-04T16:04:33Z | PA | unknown | 6171a02d8034995a3d858fb3 | |
2 | 2021-10-23T19:19:18.305Z | 1995-06-18T05:00:00Z | male | 2023-01-04T16:13:13Z | FL | NaN | Apple | 617460366e2684498143e769 |
3 | 2021-03-30T02:35:41.249Z | 1999-08-23T07:00:00Z | transgender | 2023-01-04T16:09:51Z | MI | ios | 60628e7d85cdff011de41113 | |
4 | 2021-04-26T23:15:54.375Z | 1992-10-28T16:16:23Z | male | 2023-01-04T16:24:18Z | CA | andrioid | 608749aac63a95130a45fbf4 |
brands
BARCODE | BRAND_CODE | CATEGORY | CATEGORY_CODE | NAME | |
---|---|---|---|---|---|
0 | 511111904175 | PEPPERIDGE FARM | Grocery | GROCERY | Pepperidge Farm |
1 | 511111212997 | CHEX MIX | Snacks | SNACKS | CHEX MIX |
2 | 511111803393 | NaN | Snacks | SNACKS | Cheez-It |
3 | 511111304982 | BETTER HOMES & GARDENS | Magazines | MAGAZINES | Better Homes & Gardens Magazine |
4 | 511111912859 | TRUVIA | Baking | BAKING | Truvia® |
BRAND_CODE | CATEGORY | CATEGORY_CODE | NAME | |
---|---|---|---|---|
BARCODE | ||||
511111904175 | PEPPERIDGE FARM | Grocery | GROCERY | Pepperidge Farm |
511111212997 | CHEX MIX | Snacks | SNACKS | CHEX MIX |
511111803393 | NaN | Snacks | SNACKS | Cheez-It |
511111304982 | BETTER HOMES & GARDENS | Magazines | MAGAZINES | Better Homes & Gardens Magazine |
511111912859 | TRUVIA | Baking | BAKING | Truvia® |
receipts
ID | STORE_NAME | PURCHASE_DATE | PURCHASE_TIME | DATE_SCANNED | TOTAL_SPENT | REWARDS_RECEIPT_STATUS | USER_ID | PURCHASED_ITEM_COUNT | CREATE_DATE | FINISHED_DATE | REJECTED_DATE | DIGITAL_RECEIPT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 62868f660a72546bef0b2dd0 | TOWN OF ROCKY MOUNT | 2022-05-19T00:00:00Z | 2:05 PM | 2022-05-19T18:41:42.53Z | 859.87 | FINISHED | 61375682c625197c5db73602 | 1.0 | 2022-05-19T18:41:42.53Z | 2022-05-19T18:57:57.336Z | NaN | False |
1 | 6096b7370a7216d316001149 | NaN | NaN | NaN | 2021-05-08T16:07:19.03Z | NaN | SUBMITTED | 60047e8a2d7db612a69d2e18 | NaN | 2021-05-08T16:07:19.03Z | NaN | NaN | False |
2 | 6269a4ea0a7241077408b6e1 | FAMILY DOLLAR | 2022-04-27T00:00:00Z | 4:15 PM | 2022-04-27T20:17:46.09Z | 11.00 | FINISHED | 61578e96c2fb9321213720fc | 2.0 | 2022-04-27T20:17:46.09Z | 2022-04-27T20:17:54.394Z | NaN | False |
3 | 625b25e70a723eb9730d2c9c | PUBLIX | 2022-04-15T00:00:00Z | 4:45 PM | 2022-04-16T20:24:07.259Z | 10.67 | FINISHED | 6048de2ec9106b12d2b1ff61 | 4.0 | 2022-04-16T20:24:07.259Z | 2022-04-16T20:24:23.598Z | NaN | False |
4 | 60e3bd7e0a7215bd550fb8cc | COSTCO | 2021-06-30T00:00:00Z | 1:16 PM | 2021-07-06T02:18:38.495Z | 61.90 | FINISHED | 6048de2ec9106b12d2b1ff61 | 10.0 | 2021-07-06T02:18:38.495Z | 2021-07-06T02:18:50.283Z | NaN | False |
STORE_NAME | PURCHASE_DATE | PURCHASE_TIME | DATE_SCANNED | TOTAL_SPENT | REWARDS_RECEIPT_STATUS | USER_ID | PURCHASED_ITEM_COUNT | CREATE_DATE | FINISHED_DATE | REJECTED_DATE | DIGITAL_RECEIPT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | ||||||||||||
62868f660a72546bef0b2dd0 | TOWN OF ROCKY MOUNT | 2022-05-19T00:00:00Z | 2:05 PM | 2022-05-19T18:41:42.53Z | 859.87 | FINISHED | 61375682c625197c5db73602 | 1.0 | 2022-05-19T18:41:42.53Z | 2022-05-19T18:57:57.336Z | NaN | False |
6096b7370a7216d316001149 | NaN | NaN | NaN | 2021-05-08T16:07:19.03Z | NaN | SUBMITTED | 60047e8a2d7db612a69d2e18 | NaN | 2021-05-08T16:07:19.03Z | NaN | NaN | False |
6269a4ea0a7241077408b6e1 | FAMILY DOLLAR | 2022-04-27T00:00:00Z | 4:15 PM | 2022-04-27T20:17:46.09Z | 11.00 | FINISHED | 61578e96c2fb9321213720fc | 2.0 | 2022-04-27T20:17:46.09Z | 2022-04-27T20:17:54.394Z | NaN | False |
625b25e70a723eb9730d2c9c | PUBLIX | 2022-04-15T00:00:00Z | 4:45 PM | 2022-04-16T20:24:07.259Z | 10.67 | FINISHED | 6048de2ec9106b12d2b1ff61 | 4.0 | 2022-04-16T20:24:07.259Z | 2022-04-16T20:24:23.598Z | NaN | False |
60e3bd7e0a7215bd550fb8cc | COSTCO | 2021-06-30T00:00:00Z | 1:16 PM | 2021-07-06T02:18:38.495Z | 61.90 | FINISHED | 6048de2ec9106b12d2b1ff61 | 10.0 | 2021-07-06T02:18:38.495Z | 2021-07-06T02:18:50.283Z | NaN | False |
STORE_NAME | PURCHASE_DATE | PURCHASE_TIME | DATE_SCANNED | TOTAL_SPENT | REWARDS_RECEIPT_STATUS | USER_ID | PURCHASED_ITEM_COUNT | CREATE_DATE | FINISHED_DATE | REJECTED_DATE | DIGITAL_RECEIPT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | ||||||||||||
62868f660a72546bef0b2dd0 | TOWN OF ROCKY MOUNT | 2022-05-19T00:00:00Z | 2:05 PM | 2022-05-19T18:41:42.53Z | 859.87 | FINISHED | 61375682c625197c5db73602 | 1 | 2022-05-19T18:41:42.53Z | 2022-05-19T18:57:57.336Z | NaN | False |
6096b7370a7216d316001149 | NaN | NaN | NaN | 2021-05-08T16:07:19.03Z | NaN | SUBMITTED | 60047e8a2d7db612a69d2e18 | <NA> | 2021-05-08T16:07:19.03Z | NaN | NaN | False |
6269a4ea0a7241077408b6e1 | FAMILY DOLLAR | 2022-04-27T00:00:00Z | 4:15 PM | 2022-04-27T20:17:46.09Z | 11.00 | FINISHED | 61578e96c2fb9321213720fc | 2 | 2022-04-27T20:17:46.09Z | 2022-04-27T20:17:54.394Z | NaN | False |
625b25e70a723eb9730d2c9c | PUBLIX | 2022-04-15T00:00:00Z | 4:45 PM | 2022-04-16T20:24:07.259Z | 10.67 | FINISHED | 6048de2ec9106b12d2b1ff61 | 4 | 2022-04-16T20:24:07.259Z | 2022-04-16T20:24:23.598Z | NaN | False |
60e3bd7e0a7215bd550fb8cc | COSTCO | 2021-06-30T00:00:00Z | 1:16 PM | 2021-07-06T02:18:38.495Z | 61.90 | FINISHED | 6048de2ec9106b12d2b1ff61 | 10 | 2021-07-06T02:18:38.495Z | 2021-07-06T02:18:50.283Z | NaN | False |
receipt_items
REWARDS_RECEIPT_ID | ITEM_INDEX | DESCRIPTION | BARCODE | BRAND_CODE | QUANTITY_PURCHASED | TOTAL_FINAL_PRICE | |
---|---|---|---|---|---|---|---|
0 | 60bb28c10a720d557b128262 | 0 | RLGULAR SALE | NaN | NaN | 1.0 | 9.99 |
1 | 60bb28c10a720d557b128262 | 1 | 82 GOURMET HOUSEW | 000240292012 | NaN | 1.0 | 4.99 |
2 | 627151230a724d730825106a | 0 | GOLDILOCKS NOPIA R BLAGK | NaN | NaN | 1.0 | 4.75 |
3 | 627151230a724d730825106a | 1 | KURI-IRI DORAYAKI CAKE | NaN | NaN | 1.0 | 5.99 |
4 | 627151230a724d730825106a | 2 | YIZMANG FISH BALL | NaN | NaN | 1.0 | 12.00 |
REWARDS_RECEIPT_ID | ITEM_INDEX | DESCRIPTION | BARCODE | BRAND_CODE | QUANTITY_PURCHASED | TOTAL_FINAL_PRICE | |
---|---|---|---|---|---|---|---|
0 | 60bb28c10a720d557b128262 | 0 | RLGULAR SALE | NaN | NaN | 1 | 9.99 |
1 | 60bb28c10a720d557b128262 | 1 | 82 GOURMET HOUSEW | 000240292012 | NaN | 1 | 4.99 |
2 | 627151230a724d730825106a | 0 | GOLDILOCKS NOPIA R BLAGK | NaN | NaN | 1 | 4.75 |
3 | 627151230a724d730825106a | 1 | KURI-IRI DORAYAKI CAKE | NaN | NaN | 1 | 5.99 |
4 | 627151230a724d730825106a | 2 | YIZMANG FISH BALL | NaN | NaN | 1 | 12.00 |
ITEM_INDEX | DESCRIPTION | BARCODE | BRAND_CODE | QUANTITY_PURCHASED | TOTAL_FINAL_PRICE | |
---|---|---|---|---|---|---|
REWARDS_RECEIPT_ID | ||||||
60bb28c10a720d557b128262 | 0 | RLGULAR SALE | NaN | NaN | 1 | 9.99 |
60bb28c10a720d557b128262 | 1 | 82 GOURMET HOUSEW | 000240292012 | NaN | 1 | 4.99 |
627151230a724d730825106a | 0 | GOLDILOCKS NOPIA R BLAGK | NaN | NaN | 1 | 4.75 |
627151230a724d730825106a | 1 | KURI-IRI DORAYAKI CAKE | NaN | NaN | 1 | 5.99 |
627151230a724d730825106a | 2 | YIZMANG FISH BALL | NaN | NaN | 1 | 12.00 |
users
CREATED_DATE | BIRTH_DATE | GENDER | STATE | SIGN_UP_PLATFORM | SIGN_UP_SOURCE | ID | |
---|---|---|---|---|---|---|---|
0 | 2021-12-20T00:29:17.118Z | 1984-03-20T00:00:00Z | transgender | FL | NaN | Apple | 61bfce5d6655417f803b6538 |
1 | 2021-10-21T17:15:25.825Z | 1987-08-08T05:00:00Z | prefer_not_to_say | PA | unknown | 6171a02d8034995a3d858fb3 | |
2 | 2021-10-23T19:19:18.305Z | 1995-06-18T05:00:00Z | male | FL | NaN | Apple | 617460366e2684498143e769 |
3 | 2021-03-30T02:35:41.249Z | 1999-08-23T07:00:00Z | transgender | MI | ios | 60628e7d85cdff011de41113 | |
4 | 2021-04-26T23:15:54.375Z | 1992-10-28T16:16:23Z | male | CA | andrioid | 608749aac63a95130a45fbf4 |
CREATED_DATE | BIRTH_DATE | GENDER | STATE | SIGN_UP_PLATFORM | SIGN_UP_SOURCE | |
---|---|---|---|---|---|---|
ID | ||||||
61bfce5d6655417f803b6538 | 2021-12-20T00:29:17.118Z | 1984-03-20T00:00:00Z | transgender | FL | NaN | Apple |
6171a02d8034995a3d858fb3 | 2021-10-21T17:15:25.825Z | 1987-08-08T05:00:00Z | prefer_not_to_say | PA | unknown | |
617460366e2684498143e769 | 2021-10-23T19:19:18.305Z | 1995-06-18T05:00:00Z | male | FL | NaN | Apple |
60628e7d85cdff011de41113 | 2021-03-30T02:35:41.249Z | 1999-08-23T07:00:00Z | transgender | MI | ios | |
608749aac63a95130a45fbf4 | 2021-04-26T23:15:54.375Z | 1992-10-28T16:16:23Z | male | CA | andrioid |
The next step is diagramming a new structured relational data model. After cleaning the data, I created a new ER diagram(Entity-Relationship Diagram) using Lucidchart for this database(shown below).
WITH cte AS
(
SELECT
b.BARCODE,
b.NAME AS brand_name,
ri.TOTAL_FINAL_PRICE,
r.PURCHASE_DATE,
MONTH(r.PURCHASE_DATE) AS month
FROM brands AS b
JOIN receipt_items AS ri
ON b.BARCODE = ri.BARCODE
JOIN receipts AS r
ON r.ID = ri.REWARDS_RECEIPT_ID
)
SELECT
TOP 1 brand_name,
SUM(TOTAL_FINAL_PRICE) AS price,
month
FROM cte
GROUP BY brand_name, month
HAVING MONTH = 6
ORDER BY price DESC;
WITH cte AS
(
SELECT
u.ID AS user_id,
MONTH(r.PURCHASE_DATE) AS month,
r.TOTAL_SPENT
FROM users AS u
JOIN receipts AS r
ON u.ID = r.USER_ID
)
SELECT
TOP 1 user_id,
month,
SUM(TOTAL_SPENT) AS total_spent
FROM cte
GROUP BY user_id, month
HAVING month = 8
ORDER BY total_spent DESC;
WITH cte AS
(
SELECT
u.ID AS user_id,
r.ID AS receipt_id,
ri.DESCRIPTION AS item_name,
CAST(ROUND(ri.TOTAL_FINAL_PRICE, 2) AS DECIMAL(10,2)) AS total_final_price,
ri.QUANTITY_PURCHASED AS quantity_purchased,
CAST(ROUND(total_final_price/quantity_purchased, 2) AS FLOAT) AS item_price
FROM users AS u
JOIN receipts AS r
ON u.ID = r.USER_ID
JOIN receipt_items AS ri
ON r.ID = ri.REWARDS_RECEIPT_ID
WHERE (total_final_price IS NOT NULL)
AND (total_final_price != 0)
AND (quantity_purchased IS NOT NULL)
AND (quantity_purchased != 0)
)
SELECT
TOP 1 user_id,
item_name,
item_price
FROM cte
ORDER BY item_price DESC;
WITH cte AS
(
SELECT
u.ID AS user_id,
MONTH(r.DATE_SCANNED) AS month_scanned
FROM users AS u
JOIN receipts AS r
ON u.ID = r.USER_ID
)
SELECT
CASE WHEN month = 1 THEN 'JAN'
WHEN month = 2 THEN 'FEB'
WHEN month = 3 THEN 'MAR'
WHEN month = 4 THEN 'APR'
WHEN month = 5 THEN 'MAY'
WHEN month = 6 THEN 'JUN'
WHEN month = 7 THEN 'JUL'
WHEN month = 8 THEN 'AUG'
WHEN month = 9 THEN 'SEP'
WHEN month = 10 THEN 'OCT'
WHEN month = 11 THEN 'NOV'
ELSE 'DEC' END AS month,
user_id_count
FROM
(
SELECT
TOP 12 month_scanned AS month,
COUNT(user_id) AS user_id_count
FROM cte
GROUP BY month_scanned
ORDER BY month
)sub;
SELECT
TOP 10 COUNT(NAME) AS brand_count,
CATEGORY AS category
FROM brands
GROUP BY category
HAVING category IS NOT NULL
ORDER BY brand_count DESC;
SELECT
TOP 20 STORE_NAME AS store_name,
ROUND(SUM(TOTAL_SPENT), 3) AS revenue
FROM receipts
WHERE YEAR(PURCHASE_DATE) = 2022
GROUP BY store_name
ORDER BY revenue DESC;
WITH cte AS
(
SELECT
*,
CASE WHEN age BETWEEN 0 AND 10 THEN '0-10'
WHEN age BETWEEN 11 AND 20 THEN '11-20'
WHEN age BETWEEN 21 AND 30 THEN '21-30'
WHEN age BETWEEN 31 AND 40 THEN '31-40'
WHEN age BETWEEN 41 AND 50 THEN '41-50'
WHEN age BETWEEN 51 AND 60 THEN '51-60'
WHEN age BETWEEN 61 AND 70 THEN '61-70'
WHEN age BETWEEN 71 AND 80 THEN '71-80'
END AS age_range
FROM
(
SELECT
*,
2023 - YEAR(BIRTH_DATE) AS age
FROM users
) sub
)
SELECT
age_range,
COUNT(*) AS age_range_count
FROM cte
GROUP BY age_range;
SELECT
SIGN_UP_SOURCE AS sign_up_source,
COUNT(*) AS source_count
FROM users
GROUP BY sign_up_source;
For the last part, I used Tableau to create visualizations and dashboards to help non-technical stakeholders easily understand my analysis.
The visualization below shows the distribution of registered users. From this visualization, we know that Florida State has the most registered users since the color of Florida is the darkest on the whole map.
From the highlight table below, we can know the number and percentage of the receipt that is FINISHED/FLAGGED/PENDING/REJECTED/SUBMITTED. For example, among all the rewards receipts, 91.15% of them are finished(which is 64,350). The darker color represents the higher percentage/number.
I created a treemap in the following dashboard to visualize the store’s revenue(top 20) each year. Users can use the slider above the dashboard to filter the year they want. The bigger the area, the larger the revenue.
Users can see the corresponding store name and revenue that year when they hover over the dashboard.
This visualization can be used to answer Question 6. From this treemap, we know that Beverages is the category with the most brand among all the categories. Users can see the category and brand count when they hover over the visualization.
The dashboard below can be used to answer Question 5 and helps stakeholders visualize the number of users scanned in a specific year or quarter. Users can use the slider to filter quarters and the dropdown to filter years.