added solution for working with data II

2b0ac20a · Jöran Frey · 2b856d04 · 2b0ac20a · 2b0ac20a · 2b0ac20a
Commit 2b0ac20a authored 3 months ago by Jöran Frey
--- a/unit-5/Exercises/Week_7/Solution/70_Working_with_Data_II_SOLUTION.ipynb
+++ b/unit-5/Exercises/Week_7/Solution/70_Working_with_Data_II_SOLUTION.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b8ad1699",
+   "metadata": {},
+   "source": [
+    "# 7 Working with Data II"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bac932a5",
+   "metadata": {},
+   "source": [
+    "## 1 List Comprehensions\n",
+    "List Comprehensions are a concise way to create a new list whose elements are an operation on each element of an existing list or other iterable structure. List comprehensions provide an elegant and readable syntax to simplify the creation of lists and make the code more compact.\n",
+    "\n",
+    "**Example**\n",
+    "\n",
+    "[**Expression** for **Element** in **Iterable_element** if **Condition**]\n",
+    "\n",
+    "  - **Expression:** An expression that defines the element in the new list.\n",
+    "   - **Element:** An element from the iterable structure that is iterated over.\n",
+    "   - **Iterable_element:** An existing list or another structure that is iterated through.\n",
+    "   - **Condition** (optional): A condition that determines whether the element is included in the new list.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f664b431",
+   "metadata": {},
+   "source": [
+    "### 1.1 Create a list with the squares of the numbers from 1 to 10\n",
+    "Use a list comprehension to do this.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "527f8b71",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]\n"
+     ]
+    }
+   ],
+   "source": [
+    "squares = [x**2 for x in range(1,11,1)]\n",
+    "print(squares)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4be7b5f1",
+   "metadata": {},
+   "source": [
+    "**Task:** Extend the script so that only even digits are output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "0aa05216",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[4, 16, 36, 64, 100]\n"
+     ]
+    }
+   ],
+   "source": [
+    "squares = [x**2 for x in range(1, 11, 1) if x%2 == 0]\n",
+    "print(squares)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "74199360",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[4, 16, 36, 64, 100]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# without list comprehension\n",
+    "\n",
+    "even_squares = []\n",
+    "\n",
+    "for x in range(1,11):\n",
+    "    if x % 2 == 0:\n",
+    "        even_squares.append(x**2)\n",
+    "        \n",
+    "print(even_squares)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e50cf222",
+   "metadata": {},
+   "source": [
+    "### 1.2 Filtering words\n",
+    "Given a list of words. Use a List Comprehension to create a new list containing only words that are longer than 5 letters and end with a vowel <br>\n",
+    "**TIP:** the length of a word can be output with the command len\n",
+    "```Python\n",
+    "len(‘string’)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "2262f180",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "words= [\"banana\", \"apple\", \"orange tree\", \"cherry\", \"lemon\", \"melon\", \"water\", \"computer\", \"information\", \"keyboard\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "1c9fe26e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['banana', 'apple', 'orange tree']\n"
+     ]
+    }
+   ],
+   "source": [
+    "wordssorted = [word for word in words if len(word)>=5 and word[-1] in \"aeiou\"]\n",
+    "print(wordssorted)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "5b97a72a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['banana', 'orange tree']\n"
+     ]
+    }
+   ],
+   "source": [
+    "vocals = \"aeiouAEIOU\"\n",
+    "shortlist = [word for word in words if len(word) > 5 and word[-1] in vocals ]\n",
+    "print(shortlist)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba7035e8",
+   "metadata": {},
+   "source": [
+    "### 1.3 Nested list comprehensions\n",
+    " Given a matrix (list of lists) with numbers. Create a flat list of all numbers in the matrix that are divisible by 3. Use a nested list comprehension for this."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7d65faa",
+   "metadata": {},
+   "source": [
+    "**Outer Loop:** ```for row in matrix```\n",
+    "\n",
+    "This goes through each small list (row) inside the main matrix list. So, it will look at each row one by one.\n",
+    "\n",
+    "**Inner Loop:** ```for num in row```\n",
+    "\n",
+    "For each row, it goes through each num (number) inside that row. This means it’s checking each number in that row one at a time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "e1bc07a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "globals().clear()\n",
+    "matrix = [\n",
+    "    [1, 2, 3, 4, 5, 6],\n",
+    "    [7, 8, 9, 10, 11, 12],\n",
+    "    [13, 14, 15, 16, 17, 18],\n",
+    "    [19, 20, 21, 22, 23, 24]\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a30b7942",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[3, 6, 9, 12, 15, 18, 21, 24]\n"
+     ]
+    }
+   ],
+   "source": [
+    "numbers =[num for row in matrix for num in row if num % 3 == 0]\n",
+    "print(numbers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c655dd9c",
+   "metadata": {},
+   "source": [
+    "## 2 Data Cleaning with Pandas"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b0139a2",
+   "metadata": {},
+   "source": [
+    "### 2.1 Remove NaN values\n",
+    "**Task:** Import the DataFrame named ```10_Word_filtering.csv``` Using a method to remove rows with more than 1 missing values, filter out rows with fewer than two entries across the rows of example data columns. Display the final DataFrame, showing only rows that meet this requirement.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "500bf1b6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Filtered DataFrame:\n",
+      "   Example 1 Example 2 Example 3 Example 4 Example 5\n",
+      "0      Word1    Word21    Word41       NaN    Word81\n",
+      "1      Word2    Word22    Word42       NaN    Word82\n",
+      "2      Word3    Word23    Word43    Word63       NaN\n",
+      "5      Word6    Word26    Word46    Word66       NaN\n",
+      "8      Word9    Word29    Word49    Word69    Word89\n",
+      "9     Word10    Word30    Word50       NaN    Word90\n",
+      "10    Word11       NaN    Word51    Word71    Word91\n",
+      "11    Word12    Word32    Word52    Word72    Word92\n",
+      "12    Word13    Word33    Word53    Word73       NaN\n",
+      "13    Word14    Word34       NaN    Word74    Word94\n",
+      "16    Word19    Word39    Word59       NaN    Word99\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "# Load the data\n",
+    "df_profiles = pd.read_csv(\"./Testdata/10_Word_filtering.csv\")\n",
+    "\n",
+    "\n",
+    "# Drop rows that have fewer than for non-NaN values\n",
+    "filtered_df = df_profiles.dropna(thresh=4)  # Set thresh to 4\n",
+    "\n",
+    "print(\"\\nFiltered DataFrame:\")\n",
+    "print(filtered_df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b864d54d",
+   "metadata": {},
+   "source": [
+    "### 2.2 Analyze Sales Data by Product and Region"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffb536ed",
+   "metadata": {},
+   "source": [
+    "Given a sales dataset ```20_sales_data_2024.csv``` for a fictional company, analyze \n",
+    "* **1. Total Sales by Product**\n",
+    "* **2. Average Sales by Region**\n",
+    "* **3. Top 3 Best-Selling Products Overall**\n",
+    "* **4. Highest Revenue Region for Each Product**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "id": "ca1e2c11",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "         Date    Product         Region  Sales Amount  Quantity Sold\n",
+      "0  2024-04-12  Product D           Asia        743.12              3\n",
+      "1  2024-12-14  Product E  South America       1420.23             11\n",
+      "2  2024-09-27  Product C         Africa       4859.09             53\n",
+      "3  2024-04-16  Product C  North America       1690.17             46\n",
+      "4  2024-03-12  Product A  North America       2436.10             30\n",
+      "Total Sales by Product:\n",
+      "     Product  Sales Amount\n",
+      "0  Product A     510064.23\n",
+      "1  Product B     492452.92\n",
+      "2  Product C     496559.03\n",
+      "3  Product D     470472.52\n",
+      "4  Product E     519674.26\n",
+      "\n",
+      "Average Sales by Region:\n",
+      "          Region  Sales Amount\n",
+      "0         Africa   2572.376368\n",
+      "1           Asia   2171.946957\n",
+      "2         Europe   2605.275885\n",
+      "3  North America   2672.068906\n",
+      "4  South America   2421.155281\n",
+      "\n",
+      "Top 3 Best-Selling Products Overall:\n",
+      "     Product  Sales Amount\n",
+      "4  Product E     519674.26\n",
+      "0  Product A     510064.23\n",
+      "2  Product C     496559.03\n",
+      "\n",
+      "Highest Revenue Region for Each Product:\n",
+      "      Product         Region  Sales Amount\n",
+      "0   Product A         Africa     127460.48\n",
+      "9   Product B  South America     151319.93\n",
+      "10  Product C         Africa     126428.13\n",
+      "18  Product D  North America     118419.80\n",
+      "23  Product E  North America     124364.07\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "df_sales = pd.read_csv(\"./Testdata/20_sales_data_2024.csv\")\n",
+    "\n",
+    "#look at dataframe\n",
+    "print(df_sales.head(5))\n",
+    "\n",
+    "# 1. Total Sales by Product\n",
+    "total_sales_by_product = df_sales.groupby('Product')['Sales Amount'].sum().reset_index()\n",
+    "print(\"Total Sales by Product:\")\n",
+    "print(total_sales_by_product)\n",
+    "\n",
+    "# 2. Average Sales by Region\n",
+    "average_sales_by_region = df_sales.groupby('Region')['Sales Amount'].mean().reset_index()\n",
+    "print(\"\\nAverage Sales by Region:\")\n",
+    "print(average_sales_by_region)\n",
+    "\n",
+    "# 3. Top 3 Best-Selling Products Overall\n",
+    "top_3_products = total_sales_by_product.sort_values(by='Sales Amount', ascending=False).head(3)\n",
+    "print(\"\\nTop 3 Best-Selling Products Overall:\")\n",
+    "print(top_3_products.head(4))\n",
+    "\n",
+    "\n",
+    "# 4. Highest Revenue Region for every Product\n",
+    "highest_revenue_region_by_product = df_sales.groupby(['Product', 'Region'])['Sales Amount'].sum().reset_index()\n",
+    "highest_revenue_region_by_product = highest_revenue_region_by_product.sort_values(['Product', 'Sales Amount'], ascending=[True, False])\n",
+    "highest_revenue_region_by_product = highest_revenue_region_by_product.drop_duplicates(subset=['Product'], keep='first')\n",
+    "print(\"\\nHighest Revenue Region for Each Product:\")\n",
+    "print(highest_revenue_region_by_product)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:b8ad1699 tags:
+# 7 Working with Data II
+%% Cell type:markdown id:bac932a5 tags:
+## 1 List Comprehensions
+List Comprehensions are a concise way to create a new list whose elements are an operation on each element of an existing list or other iterable structure. List comprehensions provide an elegant and readable syntax to simplify the creation of lists and make the code more compact.
+**Example**
+[**Expression** for **Element** in **Iterable_element** if **Condition**]
+  - **Expression:** An expression that defines the element in the new list.
+   - **Element:** An element from the iterable structure that is iterated over.
+   - **Iterable_element:** An existing list or another structure that is iterated through.
+   - **Condition** (optional): A condition that determines whether the element is included in the new list.
+%% Cell type:markdown id:f664b431 tags:
+### 1.1 Create a list with the squares of the numbers from 1 to 10
+Use a list comprehension to do this.
+%% Cell type:code id:527f8b71 tags:
+``` python
+squares = [x**2 for x in range(1,11,1)]
+print(squares)
+```
+%% Output
+    [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
+%% Cell type:markdown id:4be7b5f1 tags:
+**Task:** Extend the script so that only even digits are output
+%% Cell type:code id:0aa05216 tags:
+``` python
+squares = [x**2 for x in range(1, 11, 1) if x%2 == 0]
+print(squares)
+```
+%% Output
+    [4, 16, 36, 64, 100]
+%% Cell type:code id:74199360 tags:
+``` python
+# without list comprehension
+even_squares = []
+for x in range(1,11):
+    if x % 2 == 0:
+        even_squares.append(x**2)
+print(even_squares)
+```
+%% Output
+    [4, 16, 36, 64, 100]
+%% Cell type:markdown id:e50cf222 tags:
+### 1.2 Filtering words
+Given a list of words. Use a List Comprehension to create a new list containing only words that are longer than 5 letters and end with a vowel <br>
+**TIP:** the length of a word can be output with the command len
+```Python
+len(‘string’)
+%% Cell type:code id:2262f180 tags:
+``` python
+words= ["banana", "apple", "orange tree", "cherry", "lemon", "melon", "water", "computer", "information", "keyboard"]
+```
+%% Cell type:code id:1c9fe26e tags:
+``` python
+wordssorted = [word for word in words if len(word)>=5 and word[-1] in "aeiou"]
+print(wordssorted)
+```
+%% Output
+    ['banana', 'apple', 'orange tree']
+%% Cell type:code id:5b97a72a tags:
+``` python
+vocals = "aeiouAEIOU"
+shortlist = [word for word in words if len(word) > 5 and word[-1] in vocals ]
+print(shortlist)
+```
+%% Output
+    ['banana', 'orange tree']
+%% Cell type:markdown id:ba7035e8 tags:
+### 1.3 Nested list comprehensions
+ Given a matrix (list of lists) with numbers. Create a flat list of all numbers in the matrix that are divisible by 3. Use a nested list comprehension for this.
+%% Cell type:markdown id:a7d65faa tags:
+**Outer Loop:** ```for row in matrix```
+This goes through each small list (row) inside the main matrix list. So, it will look at each row one by one.
+**Inner Loop:** ```for num in row```
+For each row, it goes through each num (number) inside that row. This means it’s checking each number in that row one at a time.
+%% Cell type:code id:e1bc07a0 tags:
+``` python
+globals().clear()
+matrix = [
+    [1, 2, 3, 4, 5, 6],
+    [7, 8, 9, 10, 11, 12],
+    [13, 14, 15, 16, 17, 18],
+    [19, 20, 21, 22, 23, 24]
+]
+```
+%% Cell type:code id:a30b7942 tags:
+``` python
+numbers =[num for row in matrix for num in row if num % 3 == 0]
+print(numbers)
+```
+%% Output
+    [3, 6, 9, 12, 15, 18, 21, 24]
+%% Cell type:markdown id:c655dd9c tags:
+## 2 Data Cleaning with Pandas
+%% Cell type:markdown id:9b0139a2 tags:
+### 2.1 Remove NaN values
+**Task:** Import the DataFrame named ```10_Word_filtering.csv``` Using a method to remove rows with more than 1 missing values, filter out rows with fewer than two entries across the rows of example data columns. Display the final DataFrame, showing only rows that meet this requirement.
+%% Cell type:code id:500bf1b6 tags:
+``` python
+import pandas as pd
+# Load the data
+df_profiles = pd.read_csv("./Testdata/10_Word_filtering.csv")
+# Drop rows that have fewer than for non-NaN values
+filtered_df = df_profiles.dropna(thresh=4)  # Set thresh to 4
+print("\nFiltered DataFrame:")
+print(filtered_df)
+```
+%% Output
+    Filtered DataFrame:
+       Example 1 Example 2 Example 3 Example 4 Example 5
+    0      Word1    Word21    Word41       NaN    Word81
+    1      Word2    Word22    Word42       NaN    Word82
+    2      Word3    Word23    Word43    Word63       NaN
+    5      Word6    Word26    Word46    Word66       NaN
+    8      Word9    Word29    Word49    Word69    Word89
+    9     Word10    Word30    Word50       NaN    Word90
+    10    Word11       NaN    Word51    Word71    Word91
+    11    Word12    Word32    Word52    Word72    Word92
+    12    Word13    Word33    Word53    Word73       NaN
+    13    Word14    Word34       NaN    Word74    Word94
+    16    Word19    Word39    Word59       NaN    Word99
+%% Cell type:markdown id:b864d54d tags:
+### 2.2 Analyze Sales Data by Product and Region
+%% Cell type:markdown id:ffb536ed tags:
+Given a sales dataset ```20_sales_data_2024.csv``` for a fictional company, analyze
+* **1. Total Sales by Product**
+* **2. Average Sales by Region**
+* **3. Top 3 Best-Selling Products Overall**
+* **4. Highest Revenue Region for Each Product**
+%% Cell type:code id:ca1e2c11 tags:
+``` python
+import pandas as pd
+df_sales = pd.read_csv("./Testdata/20_sales_data_2024.csv")
+#look at dataframe
+print(df_sales.head(5))
+# 1. Total Sales by Product
+total_sales_by_product = df_sales.groupby('Product')['Sales Amount'].sum().reset_index()
+print("Total Sales by Product:")
+print(total_sales_by_product)
+# 2. Average Sales by Region
+average_sales_by_region = df_sales.groupby('Region')['Sales Amount'].mean().reset_index()
+print("\nAverage Sales by Region:")
+print(average_sales_by_region)
+# 3. Top 3 Best-Selling Products Overall
+top_3_products = total_sales_by_product.sort_values(by='Sales Amount', ascending=False).head(3)
+print("\nTop 3 Best-Selling Products Overall:")
+print(top_3_products.head(4))
+# 4. Highest Revenue Region for every Product
+highest_revenue_region_by_product = df_sales.groupby(['Product', 'Region'])['Sales Amount'].sum().reset_index()
+highest_revenue_region_by_product = highest_revenue_region_by_product.sort_values(['Product', 'Sales Amount'], ascending=[True, False])
+highest_revenue_region_by_product = highest_revenue_region_by_product.drop_duplicates(subset=['Product'], keep='first')
+print("\nHighest Revenue Region for Each Product:")
+print(highest_revenue_region_by_product)
+```
+%% Output
+             Date    Product         Region  Sales Amount  Quantity Sold
+    0  2024-04-12  Product D           Asia        743.12              3
+    1  2024-12-14  Product E  South America       1420.23             11
+    2  2024-09-27  Product C         Africa       4859.09             53
+    3  2024-04-16  Product C  North America       1690.17             46
+    4  2024-03-12  Product A  North America       2436.10             30
+    Total Sales by Product:
+         Product  Sales Amount
+    0  Product A     510064.23
+    1  Product B     492452.92
+    2  Product C     496559.03
+    3  Product D     470472.52
+    4  Product E     519674.26
+    Average Sales by Region:
+              Region  Sales Amount
+    0         Africa   2572.376368
+    1           Asia   2171.946957
+    2         Europe   2605.275885
+    3  North America   2672.068906
+    4  South America   2421.155281
+    Top 3 Best-Selling Products Overall:
+         Product  Sales Amount
+    4  Product E     519674.26
+    0  Product A     510064.23
+    2  Product C     496559.03
+    Highest Revenue Region for Each Product:
+          Product         Region  Sales Amount
+    0   Product A         Africa     127460.48
+    9   Product B  South America     151319.93
+    10  Product C         Africa     126428.13
+    18  Product D  North America     118419.80
+    23  Product E  North America     124364.07
--- a/unit-5/Exercises/Week_7/Solution/Testdata/10_Word_filtering.csv
+++ b/unit-5/Exercises/Week_7/Solution/Testdata/10_Word_filtering.csv
+Example 1,Example 2,Example 3,Example 4,Example 5
+Word1,Word21,Word41,,Word81
+Word2,Word22,Word42,,Word82
+Word3,Word23,Word43,Word63,
+Word4,Word24,,,Word84
+Word5,,Word45,,Word85
+Word6,Word26,Word46,Word66,
+Word7,,,Word67,Word87
+Word8,Word28,,,
+Word9,Word29,Word49,Word69,Word89
+Word10,Word30,Word50,,Word90
+Word11,,Word51,Word71,Word91
+Word12,Word32,Word52,Word72,Word92
+Word13,Word33,Word53,Word73,
+Word14,Word34,,Word74,Word94
+,Word37,Word57,,Word97
+Word18,Word38,,,
+Word19,Word39,Word59,,Word99
+Word20,Word40,,,
--- a/unit-5/Exercises/Week_7/Solution/Testdata/20_sales_data_2024.csv
+++ b/unit-5/Exercises/Week_7/Solution/Testdata/20_sales_data_2024.csv
--- a/unit-5/Exercises/Week_7/Solution/Testdata/testdatagenerator/generatorsalesdata.py
+++ b/unit-5/Exercises/Week_7/Solution/Testdata/testdatagenerator/generatorsalesdata.py
+import pandas as pd
+import numpy as np
+# Define parameters for dataset creation
+np.random.seed(42)
+num_records = 1000  # Number of records
+# Generate random dates within the year 2024
+date_range = pd.date_range(start="2024-01-01", end="2024-12-31", freq='D')
+dates = np.random.choice(date_range, num_records)
+# Define product names and regions
+products = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']
+regions = ['North America', 'Europe', 'Asia', 'South America', 'Africa']
+# Generate random data
+product_choices = np.random.choice(products, num_records)
+region_choices = np.random.choice(regions, num_records)
+sales_amount = np.round(np.random.uniform(50, 5000, num_records))
+quantity_sold = np.random.randint(1, 100, num_records)
+# Create the DataFrame
+sales_data = pd.DataFrame({
+    'Date': dates,
+    'Product': product_choices,
+    'Region': region_choices,
+    'Sales Amount': sales_amount,
+    'Quantity Sold': quantity_sold
+})
+print(sales_data)
+# Save the DataFrame to a CSV file
+sales_data.to_csv("./20_sales_data_2024.csv", index=False)
+print("Dataset saved as '20_sales_data_2024.csv'")
--- a/unit-6/images/anatomyofplot.webp
+++ b/unit-6/images/anatomyofplot.webp
--- a/unit-6/lecture.ipynb
+++ b/unit-6/lecture.ipynb
@@ -18,6 +18,44 @@
    "* Of Figures and Plots"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## **Matplotlib**\n",
+    "\n",
+    "* **What is Matplotlib?**\n",
+    "    * **A Powerful Visualization Library:** Matplotlib is a core Python library that allows you to create a wide range of data visualizations, from simple plots to complex graphics.\n",
+    "\n",
+    "* **Why Use Matplotlib?**\n",
+    "    * **Highly Customizable:** Offers flexibility to adjust colors, styles, and layouts,\n",
+    "    * **Works Seamlessly with Other Libraries:** Integrates well with data tools like pandas or numpy\n",
+    "    * **Industry Standard:** Widely used in data science, research, and engineering, making it a valuable skill in many technical fields."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\".\\images\\anatomyofplot.webp\" width=\"700\"/>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Introduction to Matplotlib\n",
+    "\n",
+    "What is Matplotlib?\n",
+    "\n",
+    "Matplotlib is a popular Python library for creating static, interactive, and animated plots and visualizations.\n",
+    "It’s widely used for data analysis, making it easy to understand data through visual representation.\n",
+    "Why Use Matplotlib?\n",
+    "\n",
+    "Helps turn complex data into charts, making patterns and trends easier to see.\n",
+    "Essential tool in data science, machine learning, and research."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,

 %% Cell type:markdown id: tags:
 # MathPlot
 %% Cell type:markdown id: tags:
 ## Agemda
 * Introduction to MathPlot
 * Elemets of a Figure
 * Of Figures and Plots
+%% Cell type:markdown id: tags:
+## **Matplotlib**
+* **What is Matplotlib?**
+    * **A Powerful Visualization Library:** Matplotlib is a core Python library that allows you to create a wide range of data visualizations, from simple plots to complex graphics.
+* **Why Use Matplotlib?**
+    * **Highly Customizable:** Offers flexibility to adjust colors, styles, and layouts,
+    * **Works Seamlessly with Other Libraries:** Integrates well with data tools like pandas or numpy
+    * **Industry Standard:** Widely used in data science, research, and engineering, making it a valuable skill in many technical fields.
+%% Cell type:markdown id: tags:
+<img src=".\images\anatomyofplot.webp" width="700"/>
+%% Cell type:markdown id: tags:
+### Introduction to Matplotlib
+What is Matplotlib?
+Matplotlib is a popular Python library for creating static, interactive, and animated plots and visualizations.
+It’s widely used for data analysis, making it easy to understand data through visual representation.
+Why Use Matplotlib?
+Helps turn complex data into charts, making patterns and trends easier to see.
+Essential tool in data science, machine learning, and research.
 %% Cell type:code id: tags:
 ``` python
 https://matplotlib.org/
 ```