<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      <title>CISB5143 Topic 1 Introduction to Data Engineering by </title>
      <link>https://padlet.com/laila178/9k4ws7vtrybva32l</link>
      <description>Key concepts of data engineering</description>
      <language>en-us</language>
      <pubDate>2025-09-29 01:04:58 UTC</pubDate>
      <lastBuildDate>2025-09-29 05:17:35 UTC</lastBuildDate>
      <webMaster>hello@padlet.com</webMaster>
      <image>
         <url>https://padlet.net/icons/png/1f4ac.png</url>
      </image>
      <item>
         <title></title>
         <author>laila178</author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608180880</link>
         <description><![CDATA[<ol><li><p>What is the difference between traditional data infrastructure and cloud data infrastructure?</p></li><li><p>Refer to the <strong>five stages</strong> of the Data Engineering Lifecycle: Data Generation, Data Ingestion, Data Storage, Data Transformation and Data Serving. For each stage, find at least one Google Cloud Platform (GCP) service that can be used.</p></li><li><p>Define key terms in data engineering evolution: data warehouse, distributed computing, big data, cloud computing.</p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 01:08:34 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608180880</guid>
      </item>
      <item>
         <title></title>
         <author>syasyamasturina</author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608542956</link>
         <description><![CDATA[<p><strong>Nur Syasya Masturina binti Ibrahim (IS01083881)</strong></p><p><br/></p><ol><li><p>Difference between traditional data infrastructure and cloud data infrastructure</p><ol><li><p>Traditional data infrastructure</p><ol><li><p>Hardware such as storage and servers are physically located in organization premises and must be maintained by the organization itself.</p></li><li><p>Not really scalable as it requires buying and installing new physical hardware.</p></li><li><p>High upfront costs due to paying for power, space, and IT staff.</p></li></ol></li><li><p>Cloud data infrastructure</p><ol><li><p>Hardware is owned and maintained by cloud service provider. Resources can be accessed remotely over the internet.</p></li><li><p>More scalable as resources can be shut down within minutes. Organization only pay for the capacity they actually use.</p></li><li><p>No big upfront investment.</p></li></ol></li></ol></li></ol><p><br/></p><ol start="2"><li><p>GCP Service</p><ol><li><p>Data Generation : Cloud Pub/Sub</p></li><li><p>Data Ingestion : Dataflow</p></li><li><p>Data Storage : BigQuery</p></li><li><p>Data Transformation : Dataflow</p></li><li><p>Data Serving : BigQuery</p></li></ol><p><br/></p></li><li><p>Key terms</p><ol><li><p>Data warehouse : A centralized repository that stores structured data for analysis, reporting, and business intelligence.</p></li><li><p>Distributed computing : Method of making multiple computers work together to solve a common problem.</p></li><li><p>Big data : Refers to data that is large, fast, and complex. Associated with 3Vs (volume, velocity, variety)</p></li><li><p>Cloud computing : on-demand availability of computing resources and services over the internet.</p></li></ol></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:00:21 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608542956</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608543329</link>
         <description><![CDATA[<p><strong>IS01082945</strong></p><p><br/></p><ol><li><p><strong>Traditional data infrastructure</strong> uses in house servers, storage systems and network devices to collect store, process and manage data. Meanwhile <strong>cloud data infrastructure</strong> uses cloud storage provided by a third party resource that is more cost effective.</p><p><br/></p></li><li><p><strong>Data generation: </strong>Big query</p><p><strong>Data Ingestion: </strong>Cloud pub/sub</p><p><strong>Data Storage: </strong>Cloud Storage</p><p><strong>Data Transformation: </strong>Dataflow</p><p><strong>Data Serving: </strong>Cloud SQL</p><p><br/></p></li><li><p><strong>Data Warehouse: </strong>centralized system specifically designed for storing and querying large volumes of structured data and typically used for business inteligence</p><p><strong>Distributed computing: </strong>is handling compute tasks via a network of computers or servers, rather than relying on a single computer and processor</p><p><strong>Big Data: </strong>Big data refers to datasets that are too large, fast, or diverse for traditional data processing tools to handle effectively. It is often defined by 5vs</p><p><strong>Cloud Computing: </strong>Cloud computing provides on-demand access to IT services, such as servers, storage, and applications, over the internet, on a pay-per-use basis</p></li></ol><p><br/></p><p> </p><p> </p><p><br/></p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:00:39 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608543329</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608543446</link>
         <description><![CDATA[<p><strong>IS01082956</strong></p><p><br/></p><p><strong>Traditional data infrastructure:</strong></p><p>Computing relies on local server, hardware, and maintenances. It uses physical data center to store data and assets. Making it hard very costly, limited accessibility.</p><p><br/></p><p><strong>Cloud data infrastructure:</strong></p><p>Services is deliver using internet. A collective combination of configurable systems. Providing more scalable and flexible operation. Ensuring a cost saving IT maintenance. </p><p><br/></p><p><strong>GCP service for each stages of Data Engineering Lifecycle:</strong></p><ul><li><p>Data Generation - Raw data (created/extracted)</p></li><li><p>Data Ingestion - Cloud Pub/Sub , Cloud Dataflow</p></li><li><p>Data Storage - BigQuery, Cloud SQL</p></li><li><p>Data Transformation - Dataproc, Cloud Dataflow</p></li><li><p>Data Serving - Big Query</p></li></ul><p><br/></p><p><strong>Key Terms</strong></p><ul><li><p>Data warehouse: Helps to aggregates data from various sources into a central data store to optimized for querying and analysis. Enabling organization to centralize data from disparate systems</p></li><li><p>Distributed computing: System that consists of multiple software components - on multiple computers but run as a single system.</p></li><li><p>Big data: Data volumes exceeding traditional database limits (&gt;1-2TB). It is a massive and complex data that come from diverse sources where traditional data management system cannot handle.</p></li><li><p>Cloud Computing: Work as on-demand access to remote server.</p></li></ul>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:00:41 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608543446</guid>
      </item>
      <item>
         <title></title>
         <author>is01082943</author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608543995</link>
         <description><![CDATA[<p><strong>1.Difference between traditional data infrastructure and cloud data infrastructure</strong></p><ul><li><p><strong>Traditional Data Infrastructure</strong>:</p><ul><li><p>Runs on on-premise servers and hardware.</p></li><li><p>Requires large upfront investment (CAPEX) in hardware, storage, and maintenance.</p></li><li><p>Scaling is limited and slow since it needs physical resources.</p></li></ul></li><li><p><strong>Cloud Data Infrastructure</strong>:</p><ul><li><p>Operates on a pay-as-you-go model (OPEX), reducing upfront costs.</p></li><li><p>Highly scalable and flexible, resources can be provisioned instantly.</p></li><li><p>Cloud provider handles maintenance, updates, and much of the security.</p></li></ul></li></ul><p><br/></p><p>2 Five Stages of the Data Engineering Lifecycle with Google Cloud Platform (GCP) Services</p><p><br/></p><ul><li><p><strong>Data Generation</strong> – This stage involves the creation of data from various sources such as applications, devices, sensors, or user activities.</p><ul><li><p><em>GCP Service</em>: <strong>Cloud Pub/Sub</strong>, which provides real-time event streaming and reliable message delivery from diverse data sources.</p></li></ul></li><li><p><strong>Data Ingestion</strong> – This stage focuses on collecting and transporting data from its sources into the data pipeline.</p><ul><li><p><em>GCP Service</em>: <strong>Cloud Dataflow</strong>, which supports both batch and stream processing for Extract, Transform, Load (ETL) tasks. </p></li></ul></li><li><p><strong>Data Storage</strong> – At this stage, data is securely stored in suitable repositories for both raw and processed forms.</p><ul><li><p><em>GCP Service</em>: <strong>Cloud Storage</strong> is used for object and unstructured data, while <strong>BigQuery</strong> serves as a highly scalable and fully managed data warehouse for structured analytical data.</p></li></ul></li><li><p><strong>Data Transformation</strong> – This stage involves cleansing, aggregating and enriching data to make it meaningful and usable for analytics.</p><ul><li><p><em>GCP Service</em>: <strong>Cloud Dataflow</strong> provides transformation capabilities through stream.</p></li></ul></li><li><p><strong>Data Serving</strong> – The final stage ensures that the processed data is accessible for analysis, visualization, and decision-making.</p><ul><li><p><em>GCP Service</em>: <strong>BigQuery</strong> enables fast, SQL-based analytics on large datasets.</p></li></ul></li></ul><p>3. Key terms in data engineering evolution</p><ul><li><p><strong>Data Warehouse</strong>: A centralized repository optimized for storing structured data, used for analytics and reporting.</p></li><li><p><strong>Distributed Computing</strong>: A computing model where tasks are divided across multiple machines to improve processing speed and scalability.</p></li><li><p><strong>Big Data</strong>: Extremely large and complex datasets (structured, semi-structured, unstructured) that require specialized tools and architectures to process and analyze.</p></li><li><p><strong>Cloud Computing</strong>: Delivery of computing services (storage, processing, networking, analytics) over the internet, enabling on-demand scalability and cost efficiency </p></li></ul><p><br/></p><p>Aidil(IS01082943) :)</p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:01:05 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608543995</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608544034</link>
         <description><![CDATA[<p>IS01082776 </p><p><br/></p><ol><li><p>Traditional data infrastructure is hosted in-house and managed by IT staff, ensuring that security is protected under strict access by the company itself.</p><p>cloud data infrastructure storage managed by a third party or service provider may be exposed to security breaches.</p></li><li><p>Data Generation : BigQuery  </p><p>Data Ingestion : Pub/Sub </p><p>Data Storage : Cloud SQL </p><p>Data Transformation : Dataflow </p><p>Data Serving : Cloud Spanner </p></li><li><p>Data Warehouse : A centralized repository that stores structured, historical, and current data from multiple sources.</p><p><br/></p><p>Distributed Computing : A computing model where processing tasks are divided across multiple machines that work together to solve large scale.</p><p><br/></p><p>Big Data : large and diverse datasets that are huge in volume and also rapidly grow in size over time.</p><p><br/></p><p>Cloud Computing : Cloud computing is the delivery of computing services including servers, storage, databases, networking and software over the internet.</p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:01:07 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608544034</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608544399</link>
         <description><![CDATA[<p><strong>Question 1</strong></p><p>Traditional Data Infrastructure</p><p>Setup: Physical serves in the building</p><p>Scaling: Hard Scaling and takes time</p><p>Cost: Big upfront investment</p><p>Access: Only from specific locations</p><p>Speed of Innovation: Slower and tied to hardware</p><p><br/></p><p>Cloud Data Infrastructure</p><p>Setup: Virtual serves hosted by cloud providers</p><p>Scaling: Easy to scale up/down automatically</p><p>Cost: Pay for what we are using similar as utility bills</p><p>Access: Accessible from anywhere with internet</p><p>Speed of Innovation: Faster, with access to new tools and technology</p><p><br/></p><p><strong>Question 2</strong></p><p>Data Generation: Cloud Sub/Pub (Which handles real-time messages)</p><p>Data Ingestion: Dataflow (For streaming and batch processing)</p><p>Data Storage: Cloud Storage (For raw files)</p><p>                          Big Query (For analysis purposes)</p><p>Data Transformation: Dataprep for prepping data</p><p>Data Serving: Looker (Dashboards)</p><p><br/></p><p><strong>Question 3 </strong></p><p>Data Warehouse</p><p>A big, central place to store clean, structured data so teams can run reports and analysis. Example: BigQuery</p><p><br/></p><p>Distributed Computing</p><p>Breaks big tasks into smaller ones and runs them on many computers at once. Example: Apache Spark</p><p><br/></p><p>Big Data</p><p>Data that's too large or messy for regular tools to handle. You need special tools to process and understand it.</p><p>Example: high mb/gb videos, pictures and files</p><p><br/></p><p>Cloud Computing</p><p>Using the internet to access computing power, storage, and services – instead of buying and maintaining your own hardware. Example: GCP</p><p><br/></p><p>Surthikkaa Laavanya (IS01083061)</p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:01:21 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608544399</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608544589</link>
         <description><![CDATA[<p>Amirul Irfaan Bin Mohd.Hishamuddin (IS01083864)</p><ol><li><p>Traditional data infrastructure is owned by organizations, which manage physical servers and data warehouses. Expanding it requires the organization to buy expensive hardware and maintain it through IT teams. Meanwhile, cloud data infrastructure is service-based, where organizations subscribe to providers like AWS or Google Cloud. It offers more flexibility since the cloud vendors handle infrastructure, security, and redundancy.</p></li><li><p> <strong>Data Generation</strong>: Pub/Sub</p><p><strong>Data Ingestion</strong>: Dataflow</p><p><strong>Data Storage</strong>: Cloud Storage</p><p><strong>Data Transformation</strong>: Dataproc</p><p><strong>Data Serving</strong>: BigQuery</p></li><li><p><strong>Data Warehouse</strong>: A centralized system for storing structured data from multiple sources, optimized for reporting and analysis</p><p><strong>Distributed Computing</strong>: A model where processing is spread across multiple servers/machines to handle very large-scale data </p><p><strong>Big Data</strong>: Extremely large and complex datasets that exceed traditional database limits, often characterized by the <em>3Vs</em></p><p><strong>Cloud Computing</strong>: Delivery of computing resources over the internet which enable scalability and flexibility</p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:01:27 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608544589</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547207</link>
         <description><![CDATA[<p>TUN DANIAL ADLI BIN TUN ALI</p><p>(IS01083502)</p><ol><li><p>Traditional Data Infra required the organization to manage everything on its own, from hardware, scaling, maintenance, backup, and disaster recovery. Cloud Data Infra is hosted by service providers such as Google, AWS, and Alibaba. It requires no physical hardware to be present cause the host provides everything, the organization can choose the desired resources based on their need, and it can be scaled up and down to fulfill their requirement. It also gives an analytical dashboard and data to the customer</p></li><li><p>Data Generation = Google Cloud pub/sub</p><p>Data ingestion = Cloud Dataflow</p><p>Data Storage = Cloud big table</p><p>Data Transformation = Dataproc</p><p>Data Serving = Looker</p></li><li><p>Data Warehouse = A centralized system to store structured data for analytics and reporting.</p><p>Distributed Computing = Splitting large data tasks across multiple machines working in parallel to ease the workload when processing data.</p><p>Big data = A huge amount of data and a complex dataset that can't be stored by traditional databases. It can be characterized by the 3v</p><p>Cloud Computing = On-demand access to computing, storage, and services over the internet</p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:03:12 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547207</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547260</link>
         <description><![CDATA[<p>IS01082871</p><ol><li><p>Traditional Data Infrastructure vs Cloud Data Infrastructure : </p><p><br/></p><p>Traditional Data Infrastructure</p><p>i. On-premise (physical servers/data centers)</p><p>ii. Limited and costly to scale</p><p>iii. High upfront costs (hardware, maintenance)</p><p><br/></p><p>Cloud Data Infrastructure </p><p>i. Hosted on cloud platforms (AWS)</p><p>ii. Easily scalable on-demand</p><p>iii. Pay-as-you-go model</p><p><br/></p></li><li><p>Five Stages </p><p>i. <strong>Data Generation</strong></p><ul><li><p><em>Description:</em> Creation of raw data from various sources. </p></li><li><p>example : Firebase</p></li></ul><p>ii. <strong>Data Ingestion</strong></p><ul><li><p><em>Description:</em> Bringing data into the system.</p></li><li><p>Example : Cloud Pub/Sub</p></li></ul><p>iii. <strong>Data Storage</strong></p><ul><li><p><em>Description:</em> Storing raw and processed data.</p></li><li><p>Example : Cloud Storage</p></li></ul><p>iv. <strong>Data Transformation</strong></p><ul><li><p><em>Description:</em> Cleaning, enriching, and transforming data.</p></li><li><p>Example : Cloud Dataflow</p></li></ul><p>v. <strong>Data Serving</strong></p><ul><li><p><em>Description:</em> Making data available for analytics, BI, or applications.</p></li><li><p>Example : Looker</p><p><br/></p></li></ul></li><li><p><strong>Data Warehouse:</strong><br>A centralized repository that stores structured data from multiple sources, optimized for fast querying and analytics. Example: BigQuery.</p><p><strong>Distributed Computing:</strong><br>A model where computation is spread across multiple machines to handle large workloads efficiently. Frameworks like Apache Hadoop or GCP's Dataflow utilize this.</p><p><strong>Big Data:</strong><br>Extremely large datasets that traditional data processing tools can’t handle efficiently. Characteristics are often summarized as the 4 V’s: Volume, Velocity, Variety, and Veracity.</p></li><li><p><strong>Cloud Computing:</strong><br>Delivery of computing services (servers, storage, databases, networking, software) over the internet. Enables scalability, flexibility, and cost-efficiency.</p><p><br/></p><p><br/></p><p><br/></p><p><br/></p><p><br/></p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:03:14 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547260</guid>
      </item>
      <item>
         <title></title>
         <author>roshankumarasan15</author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547743</link>
         <description><![CDATA[<p>IS01083065</p><p><br/></p><p>1.Traditional data infrastructure relies on physical servers and storage owned and maintained by a company, requiring high upfront costs, manual scaling, and ongoing maintenance. Cloud data infrastructure is hosted by providers, offering on-demand resources, easy scalability, lower upfront costs, and pay-as-you-go flexibility without the need to manage hardware.</p><p><br/></p><p>2.<strong>Data Generation</strong> – The stage where raw data is created from applications, devices, or users.</p><ul><li><p><strong>Data Ingestion</strong> – The process of collecting and moving data into the system.</p></li><li><p><strong>Data Storage</strong> – Storing data in a secure and scalable system for later use.</p></li><li><p><strong>Data Transformation</strong> – Cleaning, structuring, and processing data to make it useful.</p></li><li><p><strong>Data Serving</strong> – Making processed data available for analysis, reporting, or applications.</p></li></ul><p><br/></p><p>3.<strong>Data Warehouse</strong> – A centralized system that stores structured data from different sources for reporting and analysis.</p><ul><li><p><strong>Distributed Computing</strong> – A method of processing data using multiple computers working together as a single system to handle large tasks efficiently.</p></li><li><p><strong>Big Data</strong> – Extremely large and complex datasets that cannot be managed or processed by traditional databases, often characterized by high volume, velocity, and variety.</p></li><li><p><strong>Cloud Computing</strong> – The delivery of computing resources over the internet on a pay-as-you-go basis.</p></li></ul>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:03:25 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547743</guid>
      </item>
      <item>
         <title></title>
         <author>arfanoornasna</author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547802</link>
         <description><![CDATA[<p>Nur Arfa Sakinah Binti Noor Ahmad (IS01083890)</p><p><br/></p><ol><li><p><strong>Difference between traditional data infrastructure and cloud data infrastructure:</strong></p></li></ol><ul><li><p>Traditional data infrastructure is managed on-site, where organizations must buy, install, and maintain their own servers, storage, and networking equipment. This requires higher costs and more effort in maintenance.</p></li><li><p>Cloud data infrastructure, however is managed by providers such as Google Cloud. Organizations only need to rent storage and computing resources through the internet. It is more flexible, scalable, and cost-effective since hardware ownership and maintenance are handled by the provider.</p></li></ul><p><br/></p><ol start="2"><li><p><strong>Five stages of the data engineering lifecycle with GCP services:</strong></p></li></ol><ul><li><p>Data Generation: Google Cloud Pub/Sub </p></li><li><p>Data Ingestion: Cloud Dataflow </p></li><li><p>Data Storage: BigQuery </p></li><li><p>Data Transformation: Dataprep or Dataflow</p></li><li><p>Data Serving: BigQuery</p></li></ul><p><br/></p><ol start="3"><li><p><strong>Key terms in data engineering evolution:</strong></p><ul><li><p>Data warehouse: A centralized storage system designed for reporting and analysis of structured data</p></li><li><p>Distributed computing: A method where large tasks are divided across multiple computers to work simultaneously, speeding up process</p></li><li><p>Big data: Very large and complex datasets that cannot be processed efficiently with traditional data tools</p></li><li><p>Cloud computing: The delivery of computing services such as storage, servers, and databases over the internet instead of relying on physical hardware</p><p><br/></p></li></ul></li></ol><p><br/></p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:03:27 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608547802</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608548089</link>
         <description><![CDATA[<p><strong>IS01082817</strong></p><p><br/></p><ol><li><p> a) One of the differences is how the data is <strong>stored</strong>. For traditional data infrastructure, it is stored in on-premise relational databases. However, the storage is limited with semi-structured/unstructured data. For data cloud infrastructure, it uses cloud-native storage like Amazon S3 and storage is virtually unlimited and scales automatically.  </p><p><br/></p><p>b) Next is how the data is<strong> processed</strong>. For traditional data infrastructure, ETL jobs run on local servers with tools like Informatica, SSIS, or custom scripts and it has limited parallelism and scalability. For data cloud infrastructure, it uses distributed processing frameworks like Apache Spark<strong> </strong>and pipelines are often orchestrated with cloud-native tools. </p><p><br/></p><p>c)Lastly, they differ in where they are stored in <strong>warehouses</strong>. For traditional data infrastructure, it is stored in on-premise data warehouses but scaling requires buying more hardware. For data cloud infrastructure, it uses cloud-native warehouses like Snowflake, BigQuery, Redshift and it has elastic compute &amp; storage scaling</p></li></ol><p><br/></p><ol start="2"><li><p>5 stage of data engineering :</p><p><br/></p><p>a) <strong>Data generation.</strong> Here is where the data is created. GCP can be used is Google Analytics.</p><p><br/></p><p>b) <strong>Data ingestion.</strong> Here it involves moving data from its source into the cloud. GCP can be used is Cloud Dataflow.</p><p><br/></p><p>c) <strong>Data Storage.</strong> Here, data is stored for further processing or querying. GCP can be used is Google Cloud Storage.</p><p><br/></p><p>d) <strong>Data transformation.</strong> Here, it process raw data into clean, usable formats. GCP can be used is Cloud Dataflow</p><p><br/></p><p>e) <strong>Data serving</strong>. Here, it transform data available to end users, dashboards, ML models, or applications. GCP can be used is Big Query.</p></li></ol><p><br/></p><ol start="3"><li><p>Key terms definition:</p></li></ol><p><br/></p><p>a) <strong>Data warehouse</strong>: Centralized system used for storing structured data from multiple sources, optimized for querying and reporting.</p><p><br/></p><p>b) <strong>Distributed Computing</strong>: Computing model where tasks are divided and run across multiple machines to improve performance, fault tolerance, and scalability.</p><p><br/></p><p>c) <strong>Big Data</strong>: Datasets that are too large or complex for traditional systems to store, process, or analyze effectively.</p><p><br/></p><p>d) <strong>Cloud computing</strong>: Delivery of computing services over the internet, on a pay-as-you-go basis.</p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:03:37 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608548089</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608548266</link>
         <description><![CDATA[<p>IS01082893</p><p><br/></p><p>1. traditional data infrastructure requires new hardware, possibly more physical space and cloud data infrastructure has flexible options. Traditional IT systems offer a level of control that some companies prefer. In cloud computing, security measures include encryption, multi-factor authentication, and compliance with various industry standards.</p><p><br/></p><p>2. i. data generation: Cloud Pub/Sub: Provides low-latency, real-time messaging for event-driven architectures, enabling efficient data generation from diverse sources like IoT devices or application logs.</p><p><br/></p><p>ii. data storage: Cloud Storage: Offers various storage classes (Standard, Nearline, Coldline, Archive) to optimize cost and performance based on data access patterns and retention needs.</p><p><br/></p><p>iii. data ingestion: Cloud Dataflow: A fully managed service for executing Apache Beam pipelines, enabling high-throughput, low-latency data ingestion and processing for both batch and stream data.</p><p><br/></p><p>iv. data transformation: Cloud Dataflow: Processes and transforms data at scale, performing complex ETL (Extract, Transform, Load) operations efficiently.</p><p><br/></p><p>v. data serving: BigQuery: Serves as a high-performance data source for analytics, reporting, and machine learning models, enabling fast query responses for end-users and applications.</p><p><br/></p><p>3. i. data warehouse: aggregates data from various sources into a central data store optimized for querying and analysis.</p><p> </p><p>ii. distributed computing: Distributed computing is the method of making multiple computers work together to solve a common problem. It makes a computer network appear as a powerful single computer that provides large-scale resources to deal with complex challenges.</p><p><br/></p><p>iii. big data: Big data refers to extremely large and complex data sets that cannot be easily managed or analyzed with traditional data processing tools, particularly spreadsheets.</p><p><br/></p><p>iv: cloud computing: the practice of using a network of remote servers hosted on the internet to store, manage, and process data, rather than a local server or a personal computer.</p><p><br/></p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:03:46 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608548266</guid>
      </item>
      <item>
         <title></title>
         <author>is01083876</author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608548723</link>
         <description><![CDATA[<p>Kaminy Varma [IS01083876]</p><p><br/></p><p><strong>1. Difference between</strong></p><p><strong>Traditional Data Infrastructure: </strong></p><ul><li><p>Data is stored on physical hardware that's owned and managed by the organisation</p></li><li><p>requires high upfront costs &amp; maintenance </p></li><li><p>provides maximum control</p></li></ul><p><strong>Cloud Data Infrastructure:</strong></p><ul><li><p>Data is stored on remote servers provided by third parties</p></li><li><p>accesible over the internet</p></li><li><p>Cost effective &amp; reduced operational overload</p><p><br/></p></li></ul><p><strong>2. Five stages of Data Engineering Lifecycle &amp; GCP service:</strong></p><ol><li><p><strong>Data Generation:</strong> Cloud Firestore, Cloud SQL (GCP doesn't directly generate data)</p></li><li><p><strong>Data Ingestion</strong>: Cloud Pub/Sub, Cloud Dataflow </p></li><li><p><strong>Data Storage: </strong>Cloud Storage, Big Query, Bigtable</p></li><li><p><strong>Data Transformation:</strong> Big Query, Dataproc</p></li><li><p><strong>Data Serving:</strong> Big Query, Looker</p><p><br/></p></li></ol><p>3. Define:</p><ul><li><p><strong>Data Warehouse:</strong> A central, structured repository that stores &amp; organizes vast amounts of historical data from various sources</p></li><li><p><strong>Distributed Computing:</strong> Breaks down complex tasks &amp; executed across multiple independent computers that communicate &amp; coordinate with each other over a network. (work together as single system to achieve a common goal)</p></li><li><p><strong>Big Data: </strong>enormous, complex data sets, characterized by their high Volume, Velocity, and Variety, that traditional software cannot manage.</p></li><li><p><strong>Cloud Computing:</strong> on-demand, pay-as-you-go use of computing resources like servers, storage &amp; databases over the internet, allowing users to access these services without buying or managing their own physical hardware.</p></li></ul><p><br/></p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:03:55 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608548723</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549240</link>
         <description><![CDATA[<p>Muhammad Iskandar Bin Mohd Yusri (IS01083541)</p><p><br/></p><p>1. Traditional data infrastructure</p><ul><li><p>Data generated from difference premise apps such as ERP systems, IoT devices and servers.</p></li><li><p>Storage are limited by hardware and expensive to scale.</p></li></ul><p>Cloud data infrastructure</p><ul><li><p>Data generated from cloud-native apps, SaaS, IoT and global digital services.</p></li><li><p>Elastic capacity and much more cheaper.</p></li></ul><p><br/></p><p>2. 1 GCP for each 5 stages of Data Engineering Lifecycle</p><ol><li><p>Data Generation - Google Analytics</p></li><li><p>Data Ingestion - Transfer Service (batch load)</p></li><li><p>Data Storage - Cloud Storage</p></li><li><p>Data Transformation - Dataflow (ETL/ELT pipelines)</p></li><li><p>Data Serving - BigQuery (SQL queries)</p></li></ol><p><br/></p><p>3. Key terms in data engineering</p><ol><li><p>Data warehouse - Storage place where a company keeps all the the important and organized data.</p></li><li><p>Distributed Computing - Divide the each computer handle a certain part, making the process much faster.</p></li><li><p>Big Data - Large and complex amount of data for normal computer to handle and require specific tools to process it.</p></li><li><p>Cloud Computing - Using the internet to access computing power, storage and software.</p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:04:17 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549240</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549250</link>
         <description><![CDATA[<ol><li><p>Traditional data infrastructure relies on on-site physical hardware like servers and storage owned and managed by the organization, offering high control but requiring significant upfront investment and ongoing maintenance. While cloud data infrastructure uses virtualization and virtualization-based services delivered over the internet by third-party providers, offering greater scalability, flexibility, and a pay-as-you-go model with lower upfront costs.</p><p><br></p></li><li><p>i. Data Generation; Cloud Pub/Sub, for collecting and ingesting data streams from user interactions, devices, and logs.</p><p>ii. Data Ingestion: Cloud Dataflow, to build and manage ETL/ELT pipelines for batch and streaming data processing.</p><p>iii. Data Storage: BigQuery, a fully-managed data warehouse for storing and querying large structured datasets.</p><p>iv. Data Transformation: Cloud Dataflow, for processing and transforming raw data into clean, enriched, and analytical formats.</p><p>v. Data Serving: Looker (or Data Studio), for visualizing processed data and delivering insights through dashboards and reports.</p><p><br></p></li><li><p>i. Data warehouse: A centralized storage system that integrates data from multiple sources.</p><p>ii. Distributed Computing: Using multiple computers working together to process tasks faster and handle larger workloads. </p><p>iii. Big Data: Very large and complex datasets that traditional systems cant process efficiently.</p><p>iv. Cloud Computing: Deliver computing resources (servers, storage, databses, etc.) over the internet on a pay-as-you-go basis.</p><p><br></p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:04:18 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549250</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549329</link>
         <description><![CDATA[<p>Zalin Zalifah (IS01083447)</p><p><br/></p><ol><li><p>Traditional data infrastructure relies on on-site physical servers managed at the location of the business itself, which requires expertise to maintain and manage. On the other hand, cloud data infrastructure uses already existing hardware and resources that are managed by third-party cloud providers offering IaaS, PaaS, and SaaS. </p></li><li><p>Data Generation: Dataproc, BigQuery</p><p>Data Ingestion: DataFlow</p><p>Data Storage: Bigtable, Cloud SQL</p><p>Data Transformation: Dataflow</p><p>Data Serving: LookerStudio</p></li><li><p>a) Data warehouse: A data warehouse is a centralised data repository that contains structured data collected from different data resources.</p><p>b) Distributed computing: Distributed computing refers to a model where IT tasks are divided and done by multiple machines that work together as a single system.</p><p>c) Big data: Big data refers to huge and complex datasets that exceed traditional database limits.</p><p>d) Cloud computing: Cloud computing refers to computing resources being offered as a service over the internet. </p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:04:21 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549329</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549522</link>
         <description><![CDATA[<p>IS01082919</p><p>Question 1:                                                    A.Traditional Data Infrastructure  </p><p><br/></p><p>-<strong>On-premises setup</strong>: Servers and equipment are kept in your own building or office.</p><p>-<strong>Capital-intensive</strong>: investment is required to purchase hardware, licenses, networking equipment, and data center space. Additional ongoing costs include power, cooling, and dedicated IT staff.</p><p>-<strong>Limited scalability</strong>: Scaling up means purchasing and installing more physical servers.</p><p>-<strong>Manual management</strong>: IT teams are responsible for performing updates, applying security patches, managing backups, and monitoring performance often using manual tools </p><p>-<strong>Rigid architecture</strong>: Supports integration across public, private, and hybrid environments. Data and applications can move between systems more easily, supporting modern development workflows like DevOps</p><p>-<strong>Disaster recovery challenges</strong>: Backup and recovery solutions are often complex and costly.</p><p><br/></p><p>B.Cloud Data Infrastructure</p><p>-<strong>Hosted remotely</strong>: Data and services are stored and managed in cloud providers’ data centers.</p><p>-<strong>Pay-as-you-go model</strong>: Organizations pay only for the resources they use, reducing capital expenditure.</p><p>-<strong>Highly scalable</strong>: Resources can be scaled up or down instantly based on demand.</p><p>-<strong>Automated management</strong>: Cloud platforms offer automated updates, backups, and security features.</p><p>-<strong>Flexible and integrated</strong>: Supports hybrid and multi-cloud environments, enabling seamless data flow.</p><p>-<strong>Robust disaster recovery</strong>: Built-in tools help you recover fast if something goes wrong.</p><p><br/></p><p>Question 2:</p><p>A. Data Generation: </p><ul><li><p><strong>What happens?</strong><br>Data is created by apps, websites, devices (like sensors), or users.</p></li><li><p><strong>GCP Tool:</strong> <strong>Cloud Pub/Sub</strong><br>Think of it like a message collector it grabs data as soon as it happens (real-time).</p><p><br/></p></li></ul><p>B. Data Ingestion:</p><ul><li><p><strong>What happens?</strong><br>Data is stored in a way that makes it easy to access later.</p></li><li><p><strong>GCP Tools:</strong></p><ul><li><p><strong>BigQuery</strong> – For structured data (like tables, numbers, etc.)</p></li><li><p><strong>Cloud Storage</strong> – For files, images, logs, and other unstructured data.</p></li></ul></li></ul><p>C. Data Transformation:</p><ul><li><p><strong>What happens?</strong><br>Raw data is cleaned, fixed, and reshaped so it’s ready to analyze.</p></li><li><p><strong>GCP Tools:</strong></p><ul><li><p><strong>Dataprep (part of Dataplex)</strong> – A no-code, drag-and-drop tool for cleaning data</p></li><li><p><strong>Dataflow</strong> – Can also be used to transform data with coding (Apache Beam)</p></li></ul></li></ul><p>D.<strong> </strong>Data Serving:</p><ul><li><p><strong>What happens?</strong><br>Final, clean data is used in dashboards, reports, or apps.</p></li><li><p><strong>GCP Tools:</strong></p><ul><li><p><strong>BigQuery</strong> – Run fast SQL queries on big data</p></li><li><p><strong>Looker</strong> – Build dashboards and visualize insights</p></li></ul></li></ul><p><br/></p><p>Question3:</p><p>A. <strong>Data Warehouse</strong></p><ul><li><p>A big, organized place to store data from different sources.</p></li><li><p><strong>Example: Google BigQuery</strong> – Serverless data warehouse used to analyze large datasets using SQL.</p></li></ul><p>B. <strong>Distributed Computing</strong></p><ul><li><p>Big jobs are split up and shared across many computers.</p></li><li><p><strong>Example: Apache Hadoop</strong> – Breaks big data into smaller chunks and processes them across clusters.</p></li></ul><p>C. <strong>Big Data</strong></p><ul><li><p>Really <strong>large and complex</strong> sets of data.</p></li><li><p>Comes in all shapes and forms — like text, videos, logs, etc.</p></li><li><p>Known for the <strong>3 Vs</strong>:</p><ul><li><p><strong>Volume</strong> – A lot of data</p></li><li><p><strong>Velocity</strong> – Data comes in fast</p></li><li><p><strong>Variety</strong> – Different types of data</p></li></ul></li><li><p>Needs special tools like <strong>Hadoop</strong>, <strong>Spark</strong>, or <strong>NoSQL</strong> databases.</p></li></ul><p>D. <strong>Cloud Computing</strong></p><ul><li><p>You <strong>rent computing power and storage</strong> instead of buying it.</p></li><li><p>Run apps, store files, and process data <strong>online</strong>, without owning servers.</p></li><li><p><strong>Example: Google Cloud Platform (GCP)</strong> – Used for data storage, machine learning, and hosting apps.</p></li></ul><p><br/></p><p><br/></p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:04:30 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608549522</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608553239</link>
         <description><![CDATA[<p>IS01083877</p><p><br/></p><p>Traditional data infrastructure:</p><p>-Data is stored physical servers.</p><p>-Costly and time consuming</p><p>-Less Flexible </p><p><br/></p><p>Cloud data infrastructure:</p><p>-Data is stored on remote servers accessed via the internet</p><p>-Data is accessible from anywhere with an internet connection.</p><p><br/></p><p>1)FireBase IoT Core, Cloud Logging</p><p>2)Dataflow</p><p>3)BigQuery, Cloud Storage</p><p>4)Dataflow</p><p>5)Cloud SQL</p><p><br/></p><p><br/></p><p>Data Warehouse: stores structured data from multiple sources for reporting and analysis.</p><p><br/></p><p>Distributed Computing: processing tasks are divided across multiple machines that work together as a single system.</p><p><br/></p><p>Big Data: very large data that cannot be easily to manage, analyze and process using traditional methods.</p><p><br/></p><p>Cloud Computing: servers, storage, databases over the internet.</p><p><br/></p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:06:17 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608553239</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608555075</link>
         <description><![CDATA[<p>IS01084550</p><p><br/></p><p>1)Question 1</p><p><strong>Traditional data infrastructure :</strong></p><ul><li><p>system run on physical servers and storage that a company owns and maintains. </p></li><li><p>Scaling them up means buying and installing more hardware, which takes time and money. </p></li><li><p>The IT team also has to handle upgrades, maintenance, and security.</p></li></ul><p><strong>Cloud data infrastructure:</strong></p><ul><li><p><strong> </strong>system runs on platforms like Google Cloud, AWS, or Azure. </p></li><li><p>It’s flexible, meaning it can scale resources up or down instantly and only pay for what use. </p></li><li><p>The cloud provider takes care of most of the maintenance, updates, and security, businesses can focus on using the data instead of managing hardware.                </p></li></ul><p><br/></p><p>2)Question 2</p><p>-<strong>Data Generation:</strong> Cloud Pub/Sub</p><p>-<strong>Data Ingestion: </strong>Dataflow</p><p>-<strong>Data Storage: </strong>BigQuery</p><p>-<strong>Data Transformation:</strong> Dataflow</p><p>-<strong>Data Serving:</strong> BigQuery,Looker studio</p><p><br/></p><p>3)Question 3</p><p>-<strong>Data Warehouse:</strong> A single place where data from different sources is collected and structured to making it easier to run reports and analysis.</p><p>-<strong>Distributed Computing:</strong> Breaking big tasks into smaller pieces and running them across many machines at once which makes handling large datasets faster and more efficient.</p><p>-<strong>Big Data: </strong>large and complex datasets that go beyond what traditional databases can handle. It often described using the  Volume (size), Variety (different types), and Velocity (speed of data).</p><p>-<strong>Cloud Computing: </strong>Using the internet to access computing power, storage, and services on demand.</p>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:07:32 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608555075</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608555251</link>
         <description><![CDATA[<p>Batrisyaa Iszween binti Azman (IS01083867)</p><p><br/></p><ol><li><p>What is the difference between traditional data infrastructure and cloud data infrastructure?  </p><ul><li><p><strong>Traditional data infrastructure </strong></p><p>Companies manage their own servers, storage, and databases, giving them full control and direct oversight of their systems. </p></li><li><p><strong>Cloud data infrastructure </strong></p><p> stream-based and runs online through platforms like GCP, offering flexible pay-as-you-go services, automatic scalability, and built-in reliability and security, which makes it easier to support growing data needs and modern analytics.</p></li></ul></li><li><p>Refer to the <strong>five stages</strong> of the Data Engineering Lifecycle: Data Generation, Data Ingestion, Data Storage, Data Transformation and Data Serving. For each stage, find at least one Google Cloud Platform (GCP) service that can be used.</p><ul><li><p><strong>Data Generation</strong> → Firebase,IoT Core.</p></li><li><p><strong>Data Storage</strong> → BigQuery,Cloud Storage, Spanner.</p></li><li><p><strong>Data Ingestion</strong> → Pub/Sub,Dataflow.</p></li><li><p><strong>Data Transformation</strong> → Dataflow,Dataproc ,BigQuery SQL.</p></li><li><p><strong>Data Serving</strong> → Looker Studio,Vertex AI.</p></li></ul></li><li><p>Define key terms in data engineering evolution: data warehouse, distributed computing, big data, cloud computing.</p><ul><li><p><strong>Data Warehouse</strong> →  Centralized system that stores structured and historical data kept separate from operational databases, mainly used for reporting and analytics.</p></li><li><p><strong>Distributed Computing</strong> → A way of processing data by splitting tasks across many machines so that big workloads can be done faster and more efficiently.</p></li><li><p><strong>Big Data</strong> → Very large, fast, and diverse datasets that traditional databases cannot handle, usually described by the three Vs: volume, velocity, and variety.</p></li><li><p><strong>Cloud Computing</strong> → The use of internet-based services that provide computing, storage, and processing power on demand, where you only pay for what you use and can scale easily.</p></li></ul></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:07:41 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608555251</guid>
      </item>
      <item>
         <title></title>
         <author></author>
         <link>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608555875</link>
         <description><![CDATA[<p><strong>IS01083874</strong></p><p><br/></p><ol><li><p><strong>Traditional data infrastructure</strong> relies on physical servers and storage that are maintained by an organization. Scaling requires purchasing and installing additional hardware, which often results in high upfront costs and longer deployment times.</p><p><strong>Cloud data infrastructure </strong>is built on virtualize resources provided by cloud providers such as Google Cloud. It offers on-demand scalability, where organizations can scale resources instantly without investing in physical hardware.</p></li><li><p><strong>i)</strong> <strong>Data Generation : </strong>Cloud Pub/Sub</p><p><strong>ii) Data Ingestion : </strong>Cloud Pub/Sub</p><p><strong>iii)</strong> <strong>Data Storage : </strong>BigQuery, Cloud Storage</p><p><strong>iv)</strong> <strong>Data Transformation : </strong>Cloud Dataflow</p><p><strong>v)</strong> <strong>Data Serving : </strong>Looker</p></li><li><p><strong>i)</strong> <strong>Data warehouse : </strong>Centralized system for storing and analyzing structured data from various sources.</p><p><strong>ii)</strong> <strong>Distributed computing : </strong>A method of processing data across multiple machines to handle large-scale workloads.</p><p><strong>iv)</strong> <strong>Big data :</strong> Big and complex data sets, characterized by their Volume, Variety, and Velocity.</p><p><strong>v)</strong> <strong>Cloud computing : </strong>The delivery of computing resources, such as servers, storage, databases, analytics, and machine learning over the internet, rather than relying on local hardware.</p></li></ol>]]></description>
         <enclosure url="" />
         <pubDate>2025-09-29 05:08:09 UTC</pubDate>
         <guid>https://padlet.com/laila178/9k4ws7vtrybva32l/wish/3608555875</guid>
      </item>
   </channel>
</rss>
