Build a synthetic data pipeline using Gretel and Apache Airflow (2022)

Hey folks, my name is Drew, and I'm a software engineer here at Gretel. I've recently been thinking about patterns for integrating Gretel APIs into existing tools so that it's easy to build data pipelines where security and customer privacy are first-class features, not just an afterthought or box to check.

One data engineering tool that is popular amongst Gretel engineers and customers is Apache Airflow. It also happens to work great with Gretel. In this blog post, we'll show you how to build a synthetic data pipeline using Airflow, Gretel and PostgreSQL. Let's jump in!

What is Airflow

Airflow is a workflow automation tool commonly used to build data pipelines. It enables data engineers or data scientists to programmatically define and deploy these pipelines using Python and other familiar constructs. At the core of Airflow is the concept of a DAG, or directed acyclic graph. An Airflow DAG provides a model and set of APIs for defining pipeline components, their dependencies and execution order.

You might find Airflow pipelines replicating data from a product database into a data warehouse. Other pipelines might execute queries that join normalized data into a single dataset suitable for analytics or modeling. Yet another pipeline might publish a daily report aggregating key business metrics. A common theme shared amongst these use cases: coordinating the movement of data across systems. This is where Airflow shines.

Leveraging Airflow and its rich ecosystem of integrations, data engineers and scientists can orchestrate any number of disparate tools or services into a single unified pipeline that is easy to maintain and operate. With an understanding of these integration capabilities, we’ll now start talking about how Gretel might be integrated into an Airflow pipeline to improve common data ops workflows.

How does Gretel fit in?

At Gretel, our mission is to make data easier and safer to work with. Talking to customers, one pain point we often hear about is the time and effort required to get data scientists access to sensitive data. Using Gretel Synthetics, we can reduce the risk of working with sensitive data by generating a synthetic copy of the dataset. By integrating Gretel with Airflow, it’s possible to create self-serve pipelines that make it easy for data scientists to quickly get the data they need without requiring a data engineer for every new data request.

To demonstrate these capabilities, we’ll build an ETL pipeline that extracts user activity features from a database, generates a synthetic version of the dataset, and saves the dataset to S3. With the synthetic dataset saved in S3, it can then be used by data scientists for downstream modeling or analysis without compromising customer privacy.

To kick things off, let’s first take a bird’s eye view of the pipeline. Each node in this diagram represents a pipeline step, or “task” in Airflow terms.

Build a synthetic data pipeline using Gretel and Apache Airflow (1)

(Video) Build a Synthetic Data Pipeline Using & Apache Airflow

We can break the pipeline up into 3 stages, similar to what you might find in an ETL pipeline:

  • Extract - The `extract_features` task will query a database, and transform the data into a set of features that can be used by data scientists for building models.
  • Synthesize - `generate_synthetic_features` will take the extracted features as input, train a synthetic model, and then generate a synthetic set of features using Gretel APIs and cloud services.
  • Load - `upload_synthetic_features` saves the synthetic set of features to S3 where it can be ingested into any downstream model or analysis.

In the next few sections we’ll dive into each of these three steps in greater detail. If you wish to follow along with each code sample, you can head over to gretelai/gretel-airflow-pipelines and download all the code used in this blog post. The repo also contains instructions you can follow to start an Airflow instance and run the pipeline end to end.

Additionally, it may be helpful to view the Airflow pipeline in its entirety, before we dissect each component, dags/ The code snippets in the following sections are extracted from the linked user booking pipeline.

Extract Features

The first task, `extract_features` is responsible for extracting raw data from the source database and transforming it into a set of features. This is a common feature engineering problem you might find in any machine learning or analytics pipeline.

In our example pipeline we will provision a PostgreSQL database and load it with booking data from an Airbnb Kaggle Competition.

This dataset contains two tables, `Users` and `Sessions`. `Sessions` contains a foreign key reference, `user_id`. Using this relationship, we’ll create a set of features containing various booking metrics aggregated by user. The following figure represents the SQL query used to build the features.

WITH session_features_by_user AS ( SELECT user_id, count(*) AS number_of_actions_taken, count(DISTINCT action_type) AS number_of_unique_actions, round(avg(secs_elapsed)) AS avg_session_time_seconds, round(max(secs_elapsed)) AS max_session_time_seconds, round(min(secs_elapsed)) AS min_session_time_seconds, ( SELECT count(*) FROM sessions s WHERE s.user_id = user_id AND s.action_type = 'booking_request') AS total_bookings FROM sessions GROUP BY user_id)SELECT AS user_id, u.gender, u.age, u.language, u.signup_method, u.date_account_created, s.number_of_actions_taken, s.number_of_unique_actions, s.avg_session_time_seconds, s.min_session_time_seconds, s.max_session_time_secondsFROM session_features_by_user s LEFT JOIN users u ON = s.user_idLIMIT 5000

The SQL query is then executed from our Airflow pipeline and written to an intermediate S3 location using the following task definition.

@task()def extract_features(sql_file: str) -> str: context = get_current_context() sql_query = Path(sql_file).read_text() key = f"{context['dag_run'].run_id}_booking_features.csv" with NamedTemporaryFile(mode="r+", suffix=".csv") as tmp_csv: postgres.copy_expert( f"copy ({sql_query}) to stdout with csv header", ) s3.load_file(, key=key, ) return key
(Video) How to Generate Synthetic Data with

The input to the task, `sql_file`, determines what query to run on the database. This query will be read-in to the task and then executed against the database. The results of the query will then be written to S3 and the remote file key will be returned as an output of the task.

The screenshot below shows a sample result set of the extraction query from above. We will describe how to create a synthetic version of this dataset in the next section.

Build a synthetic data pipeline using Gretel and Apache Airflow (2)

Synthesize Features using Gretel APIs

To generate a synthetic version of each feature, we must first train a synthetic model, and then run the model to generate synthetic records. Gretel has a set of Python SDKs that make it easy to integrate into Airflow tasks.

In addition to the Python Client SDKs, we’ve created a Gretel Airflow Hook that manages Gretel API connections and secrets. After setting up a Gretel Airflow Connection, connecting to the Gretel API is as easy as

For more information about how to configure Airflow connections, please refer to our Github repository README.

The `project` variable in the example above can be used as the main entrypoint for training and running synthetic models using Gretel’s API. For more details, you can check out our Python API docs.

Referring back to the booking pipeline, we’ll now review the `generate_synthetic_features` task. This step is responsible for training the synthetic model using the features extracted in the previous task.

(Video) What is Synthetic Data and how to use it in your project

@task()def generate_synthetic_features(data_source: str) -> str: project = gretel.get_project() model = project.create_model_obj( model_config="synthetics/default", data_source=s3.download_file(data_source) ) model.submit_cloud() poll(model) return model.get_artifact_link("data_preview")

Looking at the method signature, you will see it takes a path, `data_source`. This value points to the S3 features extracted in the previous step. In a later section we’ll walk through how all these inputs and outputs are wired together.

When creating the model using `project.create_model_obj`, the `model_config` param represents the synthetic model configuration used to generate the model. In this pipeline, we’re using our default model config, but many other configuration options are available.

After the model has been configured, we call `model.submit_cloud()`. This will submit the model for training and record generation using Gretel Cloud. Calling `poll(model)` will block the task until the model has completed training.

Now that the model has been trained, we’ll use `get_artifact_link` to return a link to download the generated synthetic features.

Build a synthetic data pipeline using Gretel and Apache Airflow (3)

This artifact link will be used as an input to the final `upload_synthetic_features` step.

Load Synthetic Features

The original features have been extracted, and a synthetic version has been created. Now it’s time to upload the synthetic features so they can be accessed by downstream consumers. In this example, we’re going to use an S3 bucket as the final destination for the dataset.

@task()def upload_synthetic_features(data_set: str): context = get_current_context() with open(data_set, "rb") as synth_features: s3.load_file_obj( file_obj=synth_features, key=f"{..._booking_features_synthetic.csv", )
(Video) Create Synthetic Data with Python and

This task is pretty straightforward. The `data_set` input value contains a signed HTTP link to download the synthetic dataset from Gretel’s API. The task will read that file into the Airflow worker, and then use the already configured S3 hook to upload the synthetic feature file to an S3 bucket where downstream consumers or models can access it.

Orchestrating the Pipeline

Over the last three sections we’ve walked through all the code required to extract, synthesize and load a dataset. The last step is to tie each of these tasks together into a single Airflow pipeline.

If you’ll recall back to the beginning of this post, we briefly mentioned the concept of a DAG. Using Airflow’s TaskFlow API we can compose these three Python methods into a DAG that defines the inputs, outputs and order each step will be run.

feature_path = extract_features("/opt/airflow/dags/sql/session_rollups__by_user.sql" )synthetic_data = generate_synthetic_features(feature_path)upload_synthetic_features(synthetic_data)

If you follow the path of these method calls, you will eventually get a graph that looks like our original feature pipeline.

Build a synthetic data pipeline using Gretel and Apache Airflow (4)

If you want to run this pipeline, and see it in action, head over to the accompanying Github repository. There you will find instructions on how to start an Airflow instance and run the pipeline end to end.

Wrapping things up

If you’ve made it this far, you’ve seen how Gretel can be integrated into a data pipeline built on Airflow. By combining Gretel’s developer friendly APIs, and Airflow’s powerful system of hooks and operators it’s easy to build ETL pipelines that make data more accessible and safer to use.

We also talked about a common feature engineering use case where sensitive data may not be readily accessible. By generating a synthetic version of the dataset, we reduce the risk of exposing any sensitive data, but still retain the utility of the dataset while making it quickly available to those who need it.

(Video) Synthetic Data for Healthcare and Life Sciences

Thinking about the feature pipeline in more abstract terms, we now have a pattern that can be repurposed for any number of new SQL queries. By deploying a new version of the pipeline, and swapping out the initial SQL query, we can front any potentially sensitive query with a synthetic dataset that preserves customer privacy. The only line of code that needs to change is the path to the sql file. No complex data engineering required.

Thanks for reading

Send us an email at or come join us in Slack if you have any questions or comments. We’d love to hear how you’re using Airflow and how we can best integrate with your existing data pipelines.


How do I create a pipeline in Apache airflow? ›

Steps to Build Data Pipelines with Apache Airflow
  1. Step 1: Install the Docker Files and UI for Apache Airflow.
  2. Step 2: Create a DAG file.
  3. Step 3: Extract Lines Containing Exceptions.
  4. Step 4: Extract the Required Fields.
  5. Step 5: Query the Table to Generate Error Records.
17 Feb 2022

How do you create an ETL pipeline in Airflow? ›

Performing an Airflow ETL job involves the following steps:
  1. Step 1: Preparing the Source and Target Environments.
  2. Step 2: Starting the Airflow Web Server.
  3. Step 3: Creating a Connection to S3.
  4. Step 4: Creating a Redshift Connection.
  5. Step 5: Creating the DAG File.
  6. Step 6: Triggering the Job and Monitoring the Results.

Is Apache airflow good for ETL? ›

We can use it in a batch ETL pipeline.

You can use Airflow transfer operators together with database operators to build ELT pipelines. Airflow provides a vast number of choices to move data from one system to another. This can be ok if your data engineering team is proficient with Airflow.

What is synthetic data Gretel? ›

Gretel Synthetics

Train machine learning models on your data then generate synthetic data that is statistically equivalent. Generate unlimited data. AI-based and open source. Differential privacy enabled.

Can Airflow replace Jenkins? ›

Airflow vs Jenkins: Production and Testing

Since Airflow is not a DevOps tool, it does not support non-production tasks. This means that any job you load on Airflow will be processed in real-time. However, Jenkins is more suitable for testing builds.

Is Airflow a data pipeline? ›

Apache Airflow is a batch-oriented tool for building data pipelines. It is used to programmatically author, schedule, and monitor data pipelines commonly referred to as workflow orchestration. Airflow is an open-source platform used to manage the different tasks involved in processing data in a data pipeline.

Is Airflow ETL or ELT? ›

Airflow is purpose-built to orchestrate the data pipelines that provide ELT at scale for a modern data platform.

Is ETL same as pipeline? ›

How ETL and Data Pipelines Relate. ETL refers to a set of processes extracting data from one system, transforming it, and loading it into a target system. A data pipeline is a more generic term; it refers to any set of processing that moves data from one system to another and may or may not transform it.

How do you create an automated ETL pipeline? ›

To build an ETL pipeline with batch processing, you need to:
  1. Create reference data: create a dataset that defines the set of permissible values your data may contain. ...
  2. Extract data from different sources: the basis for the success of subsequent ETL steps is to extract data correctly.

Does Airbnb still use Airflow? ›

Our growing workforce of data engineers, data scientists and analysts are using Airflow, a platform we built to allow us to move fast, keep our momentum as we author, monitor and retrofit data pipelines.

Which ETL tool is best? ›

8 More Top ETL Tools to Consider
  • 1) Striim. Striim offers a real-time data integration platform for big data workloads. ...
  • 2) Matillion. Matillion is a cloud ETL platform that can integrate data with Redshift, Snowflake, BigQuery, and Azure Synapse. ...
  • 3) Pentaho. ...
  • 4) AWS Glue. ...
  • 5) Panoply. ...
  • 6) Alooma. ...
  • 7) Hevo Data. ...
  • 8) FlyData.

How do you generate synthetic data? ›

To generate synthetic data, data scientists need to create a robust model that models a real dataset. Based on the probabilities that certain data points occur in the real dataset, they can generate realistic synthetic data points.

Is Gretel AI free? ›

Sign up for Gretel. ai's free beta and walk through one of our pre-packaged examples or run on your own dataset. Or, check out and contribute to our open source gretel-synthetics library.

How do you create synthetic data for testing? ›

How to Generate Synthetic Test Data
  1. Available Data Generator Functions.
  2. Insert Synthetic Parameters From Examples.
  3. Scenario: Insert Synthetic Test Data Manually.
  4. Scenario: Use CSV data as input to synthetically generate data.
  5. Scenario: Augment CSV files with dynamic test data.
  6. Preview Generated Test Data.

Does Airflow need a database? ›

The metadata database is a core component of Airflow. It stores crucial information such as the configuration of your Airflow environment's roles and permissions, as well as all metadata for past and present DAG and task runs. A healthy metadata database is critical for your Airflow environment.

Can we run Airflow without Docker? ›

How to install and run Airflow locally with Windows subsystem for Linux (WSL) with these steps: Open Microsoft Store, search for Ubuntu , install it then restart. Open cmd and type wsl. Update everything: sudo apt update && sudo apt upgrade.

What language does Airflow use? ›

Airflow is written in Python, and workflows are created via Python scripts. Airflow is designed under the principle of "configuration as code".

What are the main 3 stages in data pipeline? ›

Data pipelines consist of three essential elements: a source or sources, processing steps, and a destination.

Do data engineers use Airflow? ›

Data Engineers can easily integrate and use Talend together with Airflow for better data management. Using Airflow for orchestration allows for easily running multiple jobs with dependencies, parallelizing jobs, monitoring run status and failures, and more.

Can we use Cassandra in Airflow? ›

Airflow Cassandra Hooks

They work using Cassandra Hooks. By combining the Cassandra Hooks with the Python Operators we can gain access to all of the functionality contained within the cassandra hook object.

What is AWS equivalent of Airflow? ›

Amazon MWAA is a managed service for Apache Airflow that lets you use your current, familiar Apache Airflow platform to orchestrate your workflows. You gain improved scalability, availability, and security without the operational burden of managing underlying infrastructure.

How is Airflow different from Jenkins? ›

Airflow is more for considering the production scheduled tasks and hence Airflows are widely used for monitoring and scheduling data pipelines whereas Jenkins are used for continuous integrations and deliveries.

Does Airflow have a REST API? ›

Airflow exposes an REST API. It is available through the webserver. Endpoints are available at /api/experimental/ .

Is ETL outdated? ›

Quite simply, it is outdated because it predated cloud storage solutions. When building an ETL pipeline data analysts and data engineers normally follow a certain workflow that includes the following steps. Let's take a look at them and see if we can spot the problems.

Is SQL enough for ETL? ›

Every part of ETL can be done with SQL, and often is. There are other Query Languages that can be used, but SQL is the most popular for businesses. Often times, ETL Tools are really just SQL generators behind the scenes, so it's important to be able to use both interchangebly.

Can ETL be automated? ›

An automated ETL solution allows IT teams or data integration specialists to design, execute, and monitor the performance of ETL integration workflows through a simple point-and-click graphical interface.

Can ETL testing be automated? ›

Automating ETL tests allows frequent smoke and regression testing without much user intervention and supports automated testing on older code after each new database build. Automation can not only help execute tests; it can also assist with designing and managing them.

Which ETL tool is easiest? ›

Which ETL tool is easiest? It depends from user to user but some of the easiest ETL Tools that you can learn are Hevo, Dataddo, Talend, Apache Nifi because of their simple-to-understand UI and as they don't require too much technical knowledge.

Why is Airflow so popular? ›

The advantage of using Airflow over other workflow management tools is that Airflow allows you to schedule and monitor workflows, not just author them. This outstanding feature enables enterprises to take their pipelines to the next level.

Does Airflow use log4j? ›

Many common logging libraries, such as log4j, offer log rotation strategies to clear out older logs. However, Airflow does not utilize anything like it.

Which company uses Airflow? ›

Over 300 companies, including Airbnb, Slack, Walmart, etc., use Airflow to run their data pipelines efficiently.

What are 3 tiers of ETL? ›

To sum up, the processes involved in the Three Tier Architecture are ETL, querying, OLAP and the results produced in the Top Tier of this three-tier system. The front-end activities such as reporting, analytical results or data-mining are also a part of the process flow of the Data Warehouse system.

What ETL does Amazon use? ›

AWS Glue is the ETL tool offered by Amazon Web Services. Glue is a serverless platform and toolset that can extract data from various sources, transform it in different ways (enrich, cleanse, combine, and normalize), and load and organize data in destination databases, data warehouses, and data lakes.

Is ETL have future? ›

There are multiple companies to hire these skills. In this way, we can say that this popular ETL software will have a good demand in the future IT market.

What is similar to Apache Airflow? ›

Explore Some Alternatives to Apache Airflow
  • JS7 (JobScheduler) JobScheduler, a workload automation tool enables the automation and integration of corporate processes and workflows. ...
  • Autosys. AutoSys is a job control system that automates scheduling, monitoring, and reporting. ...
  • Control-M. ...
  • IBM Workload Automation-Tivoli.
14 Jun 2022

How many tasks can Airflow handle? ›

You can also tune your worker_concurrency (environment variable: AIRFLOW__CELERY__WORKER_CONCURRENCY ), which determines how many tasks each Celery worker can run at any given time. By default, the Celery executor runs a maximum of sixteen tasks concurrently.

For which use Apache Airflow is best suited? ›

Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing complex data pipelines from diverse sources.

What is an example of synthetic data? ›

Amazon is using synthetic data to train Alexa's language system. Google's Waymo uses synthetic data to train its self driving cars. Health insurance company Anthem works with Google Cloud to generate synthetic data. American Express & J.P. Morgan are using synthetic financial data to improve fraud detection.

Is synthetic data fake data? ›

Synthetic data is information that's artificially generated rather than produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data generated by a computer simulation can be seen as synthetic data.

How do you generate synthetic text data in Python? ›

1. Faker
  1. pip install Faker. To use the Faker package to generate synthetic data, we need to initiate the Faker class.
  2. from faker import Faker. fake = Faker() With the class initiated, we could generate various synthetic data. ...
  3. Image by Author. The result is a person's name when we use the .

What is the best free AI? ›

Best Free Artificial Intelligence Software
  • Scalenut,
  • Anyword,
  • UserWay,
  • Lityx,
  • Velma,
  • TARA ,
  • MeteoSales,

Does Google have an AI program? ›

Google Cloud's AI tools are armed with the best of Google's research and technology to help developers focus exclusively on solving problems that matter.

Is Google AI free? ›

By being open and freely available, it enables and encourages collaboration and the development of technology, solving real world problems.

When should I create synthetic data? ›

Often, synthetic data is used as a substitute when suitable real-world data is not available – for instance, to augment a limited machine learning dataset with additional examples.

What is synthetic data in testing? ›

Synthetic data is information that is artificially manufactured rather than generated by real-world events. Synthetic data is created algorithmically, and it is used as a stand-in for test datasets of production or operational data, to validate mathematical models and, increasingly, to train machine learning models.

When should you not use synthetic data? ›

This can lead to false insight generation and, consequentially, to erroneous decision-making. Moreover, synthetic data doesn't eliminate bias, one of the biggest problems with using data in general. Given that it is posed to reflect the qualities of real-world data, bias can easily creep in.

How do you create a pipeline in NLP? ›

How to build an NLP pipeline
  1. Step1: Sentence Segmentation. Sentence Segment is the first step for building the NLP pipeline. ...
  2. Step2: Word Tokenization. Word Tokenizer is used to break the sentence into separate words or tokens.
  3. Step3: Stemming. ...
  4. Step 4: Lemmatization. ...
  5. Step 5: Identifying Stop Words.

What is Jinja template in Airflow? ›

Templating in Airflow works the same as Jinja templating in Python. You enclose the code you want evaluated between double curly braces, and the expression is evaluated at runtime. This table lists some of the most commonly used Airflow variables that you can use in templates: Variable name. Description.

How do I set up an Airflow cluster? ›

To set up an airflow cluster, we need to install below components and services: Airflow Webserver: A web interface to query the metadata to monitor and execute DAGs. Airflow Scheduler: It checks the status of the DAG's and tasks in the metadata database, create new ones if necessary, and sends the tasks to the queues.

How do I create a parallel task in Airflow? ›

By default, Airflow uses SequentialExecutor which would execute task sequentially no matter what. So to allow Airflow to run tasks in Parallel you will need to create a database in Postges or MySQL and configure it in airflow. cfg ( sql_alchemy_conn param) and then change your executor to LocalExecutor in airflow.

What are the three steps to create a data pipeline? ›

Data pipelines consist of three essential elements: a source or sources, processing steps, and a destination.
  1. Sources. Sources are where data comes from. ...
  2. Processing steps. ...
  3. Destination.

Which algorithm is best for NLP? ›

The most popular supervised NLP machine learning algorithms are:
  • Support Vector Machines.
  • Bayesian Networks.
  • Maximum Entropy.
  • Conditional Random Field.
  • Neural Networks/Deep Learning.

What is the difference between Jinja and Jinja2? ›

from_string . Jinja 2 provides a Template class that can be used to do the same, but with optional additional configuration. Jinja 1 performed automatic conversion of bytes in a given encoding into unicode objects.

Why is it called Jinja2? ›

Why is it called Jinja? ¶ The name Jinja was chosen because it's the name of a Japanese temple and temple and template share a similar pronunciation. It is not named after the city in Uganda.

Which executor is best for Airflow? ›

Airflow comes configured with the SequentialExecutor by default, which is a local executor, and the safest option for execution, but we strongly recommend you change this to LocalExecutor for small, single-machine installations, or one of the remote executors for a multi-machine/cloud installation.

How do I create a dynamic DAG in Airflow? ›

In Airflow, DAGs are defined as Python code. Airflow executes all Python code in the dags_folder and loads any DAG objects that appear in globals() . The simplest way to create a DAG is to write it as a static Python file.

How many tasks can you execute in parallel in Airflow? ›

Apache Airflow's capability to run parallel tasks, ensured by using Kubernetes and CeleryExecutor, allows you to save a lot of time. You can use it to execute even 1000 parallel tasks in only 5 minutes.


1. Generate Synthetic Data Locally from the CLI - Part 3 |
(Gretel - Better data. Faster.)
2. Use AI to Create Synthetic Data from a DataFrame or CSV
(Gretel - Better data. Faster.)
3. Use AI to Create Synthetic Data from a DataFrame or CSV
(Gretel - Better data. Faster.)
4. Generate Synthetic Data Locally from the SDK - Part 4 |
(Gretel - Better data. Faster.)
5. | Making Data Work
6. How to Set Up a VM for Deep Learning - Part 1 |
(Gretel - Better data. Faster.)

Top Articles

Latest Posts

Article information

Author: Arielle Torp

Last Updated: 12/29/2022

Views: 5784

Rating: 4 / 5 (61 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Arielle Torp

Birthday: 1997-09-20

Address: 87313 Erdman Vista, North Dustinborough, WA 37563

Phone: +97216742823598

Job: Central Technology Officer

Hobby: Taekwondo, Macrame, Foreign language learning, Kite flying, Cooking, Skiing, Computer programming

Introduction: My name is Arielle Torp, I am a comfortable, kind, zealous, lovely, jolly, colorful, adventurous person who loves writing and wants to share my knowledge and understanding with you.