# Web Data Extraction

## Overview

<figure><img src="/files/54JYM5kgP5MJXeywQcrl" alt=""><figcaption><p>Get URL Block</p></figcaption></figure>

Web Data Extraction is a powerful block that allows you to extract metadata and full-page content from web pages. It is useful for a variety of applications, including web scraping, data mining, and content analysis.&#x20;

It is highly customizable, allowing you to specify the types of data you want to extract and the format in which you want to receive it.&#x20;

Whether you are a researcher, marketer, or data analyst, Web Data Extraction can help you extract valuable insights from the web.

Works best with web search components and knowledge extraction.

## How to Setup

1. Provide an input - URL data point. You can get it from another action block or manual input block.
2. Select the output data points you want to extract.

{% hint style="warning" %}
Web content that has **more than 4,000 characters** (around 1000 English words) can't be processed live during the workflow.
{% endhint %}

{% hint style="info" %}
To make using longer web page content possible, collect it to the "Documents" section first, and then perform "Internal Search" across it from the workflow.
{% endhint %}

## Inputs and Outputs

<table><thead><tr><th width="142">Input</th><th width="229.33333333333331">Output</th><th>Output Description</th></tr></thead><tbody><tr><td>Target link (URL)</td><td>Meta Title (Text)</td><td>Title of the web page</td></tr><tr><td></td><td>Meta Description (Text)</td><td>Description of the page</td></tr><tr><td></td><td>Meta Image (Image)</td><td>Social media image of the page</td></tr><tr><td></td><td>Full-page text  (Text)</td><td>Texts extracted from the full-page HTML, structured by headlines or paragraphs</td></tr><tr><td></td><td>Full page HTML (Text)</td><td>HTML of the target web page</td></tr><tr><td></td><td>Links to media on the page (URL) - Soon</td><td>Links to all the images or videos from the page</td></tr><tr><td></td><td>Links to other pages/websites on the page (URL) - Soon</td><td>Links to all other web pages from the target page</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.drafter.ai/building-blocks/actions/web-data-extraction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
