Part Zero: Tips of Starting

If you have an invite code, go to www.sheet0.com to log in using your code. If you don’t have a code, join our Discord server or fill out this form to request access.

Part One: Submitting Your First Task

Whenever you want to scrape data from the internet, simply enter your task description in the input box on Sheet0 and click the “Run” button. For example:
Extract the companies in the recent 3 batches on https://www.ycombinator.com/companies, and collect information about each company’s founders.
Sheet0 will automatically break down the task, generate a data collection workflow, and start executing it. You can view the progress and results in the chat window.

Advanced Tutorial: Describe Tasks More Effectively

To improve the accuracy and efficiency of your tasks, consider these tips:
  • Provide Specific URLs: If you want to scrape data from a specific website, provide the URL so Sheet0 can better understand your needs. Without a URL, Sheet0 will try to find relevant information online, which may lead to inaccurate results.
  • Break Down Complex Tasks: If your task involves multiple pages or steps, break it down into smaller subtasks and describe the dependencies between them. For example, with the YCombinator task, we have two steps: first, scrape the list of companies, then gather founder information from each company’s detail page. The second step depends on the results of the first step (i.e., the URLs of the company detail pages), so you can describe it as:
    1. Scrape the list of recent companies from https://www.ycombinator.com/companies and extract the detail page URL for each company.
    2. Gather founder information from each company’s detail page.
  • Avoid Using Non-Existent Filters: When describing tasks, ensure that the relevant filters exist on the source webpage and can be used. If filters are not available, you can scrape all data first and then perform data analysis via a chat message. For example, if you want to scrape @elonmusk’s tweets from the past week but Twitter doesn’t provide such a filter, you can first scrape all tweets and then ask Sheet0 to analyze and filter the data:
    Task description: Scrape all tweets from @elonmusk After task completion: Help me filter @elonmusk’s tweets from the past week

Example of a Complex Task

In the following example, we’ll scrape a list of companies from YCombinator with “serial entrepreneur” founders. Sounds complex? Don’t worry, with a little patience, Sheet0 will help you get it done!

Task Breakdown and Description

First, we need to define what a “serial entrepreneur” is. In this task, we’ll define it as a founder who has started two or more companies. We can obtain this data from the founder’s LinkedIn profile. Therefore, our task can be broken down into the following steps:
  1. Scrape the list of recent companies from https://www.ycombinator.com/companies and extract the detail page URL for each company.
  2. Find the founder’s name and LinkedIn profile link on each company’s detail page. Since LinkedIn profiles may have omitted work experiences, we need to construct a work experience page URL to get complete information. The LinkedIn work experience page URL format is {linkedin_profile_url}/details/experience/.
  3. Extract all experiences from the work experience page, including company name, position, description, start date, end date, etc.
Please note that since LinkedIn work experience pages do not have position-related filters, we need to scrape all experiences first and then perform data analysis.

Data Analysis and Filtering

After scraping all experiences, we need to analyze and filter the data to identify “serial entrepreneurs.” At this point, we can tell Sheet0 to:
List the founders who are serial entrepreneurs. A serial entrepreneur is someone who is titled as a founder or co-founder (case insensitive) of multiple companies. Output the founder’s name, their LinkedIn profile URL, and the number of companies they founded.

Part Two: Task Execution and Monitoring

During task execution, you can view the progress and results in the chat window on Sheet0. Sheet0 will automatically generate a data collection workflow and seek your confirmation at the following critical steps:

Critical Step: Task Breakdown

When executing a task, Sheet0 will automatically break it down into multiple subtasks and ask for your confirmation before proceeding. You can see the broken-down task description in the “Goal Breakdown” message card. If you’re not satisfied with the description, you can edit it and click the “Approve” button to submit the revised task description.

Critical Step: Scrape Limit

After analyzing the page content and task goals, Sheet0 will automatically suggest a scrape quantity limit for the current subtask. If you’re not satisfied with this limit, you can set it to “Unlimited” or enter a new limit value directly in the message card. When there are multiple subtasks, each may have a different scrape quantity limit, and Sheet0 will list the suggested limits for each subtask in their respective message cards. You can choose to accept or modify these limits. Note: When there is a lot of page content, setting the limit to “Unlimited” may result in prolonged task execution. For example, when scraping tweets on Twitter, setting it to “Unlimited” may cause the task to run indefinitely. In such cases, it’s recommended to set a reasonable limit, such as 100 tweets.

Critical Step: Column Generation

Before scraping, Sheet0 will automatically generate table columns based on your task description and page content. You can see the highlighted columns on the right side of the chat window and click “Column Menu” -> “Edit” to modify the column definitions, including column names, data types, and prompts for scraping. The prompt is the hint Sheet0 will use when scraping data. You can modify the prompt as needed to better capture the required data.

Part Three: User Takeover

When Sheet0 encounters situations it cannot handle, such as captchas or logins, it will automatically transfer the task to you for takeover. You can click the “Takeover” button to remotely operate the browser, and after completing the operation, click the “Return Control to Sheet0” button to hand control back to Sheet0. Since Sheet0 is still in beta, your operations may experience high latency. We will optimize this experience in future versions.

Part Four: Data Scraping

After the above steps, Sheet0 will begin scraping data. You can see the data being updated in real-time in the table.

Part Five: Data Analysis

Once data scraping is complete, you can perform data analysis and filtering through natural language with Sheet0. Sheet0 will automatically generate SQL queries based on your needs and execute them in the background. This process may take some time, so please be patient.

Part Six: Share Your Experience

After completing a task, you can share your task replay link by clicking the “Share” button in the top right corner. This link contains all your operation records during the task execution process, and you can share it with others to let them understand your task execution process and results.