Documentation

Website Data Source

Documentation

Accessing EveryAnswer

Features and Functionalities

Security and Privacy

Troubleshooting and Support

Introduction

Enhancing your Expert's knowledge using public website data on EveryAnswer is a powerful way to ensure comprehensive and up-to-date content. This feature enables you to extract and integrate information from publicly accessible websites, making your Expert more knowledgeable and versatile. This guide provides detailed instructions on how to add public website data to your Expert.

Accessing the Feature

  1. In the left navigation, under Settings, click Experts.
  2. From the list of Experts, click the pencil icon on the Expert you'd like to add data to.
  3. Click on the Data tab.
  4. Click on the Web Data sub-tab.
  5. Click on the Add Website button.

Process to Add Public Website Data

  1. In the popup, enter a URL in the Website URL field.
  2. Optionally, under Excluded Paths, enter paths to exclude (e.g., /path or https://www.example.com/path).
  3. Click the Scan Now button.
  4. Review the automatically detected pages under Discovered Pages. Uncheck pages you don't want to include.
  5. Optionally go back and add more Excluded Paths.
  6. Once satisfied, click Add to Expert.
  7. EveryAnswer will process the pages, converting each into an AI-optimized format and training your Expert.
  8. Once complete, your Expert will know that information until you remove it.

Specific URLs and Sitemaps

When entering a non-root URL (e.g., www.domain.com/blog), EveryAnswer will attempt to determine whether there are other pages under it or if it's a stand-alone page. This adjustment helps optimize the import process automatically.

EveryAnswer uses sitemaps whenever available to speed up the process of page discovery. If a sitemap is found, it's used to create the list of pages. If not, EveryAnswer will attempt to crawl the site. Sitemaps allow for importing an unlimited number of pages (limited only by plan storage), whereas without a sitemap, EveryAnswer can crawl up to 300 pages.

Page Updates and Rescanning

Keeping your Expert's knowledge current is critical. EveryAnswer offers features to detect new pages and rescan existing pages for updates. These capabilities vary based on your subscription plan:

Detecting New Pages

  • EveryAnswer can periodically check for new pages on the websites you've already imported. This helps keep your Expert up-to-date with the latest information.

Rescanning for Updated Content

  • Existing pages can be rescanned to detect and import any changes or updates, ensuring your Expert's knowledge remains current.

Plan Limitations

  • The Starter (free) plan does not include automatic detection of new pages or rescanning for updates.
  • Higher-tier plans offer these features, with the frequency of checks determined by the specific plan level. More advanced plans typically offer more frequent checks for new pages and content updates.

Manual Updates

  • Regardless of the plan, you can manually trigger a rescan of a website to update your Expert's knowledge. Note that there is a 24-hour cooldown period between manual rescans of the same page to prevent excessive server load.

Limitations and Considerations

  • EveryAnswer cannot extract from private or password-protected websites.
  • The number of websites you can extract from depends on your subscription plan's character limit.
  • Ensure compliance with legal and ethical standards when importing website data.
  • Advanced techniques and a distributed server network reduce the chance of being blocked or facing rate limits.
  • EveryAnswer does not honor URL arguments (components after '#', '?', or '&') when importing pages to prevent duplicate content.

For further information on managing your Experts or other features, refer to the relevant sections in your documentation, such as General Knowledge Data Source, Forms, and Lead Capture.

Related Documentation

Frequently Asked Questions

How does EveryAnswer handle page discovery for websites?
EveryAnswer handles page discovery for websites through a combination of automated crawling and sitemap detection. When you add a website, EveryAnswer will first look for a sitemap associated with the URL provided. If a sitemap is found, EveryAnswer uses it to generate a comprehensive list of pages to import, which allows for a faster and more complete extraction of content. This process helps in importing an unlimited number of pages, limited only by your plan's storage constraints. If no sitemap is available, EveryAnswer will perform an automated crawl of the website to discover pages. During this crawl, EveryAnswer can detect and list up to a maximum of 300 pages. This ensures that even without a sitemap, a significant amount of content can still be imported. The platform's intelligent scanning engine leverages a global network of proxies and advanced crawling techniques to effectively discover and import both static and dynamic web content, ensuring a robust and comprehensive knowledge base for your Expert.
What happens if I enter a non-root URL for import?
If you enter a non-root URL for import, EveryAnswer will attempt to determine whether there are other pages under it or if it is a stand-alone page. For example, if you enter a URL like "www.domain.com/blog," EveryAnswer will check if there are additional pages linked under this specific path. The import process will adjust accordingly, treating it either as a part of a larger structure with multiple pages or as a single page. This flexibility ensures that even specific sections of a website or individual pages can be accurately imported and incorporated into your Expert's knowledge base. This approach helps to capture all relevant content, whether from a broad site scope or a focused segment.
Can I import content from websites in multiple languages?
Yes, EveryAnswer can import webpages written in different languages and understand the information, regardless of the source language. This allows your Expert to use the imported content to answer questions in the user's native language, making it a versatile tool for multilingual support. The platform's intelligent scanning engine can handle various languages, ensuring a comprehensive knowledge base that caters to a diverse audience. However, please note that while EveryAnswer can ingest content and respond in any language, the citations will link to the original source URL, which may be in a different language than the user of the system.
How does EveryAnswer prevent duplicate content during the import process?
EveryAnswer prevents duplicate content during the import process by ignoring URL arguments such as components after '?', '&', or '#'. This approach ensures that only unique content is imported into your Expert's knowledge base, avoiding redundancy. By treating pages with identical content but different URL arguments as a single entry, EveryAnswer maintains a clean and efficient knowledge base.
How does EveryAnswer handle pages with JavaScript content?
EveryAnswer handles pages with JavaScript content by waiting for the DOM (Document Object Model) to fully load before processing the page. This ensures that any dynamic content generated by JavaScript, such as interactive elements or asynchronously loaded data, is captured accurately. Additionally, EveryAnswer can see and process content within hidden divs, such as those used in tabbed interfaces. This means that even if content is not immediately visible on the page (e.g., content hidden under a tab), it will still be included in the imported data, ensuring a comprehensive capture of all relevant information on the page.
Can EveryAnswer import data from password-protected websites?
No, currently EveryAnswer cannot extract data from private or password-protected websites.
Can EveryAnswer automatically detect new pages on imported websites?
Yes, EveryAnswer can automatically detect new pages on imported websites, but this capability varies based on your subscription plan. Higher-tier plans offer features to periodically check for new pages on websites you've already imported, helping your Expert stay up-to-date with the latest information. The frequency of these checks depends on your specific plan level. The Starter (free) plan does not include automatic detection of new pages. Regardless of your plan, you can manually trigger a rescan of a website to update your Expert's knowledge, but there is a 24-hour cooldown period between manual rescans of the same page. Please refer to your specific plan details or contact EveryAnswer support for more information about the update frequencies available on your current subscription.
Does EveryAnswer offer the ability to rescan existing pages for updates?
Yes, existing pages can be rescanned and removed individually. This feature allows you to manage your Expert's knowledge base with precision, ensuring that specific pages are updated or removed as necessary. You can manually trigger a rescan for any individual page to ensure it reflects the most current content from the source website, or remove a page entirely if the information is no longer relevant. There is a 24-hour cooldown period between manual rescans or removals of the same page to prevent excessive server load and ensure fair usage across all users. Automatic rescanning and removal capabilities, along with the frequency of these actions, vary based on your subscription plan, with higher-tier plans offering more frequent and automated updates and removals. For more detailed information about these features available on your current subscription, please refer to your plan details or contact EveryAnswer support.
Are there any limitations on the automatic detection of new pages and rescanning?
Yes, there are limitations on the automatic detection of new pages and rescanning with EveryAnswer. The availability and frequency of these features depend on your subscription plan. The Starter (free) plan does not include automatic detection of new pages or automatic rescanning for updates. Higher-tier plans offer these capabilities, with the frequency of checks determined by the specific plan level. Additionally, there is a 24-hour cooldown period between manual rescans of the same page to prevent excessive server load and ensure fair usage across all users. EveryAnswer also cannot extract data from private or password-protected websites, and the number of websites you can import from depends on your subscription plan's character limit. For more detailed information about the limitations and features available on your current subscription, please refer to your plan details or contact EveryAnswer support.
Can I manually trigger a website rescan to update my Expert's knowledge?
Yes, you can manually trigger a website rescan to update your Expert's knowledge. This allows you to ensure that your Expert has the most current information from the source website. To do this, simply initiate a manual rescan for the desired website or individual pages. Keep in mind that there is a 24-hour cooldown period between manual rescans of the same page to prevent excessive server load and ensure fair usage across all users. This feature is available regardless of your subscription plan, although automatic rescanning capabilities and frequencies vary based on the plan level. For more detailed information about rescanning features available on your current subscription, please refer to your plan details or contact EveryAnswer support.
What are the limitations on the number of websites I can import data from?
The number of websites you can import data from using EveryAnswer depends on the character limit specified by your subscription plan. Each plan has a different character limit that determines the volume of data you can import and process. Additionally, while EveryAnswer can extract data from any publicly accessible webpage, it cannot extract data from private or password-protected websites. Without a sitemap, EveryAnswer will crawl up to 300 pages, but if a sitemap is available, there is no limit to the number of pages that can be imported, constrained only by the plan's storage limits. For detailed information about the specific limitations of your plan, please refer to your plan details or contact EveryAnswer support.
How does EveryAnswer handle sitemaps during the import process?
During the import process, EveryAnswer uses sitemaps to efficiently discover and import pages from a website. When you provide a URL, EveryAnswer will automatically search for a sitemap associated with that website. If a sitemap is found, EveryAnswer uses it to generate a comprehensive list of pages to import, which allows for a faster and more complete extraction of content. This process helps in importing an unlimited number of pages, limited only by your plan's storage constraints. If no sitemap is available, EveryAnswer will perform a crawl of the website, up to a maximum of 300 pages. This ensures that even without a sitemap, a significant amount of content can still be imported. Sitemaps are particularly useful for large websites with many pages, as they provide a structured and detailed map of the site's content, enabling EveryAnswer to efficiently import data and keep your Expert's knowledge base up-to-date. For more information on how sitemaps are utilized based on your subscription plan, please refer to your plan details or contact EveryAnswer support.
Do I need permission to import data from publicly accessible websites?
Yes, you need to ensure that you have permission to import data from publicly accessible websites. While EveryAnswer can extract data from any publicly available webpage, it is important to comply with legal and ethical standards when importing website data. This means you should verify that you are allowed to use the content according to the website's terms of service and copyright laws. Importing data without proper authorization could result in legal issues or breaches of terms of service agreements. Always ensure you have the right to use the content you are importing to maintain compliance and avoid potential problems.
What if a website doesn't have a sitemap?
If a website doesn't have a sitemap, EveryAnswer will perform a crawl of the website to discover and import pages. In this case, EveryAnswer can crawl up to a maximum of 300 pages. This ensures that even without a sitemap, a significant amount of content can still be imported to enhance your Expert's knowledge base. However, using a sitemap, if available, is always preferable as it allows for a more comprehensive and efficient import process, enabling EveryAnswer to discover and import an unlimited number of pages, limited only by your plan's storage constraints. For more detailed information about how the crawling process works without a sitemap and the limitations based on your subscription plan, please refer to your plan details or contact EveryAnswer support.
How does EveryAnswer handle internal anchor links within webpages?
EveryAnswer processes internal anchor links within webpages by treating them as part of the page content. Internal anchors, indicated by a '#' followed by a specific section identifier, do not affect the import process. These links are used for navigation within the same page and do not create separate entries in the Expert's knowledge base. Consequently, the entire page content, including sections referenced by internal anchors, is imported as a single entity, ensuring a comprehensive capture of the webpage's information.
How long must I wait before I can rescan a webpage?
You must wait 24 hours before you can rescan the same webpage on EveryAnswer. This 24-hour cooldown period helps prevent excessive server load and ensures fair usage across all users. This means that after manually triggering a rescan for a specific page, you will need to wait 24 hours before you can initiate another rescan for that same page. This cooldown period applies to all users, regardless of their subscription plan. For more detailed information about rescanning policies and features available on your current subscription, please refer to your plan details or contact EveryAnswer support.
How does EveryAnswer handle static and dynamic web content?
EveryAnswer is capable of capturing both static and dynamic web content to provide a comprehensive knowledge base for your Expert. Static web content, which includes fixed information on webpages such as text, images, and links, is easily extracted and converted into an AI-ready format. Dynamic web content, which may include elements that change or load asynchronously (such as JavaScript-generated content, interactive elements, and real-time updates), is also processed by EveryAnswer. The platform's intelligent scanning engine can handle the complexities of dynamic content to ensure that your Expert has access to up-to-date and accurate information from a variety of online sources. However, it is important to note that EveryAnswer cannot process dynamic pages that rely on URL arguments (such as parameters after '?', '&', or '#') instead of having individual pages. This means that content differentiated only by URL arguments will not be individually recognized and imported, which helps to prevent duplicate content but may limit the import of some dynamically generated pages. For a robust and versatile knowledge base, ensuring that important dynamic content is accessible through unique URLs is recommended.
Can I adjust the paths to exclude after the initial scan?
Yes, you can adjust the paths to exclude even after the initial scan. During the website import process, after reviewing the automatically detected pages under "Discovered Pages," you have the option to go back and add or modify the Excluded Paths. This allows you to refine the content that you want to exclude from your Expert's knowledge base during the detection process. However, exclusions are only used during page detection and cannot be used to remove pages once they are added to the Expert. Once webpages are added to the Expert, you can either remove each page individually or delete the entire site altogether. This ensures you have precise control over the content within your Expert, allowing you to maintain only the most relevant and up-to-date information.
How does EveryAnswer handle URL arguments during the import process?
EveryAnswer does not honor URL arguments when importing pages. This means that components after characters like '?', '&', or '#' are ignored during the import process. This approach helps prevent duplicate content, as URL arguments often result in pages with identical content. As a result, pages are imported without these URL arguments to ensure unique content in your Expert's knowledge base. If your content relies heavily on URL arguments for differentiation, it's advisable to ensure that important dynamic content is accessible through unique URLs.
What should I do if I want to exclude certain pages from being imported?
If you want to exclude certain pages from being imported, you can specify paths to exclude during the website import process. When you add a website, you can enter URLs or specific paths in the "Excluded Paths" field. This will prevent EveryAnswer from importing content from these paths. You can refine these exclusions after the initial scan by reviewing the automatically detected pages under "Discovered Pages" and going back to add or modify the Excluded Paths as needed. Remember, exclusions are only used during page detection and cannot be used to remove pages once they are added to the Expert. If you need to remove pages after they have been added, you will need to remove each page individually or delete the entire site from your Expert.
How does EveryAnswer process the pages once they are added to an Expert?
Once pages are added to an Expert, EveryAnswer processes them by converting each page into an AI-optimized format and training your Expert with this information. This involves extracting the content from the pages, including text, images, and other relevant data, and transforming it into a format that the AI can effectively use to answer questions and provide information. The Expert will then incorporate this knowledge into its responses, allowing it to utilize the newly added data. This processed information remains in the Expert's knowledge base until you decide to remove it. If you remove a webpage or an entire website, the corresponding information will be deleted from the Expert, and the Expert will no longer know that information.
Last Updated:
October 8, 2024