Anatomy of the Search Engine by CHAITYA SHAH

Question : Chaitya Shah, Aditya Pingale

chaitya_shah — 2018-08-27 11:11:55 UTC

Since the database consists of indexes to large number of content. There are changes in web pages on day to day or on hour basis. What are the criteria and factors considered to update the data which is already stored in the database and how it is done?

Drishti Shah, Nida Shah

2018-09-02 09:34:25 UTC

The pages can be crawled and indexed as and when you want them to. Google Index stores the data alphabetically so as soon as you type a keyword, the relevant webpages surface on the SERPs

Unnati Mistry

unnati_mistry — 2018-09-06 17:47:30 UTC

For the sites which are already been indexed, search bots/crawlers wait for the update signals from them (ie. Updated contents links). Which means if you have created or updated new content and it is connected to your already indexed page, then it will be signal to bots to index those updated content. Updates can also be notified to bots using Sitemaps and Robot.txt files.

Roll no.: 1624017

Bhakti Kantariya

bhakti_kantariya — 2018-09-07 09:17:28 UTC

Google indexes website updates based on factors such as popularity of a site, whether the content is crawl-able and the site structure. If you have made any changes to your URL, you can ask Google to re-crawl it. But before that make sure that there are no rendering errors. You can also assist Google in finding your updated content using Sitemaps. The content discovered by bots is then sent back to Google servers, where it is added to database.

Rollno: 1624002

Rachana Gandhi

rachana_gandhi — 2018-09-07 19:04:02 UTC

Inorder to update your content, you will want the spider to re-crawl you website.This can be done in 2 ways:
First one being submitting a sitemap and the second one is using the Fetch as Google option in Webmaster Tools
Though Google "favors" some websites so you might see websites of dogs and cats getting updated faster as they have more views /popularity/content frequency than what you have. It takes more time to remove the data already present then adding new data. It often takes more than a month for content that has gone offline to be removed from Google's index.
So the major factors for updation can be: site structure,content frequency,popularity,the quality of your website and also the data to be updated.
Roll No: 1624001

Shreya Parikh

shreya_parikh — 2018-09-09 12:17:02 UTC

It is obvious that the content of an active website will change periodically. So it is really important to have control over crawling and indexing.
Factors to control crawling and indexing are :
1) Avoiding duplicate content
2) Consolidate relevancy and authority signals.
3) Quality of website
4) Popularity of website
The attributes we can use to handle these factors are :
1) Pagination attributes
2) Mobile attribute
3) Robots.txt
4) Hreflang attribute

Roll No : 1514099

Nida Shah

nida_shah — 2018-09-09 12:47:20 UTC

When updating content of a web page one thing that should be kept in mind is that the content should be as original as possible . Also, if the page is popular and the URL is SEO friendly , there is a high probability that bots will be driven to the website. Crawlers visit the websites when updated and the hyperlinks are added to the URLs to be visited. This is known as the frontier. The links on the frontier are visited recursively according to the algorithm

Roll No: 1514112

Arvind Ganesh

arvindganesh_a — 2018-09-09 17:27:18 UTC

One can add details in the robots.txt and make sure the content is unique or the content has been give its due credit.
We can also tell the crawler to skip the old webpage or the url and focus on the new and upated webapge
Roll No: 1514126

Malvika Parulekar

malvika_p — 2018-09-10 03:24:16 UTC

For any changes that we make on our sites, having sitemaps helps in faster crawling and indexing by crawlers. Sitemaps are like a map which lists and maps our content to the bots. So, whenever we update our website, it helps in faster crawling to our site. Other options that help are asking your engine to fetch your site after you have updated it or enabling webmaster tools which allow some minimum updates to be fetched fasted (paid). Crawling is also majorly based on factors such as popularity, site meta, content appropriation, etc. So if you have relevant data, chances of your website being crawled faster are greater.
Roll No : 1624007

Mehul Monani

mehul_monani — 2018-09-10 07:26:19 UTC

So if you are making changes to your content more frequently than you can specify that in sitemap.xml or robots.txt file which indicates how frequently you want your website to be indexed by crawler .
Factors that are there in tag are
1. Changefreq
2. Priority
3. Lastmod

You can also create multiple XML sitemaps.

Rollno:- 1624006

Parth Thakker

parth_kt — 2018-09-10 07:50:53 UTC

In order to increase the performance of the website, it is advisable to update the website at regular intervals.
There are two options. The first one is using the Fetch as Google option in Webmaster Tools. Here are detailed instructions:

Go to: https://www.google.com/webmasters/tools/ and log in
If you haven't already, add and verify the site with the "Add a Site" button
Click on the site name for the one you want to manage
Click Crawl -> Fetch as Google
Optional: if you want to do a specific page only, type in the URL
Click Fetch
Click Submit to Index
Select either "URL" or "URL and its direct links"
Click OK and you're done.

Second option is through robots.txt or using sitemap.xml

Ankit Ramani

2018-09-10 08:02:39 UTC

When the content of a page changes,the sitemaps record that change.The fact that content is updated is also reported to the bot through the robots.txt file.Crawlers can access the sitemaps to go through this updated content.Crawling is also majorly based on factors such as popularity,site meta and content appropriation.

Ashwinikumar,

viral_vora — 2018-09-10 08:08:08 UTC

The two most important factors regarding updates to the database are frequency and automation.

Do you need data to be live and constantly in sync with your other systems, or would daily or even weekly updates to the database be sufficient? Consider that in order to automate the update process, you will typically need a consistent data source, i.e. the field types, and the files supplied each time must be the same. You should consider how often source data is likely to change, if you are ever going to import additional data and if so how your chosen software will deal with this.
Roll no:1514122

Viraj Shah

2018-09-10 09:34:13 UTC

The two most important factors regarding updates to the database are frequency and automation.

Consider that in order to automate the update process, you will typically need a consistent data source, i.e. the field types, and the files supplied each time must be the same. You should consider how often source data is likely to change, if you are ever going to import additional data and if so how your chosen software will deal with this.

These are the most important factors to be considered when updates are made to the already present large indexed databases of the search engines.
Roll Number : 1514114

Saurabh Ughade

2018-09-10 09:42:54 UTC

Eliminating unnecessary processing.Eliminating redundant processing.Using more efficient processing in exchange for less efficient processing

Roll No. 1514121

Aakash Zaveri

aakash_zaveri — 2018-09-17 09:04:14 UTC

Roll No:-1514125

The factors to be considered to update the data can be mentioned in the robots.txt file which tells the crawlers as to which pages to crawl and which not to, so the only pages where changes are made are to be crawled and the updations in thedatabase are made accordingly, saving enough time and memory .

Shreyash Sharma

shreyash_sharma — 2018-09-17 09:05:48 UTC

Initially, a web page can be given a “freshness” score based on its inception date, which decays over time. This freshness score may boost a piece of content for certain search queries, but degrades as the content becomes older. The factor of the contect or the data being fresh or old makes an important change in the score of the website. Tools such as google webmaster is used to change or update the data on the web. Also duplication of the data and redundancy in the data should be avoided

Roll No : 1514115

Viral Vora

viral_vora — 2018-09-17 09:06:34 UTC

Web pages are bound to updates and changes in content on almost a regular basis, so one obviously wants their website to be crawled whenever changes are made in the content. One way to achieve this is to add details in the robot.txt file. But one should make sure that the content updated is unique and fresh . Redudancy reduces the score of the webpage. One can also ask the crawler not to crawl the older content.
ROll No: 1514123