Saturday, September 20, 2008

Architecture of search engines:

Spider - a browser-like program that downloads web pages.

Crawler – a program that automatically follows all of the links on each web page.
Indexer - a program that analyzes web pages downloaded by the spider and the crawler.

Database– storage for downloaded and processed pages.
Results engine – extracts search results from the database.

Web server – a server that is responsible for interaction between the user and other search engine components.

How do Search Engines Work?

All search engines consist of three main parts:

  • The Spider (or worm)
  • The Index
  • The Search Algorithm.

The spider (or worm), continuously ‘crawls’ web space, following links that leads either to different website or with in the limits of the website. The spider ‘reads’ all pages content and passes the data to the index.

The index is the next step of search engine after crawling. Index is a storage area for spidered web pages and is of a huge magnitude. Google index is said to consist of more than three billion pages.

For example Google’s index, is said to consist of more than three billion pages.

Search algorithm is more sophisticated and third step of a search engine system. Search algorithm is very complicated mechanism that sorts an immense database within a few seconds and produces the results list. The most relevant the search engine sees the webpage the nearer the top of the list. So site owners or webmasters should therefore see site’s relevancy to keywords.

Algorithm is unique for each and every search engine, and is a trade secret, kept hidden from the public.

Most of the modern web search combines the two systems to produce their results

Wednesday, August 6, 2008

Hit
Hit is a somewhat misleading measure of traffic to a web site. One hit is recorded for each file request in a web server’s access log. If a user visits a page with four images, one hit will be recorded for each graphic image file plus another for the page’s HTML file. A better measure of traffic volume is the number of pages/HTML files accessed.

HTML
The acronym HTML stands for HyperText Markup Language, the authoring language used to create pages on the World Wide Web. HTML is a set of codes or HTML tags that provide a web browser with directions on how to structure a web page’s information and features.

Hyperlink
Also known as link or HTML link, a hyperlink is an image or portion of text that when clicked on by a user opens another web page or jumps the browser to a different portion of the current page. Inbound Links with keyword-relevant Link Text are an important part of Search Engine Optimization Strategy.

Index
An index is a Search Engine’s database. It contains all of the information that a Crawler has identified, particularly copies of World Wide Web pages. When a user performs a Query, the search engine uses its indexed pages and Algorithm set to provide a ranked list of the most relevant pages. In the case of a Directory, the index consists of titles and summaries of registered sites that have been categorized by the directory’s editors.

Inbound Links
Also known as back link, backward link, or backlinks, inbound links are all of the links on other websites that direct the users who click on them to your site. Inbound links can significantly improve your site’s search rankings, particularly if they contain Anchor Text keywords relevant to your site and are located on sites with high Page Rank.

Monday, August 4, 2008

Google AdSense
Google AdSense is an ad-serving program operated by Google that provides relevant text, image, and video-based advertisements to enrolled site owners. Advertisers register via Google AdWords and pay for ads on a Pay-Per-Click, Cost-Per-Thousand or Cost-Per-Action basis. This revenue is shared with Google AdSense host sites, typically on a PPC basis (which sometimes leads to Click Fraud). Google uses its search Algorithms and Contextual Link Inventory to display the most appropriate ads based on site content, Query relevancy, ad “quality scores,” and other factors.

Google AdWords
Google AdWords is the Keyword Submission program that determines the advertising rates and keywords used in the Google AdSense program. Advertisers bid on the keywords that are relevant to their businesses. Ranked ads then appear as sponsored links on Google Search Engine Results Pages (SERPS) and Google AdSense host sites.

Graphical Search Inventory (GSI)
Graphical Search Inventory is the visual equivalent of Contextual Link Inventory. GSI is non-text-based advertising such as Banner Ads, pop-up ads, browser toolbars, animation, sound, video and other media that is synchronized to relevant Keyword queries.

Gray Hat SEO
Gray hat SEO refers to Search Engine Optimization strategies that fall in between Black Hat SEO and White Hat SEO. Gray hat SEO techniques can be legitimate in some cases and illegitimate in others. Such techniques include Doorway Pages, Gateway Pages, Cloaking and duplicate content.

Hidden Text

Hidden text is a generally obsolete form of Black Hat SEO in which pages are filled with a large amount of text that is the same color as the background, rendering keywords invisible to the human eye but detectable to a search engine Crawler. Multiple Title Tags or HTML comments are alternative hidden text techniques. Hidden text is easily detectable by search engines and will result in Blacklisting or reduced Rank.

Thursday, July 24, 2008

What type of content are you adding to your Web site?

Your Web content needs to be good to high quality content that is
well-written and engaging. We talk a lot about the richness and
value of you providing useful content that hold high-interest for
your audience of visitors.

Tip: Remember the power of using nostalgia where appropriate.

By using nostalgia, you can often connect with your readers to
activate their positive memories and past experiences. After all,
regardless of what your online business is about, the Web is really
all about connecting with others.

Wednesday, July 23, 2008

Search Engine Optimization And Sitemaps Effects

Search Engine Optimization is the process which improves the amount of traffic that a website receives naturally from the search engines. A website gets traffic from search engines when it ranks high for its targeted keywords. A ranking in the search result is not permanent as search engines frequently change their algorithms in order to provide the best search results and so, an individual needs to work consistently on his site so as to maintain the rankings and also, to improve the rankings.

However, it can take some good amount of time to see the desired results as there are already a number of websites on Internet and new ones are being launched at regular intervals. So, you need to work consistently without getting deviated from your target as you're competing against a large number of websites.

On-page optimization and off-page optimization are two forms of Search Engine Optimization and both of them are to be considered while optimizing a website. In on-page optimization, you've the control over the page and you modify the internal aspects of the page so as to optimize it. In off-page optimization, you don't have the control over the page that is linking to your website.

There are a number of factors that are to be considered while optimizing a website so as to improve its ranking. Title, keyword density, unique content, interlinking, anchor text, backlinks, sitemap are some of the key factors that are to be considered while optimizing a website. Each factor has its own importance and it needs to be properly used in order to rank high in the search results.

What's a Sitemap and what are its Benefits: Using a sitemap is one of the tricks that are usually underestimated while optimizing a website. If you're wondering what's sitemap then a sitemap is the map of the site as it's a page which displays the different sections of the website, articles and how the different sections are linked together.

A sitemap is very important as it is used to communicate with search engines. However, XML sitemap is used for search engines whereas HTML sitemap is used for human beings. Sitemaps inform search engines about changes to your site and this helps in faster indexing of the changes when compared to the site without a sitemap. In addition to faster indexing, sitemaps also helps an individual to fix the broken internal links. A website without a sitemap can also achieve high rankings as it's is not a strict requirement for achieving high rankings. Although having a regularly updated sitemap helps in improving the rankings at a better rate when compared to a site without one.

Now, if you're wondering how a sitemap is created and where it is placed then you can use sitemap generator tools to generate a sitemap for your website. Once you've the sitemap ready with you, you need to upload it to the server. Before uploading the sitemap, make sure your sitemap is absolutely perfect as an improper sitemap can cause de-indexing of the website. You can use online tools to check whether the sitemap is properly created or not.

Also, adding a link to the sitemap from the website's page helps in improving the rate at which the sitemap is crawled by the search engine spiders. You should also add the sitemap to your Google Webmaster account as this decreases your reliance on external links for improving the indexing rate. So, a sitemap is a very important aspect that should be considered appropriately while optimizing a website.

Saturday, July 19, 2008

Dynamic Content
Dynamic content is web content such as Search Engine Results Pages (SERPS) that are generated or changed based on database information or user activity. Web pages that remain the same for all visitors in every context contain “static content.” Many e-commerce sites create dynamic content based on purchase history and other factors. Search engines have a difficult time indexing dynamic content if the page includes a session ID number, and will typically ignore URLs that contain the variable “?”.Search engines will punish sites that use deceptive or invasive means to create dynamic content.

Flash Optimization
Flash is a vector graphics-based animation program developed by Macromedia. Most corporate sites feature Flash movies/animation, yet because search engine Crawlers were designed to index HTML text, sites that favor Flash over text are difficult or even impossible for crawlers to read. Flash Optimization is the process of reworking the Flash movie and surrounding HTML code to be more “crawlable” for Search Engines.

Gateway Page
Also known as a doorway page or jump page, a gateway page is a URL with minimal content designed to rank highly for a specific keyword and redirect visitors to a homepage or designated Landing Page. Some search engines frown on gateway pages as a softer form of Cloaking or Spam. However, gateway pages may be legitimate landing pages designed to measure the success of a promotional campaign, and they are commonly allowed in Paid Listings.

Geographical Targeting
Geographical targeting is the focusing of Search Engine Marketing on states, counties, cities and neighborhoods that are important to a company’s business. One basic aspect of geographical targeting is adding the names of relevant cities or streets to a site’s keywords, i.e. Hyde Street Chicago

Geographic Segmentation
Geographic segmentation is the use of Analytics to categorize a site’s web traffic by the physical locations from which it originated.

Wednesday, July 9, 2008

Crawler
Also known as Spider or Robot, a crawler is a search engine program that “crawls” the web, collecting data, following links, making copies of new and updated sites, and storing URLs in the search engine’s Index. This allows search engines to provide faster and more up-to-date listings.

Delisted
Also known as banned or blacklisted, a delisted site is a URL that has been removed from a search engine’s Index, typically for engaging in Black Hat SEO. Delisted sites are ignored by search engines.

Description Tag
Also known as a meta description tag, a description tag is a short HTML paragraph that provides search engines with a description of a page’s content for search engine Index purposes. The description tag is not displayed on the website itself, and may or may not be displayed in the search engine’s listing for that site. Search engines are now giving less importance to description tags in lieu of actual page content.

Directory
A directory is an Index of websites compiled by people rather than a Crawler. Directories can be general or divided into specific categories and subcategories. A directory’s servers provide relevant lists of registered sites in response to user queries. Directory Registration is thus an important method for building inbound links and improving SEO performance. However, the decision to include a site and its directory rank or categorization is determined by directory editors rather than an Algorithm. Some directories accept free submissions while others require payment for listing. The most popular directories include Yahoo!, The Open Directory Project, and LookSmart.

Doorway Page
Also known as a gateway page or jump page, a doorway page is a URL with minimal content designed to rank highly for a specific keyword and redirect visitors to a homepage or designated Landing Page. Some search engines frown on doorway pages as a softer form of Cloaking or Spam. However, doorway pages may be legitimate landing pages designed to measure the success of a promotional campaign, and they are commonly allowed in Paid Listings.

Friday, July 4, 2008

Cost-Per-Acquisition (CPA)
Cost-per-acquisition (CPA) is a return on investment model in which return is measured by dividing total click/marketing costs by the number of Conversions achieved. Total acquisition costs ÷ number of conversions = CPA. CPA is also used as a synonym for Cost-Per-Action.

Cost-Per-Action (CPA)
In a cost-per-action advertising revenue system, advertisers are charged a Conversion-based fee, i.e. each time a user buys a product, opens an account, or requests a free trial. CPA is also known as cost-per-acquisition, though the term cost-per-acquisition can be confusing because it also refers to a return on investment model.

Cost-Per-Click (CPC)
Also known as pay-per-click or pay-for-performance, cost-per-click is an advertising revenue system used by search engines and ad networks in which advertising companies pay an agreed amount for each click of their ads. This Click-Through Rate-based payment structure is considered by some advertisers to be more cost-effective than the Cost-Per-Thousand payment structure, but it can at times lead to Click Fraud.

Cost-Per-Thousand (CPM)
Also known as cost-per-impression or CPM for cost-per-mille (mille is the Latin word for thousand), cost-per-thousand is an advertising revenue system used by search engines and ad networks in which advertising companies pay an agreed amount for every 1,000 users who see their ads, regardless of whether a click-through or conversion is achieved. CPM is typically used for Banner Ad sales, while Cost-Per-Click is typically used for text link advertising.