ChataTechnology: Informacion

Tips Para tu Web 5 Web Files That Will Improve Your Website The amount of code that developers encounter regularly is staggering. At any one time, a single site can make use of over five different web languages (i.e. MySQL, PHP, JavaScript, CSS, HTML). There are a number of lesser-known and underused ways to enhance your site with a few simple but powerful files. This article aims to highlight five of these unsung heroes that can assist your site. They’re pretty easy to use and understand, and thus, can be great additions to the websites you deploy or currently run. An Overview Which files are we going to be examining (and producing)? Deciding which files to cover was certainly not an easy task for me, and there are many other files (such as .htaccess which we won’t cover) you can implement that can provide your website a boost. The files I’ll talk about here were chosen for their usefulness as well as their ease of implementation. Maximum bang for our buck. We’re going to cover robots.txt, favicon.ico, sitemap.xml, dublin.rdf and opensearch.xml. Their purposes range from helping search engines index your site accurately, to acting as usability and interoperability aids. Let’s start with the most familiar one: robots.txt. Robots.txt The primary function of a robots.txt file is to declare which parts of your site should be off-limits for crawling. By definition, the use of this file acts as an opt-out process. If there are no robots.txt for a directory on your website, by default, it’s fair game for web robots such as search engine crawlers to access and index. While you can state exclusion commands within an HTML document through the use of a meta tag (), the benefits of controlling omitted pages through a single text file is the added ease of maintenance. Note: It’s worth mentioning that obeying the robots.txt file isn’t mandatory, so it’s not a good privacy mechanism. Robots.txtThis is how the robots.txt file interacts between a search engine and your website. Creating a Robots.txt File To create a robots.txt file the first and most obvious thing you will need is a text editor. It’s also worth pointing out that the file should be called robots.txt (or it won’t work) and it needs to exist within the root directory of your website because by default, that’s where web robots look for the file. The next thing we need to do is figure out a list of instructions for the search engine spiders to follow. In many ways, the robot.txt’s structure is similar to CSS in that it is comprised of attribute and value pairs that dictate rules. Another thing to note is that you can include comments inside your robots.txt file using the # (hash) character before it. This is handy for documenting your work. Here’s a basic example telling web robots not to crawl the /members/ and /private/ directory: User-agent: * Disallow: /members/ Disallow: /private/ The robots.txt exclusion standard only has two directives (there are also a few non-standard directives like Crawl-delay, which we’ll cover shortly). The first standard directive is User-agent. Each robots.txt file should begin by declaring a User-agent value that explains which web robots (i.e. search crawlers) the file applies to. Using * for the value of User-agent indicates that all web robots should follow the directives within the file; * represents a wildcard match. The Disallow directive points to the folders on your server that shouldn’t be accessed. The directive can point to a directory (i.e. /myprivatefolder/) or a particular file (i.e. /myfolder/folder1/myprivatefile.html). There is a specification for robots.txt, but the rules and syntax are exceptionally simple.There is a specification for robots.txt, but the rules and syntax are exceptionally simple. Robots.txt Non-Standard Directives Of course, whilst having a list of search engines and files you want hidden is useful, there are a few non-standard extensions to the robots.txt specification that will further boost its value to you and your website. Although these are non-standard directives, all major search crawlers acknowledge and support them. Some of these more popular non-standard directives are: Sitemap: where your Sitemap.xml file is Allow: opposite of Disallow Crawl-delay: sets the number of seconds between server requests that can be made by spiders There are other less supported directives such as Visit-time, which restricts web robots to indexing your site only between certain hours of the day. Here’s an example of a more complex robots.txt file using non-standard directives: 5 Archivos Web que mejorarán su Sitio Web La cantidad de código que se encuentran los desarrolladores con regularidad es asombrosa. En un momento dado, un solo sitio puede hacer uso de más de cinco idiomas diferentes (es decir, web de MySQL, PHP, JavaScript, CSS, HTML). Hay un número de maneras poco conocidas y poco utilizadas para mejorar su sitio con unos pocos archivos simples pero potentes. Este artículo pretende poner de relieve cinco de estos héroes anónimos que pueden ayudar a su sitio. Son muy fáciles de usar y entender, y por lo tanto, pueden ser grandes adiciones a los sitios web de implementar o ejecutar en la actualidad. Una visión general ¿Qué archivos se nos va a examinar (y la producción)? La decisión de qué archivos para cubrir ciertamente no fue una tarea fácil para mí, y hay muchos otros archivos (como. Htaccess que no vamos a cubrir) puede poner en práctica que puede proporcionar su sitio web un impulso. Los archivos voy a hablar aquí fueron escogidos por su utilidad, así como su facilidad de aplicación. Máximo provecho por nuestro dinero. Vamos a cubrir los robots.txt, favicon.ico, sitemap.xml, dublin.rdf y opensearch.xml. Sus efectos van desde ayudar a los motores de búsqueda índice de su sitio con precisión, para actuar como auxiliares de la usabilidad y la interoperabilidad. Vamos a empezar con la más familiar: robots.txt. Robots.txt La función principal de un archivo robots.txt es declarar qué partes de su sitio debe estar fuera de los límites para el rastreo. Por definición, el uso de este archivo actúa como un proceso de opt-out. Si no hay robots.txt para un directorio en su sitio web, por defecto, es presa fácil para los robots web, tales como rastreadores de motores de búsqueda para el acceso y el índice. Si bien se puede decir los comandos de exclusión en un documento HTML mediante el uso de una etiqueta meta (), los beneficios de controlar las páginas omitidas a través de un solo archivo de texto es la mayor facilidad de mantenimiento. Nota: vale la pena mencionar que obedecer el archivo robots.txt no es obligatorio, así que no es un mecanismo de buena privacidad. Robots.txtThis es como el archivo robots.txt interactúa entre un motor de búsqueda y su sitio web. Creación de un archivo robots.txt Para crear un archivo robots.txt, lo primero y más obvio que se necesita es un editor de texto. También vale la pena señalar que el archivo debe ser llamado robots.txt (o no va a funcionar) y que tiene que existir en el directorio raíz de tu sitio web ya que por defecto, que es donde los robots web buscar el archivo. La siguiente cosa que necesitamos hacer es encontrar una lista de instrucciones para los motores de búsqueda a seguir. En muchos aspectos, la estructura de la robot.txt 's es similar a CSS, ya que se compone de pares de atributo y valor que dictan las reglas. Otra cosa a tener en cuenta es que se pueden incluir comentarios dentro de su archivo robots.txt utilizando el # (numeral) antes de que carácter. Esto es útil para documentar su trabajo. Aquí hay un ejemplo básico contar robots web no se arrastran los miembros / / y / private / directorio: User-agent: * Disallow: / members / Disallow: / private / La norma de la exclusión robots.txt sólo tiene dos directivas (también hay unos pocos no estándar, como las directivas Crawl-delay, lo que vamos a cubrir en breve). La directiva de la primera norma es User-agent. Cada archivo robots.txt debe comenzar por declarar un valor de User-agent que explica que los robots rastreadores web (es decir, de búsqueda) el archivo se aplica a. Uso de * para el valor de User-agent indica que todos los robots web debe seguir las directrices dentro del archivo, * representa un partido de comodín. Los puntos de directiva No permitir a las carpetas en el servidor que no debería tener acceso. La directiva puede apuntar a un directorio (es decir, / myprivatefolder /) o un archivo en particular (es decir, / myfolder/folder1/myprivatefile.html). No es una especificación para robots.txt, pero las reglas y la sintaxis son excepcionalmente simple.There es una especificación para robots.txt, pero las reglas y la sintaxis son excepcionalmente sencillo. Robots.txt no estándar Directivas Por supuesto, mientras que tener una lista de motores de búsqueda y los archivos que desea oculto es útil, hay algunas extensiones no estándar de la especificación robots.txt que impulsará aún más su valor para usted y su sitio web. Aunque se trata de directivas no estándar, todos los rastreadores de búsqueda más importantes reconocen y apoyan. Algunos de estos más populares no estándar directivas son las siguientes: Mapa del sitio: en el archivo sitemap.xml es Permitir: No permitir el opuesto de Crawl-delay: define el número de segundos entre peticiones al servidor que se puede hacer por las arañas Hay otras directivas, como menos apoyados por la visita de tiempo, lo que restringe los robots web a la indexación de su sitio sólo a ciertas horas del día. He aquí un ejemplo de un archivo robots.txt más compleja con las directivas no estándar:

ChataTechnology

Informacion

Bienvenidos visitantes

Informacion

1 comentario:

Administradores del Blog (Lideres)

ChataTechnology

Vistas de página en total

Seguidores de TecnologyChata