Man page - libwget-robots(3)
Packages contains this manual
- libwget-console(3)
- libwget-utils(3)
- libwget-error(3)
- libwget-hash(3)
- libwget-net(3)
- libwget-stringmap(3)
- libwget-xml(3)
- libwget-dns(3)
- libwget-robots(3)
- libwget-base64(3)
- libwget-io(3)
- libwget-parse_sitemap(3)
- libwget-dns-caching(3)
- libwget-printf(3)
- libwget-bitmap(3)
- libwget-vector(3)
- libwget-ip(3)
- libwget-hashmap(3)
- libwget-mem(3)
- libwget-thread(3)
- libwget-parse_atom(3)
- libwget-xalloc(3)
- libwget-random(3)
- libwget-list(3)
apt-get install wget2-dev
Manual
libwget-robots
NAMESYNOPSIS
Data Structures
Macros
Functions
Detailed Description
Function Documentation
int wget_robots_parse (wget_robots ** _robots, const char * data, constchar * client)
void wget_robots_free (wget_robots ** robots)
int wget_robots_get_path_count (wget_robots * robots)
wget_string * wget_robots_get_path (wget_robots * robots, int index)
int wget_robots_get_sitemap_count (wget_robots * robots)
const char * wget_robots_get_sitemap (wget_robots * robots, int index)
Author
NAME
libwget-robots - Robots Exclusion file parser
SYNOPSIS
Data Structures
struct wget_robots_st
Macros
#define parse_record_field (d, f) parse_record_field(d, f, sizeof(f) - 1)
Functions
int
wget_robots_parse
(
wget_robots
**_robots,
const char *data, const char *client)
void
wget_robots_free
(
wget_robots
**robots)
int
wget_robots_get_path_count
(
wget_robots
*robots)
wget_string
*
wget_robots_get_path
(
wget_robots
*robots, int index)
int
wget_robots_get_sitemap_count
(
wget_robots
*robots)
const char *
wget_robots_get_sitemap
(
wget_robots
*robots, int index)
Detailed Description
The purpose of this set of functions is to parse a Robots Exclusion Standard file into a data structure for easy access.
Function Documentation
int wget_robots_parse (wget_robots ** _robots, const char * data, constchar * client)
Parameters
data
Memory with
robots.txt content (with trailing 0-byte)
client
Name of the client / user-agent
Returns
Return an allocated wget_robots structure or NULL on error
The function parses the robots.txt data in accordance to https://www.robotstxt.org/orig.html#format and returns a ROBOTS structure including a list of the disallowed paths and including a list of the sitemap files.
The ROBOTS structure has to be freed by calling wget_robots_free() .
void wget_robots_free (wget_robots ** robots)
Parameters
robots Pointer to Pointer to wget_robots structure
wget_robots_free() freeβs the formerly allocated wget_robots structure.
int wget_robots_get_path_count (wget_robots * robots)
Parameters
robots Pointer to instance of wget_robots
Returns
Returns the number of paths listed in robots
wget_string * wget_robots_get_path (wget_robots * robots, int index)
Parameters
robots
Pointer
to instance of wget_robots
index
Index of the wanted path
Returns
Returns the path at index or NULL
int wget_robots_get_sitemap_count (wget_robots * robots)
Parameters
robots Pointer to instance of wget_robots
Returns
Returns the number of sitemaps listed in robots
const char * wget_robots_get_sitemap (wget_robots * robots, int index)
Parameters
robots
Pointer
to instance of wget_robots
index
Index of the wanted sitemap URL
Returns
Returns the sitemap URL at index or NULL
Author
Generated automatically by Doxygen for wget2 from the source code.