Global Names Services¶
The gn module provides access to Global Names services including the Global Names Index (GNI) and Global Names Resolver (GNR).
::: pytaxize.gn
Overview¶
Global Names services provide comprehensive tools for taxonomic name recognition, parsing, and resolution. The gn module includes two main sub-modules:
- GNI (Global Names Index): Search and parse scientific names
- GNR (Global Names Resolver): Resolve names against multiple data sources
Sub-modules¶
Global Names Index (GNI)¶
::: pytaxize.gn.gni
The Global Names Index provides functions for parsing and searching scientific names.
parse¶
::: pytaxize.gn.gni.parse
Parse scientific names to extract their components.
Parameters:
- names (list): List of scientific names to parse
Returns: - List of dictionaries containing parsed name components
Examples:
from pytaxize import gn
# Parse scientific names
names = ['Cyanistes caeruleus', 'Helianthus annuus', 'Homo sapiens']
results = gn.gni.parse(names=names)
for result in results:
print(f"Name: {result.get('scientificName', 'Unknown')}")
print(f"Genus: {result.get('genus', 'Unknown')}")
print(f"Species: {result.get('species', 'Unknown')}")
search¶
::: pytaxize.gn.gni.search
Search the Global Names Index for names matching a pattern.
Parameters:
- search_term (str): Search pattern (supports wildcards). Default: "ani*"
- per_page (int): Results per page. Default: 30
- page (int): Page number to retrieve. Default: 1
Returns: - Dictionary containing search results and metadata
Examples:
from pytaxize import gn
# Search for names starting with 'Quer'
results = gn.gni.search(search_term="Quer*", per_page=10)
print(f"Found {len(results.get('names', []))} results")
for name in results.get('names', []):
print(f" {name}")
details¶
::: pytaxize.gn.gni.details
Get detailed information about a specific name string ID.
Parameters:
- id (int): Name string ID. Default: 17802847
- all_records (int): Include all records (1) or just best (0). Default: 1
Returns: - Dictionary containing detailed name information
Examples:
from pytaxize import gn
# Get details for a specific name ID
details = gn.gni.details(id=17802847)
print(f"Name: {details.get('name', 'Unknown')}")
print(f"Canonical: {details.get('canonical', 'Unknown')}")
Global Names Resolver (GNR)¶
::: pytaxize.gn.gnr
The Global Names Resolver resolves scientific names against multiple taxonomic data sources.
resolve¶
::: pytaxize.gn.gnr.resolve
Resolve scientific names against taxonomic databases.
Parameters:
- names (str or list): Scientific name(s) to resolve. Default: "Homo sapiens"
- source (str, optional): Specific data source to use
- format (str): Response format. Default: "json"
- resolve_once (str): Resolve each name only once. Default: "false"
- with_context (str): Include context information. Default: "false"
- best_match_only (str): Return only best matches. Default: "false"
- header_only (str): Return headers only. Default: "false"
- preferred_data_sources (str): Use preferred data sources. Default: "false"
- http (str): HTTP method to use. Default: "get"
Returns: - Dictionary containing resolution results
Examples:
from pytaxize import gn
# Resolve a single name
result = gn.gnr.resolve(names="Homo sapiens")
print(result)
# Resolve multiple names
names = ["Quercus alba", "Pinus strobus", "Acer saccharum"]
results = gn.gnr.resolve(names=names)
for name_result in results.get('data', []):
supplied_name = name_result.get('supplied_name_string')
print(f"\nSupplied: {supplied_name}")
for result in name_result.get('results', []):
matched_name = result.get('canonical_form')
data_source = result.get('data_source_title')
print(f" Match: {matched_name} (Source: {data_source})")
datasources¶
::: pytaxize.gn.gnr.datasources
Get information about available data sources in GNR.
Returns: - List of dictionaries containing data source information
Examples:
from pytaxize import gn
# Get all available data sources
sources = gn.gnr.datasources()
print(f"Found {len(sources)} data sources:")
for source in sources[:10]: # Show first 10
print(f" {source['id']}: {source['title']}")
Usage Examples¶
Basic Name Resolution Workflow¶
from pytaxize import gn
def resolve_species_names(species_list):
"""Resolve a list of species names using GNR"""
# First, resolve against all data sources
resolution = gn.gnr.resolve(names=species_list)
results = {}
for name_result in resolution.get('data', []):
supplied_name = name_result.get('supplied_name_string')
results[supplied_name] = {
'matches': [],
'best_match': None
}
# Process all matches
for match in name_result.get('results', []):
match_info = {
'canonical_form': match.get('canonical_form'),
'data_source': match.get('data_source_title'),
'score': match.get('score', 0)
}
results[supplied_name]['matches'].append(match_info)
# Find best match (highest score)
if results[supplied_name]['matches']:
best = max(results[supplied_name]['matches'],
key=lambda x: x['score'])
results[supplied_name]['best_match'] = best
return results
# Example usage
species = ["Quercus alba", "Homo sapiens", "Invalid species"]
resolved = resolve_species_names(species)
for name, info in resolved.items():
print(f"\nName: {name}")
if info['best_match']:
best = info['best_match']
print(f" Best match: {best['canonical_form']}")
print(f" Source: {best['data_source']}")
print(f" Score: {best['score']}")
else:
print(" No matches found")
Name Parsing and Validation¶
from pytaxize import gn
def validate_and_parse_names(names_list):
"""Parse and validate scientific names"""
# Parse the names first
parsed = gn.gni.parse(names=names_list)
validation_results = []
for i, name in enumerate(names_list):
parse_result = parsed[i] if i < len(parsed) else {}
# Check if parsing was successful
is_parsed = bool(parse_result.get('parsed'))
result = {
'original': name,
'parsed_successfully': is_parsed,
'canonical': parse_result.get('canonical'),
'genus': parse_result.get('genus'),
'species': parse_result.get('species'),
'authorship': parse_result.get('authorship')
}
# Additional validation
if is_parsed:
# Check if it's a binomial name
parts = result['canonical'].split() if result['canonical'] else []
result['is_binomial'] = len(parts) == 2
result['is_valid_format'] = result['is_binomial'] and result['genus'] and result['species']
else:
result['is_binomial'] = False
result['is_valid_format'] = False
validation_results.append(result)
return validation_results
# Example usage
test_names = [
"Homo sapiens", # Valid binomial
"Quercus alba L.", # Valid with authority
"Quercus", # Genus only
"invalid name format", # Invalid
"Canis lupus familiaris" # Trinomial
]
validated = validate_and_parse_names(test_names)
for result in validated:
print(f"\nOriginal: {result['original']}")
print(f"Valid format: {result['is_valid_format']}")
if result['canonical']:
print(f"Canonical: {result['canonical']}")
if result['authorship']:
print(f"Author: {result['authorship']}")
Finding Data Sources¶
from pytaxize import gn
def find_relevant_datasources(keywords):
"""Find data sources relevant to specific keywords"""
all_sources = gn.gnr.datasources()
relevant_sources = []
for source in all_sources:
title = source.get('title', '').lower()
description = source.get('description', '').lower()
# Check if any keyword appears in title or description
for keyword in keywords:
if keyword.lower() in title or keyword.lower() in description:
relevant_sources.append({
'id': source['id'],
'title': source['title'],
'description': source.get('description', 'No description')
})
break
return relevant_sources
# Find sources for specific groups
plant_sources = find_relevant_datasources(['plant', 'flora', 'botanical'])
animal_sources = find_relevant_datasources(['animal', 'fauna', 'zoological'])
print("Plant-related data sources:")
for source in plant_sources[:5]:
print(f" {source['id']}: {source['title']}")
print("\nAnimal-related data sources:")
for source in animal_sources[:5]:
print(f" {source['id']}: {source['title']}")
Targeted Resolution with Specific Sources¶
from pytaxize import gn
def resolve_with_specific_sources(names, source_ids):
"""Resolve names against specific data sources"""
# Convert source IDs to string format required by API
source_string = '|'.join(map(str, source_ids))
# Resolve with specific sources
results = gn.gnr.resolve(
names=names,
preferred_data_sources=source_string,
best_match_only="true"
)
resolved_names = {}
for name_result in results.get('data', []):
supplied_name = name_result.get('supplied_name_string')
matches = name_result.get('results', [])
if matches:
best_match = matches[0] # Since we requested best_match_only
resolved_names[supplied_name] = {
'matched_name': best_match.get('canonical_form'),
'source': best_match.get('data_source_title'),
'source_id': best_match.get('data_source_id'),
'score': best_match.get('score')
}
else:
resolved_names[supplied_name] = None
return resolved_names
# Example: Resolve against ITIS and Catalogue of Life only
species = ["Quercus alba", "Pinus strobus"]
itis_col_sources = [3, 1] # Example source IDs for ITIS and COL
resolved = resolve_with_specific_sources(species, itis_col_sources)
for name, match in resolved.items():
print(f"\nName: {name}")
if match:
print(f" Resolved to: {match['matched_name']}")
print(f" Source: {match['source']}")
print(f" Score: {match['score']}")
else:
print(" No match found in specified sources")
Error Handling¶
from pytaxize import gn
from pytaxize.gn.gni import NoResultError
def safe_name_resolution(names):
"""Safely resolve names with comprehensive error handling"""
results = {
'successful': [],
'failed': [],
'errors': []
}
try:
# Try to resolve names
resolution = gn.gnr.resolve(names=names)
if resolution and 'data' in resolution:
for name_result in resolution['data']:
name = name_result.get('supplied_name_string')
if name_result.get('results'):
results['successful'].append(name)
else:
results['failed'].append(name)
results['resolution_data'] = resolution
else:
results['errors'].append("No data returned from API")
except NoResultError as e:
results['errors'].append(f"No results error: {e}")
except Exception as e:
results['errors'].append(f"Unexpected error: {e}")
return results
# Test error handling
test_names = ["Valid name", "Invalid name", "Homo sapiens"]
results = safe_name_resolution(test_names)
print(f"Successful: {len(results['successful'])}")
print(f"Failed: {len(results['failed'])}")
if results['errors']:
print(f"Errors: {results['errors']}")
API Information¶
Global Names Index (GNI)¶
- Base URL:
http://gni.globalnames.org/ - Rate Limits: No explicit limits, but please be respectful
- Authentication: None required
- Coverage: Comprehensive index of scientific names
Global Names Resolver (GNR)¶
- Base URL:
http://resolver.globalnames.org/ - Rate Limits: No explicit limits, but please be respectful
- Authentication: None required
- Data Sources: 100+ taxonomic databases
Notes and Limitations¶
- Network Dependency: Requires internet connection for all functions
- Response Time: Large queries may take time to process
- Data Quality: Results depend on quality of underlying data sources
- Name Variants: May not catch all spelling variations or synonyms
- Taxonomic Coverage: Coverage varies by taxonomic group
Best Practices¶
- Batch Processing: Process multiple names at once when possible
- Error Handling: Always include try-catch blocks for network operations
- Result Validation: Check scores and multiple sources for important names
- Caching: Cache results to avoid repeated API calls
- Rate Limiting: Add delays for large batch operations
Related Functions¶
tax.scrapenames: Extract names from text using Global Names servicesIds: Get taxonomic IDs which can be cross-referenced with GN resultsscicomm: Convert between scientific and common names