Tax Module¶
The tax module provides core taxonomic functions including random name generation, taxonomic name extraction, and search capabilities.
::: pytaxize.tax
Functions¶
names_list¶
::: pytaxize.tax.names_list
Generate random taxonomic names for testing and examples.
Parameters:
rank(str): Taxonomic rank - one of 'species', 'genus', 'family', 'order'. Default: 'genus'size(int): Number of names to return. Default: 10as_dataframe(bool): Return as pandas DataFrame. Default: False
Returns:
- List of taxonomic names or DataFrame if
as_dataframe=True
Examples:
from pytaxize import tax
# Get 10 random genus names
genera = tax.names_list(size=10)
# Get species names
species = tax.names_list('species', size=5)
# Get as DataFrame
df = tax.names_list('family', size=20, as_dataframe=True)
vascan_search¶
::: pytaxize.tax.vascan_search
Search the CANADENSYS Vascan API for Canadian plant taxa.
Parameters:
q(str or list): Query term(s) - taxonomic names to search forformat(str): Response format - 'json' or 'xml'. Default: 'json'raw(bool): Return raw response. Default: False
Returns:
- Dictionary containing search results or raw response if
raw=True
Examples:
from pytaxize import tax
# Single name search
result = tax.vascan_search(q=["Helianthus annuus"])
# Multiple names
result = tax.vascan_search(q=["Helianthus annuus", "Crataegus dodgei"])
# Raw response
raw_result = tax.vascan_search(q=["Helianthus annuus"], raw=True)
# XML format
xml_result = tax.vascan_search(q=["Helianthus annuus"], format="xml")
scrapenames¶
::: pytaxize.tax.scrapenames
Extract taxonomic names from text, URLs, or files using the Global Names Recognition and Discovery service.
Parameters:
url(str, optional): URL to a web page, PDF, or documentfile(str, optional): Path to a local filetext(str, optional): Text content to analyzeengine(int, optional): Recognition engine - 0 (both), 1 (TaxonFinder), 2 (NetiNeti). Default: 0unique(bool, optional): Return unique names only. Default: Trueverbatim(bool, optional): Include verbatim strings. Default: Falsedetect_language(bool, optional): Enable language detection. Default: Trueall_data_sources(bool, optional): Resolve against all data sources. Default: Falsedata_source_ids(str, optional): Pipe-separated list of data source IDsas_dataframe(bool, optional): Return as DataFrame. Default: False
Returns:
- Dictionary with 'meta' and 'data' keys containing metadata and extracted names
Examples:
from pytaxize import tax
# Extract names from a website
result = tax.scrapenames(url='https://en.wikipedia.org/wiki/Spider')
print(result['data']) # Extracted names
print(result['meta']) # Metadata
# Extract from PDF
pdf_result = tax.scrapenames(
url='http://www.example.com/paper.pdf'
)
# Extract from text
text_result = tax.scrapenames(
text='A spider named Pardosa moesta Banks, 1892'
)
# With additional options
detailed_result = tax.scrapenames(
url='https://en.wikipedia.org/wiki/Spider',
unique=True,
all_data_sources=True,
as_dataframe=True
)
Helper Functions¶
names_list_helper¶
Internal helper function for names_list(). Not intended for direct use.
Usage Examples¶
Getting Test Data¶
from pytaxize import tax
# Get different taxonomic ranks
species = tax.names_list('species', 10)
genera = tax.names_list('genus', 10)
families = tax.names_list('family', 10)
orders = tax.names_list('order', 10)
print(f"Species: {species}")
print(f"Genera: {genera}")
print(f"Families: {families}")
print(f"Orders: {orders}")
Text Mining for Names¶
from pytaxize import tax
# Analyze a scientific paper
paper_url = "https://example.com/biodiversity-paper.pdf"
names = tax.scrapenames(url=paper_url)
# Extract unique scientific names
scientific_names = [item['scientificName'] for item in names['data']
if 'scientificName' in item]
print(f"Found {len(scientific_names)} scientific names")
Canadian Flora Search¶
from pytaxize import tax
# Search for Canadian plants
canadian_plants = ["Acer saccharum", "Betula papyrifera", "Picea glauca"]
results = tax.vascan_search(q=canadian_plants)
for plant in results:
print(f"Plant: {plant}")
Data Sources¶
names_list Data Sources¶
- Species names: Plant species from botanical databases
- Genus names: Plant genera
- Family names: APG (Angiosperm Phylogeny Group) families
- Order names: APG orders
vascan_search Data Source¶
- CANADENSYS Vascan: Database of Canadian vascular plants
- API endpoint:
http://data.canadensys.net/vascan/api/
scrapenames Data Source¶
- Global Names Recognition and Discovery:
https://finder.globalnames.org/ - Supports multiple recognition engines and data sources
- Can process various document formats (HTML, PDF, images, etc.)
Error Handling¶
from pytaxize import tax
try:
# This might fail if the service is down
result = tax.scrapenames(url="https://invalid-url.com")
except Exception as e:
print(f"Error occurred: {e}")
# Always check for results
result = tax.vascan_search(q=["Nonexistent species"])
if result and 'results' in result:
print("Found results")
else:
print("No results found")
Notes¶
- The
scrapenamesfunction requires an internet connection - Large documents may take time to process
- Some functions may have rate limits - avoid rapid successive calls
- Results may vary depending on the quality and format of input text/documents