vendor: replace third_party/nature-id gitlink with tracked files

This commit is contained in:
NODA1 System
2026-02-21 11:00:42 +01:00
parent a91309de11
commit 69486a92be
18 changed files with 1376 additions and 1 deletions

Submodule third_party/nature-id deleted from 5e9468d65a

6
third_party/nature-id/.gitignore vendored Normal file
View File

@@ -0,0 +1,6 @@
__pycache__/
*.py[cod]
*$py.class
*.csv
*.tflite
*.zip

10
third_party/nature-id/LICENSE vendored Normal file
View File

@@ -0,0 +1,10 @@
MIT License
Copyright (c) 2020, joergmlpts
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

372
third_party/nature-id/README.md vendored Normal file
View File

@@ -0,0 +1,372 @@
# Identify Plants, Birds, and Insects in Photos
This repository provides Python code that identifies plants, birds, and insects in photos.
This project was inspired by the amazing progress in identifying plants, animals and mushrooms in photos that has been made by [iNaturalist](https://iNaturalist.org) in recent years in identifying plants, animals, and fungi from photographs. The iNaturalist team has trained machine learning models with their large collection of photos and research-grade identifications. In 2019, iNaturalist released [Seek by iNaturalist](https://www.inaturalist.org/pages/seek_app) which identifies photos offline on the phone and identifies to a higher level than species when a species identification cannot be made.
Google provides three models that have been trained with iNaturalist data - classification models for plants, birds, and insects. These Google models can be downloaded and used with Google's `TensorFlow` and `TensorFlow Lite` tools.
This code is based on the trained models provided by Google. It was written to experiment with identifying species from photos and to try out Seek's approach to calculating scores (probabilities) across the taxonomic hierarchy.
This tool `nature_id.py` has been tested on Linux and Windows. It should also work on MacOS.
## Usage
This is a command-line tool. It is invoked with images or directories containing images and identifies the plants, birds, and insects in those images.
Here is an example. This is the command for Linux and macOS:
```
./nature_id.py -m plants plant_images/Persicaria_amphibia.jpg
```
On Windows the command is:
```
python .\nature_id.py -m plants plant_images\Persicaria_amphibia.jpg
```
![Smartweed](/plant_images/Persicaria_amphibia.jpg)
The above image results in this identification:
```
Classification of 'plant_images/Persicaria_amphibia.jpg' took 0.2 secs.
100.0% kingdom Plants (Plantae)
100.0% phylum Tracheophytes (Tracheophyta)
100.0% subphylum Flowering Plants (Angiospermae)
99.6% class Dicots (Magnoliopsida)
99.2% order Pinks, Cactuses, and Allies (Caryophyllales)
98.8% family Knotweed Family (Polygonaceae)
98.8% subfamily Polygonoideae
98.8% tribe Persicarieae
98.8% subtribe Persicariinae
98.8% genus Smartweeds (Persicaria)
97.6% species Water Smartweed (Persicaria amphibia)
```
These scores can be used to guide identification: define a threshold and report as result the taxon with the lowest score that is larger than or equal to this threshold. In this example for a threshold of 95% an identification to species *Persicaria amphibia* has been achieved. For a threshold of 99%, this is only an identification to order *Caryophyllales*. 95% and 99% would be unusually high thresholds; Seek, I believe, uses a threshold of 70%.
## Command-line Options
This script is a command-line utility. It is called with options, filenames and directory names as arguments. These options are supported:
```
usage: nature_id.py [-h] [-m MODEL] [-a] [-l] [-s] [-r RESULT_SIZE] file/directory [file/directory ...]
positional arguments:
file/directory Image files or directories with images.
options:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Model to load to identify organisms.
-a, --all_common_names
Show all common names and not just one.
-l, --label_scores_only
Compute and display only label scores, do not propagate scores up the hierarchy.
-s, --scientific_names_only
Only use scientific names, do not load common names.
-r RESULT_SIZE, --result_size RESULT_SIZE
Number of labels and their scores to report in results.
```
### Option -m MODEL, --model MODEL
The `-m` and `--model` options select a classification model. Possible models are `plants`, `birds`, and `insects`. These models must be installed in the `classifiers` directory. This option is required if more than one classifier is installed.
### Option -a, --all_common_names
The `-a` and `--all_common_names` options cause all common names to be displayed, not just one. Multiple common names are separated by semicolons. The output with this option looks like this:
![Phyla_nodiflora.jpg](/plant_images/Phyla_nodiflora.jpg)
```
Classification of 'plant_images/Phyla_nodiflora.jpg' took 0.2 secs.
100.0% kingdom Plants; Flora; Green Plants; Greenery; Foliage; Vegetation; Salpichlaena Papyrus; Trees; Bushes; Shrubs; Vines (Plantae)
100.0% phylum Tracheophytes; Seed Plants; Vascular Plants (Tracheophyta)
100.0% subphylum Flowering Plants; Angiosperms; Flowers; Basal Angiosperms; True Dicotyledons; Basal True Dicots; Rose Dicots; Daisy Dicots (Angiospermae)
100.0% class Dicots; Dicots; Dicotyledons; Eudicots (Magnoliopsida)
98.2% order Mints, Plantains, Olives, and Allies (Lamiales)
97.4% family Verbena Family; Lantanas (Verbenaceae)
97.4% tribe Lantaneae
85.5% genus Frogfruits; Fogfruits (Phyla)
85.5% species Turkey Tangle; Lippia; Common Lippia; Turkey Tangle Frogfruit; Sawtooth Fogfruit; Carpet Weed; Roundleaf Frogfruit; Texas Frogfruit; Cape Weed; Sawtooth Frogfruit; Lipia; Turkey Tangle Fogfruit; Daisy Lawn; Fog Grass (Phyla nodiflora)
```
### Option -l, --label_scores_only
The `-l` and `--label_scores_only` options switch from the taxonomic hierarchy view to a flat list of labels and their scores. The output with this option looks like this:
![Solidago_velutina_ssp_californica.jpg](/plant_images/Solidago_velutina_ssp_californica.jpg)
```
Classification of 'plant_images/Solidago_velutina_ssp_californica.jpg' took 0.2 secs.
86.1% Canada Goldenrod (Solidago canadensis)
9.8% Late Goldenrod (Solidago altissima)
1.6% Flat-Topped Goldenrod (Euthamia graminifolia)
1.2% Northern Seaside Goldenrod (Solidago sempervirens)
0.4% Stiff-Leaved Goldenrod (Solidago rigida)
```
Five labels with decreasing scores are shown by default. The `-r` and `--result_size` options can be used to request fewer or more labels.
### Option -s, --scientific_names_only
The `-s` and `--scientific_names_only` options disable common names; only the scientific names are displayed. The output with this option looks like this:
![Trichostema_lanceolatum.jpg](/plant_images/Trichostema_lanceolatum.jpg)
```
Classification of 'plant_images/Trichostema_lanceolatum.jpg' took 0.2 secs.
100.0% kingdom Plantae
100.0% phylum Tracheophyta
100.0% subphylum Angiospermae
100.0% class Magnoliopsida
99.6% order Lamiales
99.6% family Lamiaceae
99.2% subfamily Ajugoideae
99.2% genus Trichostema
99.2% species Trichostema lanceolatum
```
### Option -r RESULT_SIZE, --result_size RESULT_SIZE
The `-r` and `--result_size` options modify the number of labels displayed when a flat list of labels is requested with the `-l` or `--label_scores_only` options. The default is 5. Options `-r` and `--result_size` allow you to choose a number between 1 and 100.
This is an example with 15 labels. The command-line for Linux is
```
./nature_id.py -m plants -l -r 15 plant_images/Primula_hendersonii.jpg
```
![Primula_hendersonii.jpg](/plant_images/Primula_hendersonii.jpg)
```
Classification of 'plant_images/Primula_hendersonii.jpg' took 0.2 secs.
50.4% Henderson's Shooting Star (Primula hendersonii)
37.2% Eastern Shooting Star (Primula meadia)
2.5% Dark-Throated Shooting Star (Primula pauciflora)
1.7% Red Ribbons (Clarkia concinna)
1.2% Ruby Chalice Clarkia (Clarkia rubicunda)
0.8% Purple Paintbrush (Castilleja purpurea)
0.8% Fireweed (Chamaenerion angustifolium)
0.4% Western Fairy-Slipper (Calypso bulbosa occidentalis)
0.4% Texas Skeleton Plant (Lygodesmia texana)
0.4% Rhodora (Rhododendron canadense)
0.4% Ragged-Robin (Silene flos-cuculi)
0.4% Hemp Dogbane (Apocynum cannabinum)
0.4% Garden Cosmos (Cosmos bipinnatus)
0.4% Farewell-To-Spring (Clarkia amoena)
0.4% Dwarf Fireweed (Chamaenerion latifolium)
```
## Dependencies
Several things need to be installed in order for `nature-id.py` to run. Some Python packages are required, classification models need to be downloaded and installed into the `classifiers` directory, and finally the taxonomy and common names need to be downloaded into the `inaturalist-taxonomy` directory.
### Python Packages
This code is written in Python 3. Besides Python 3, the packages `Pillow` and `requests` are used to load and process images and to access the iNaturalist API.
These packages as well as `TensorFlow Lite` can be installed on Ubuntu Linux and other Debian distributions with the command
```
sudo apt install python3-pillow python3-requests
pip3 install tflite-runtime
```
and on other platforms with the command
```
pip install Pillow requests tflite-runtime
```
Where appropriate `pip3` should be called instead of `pip` to avoid accidentally installing Python 2 packages.
### Classification Models
The classification models and their labelmap files have to be downloaded from Kaggle and they go into directory `classifiers`.
The classifiers can be downloaded from these links:
* [classifier for plants](https://www.kaggle.com/models/google/aiy/tensorFlow1/vision-classifier-plants-v1/1)
* [classifier for birds](https://www.kaggle.com/models/google/aiy/tensorFlow1/vision-classifier-birds-v1/1)
* [classifier for insects](https://www.kaggle.com/models/google/aiy/tensorFlow1/vision-classifier-insects-v1/1)
Each classifier consists of a `.tflite` model and a `.csv` labelmap file. Both are required. Click on `Model Variations` under `TensorFlow Lite` to download the TFLite model. Please also note the paragraphs at the bottom of these web pages about appropriate and inappropriate use cases and licensing.
These are the links to download the labelmaps: [aiy_insects_V1_labelmap.csv](https://www.gstatic.com/aihub/tfhub/labelmaps/aiy_insects_V1_labelmap.csv), [aiy_birds_V1_labelmap.csv](https://www.gstatic.com/aihub/tfhub/labelmaps/aiy_birds_V1_labelmap.csv), and [aiy_plants_V1_labelmap.csv](https://www.gstatic.com/aihub/tfhub/labelmaps/aiy_plants_V1_labelmap.csv). On Windows, the default action for a .csv file may be to open it in Excel; be sure to save the downloaded file to disk.
### Taxonomy and Common Names Files
The trained models come with scientific names as labels and many of these scientific names are already outdated. The common names and the current taxonomy are obtained from this file: [https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip](https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip) This tool expects this zip archive in the `inaturalist-taxonomy` directory.
## Example Images
Example Images pictures of plants are provided in the `plant_images` directory. The filenames indicate the species that I think is in the photo. Note that these examples only lead to successful identification to varying degrees. The *Mentzelia lindleyi* is certainly not correctly identified.
## Messages
The first call with a model transforms the labels into a taxonomic hierarchy. Each label is replaced with its representation in the current taxonomy and all its ancestors are added. This process takes some time and results in many messages. Once the hierarchy has been successfully computed, it is written to disk. Future calls to `nature_id.py` will load the taxonomic hierarchy from disk instead of reading the labels and computing the taxonomy again.
This is what the first calls look like. Again, we use the plant model as an example. The bird and insect models are smaller and result in fewer messages.
```
PS C:\nature-id> python -m plants nature_id.py .\plant_images
Read 2,102 labels from 'classifiers\aiy_plants_V1_labelmap.csv' in 0.0 secs.
Loading iNaturalist taxonomy...
Loaded iNaturalist taxonomy of 993,552 taxa in 15.2 secs.
Info: Taxon for label 'background' not found, inserting as pseudo-kingdom.
Info: Taxon 'Eichhornia crassipes' changed to 'Pontederia crassipes', iNat taxa id 962637.
Info: Taxon 'Potentilla anserina' changed to 'Argentina anserina', iNat taxa id 158615.
Info: Taxon 'Stenosiphon linifolius' changed to 'Oenothera glaucifolia', iNat taxa id 914092.
Info: Taxon 'Sophora secundiflora' changed to 'Dermatophyllum secundiflorum', iNat taxa id 499559.
Info: Taxon 'Mimulus bigelovii' changed to 'Diplacus bigelovii', iNat taxa id 701989.
Info: Taxon 'Botrychium dissectum' changed to 'Sceptridium dissectum', iNat taxa id 122085.
Info: Taxon 'Trientalis borealis' changed to 'Lysimachia borealis', iNat taxa id 204174.
Info: Taxon 'Hyptis emoryi' changed to 'Condea emoryi', iNat taxa id 489286.
Info: Taxon 'Opuntia engelmannii lindheimeri' changed to 'Opuntia lindheimeri', iNat taxa id 119980.
Info: Taxon 'Aquilegia caerulea' changed to 'Aquilegia coerulea', iNat taxa id 501742.
Info: Taxon 'Fuscospora cliffortioides' changed to 'Nothofagus cliffortioides', iNat taxa id 404204.
Info: Taxon 'Cooperia drummondii' changed to 'Zephyranthes chlorosolen', iNat taxa id 554401.
Info: Taxon 'Dracopis amplexicaulis' changed to 'Rudbeckia amplexicaulis', iNat taxa id 200073.
Info: Taxon 'Dodecatheon meadia' changed to 'Primula meadia', iNat taxa id 549981.
Info: Taxon 'Aptenia cordifolia' changed to 'Mesembryanthemum cordifolium', iNat taxa id 589815.
Info: Taxon 'Chamerion latifolium' changed to 'Chamaenerion latifolium', iNat taxa id 564970.
Info: Taxon 'Echinocereus mojavensis' changed to 'Echinocereus triglochidiatus mojavensis', iNat taxa id 858352.
Warning: multiple taxa named 'Aquilegia vulgaris': species 51807, complex 1042772; choosing species.
Info: Taxon 'Dodecatheon pulchellum' changed to 'Primula pauciflora', iNat taxa id 498086.
Info: Taxon 'Mimulus lewisii' changed to 'Erythranthe lewisii', iNat taxa id 777190.
Info: Taxon 'Sambucus nigra canadensis' changed to 'Sambucus canadensis', iNat taxa id 84300.
Info: Taxon 'Asyneuma prenanthoides' changed to 'Campanula prenanthoides', iNat taxa id 851072.
Info: Taxon 'Anemone quinquefolia' changed to 'Anemonoides quinquefolia', iNat taxa id 950598.
Info: Taxon 'Hedypnois cretica' changed to 'Hedypnois rhagadioloides', iNat taxa id 492864.
Warning: multiple taxa named 'Achillea millefolium': species 52821, complex 1105043; choosing species.
Info: Taxon 'Anagallis arvensis' changed to 'Lysimachia arvensis', iNat taxa id 791928.
Info: Taxon 'Hieracium caespitosum' changed to 'Pilosella caespitosa', iNat taxa id 711086.
Info: Taxon 'Potentilla anserina pacifica' changed to 'Argentina pacifica', iNat taxa id 524900.
Info: Taxon 'Sambucus nigra caerulea' changed to 'Sambucus cerulea', iNat taxa id 143799.
Info: Taxon 'Polygala californica' changed to 'Rhinotropis californica', iNat taxa id 876453.
Info: Taxon 'Calylophus berlandieri' changed to 'Oenothera berlandieri', iNat taxa id 359779.
Info: Taxon 'Mimulus cardinalis' changed to 'Erythranthe cardinalis', iNat taxa id 319974.
Info: Taxon 'Callistemon citrinus' changed to 'Melaleuca citrina', iNat taxa id 77976.
Info: Taxon 'Liatris mucronata' changed to 'Liatris punctata mucronata', iNat taxa id 371814.
Warning: multiple taxa named 'Stellaria media': species 53298, complex 1087592; choosing species.
Info: Taxon 'Anemone americana' changed to 'Hepatica americana', iNat taxa id 741014.
Info: Taxon 'Anemone occidentalis' changed to 'Pulsatilla occidentalis', iNat taxa id 60482.
Info: Taxon 'Orobanche fasciculata' changed to 'Aphyllon fasciculatum', iNat taxa id 802543.
Info: Taxon 'Mimulus primuloides' changed to 'Erythranthe primuloides', iNat taxa id 635401.
Info: Taxon 'Polygala paucifolia' changed to 'Polygaloides paucifolia', iNat taxa id 497911.
Warning: multiple taxa named 'Campanula rotundifolia': species 62312, complex 984576; choosing species.
Info: Taxon 'Cissus incisa' changed to 'Cissus trifoliata', iNat taxa id 133333.
Info: Taxon 'Schinus terebinthifolius' changed to 'Schinus terebinthifolia', iNat taxa id 130872.
Info: Taxon 'Cooperia pedunculata' changed to 'Zephyranthes drummondii', iNat taxa id 120026.
Info: Taxon 'Scabiosa atropurpurea' changed to 'Sixalix atropurpurea', iNat taxa id 372376.
Info: Taxon 'Sphenosciadium capitellatum' changed to 'Angelica capitellata', iNat taxa id 704166.
Info: Taxon 'Trientalis latifolia' changed to 'Lysimachia latifolia', iNat taxa id 496537.
Warning: multiple taxa named 'Spiranthes cernua': species 773385, complex 931407; choosing species.
Info: Taxon 'Spartina pectinata' changed to 'Sporobolus michauxianus', iNat taxa id 772984.
Info: Taxon 'Centaurea americana' changed to 'Plectocephalus americanus', iNat taxa id 699778.
Info: Taxon 'Fuscospora solandri' changed to 'Nothofagus solandri', iNat taxa id 70246.
Info: Taxon 'Heliotropium tenellum' changed to 'Euploca tenella', iNat taxa id 769888.
Info: Taxon 'Blechnum spicant' changed to 'Struthiopteris spicant', iNat taxa id 774894.
Info: Taxon 'Fallopia japonica' changed to 'Reynoutria japonica', iNat taxa id 914922.
Info: Taxon 'Echinocactus texensis' changed to 'Homalocephala texensis', iNat taxa id 870496.
Info: Taxon 'Gaura parviflora' changed to 'Oenothera curtiflora', iNat taxa id 78241.
Info: Taxon 'Parentucellia viscosa' changed to 'Bellardia viscosa', iNat taxa id 537967.
Info: Taxon 'Anemone nemorosa' changed to 'Anemonoides nemorosa', iNat taxa id 950603.
Info: Taxon 'Hieracium aurantiacum' changed to 'Pilosella aurantiaca', iNat taxa id 711103.
Info: Taxon 'Anemone hepatica' changed to 'Hepatica nobilis', iNat taxa id 639660.
Info: Taxon 'Merremia dissecta' changed to 'Distimake dissectus', iNat taxa id 907480.
Info: Taxon 'Anemone canadensis' changed to 'Anemonastrum canadense', iNat taxa id 881527.
Info: Taxon 'Chamerion angustifolium' changed to 'Chamaenerion angustifolium', iNat taxa id 564969.
Info: Taxon 'Lychnis flos-cuculi' changed to 'Silene flos-cuculi', iNat taxa id 740984.
Throttling API calls, sleeping for 44.5 seconds.
Info: Taxon 'Ampelopsis brevipedunculata' changed to 'Ampelopsis glandulosa brevipedunculata', iNat taxa id 457553.
Info: Taxon 'Anemone acutiloba' changed to 'Hepatica acutiloba', iNat taxa id 179786.
Info: Taxon 'Pennisetum setaceum' changed to 'Cenchrus setaceus', iNat taxa id 430581.
Info: Taxon 'Mimulus guttatus' changed to 'Erythranthe guttata', iNat taxa id 470643.
Info: Taxon 'Blechnum fluviatile' changed to 'Cranfillia fluviatilis', iNat taxa id 700995.
Info: Taxon 'Blechnum discolor' changed to 'Lomaria discolor', iNat taxa id 403546.
Info: Taxon 'Andropogon gerardii' changed to 'Andropogon gerardi', iNat taxa id 121968.
Info: Taxon 'Ferocactus hamatacanthus' changed to 'Hamatocactus hamatacanthus', iNat taxa id 855937.
Info: Taxon 'Gaura lindheimeri' changed to 'Oenothera lindheimeri', iNat taxa id 590726.
Info: Taxon 'Gaura suffulta' changed to 'Oenothera suffulta', iNat taxa id 521639.
Info: Taxon 'Glottidium vesicarium' changed to 'Sesbania vesicaria', iNat taxa id 890511.
Info: Taxon 'Acacia farnesiana' changed to 'Vachellia farnesiana', iNat taxa id 79472.
Warning: multiple taxa named 'Rubus fruticosus': complex 55911, species 1090496; choosing species.
Info: Taxon 'Othocallis siberica' changed to 'Scilla siberica', iNat taxa id 862704.
Info: Taxon 'Mimulus aurantiacus' changed to 'Diplacus', iNat taxa id 777236.
Info: Taxon 'Phoradendron tomentosum' changed to 'Phoradendron leucarpum', iNat taxa id 49668.
Info: Taxon 'Orobanche uniflora' changed to 'Aphyllon uniflorum', iNat taxa id 802714.
Info: Taxon 'Rosmarinus officinalis' changed to 'Salvia rosmarinus', iNat taxa id 636795.
Info: Taxon 'Cynoglossum grande' changed to 'Adelinia grande', iNat taxa id 769151.
Computed taxonomic tree from labels in 64.8 secs: 4,091 taxa including 2,102 leaf taxa.
Taxonomy written to file 'classifiers\aiy_plants_V1_taxonomy.csv'.
Reading common names from 'inaturalist-taxonomy\inaturalist-taxonomy.dwca.zip' member 'VernacularNames-english.csv'...
Read 203,093 common names in 1.5 secs, loaded 3,071 in language "en_US" for 4,091 taxa.
```
### Messages Explained
```
Read 2,102 labels from 'classifiers\aiy_plants_V1_labelmap.csv' in 0.0 secs.
```
`nature-id` reads a label file. If no errors occur, a taxonomy will be written for these labels and further runs will load `classifiers\aiy_plants_V1_taxonomy.csv` instead.
```
Loading iNaturalist taxonomy...
Loaded iNaturalist taxonomy of 993,552 taxa in 15.2 secs.
```
The entire iNaturalist taxonomy of about 1 million taxa is loaded. `nature-id` will look up the labels in this taxonomy and insert them, along with all their ancestors, into a taxonomy for the labels.
```
Info: Taxon for label 'background' not found, inserting as pseudo-kingdom.
```
Label `background` was not found. It is not a species, but denotes something else in the Google model. It is treated as a kingdom in the taxonomy; it has no ancestors.
```
Info: Taxon 'Potentilla anserina' changed to 'Argentina anserina', iNat taxa id 158615.
```
In the current taxonomy, this species belongs to a different genus. The numeric ID in this message is useful for getting more information. This number can be prefixed with `https://www.inaturalist.org/taxa/` and opened in a browser: [https://www.inaturalist.org/taxa/158615](https://www.inaturalist.org/taxa/158615).
```
Warning: multiple taxa named 'Achillea millefolium': species 52821, complex 1105043; choosing species.
```
The label name for this common yarrow is not unique, there are several taxa for this scientific name. `nature-id` assumes that the species is the one we want.
```
Throttling API calls, sleeping for 44.5 seconds.
```
This message is followed by 45 seconds of silence. When a name is not found in the the current taxonomy, the one previously loaded with about 1 million taxa, then iNaturalist API calls are made to look up the inactive scientific name. The iNaturalist team would like us to throttle API calls to no more than 60 calls per minute. This delay has been implemented to accommodate their request.
```
Info: Taxon 'Mimulus aurantiacus' changed to 'Diplacus', iNat taxa id 777236.
```
The species *Mimulus aurantiacus* in the label file is replaced with the genus *Diplacus* and not with the current species *Diplacus aurantiacus*. This looks like a bug and hence deserves a closer look.
The reason for this decision of `nature_id` is that *Mimulus aurantiacus* consisted of several varieties *Mimulus aurantiacus aurantiacus*, *Mimulus aurantiacus grandiflorus*, *Mimulus aurantiacus parviflorus*, and 3 more.
In the current taxonomy, these varieties are species *Diplacus aurantiacus*, *Diplacus grandiflorus*, and *Diplacus parviflorus*. *Diplacus aurantiacus* does not replace *Mimulus aurantiacus*; it replaces the variety *Mimulus aurantiacus aurantiacus*.
Another way to understand this issue is to realize that photos of all varieties *Mimulus aurantiacus aurantiacus*, *Mimulus aurantiacus grandiflorus*, *Mimulus aurantiacus parviflorus* and the 3 others were used to train the classification model to recognize *Mimulus aurantiacus*. In the current taxonomy, this label is triggered for each of the species *Diplacus aurantiacus*, *Diplacus grandiflorus*, and *Diplacus parviflorus*. `nature_id` cannot say which of current species it sees. It can only identify images as genus *Diplacus*.
```
Taxonomy written to file 'classifiers\aiy_plants_V1_taxonomy.csv'.
```
A taxonomy for the scientific names in the label file has been successfully computed and this taxonomy was written to disk. Future calls will load this taxonomy instead of loading the labels and re-computing the taxonomy.
```
Reading common names from 'inaturalist-taxonomy\inaturalist-taxonomy.dwca.zip' member 'VernacularNames-english.csv'...
Read 203,093 common names in 1.5 secs, loaded 3,071 in language "en_US" for 4,091 taxa.
```
Common names have been read. The common names are always selected for the local language, not necessarily for English as shown here.

View File

@@ -0,0 +1,13 @@
# Download Instructions
The [Tensorflow Lite](https://www.tensorflow.org/lite/guide) classifiers that go in this directory can be downloaded from these websites:
* [classifier for plants](https://tfhub.dev/google/aiy/vision/classifier/plants_V1/1)
* [classifier for birds](https://tfhub.dev/google/aiy/vision/classifier/birds_V1/1)
* [classifier for insects](https://tfhub.dev/google/aiy/vision/classifier/insects_V1/1)
Each classifier consists of a `.tflite` model and a `.csv` labelmap file. Both are required.
On each of the above websites scroll down and under `Output` click on `labelmap` to download the labels. Then scroll back up and under `Model formats` switch to `TFLite (aiyvision/classifier/...)`. There click on `Download` to get the `.tflite` file.
If you happen to have the classifier included in [Seek](https://www.inaturalist.org/pages/seek_app), it can go in this directory as well. It consists of two files `optimized_model_v1.tflite` and `taxonomy_v1.csv`.

110
third_party/nature-id/inat_api.py vendored Normal file
View File

@@ -0,0 +1,110 @@
import json, os, pickle, requests, shelve, sys, time
#############################################################################
# #
# API calls to obtain taxonomic information. Used in case of name changes. #
# #
# See documention at https://api.inaturalist.org/v1/docs/#/Taxa #
# #
# We throttle the number of calls to less than 60 per minute. We also #
# implement a cache to avoid repeated lookups of the same taxa across runs. #
# Cache entries include time stamps and they expire after two weeks. #
# #
#############################################################################
API_HOST = "https://api.inaturalist.org/v1"
CACHE_EXPIRATION = 14 * 24 * 3600 # cache expires after 2 weeks
TOO_MANY_API_CALLS_DELAY = 60 # wait this long after error 429
# The cache stores the json responses.
if sys.platform == 'win32':
DATA_DIR = os.path.join(os.path.expanduser('~'),
'AppData', 'Local', 'inat_api')
else:
DATA_DIR = os.path.join(os.path.expanduser('~'), '.cache', 'inat_api')
if not os.path.exists(DATA_DIR):
os.makedirs(DATA_DIR)
cache = shelve.open(os.path.join(DATA_DIR, 'api.cache'))
# API call throttling.
class Throttle:
API_MAX_CALLS = 60 # max 60 calls per minute
API_INTERVAL = 60 # 1 minute
def __init__(self):
self.callTimes = [] # times of api calls
# wait if necessary to avoid more than API_MAX_CALLS in API_INTERVAL
def wait(self):
while len(self.callTimes) >= self.API_MAX_CALLS:
waitTime = self.callTimes[0] - (time.time() - self.API_INTERVAL)
if waitTime > 0:
print('Throttling API calls, '
f'sleeping for {waitTime:.1f} seconds.')
time.sleep(waitTime)
continue
self.callTimes = self.callTimes[1:]
self.callTimes.append(time.time())
api_call_throttle = Throttle()
# argument is an id or a list of id's
def get_taxa_by_id(id):
if type(id) is list:
url = API_HOST + '/taxa/' + '%2C'.join([str(i) for i in id])
else:
url = API_HOST + f'/taxa/{id}'
tim = time.time()
if not url in cache or cache[url][0] < tim - CACHE_EXPIRATION:
delay = TOO_MANY_API_CALLS_DELAY
headers = {'Content-type' : 'application/json' }
while True:
api_call_throttle.wait()
response = requests.get(url, headers=headers)
if response.status_code == requests.codes.too_many:
time.sleep(delay)
delay *= 2
else:
break
if response.status_code == requests.codes.ok:
cache[url] = (tim, response.json())
else:
print(response.text)
return None
return cache[url][1]
# returns taxa by name
def get_taxa(params):
url = API_HOST + '/taxa'
for key, val in params.items():
if type(val) == bool:
params[key] = 'true' if val else 'false'
key = pickle.dumps((url, params)).hex()
tim = time.time()
if not key in cache or cache[key][0] < tim - CACHE_EXPIRATION:
delay = TOO_MANY_API_CALLS_DELAY
headers = {'Content-type' : 'application/json' }
while True:
api_call_throttle.wait()
response = requests.get(url, headers=headers, params=params)
if response.status_code == requests.codes.too_many:
time.sleep(delay)
delay *= 2
else:
break
if response.status_code == requests.codes.ok:
cache[key] = (tim, response.json())
else:
print(response.text)
return None
return cache[key][1]
if __name__ == '__main__':
assert not 'Not a top-level Python module!'

318
third_party/nature-id/inat_taxonomy.py vendored Normal file
View File

@@ -0,0 +1,318 @@
import csv, sys, os, time, locale, zipfile, io
import inat_api
from dataclasses import dataclass
from typing import List, Dict
# The directory where this Python script is located.
INSTALL_DIR = os.path.dirname(__file__)
while os.path.islink(INSTALL_DIR):
INSTALL_DIR = os.path.join(INSTALL_DIR,
os.path.dirname(os.readlink(INSTALL_DIR)))
# This zip file contains the taxonomy and all common names.
# Download https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip and
# leave this zip file in directory 'inaturalist-taxonomy'. Do not extract the
# files from this zip archive.
INAT_TAXONOMY = os.path.join(INSTALL_DIR, 'inaturalist-taxonomy',
'inaturalist-taxonomy.dwca.zip')
# A special node represents the root of the tree, the parent of kingdoms.
ROOT_TAXON_ID = 48460
ROOT_NAME = 'Life'
ROOT_RANK_LEVEL = 100
# maps rank-level to its name
gRankLevel2Name = {
ROOT_RANK_LEVEL : 'stateofmatter', # used for the parent of kingdoms
70 : 'kingdom',
67 : 'subkingdom',
60 : 'phylum',
57 : 'subphylum',
53 : 'superclass',
50 : 'class',
47 : 'subclass',
45 : 'infraclass',
44 : 'subterclass',
43 : 'superorder',
40 : 'order',
37 : 'suborder',
35 : 'infraorder',
34.5: 'parvorder',
34 : 'zoosection',
33.5: 'zoosubsection',
33 : 'superfamily',
32 : 'epifamily',
30 : 'family',
27 : 'subfamily',
26 : 'supertribe',
25 : 'tribe',
24 : 'subtribe',
20 : 'genus',
19 : 'genushybrid', # changed, was same as genus in iNaturalist
15 : 'subgenus',
13 : 'section',
12 : 'subsection',
11 : 'complex',
10 : 'species',
9 : 'hybrid', # changed, was same as species in iNaturalist
5 : 'subspecies',
4 : 'variety', # changed, was same as subspecies in iNaturalist
3 : 'form', # changed, was same as subspecies in iNaturalist
2 : 'infrahybrid' # changed, was same as subspecies in iNaturalist
}
# maps rank name to numeric rank-level
gName2RankLevel = {}
for key, value in gRankLevel2Name.items():
gName2RankLevel[value] = key
KINGDOM_RANK_LEVEL = gName2RankLevel['kingdom']
def get_rank_level(rank):
assert rank in gName2RankLevel
return gName2RankLevel[rank]
def get_rank_name(rank_level, default_name = 'clade'):
return gRankLevel2Name[rank_level] if rank_level in gRankLevel2Name \
else default_name
@dataclass(frozen=True)
class Taxon:
id : int
parent_id : int
name : str
rank_level: float
# iNaturalist taxa, only loaded when a taxonomic tree needs
# to be computed from a label file.
gName2Taxa: Dict[str,List[Taxon]] = {}
"maps taxon name to list of taxa"
gId2Taxon: Dict[int,Taxon] = {}
"maps taxon id to taxon"
def load_inat_taxonomy():
"Load all iNaturalist taxa from file 'taxa.csv'."
global gName2Taxa
global gId2Taxon
if gName2Taxa and gId2Taxon:
return True # already loaded
print('Loading iNaturalist taxonomy...')
start_time = time.time()
gName2Taxa = {}
gId2Taxon = {}
try:
with zipfile.ZipFile(INAT_TAXONOMY, 'r') as zf:
with zf.open('taxa.csv', 'r') as zfile:
with io.TextIOWrapper(zfile, encoding = 'latin-1') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
id = int(row['id'])
parent_id = row['parentNameUsageID'].split('/')[-1]
parent_id = int(parent_id) if parent_id else \
ROOT_TAXON_ID if id != ROOT_TAXON_ID else None
name = row['scientificName']
rank = row['taxonRank']
if not rank in gName2RankLevel:
response = inat_api.get_taxa_by_id(id)
if response and 'results' in response:
rank_level = response['results'][0]\
['rank_level']
gName2RankLevel[rank] = rank_level
if not rank_level in gRankLevel2Name:
gRankLevel2Name[rank_level] = rank
print(f"Please add rank '{rank}' to gName2Rank"
f"Level, numeric value {rank_level}.")
else:
gName2RankLevel[rank] = -1
rank_level = gName2RankLevel[rank]
inat_taxon = Taxon(id, parent_id, name, rank_level)
if name in gName2Taxa:
gName2Taxa[name].append(inat_taxon)
else:
gName2Taxa[name] = [inat_taxon]
assert not id in gId2Taxon
gId2Taxon[id] = inat_taxon
if len(gId2Taxon) % 10000 == 0:
print(f' {len(gId2Taxon):,} ' if len(gId2Taxon) %
100000 == 0 else '.', end='')
sys.stdout.flush()
assert ROOT_TAXON_ID in gId2Taxon
print(f' {len(gId2Taxon):,}.')
print(f'Loaded iNaturalist taxonomy of {len(gId2Taxon):,} taxa '
f'in {time.time()-start_time:.1f} secs.')
return True
except Exception as e:
print("Cannot load taxonomy 'taxa.csv' from archive "
f"'{INAT_TAXONOMY}': {str(e)}.")
gName2Taxa = {}
gId2Taxon = {}
return False
def beautify_common_name(name):
"Capitalize (most) words in common name; helper function for common names."
if name.endswith(' [paraphyletic]'):
name = name[:-15] # fix dicots
name = '-'.join(word[0].upper() + word[1:]
for word in name.split('-'))
return ' '.join(word if word == 'and' or word.endswith('.')
else word[0].upper() + word[1:]
for word in name.split())
def annotate_common_names(id2taxon, all_common_names = False):
"""
Load the common names in our language, annotate taxonomic tree with them.
The parameter `id2taxon' includes the taxa we are interested in.
"""
start_time = time.time()
language, _ = locale.getdefaultlocale()
if language in ['C', 'C.UTF-8', 'POSIX']:
language = 'en'
if not os.path.isfile(INAT_TAXONOMY):
print("Cannot load common names, archive "
f"'{INAT_TAXONOMY}' does not exist.")
return
try:
with zipfile.ZipFile(INAT_TAXONOMY, 'r') as zf:
perfect_match = []
other_matches = []
# check all common names files for names in our language
for fname in zf.namelist():
if fname.startswith("VernacularNames-") and \
fname.endswith(".csv"):
with zf.open(fname, 'r') as zfile:
with io.TextIOWrapper(zfile, encoding='utf-8') as csvf:
reader = csv.DictReader(csvf)
for row in reader:
lang = row['language']
if lang == language:
perfect_match.append(fname) # en vs en
elif len(lang) < len(language) and \
lang == language[:len(lang)]:
other_matches.append(fname) # en vs en_US
break
if not perfect_match and not other_matches:
print("Cannot find common names for language '{language}'.")
return
# annotate the taxa with common names
total_names = loaded_names = 0
for fname in perfect_match + other_matches:
print(f"Reading common names from '{INAT_TAXONOMY}' "
f"member '{fname}'...")
with zf.open(fname, 'r') as zfile:
with io.TextIOWrapper(zfile, encoding='utf-8') as csvf:
reader = csv.DictReader(csvf)
for row in reader:
total_names += 1
id = int(row['id'])
if id in id2taxon and (all_common_names or \
id2taxon[id].common_name is None):
loaded_names += 1
cname = beautify_common_name(row['vernacular'
'Name'])
if id2taxon[id].common_name is None:
id2taxon[id].common_name = cname
else:
id2taxon[id].common_name += '; ' + cname
print(f'Read {total_names:,} common names in '
f'{time.time()-start_time:.1f} secs, loaded {loaded_names:,} '
f'in language "{language}" for {len(id2taxon)-1:,} taxa.')
except Exception as e:
print(f"Cannot load common names from archive '{INAT_TAXONOMY}':"
f" {str(e)}.")
def get_ancestors(id, ancestors):
"""
Ancestors are a list of instances of Taxon; they are ordered from the
kingdom down.
"""
taxon = gId2Taxon[id]
if taxon.rank_level < KINGDOM_RANK_LEVEL:
get_ancestors(taxon.parent_id, ancestors)
ancestors.append(taxon)
def lookup_id(name, desired_ranks = ['species', 'subspecies']):
"""
Lookup by name, returns a pair, a Taxon and its ancestors, a list of
Taxon. Desired_ranks are returned in case of ambiguities (duplicate names).
"""
if not gName2Taxa:
return None # taxonomy not loaded
if name in gName2Taxa:
taxa = gName2Taxa[name]
if len(taxa) > 1:
species = None
subspecies = None
print(f"Warning: multiple taxa named '{name}':", end='')
prefix = ' '
taxon = None
for t in taxa:
rank = get_rank_name(t.rank_level)
print(f"{prefix}{rank} {t.id}", end='')
if rank in desired_ranks:
taxon = t
prefix = ', '
if not taxon:
taxon = taxa[0]
rank = get_rank_name(taxon.rank_level)
print(f"; choosing {rank}.")
else:
taxon = taxa[0]
ancestors = []
if taxon.rank_level < KINGDOM_RANK_LEVEL:
get_ancestors(taxon.parent_id, ancestors)
return (taxon, ancestors)
else:
# likely taxon change, query iNat API
response = inat_api.get_taxa({ 'q' : name,
'all_names' : 'true',
'per_page' : 200 })
if not response:
print(f"API lookup for name '{name}' failed.")
return
taxa = response['results']
if len(taxa) > 1:
# more than one taxon, find the one that used to have this name
exact_matches = [taxon for taxon in taxa for nam in taxon['names']
if nam['locale'] == 'sci' and nam['name'] == name]
if exact_matches:
taxa = exact_matches
ids = [taxon['id'] for taxon in taxa]
taxa = set([gId2Taxon[id] for id in ids if id in gId2Taxon])
if not taxa:
return
while len(taxa) > 1:
# multiple taxa, find their common ancestor
min_rank_level = min([taxon.rank_level for taxon in taxa])
new_taxa = set()
for taxon in taxa:
new_taxon = gId2Taxon[taxon.parent_id] \
if taxon.rank_level == min_rank_level \
else taxon
if not new_taxon in new_taxa:
new_taxa.add(new_taxon)
taxa = new_taxa
taxon = taxa.pop()
ancestors = []
if taxon.rank_level < KINGDOM_RANK_LEVEL:
get_ancestors(taxon.parent_id, ancestors)
return (taxon, ancestors)
if __name__ == '__main__':
assert not 'Not a top-level Python module!'

View File

@@ -0,0 +1,3 @@
The .zip archive with the taxonomy and common names belongs in this directory.
Download https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip to this directory. Do not unpack this archive.

View File

@@ -0,0 +1,4 @@
#!/bin/sh
rm -f inaturalist-taxonomy.dwca.zip
curl https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip \
-o inaturalist-taxonomy.dwca.zip

537
third_party/nature-id/nature_id.py vendored Executable file
View File

@@ -0,0 +1,537 @@
#!/usr/bin/env python3
import numpy as np
from PIL import Image, ImageOps
import csv, sys, os, time
import inat_taxonomy
try:
# try importing TensorFlow Lite first
import tflite_runtime.interpreter as tflite
except Exception:
try:
# TensorFlow Lite not found, try to import full TensorFlow
import tensorflow.lite as tflite
except Exception:
print('Error: TensorFlow Lite could not be loaded.', file=sys.stderr)
print(' Follow instructions at https://www.tensorflow.org/lite/'
'guide/python to install it.', file=sys.stderr)
sys.exit(1)
# The directory where this Python script is located.
INSTALL_DIR = inat_taxonomy.INSTALL_DIR
# This directory contains models, label files, and taxonomy files.
CLASSIFIER_DIRECTORY = os.path.join(INSTALL_DIR, 'classifiers')
# These flags can be modified with command-line options.
scientific_names_only = False # only scientific names or also common names
label_scores_only = False # scores for labels or hierarchical
all_common_names = False # show only one or all common names
result_sz = 5 # result size (for label_scores_only)
# This class is used by class Taxonomy.
class Taxon:
def __init__(self, taxon_id):
self.taxon_id = taxon_id # for internal lookups and iNat API calls
self.rank_level = None # taxonomic rank, e.g. species, genus, family
self.name = None # scientific name
self.common_name = None # common name or None
self.children = [] # list of child taxa
self.leaf_class_ids = [] # list of indices into scores; there
# can be more than one when we use old models
# whose taxa have since been lumped together
def add_child(self, child_taxon):
self.children.append(child_taxon)
# get taxonomic rank as a string
def get_rank(self):
if self.taxon_id < 0: # pseudo-kingdom?
assert self.rank_level == inat_taxonomy.KINGDOM_RANK_LEVEL
return ''
return inat_taxonomy.get_rank_name(self.rank_level)
# get the name to display; customize here to show common names differently
def get_name(self):
if self.common_name:
return f'{self.common_name} ({self.name})'
else:
return self.name
# This taxonomy is represented in terms of instances of class Taxon.
class Taxonomy:
def __init__(self):
# The taxonomy file may contain multiple trees, one for each kingdom.
# In order to have a single tree for prediction, we add a node for
# Life as the parent of all kingdoms. This will be the root of our tree.
self.root = Taxon(inat_taxonomy.ROOT_TAXON_ID)
self.root.name = inat_taxonomy.ROOT_NAME
self.root.rank_level = inat_taxonomy.ROOT_RANK_LEVEL
self.id2taxon = { self.root.taxon_id : self.root }
self.idx2label = {}
def reset(self):
self.root.children = []
self.id2taxon = { self.root.taxon_id : self.root }
self.idx2label = {}
def taxonomy_available(self):
return len(self.root.children) > 0
def read_taxonomy(self, filename):
start_time = time.time()
self.reset()
with open(filename, newline='', encoding='latin-1') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if 'id' in row: # this is a label file
self.idx2label[int(row['id'])] = row['name']
continue
taxon_id = int(row['taxon_id'])
if taxon_id in self.id2taxon:
taxon = self.id2taxon[taxon_id] # inserted earlier as parent
else:
self.id2taxon[taxon_id] = taxon = Taxon(taxon_id)
taxon.name = row['name']
if row['rank_level'].isdigit():
taxon.rank_level = int(row['rank_level'])
else:
taxon.rank_level = float(row['rank_level'])
if len(row['leaf_class_id']):
for leaf_class_id in row['leaf_class_id'].split(';'):
leaf_class_id = int(leaf_class_id)
taxon.leaf_class_ids.append(leaf_class_id)
self.idx2label[leaf_class_id] = taxon.name
if len(row['parent_taxon_id']):
parent_taxon_id = int(row['parent_taxon_id'])
else:
parent_taxon_id = self.root.taxon_id
if not parent_taxon_id in self.id2taxon:
self.id2taxon[parent_taxon_id] = Taxon(parent_taxon_id)
self.id2taxon[parent_taxon_id].add_child(taxon)
if not self.taxonomy_available():
# We parsed a label file; unless told otherwise, we use these
# labels to build a taxonomic tree.
print(f"Read {len(self.idx2label):,} labels from '{filename}' "
f"in {time.time() - start_time:.1f} secs.")
if not label_scores_only:
self.compute_taxonomic_tree()
if self.taxonomy_available():
self.write_taxonomic_tree(filename.replace('labelmap',
'taxonomy'))
else:
print(f"Read taxonomy from '{filename}' in "
f"{time.time() - start_time:.1f} secs: "
f"{len(self.id2taxon) - 1:,} taxa including "
f"{len(self.idx2label):,} leaf taxa.")
if not scientific_names_only and self.taxonomy_available():
inat_taxonomy.annotate_common_names(self.id2taxon, all_common_names)
if label_scores_only:
self.annotate_labels_with_common_names()
del self.id2taxon # not needed anymore
# augment labels with common names
def annotate_labels_with_common_names(self):
for taxon in self.id2taxon.values():
for leaf_class_id in taxon.leaf_class_ids:
self.idx2label[leaf_class_id] = taxon.get_name()
# write one row to taxonomy file
def write_row(self, writer, taxon, parent_taxon_id):
writer.writerow([parent_taxon_id, taxon.taxon_id, taxon.rank_level,
';'.join([str(id) for id in taxon.leaf_class_ids]),
taxon.name])
for child in taxon.children:
self.write_row(writer, child, taxon.taxon_id)
# write taxonomy file
def write_taxonomic_tree(self, filename):
try:
with open(filename, 'w', newline='', encoding='latin-1') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['parent_taxon_id', 'taxon_id', 'rank_level',
'leaf_class_id', 'name'])
for child in self.root.children:
self.write_row(writer, child, '')
print(f"Taxonomy written to file '{filename}'.")
except Exception as e:
print(f"Failure writing taxonomy to file '{filename}':", str(e))
try:
os.remove(filename)
except Exception:
pass
# Called after loading label file for Google's AIY Vision Kit.
# Adds all the labels' direct and indirect ancestors to compute
# the taxonomic tree.
def compute_taxonomic_tree(self):
global label_scores_only
if not inat_taxonomy.load_inat_taxonomy():
label_scores_only = True
return
start_time = time.time()
new_id = 0 # id's we add on the fly for pseudo-kingdoms
for idx, name in self.idx2label.items():
inat_taxa = inat_taxonomy.lookup_id(name)
if not inat_taxa:
print(f"Info: Taxon for label '{name}' not found, "
"inserting as pseudo-kingdom.")
new_id -= 1
taxon_id = new_id
self.id2taxon[taxon_id] = taxon = Taxon(taxon_id)
taxon.rank_level = inat_taxonomy.KINGDOM_RANK_LEVEL
taxon.name = name
taxon.leaf_class_ids = [idx]
self.root.add_child(taxon)
continue
inat_taxon, ancestors = inat_taxa
if name != inat_taxon.name:
print(f"Info: Taxon '{name}' changed to "
f"'{inat_taxon.name}', iNat taxa "
f"id {inat_taxon.id}.")
# ancestor taxa
prev_ancestor = self.root
for ancestor in ancestors:
if ancestor.id in self.id2taxon:
prev_ancestor = self.id2taxon[ancestor.id]
else:
self.id2taxon[ancestor.id] = ancestor_taxon = Taxon(ancestor.id)
ancestor_taxon.name = ancestor.name
ancestor_taxon.rank_level = ancestor.rank_level
prev_ancestor.add_child(ancestor_taxon)
prev_ancestor = ancestor_taxon
# this taxon
if inat_taxon.id in self.id2taxon:
taxon = self.id2taxon[inat_taxon.id]
assert taxon.name == inat_taxon.name
assert taxon.rank_level == inat_taxon.rank_level
else:
self.id2taxon[inat_taxon.id] = taxon = Taxon(inat_taxon.id)
taxon.name = inat_taxon.name
taxon.rank_level = inat_taxon.rank_level
prev_ancestor.add_child(taxon)
taxon.leaf_class_ids.append(idx)
print("Computed taxonomic tree from labels in "
f"{time.time() - start_time:.1f} secs: {len(self.id2taxon)-1:,} "
f"taxa including {len(self.idx2label):,} leaf taxa.")
# propagate scores to taxon and all below
def assign_scores(self, taxon, scores):
taxon.score = 0.0
for leaf_class_id in taxon.leaf_class_ids:
taxon.score += scores[leaf_class_id]
for child in taxon.children:
self.assign_scores(child, scores)
taxon.score += child.score
# Returns list of 5-tuples (score, taxon_id, taxonomic rank,
# scientific name, common name) ordered by taxonomic rank from kingdom
# down to e.g. species.
# Returns pairs (score, scientific name) if label_scores_only
# is set.
def prediction(self, scores):
if label_scores_only:
# return list of pairs (score, scientific name)
total = np.sum(scores)
indices = np.argpartition(scores, -result_sz)[-result_sz:]
results = [(scores[i] / total, self.idx2label[i])
for i in indices if scores[i] != 0]
results.sort(reverse=True)
return results
# annotate all taxa across the hierarchy with scores.
self.assign_scores(self.root, scores)
# return one hierarchical path guided by scores
path = []
taxon = self.root
while taxon.children:
# Find child with highest score.
best_child = None
for child in taxon.children:
if not best_child or child.score > best_child.score:
best_child = child
# Truncate path if all the other children combined are better
if best_child.score < 0.5 * taxon.score:
break
path.append((best_child.score / self.root.score,
best_child.taxon_id, best_child.get_rank(),
best_child.get_name()))
taxon = best_child
return path
#
# Offline image classification.
#
class OfflineClassifier:
def __init__(self, filenames):
self.min_pixel_value = 0.0
self.max_pixel_value = 255.0
if os.path.split(filenames[0])[1] in ['optimized_model.tflite',
'optimized_model_v1.tflite']:
self.min_pixel_value = -1.0
self.max_pixel_value = 1.0
# Load TFLite model and allocate tensors.
self.mInterpreter = tflite.Interpreter(model_path=filenames[0])
self.mInterpreter.allocate_tensors()
# Get input and output tensors.
self.mInput_details = self.mInterpreter.get_input_details()
self.mOutput_details = self.mInterpreter.get_output_details()
# Read labels or taxonomy
self.mTaxonomy = Taxonomy()
self.mTaxonomy.read_taxonomy(filenames[1])
def classify_image(self, image_filename):
start_time = time.time()
try:
img = Image.open(image_filename)
except:
print(f"Error: cannot load image '{image_filename}'.")
return []
if img.mode != 'RGB':
print(f"Error: image '{image_filename}' is of mode '{img.mode}',"
" only mode RGB is supported.")
return []
# rotate image if needed as it may contain EXIF orientation tag
img = ImageOps.exif_transpose(img)
model_size = tuple(self.mInput_details[0]['shape'][1:3])
# square target shape expected by crop code below
assert model_size[0] == model_size[1]
if img.size != model_size:
# We need to scale and maybe want to crop image.
width, height = img.size
if width != height:
# Before scaling, we crop image to square shape.
left = 0
right = width
top = 0
bottom = height
if width < height:
top = (height - width) / 2
bottom = top + width
else:
left = (width - height) / 2
right = left + height
img = img.crop((left, top, right, bottom))
# scale image
img = img.resize(model_size)
#img.show()
# pixels are in range 0 ... 255, turn into numpy array
input_data = np.array([np.array(img, self.mInput_details[0]['dtype'])])
if self.mInput_details[0]['dtype'] == np.float32:
input_data *= (self.max_pixel_value - self.min_pixel_value) / 255.0
input_data += self.min_pixel_value
self.mInterpreter.set_tensor(self.mInput_details[0]['index'],
input_data)
self.mInterpreter.invoke()
output_data = self.mInterpreter.get_tensor(self.mOutput_details[0]
['index'])
path = self.mTaxonomy.prediction(output_data[0])
print()
print(f"Classification of '{image_filename}' took "
f"{time.time() - start_time:.1f} secs.")
return path
# Returns a dictionary that maps available classifiers to a pair of filenames.
def get_installed_models():
if not os.path.isdir(CLASSIFIER_DIRECTORY):
print("Cannot load classifiers, directory "
f"'{CLASSIFIER_DIRECTORY}' does not exist.")
sys.exit(1)
choices = [ 'birds', 'insects', 'plants']
models = {}
for filename in os.listdir(CLASSIFIER_DIRECTORY):
model = None
if filename.endswith(".csv"):
if filename == 'taxonomy_v2_13.csv':
model = 'v2_13'
elif filename == 'taxonomy_v1.csv':
model = 'Seek'
else:
for m in choices:
if filename.find(m) != -1:
model = m
break
if model:
filename = os.path.join(CLASSIFIER_DIRECTORY, filename)
if model in models:
if not models[model][1] or models[model][1].\
endswith('labelmap.csv'):
models[model] = (models[model][0], filename)
else:
models[model] = (None, filename)
elif filename.endswith(".tflite"):
if filename == 'optimized_model_v2_13.tflite':
model = 'v2_13'
elif filename == 'optimized_model_v1.tflite':
model = 'Seek'
else:
for m in choices:
if filename.find(m) != -1:
model = m
break
if model:
filename = os.path.join(CLASSIFIER_DIRECTORY, filename)
if model in models:
models[model] = (filename, models[model][1])
else:
models[model] = (filename, None)
delete_elements = [] # postponed deletion, cannot delete during iteration
for name, files in models.items():
if not files[0] or not files[1]:
tf_missing = ".csv file but no .tflite file"
csv_missing = ".tflite file but no .csv file"
print("Installation issue: Excluding incomplete classifier for"
f" '{name}': {tf_missing if files[1] else csv_missing}.")
delete_elements.append(name)
for element in delete_elements:
del models[element]
if not models:
print(f"No classifiers found in directory '{CLASSIFIER_DIRECTORY}'; "
"follow instructions in "
f"'{os.path.join(CLASSIFIER_DIRECTORY,'README.md')}'"
" to install them.", file=sys.stderr)
sys.exit(1)
return models
def identify_species(classifier, filename):
result = classifier.classify_image(filename)
if result:
# Print list of tuples (score, taxon id, taxonomic rank, name)
# ordered by taxonomic rank from kingdom down to species.
for entry in result:
if len(entry) == 2: # labels only
print(f'{100 * entry[0]:5.1f}% {entry[1]}')
continue
print(f'{100 * entry[0]:5.1f}% {entry[2]:11s} {entry[3]}')
# command-line parsing
models = get_installed_models()
def model_parameter_check(arg):
if not arg in models:
msg = f"Model '{arg}' not available. Available "\
f"model{'' if len(models)==1 else 's'}:"
prefix = ' '
for m in models:
msg += f"{prefix}'{m}'"
prefix = ', '
msg += '.'
raise argparse.ArgumentTypeError(msg)
return arg
def result_size_check(arg):
if arg.isdigit() and int(arg) > 0 and int(arg) <= 100:
return int(arg)
raise argparse.ArgumentTypeError(f"'{arg}' is not a number "
"between 1 and 100.")
def file_directory_check(arg):
if os.path.isdir(arg) or os.path.isfile(arg):
return arg
raise argparse.ArgumentTypeError(f"'{arg}' is not a file or directory.")
#
# Identify species for picture files and directories given as command line args
#
if __name__ == '__main__':
import argparse
preferred1 = 'v2_13' # default if this model is available
preferred2 = 'Seek' # second preference
parser = argparse.ArgumentParser()
if len(models) == 1 or preferred1 in models or preferred2 in models:
default_model = preferred1 if preferred1 in models else \
preferred2 if preferred2 in models else \
next(iter(models))
parser.add_argument("-m", "--model", type=model_parameter_check,
default=default_model,
help="Model to load to identify organisms.")
else: # no default for classification model
parser.add_argument("-m", "--model", type=model_parameter_check,
required=True,
help="Model to load to identify organisms.")
parser.add_argument('-a', '--all_common_names', action="store_true",
help='Show all common names and not just one.')
parser.add_argument('-l', '--label_scores_only', action="store_true",
help='Compute and display only label scores, '
'do not propagate scores up the hierarchy.')
parser.add_argument('-s', '--scientific_names_only', action="store_true",
help='Only use scientific names, do not load common '
'names.')
parser.add_argument('-r', '--result_size', type=result_size_check,
default=result_sz, help='Number of labels and their '
'scores to report in results.')
parser.add_argument('files_dirs', metavar='file/directory',
type=file_directory_check, nargs='+',
help='Image files or directories with images.')
args = parser.parse_args()
scientific_names_only = args.scientific_names_only
label_scores_only = args.label_scores_only
all_common_names = args.all_common_names
result_sz = args.result_size
# make classifier instance
classifier = OfflineClassifier(models[args.model])
# process photos
for arg in args.files_dirs:
if os.path.isfile(arg):
identify_species(classifier, arg)
elif os.path.isdir(arg):
for file in os.listdir(arg):
ext = os.path.splitext(file)[1].lower()
if ext in ['.jpg', '.jepg', '.png']:
identify_species(classifier, os.path.join(arg, file))

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 399 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 257 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 254 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 189 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 198 KiB

View File

@@ -0,0 +1,3 @@
Pillow
requests
tflite-runtime