This function ingests optionally georegistered polygon labels in geojson format alongside image(s) and generates .json files per the `COCO dataset specification`_ . Some models, like many Mask R-CNN implementations, require labels to be in this format. The function assumes you're providing image file(s) and geojson file(s) to create the dataset. If the number of images and geojsons are both > 1 (e.g. with a SpaceNet dataset), you must provide a regex pattern to extract matching substrings to match images to label files. .. _COCO dataset specification: http://cocodataset.org/ Arguments --------- image_src : :class:`str` or :class:`list` or :class:`dict` Source image(s) to use in the dataset. This can be:: 1. a string path to an image, 2. the path to a directory containing a bunch of images, 3. a list of image paths, 4. a dictionary corresponding to COCO-formatted image records, or 5. a string path to a COCO JSON containing image records. If a directory, the `recursive` flag will be used to determine whether or not to descend into sub-directories. label_src : :class:`str` or :class:`list` Source labels to use in the dataset. This can be a string path to a geojson, the path to a directory containing multiple geojsons, or a list of geojson file paths. If a directory, the `recursive` flag will determine whether or not to descend into sub-directories. output_path : str, optional The path to save the JSON-formatted COCO records to. If not provided, the records will only be returned as a dict, and not saved to file. image_ext : str, optional The string to use to identify images when searching directories. Only has an effect if `image_src` is a directory path. Defaults to ``".tif"``. matching_re : str, optional A regular expression pattern to match filenames between `image_src` and `label_src` if both are directories of multiple files. This has no effect if those arguments do not both correspond to directories or lists of files. Will raise a ``ValueError`` if multiple files are provided for both `image_src` and `label_src` but no `matching_re` is provided. category_attribute : str, optional The name of an attribute in the geojson that specifies which category a given instance corresponds to. If not provided, it's assumed that only one class of object is present in the dataset, which will be termed ``"other"`` in the output json. score_attribute : str, optional The name of an attribute in the geojson that specifies the prediction confidence of a model preset_categories : :class:`list` of :class:`dict`s, optional A pre-set list of categories to use for labels. These categories should be formatted per `the COCO category specification`_. example: [{'id': 1, 'name': 'Fighter Jet', 'supercategory': 'plane'}, {'id': 2, 'name': 'Military Bomber', 'supercategory': 'plane'}, ... ] include_other : bool, optional If set to ``True``, and `preset_categories` is provided, objects that don't fall into the specified categories will not be removed from the dataset. They will instead be passed into a category named ``"other"`` with its own associated category ``id``. If ``False``, objects whose categories don't match a category from `preset_categories` will be dropped. info_dict : dict, optional A dictonary with the following key-value pairs:: - ``"year"``: :class:`int` year of creation - ``"version"``: :class:`str` version of the dataset - ``"description"``: :class:`str` string description of the dataset - ``"contributor"``: :class:`str` who contributed the dataset - ``"url"``: :class:`str` URL where the dataset can be found - ``"date_created"``: :class:`datetime.datetime` when the dataset was created license_dict : dict, optional A dictionary containing the licensing information for the dataset, with the following key-value pairs:: - ``"name": :class:`str` the name of the license. - ``"url": :class:`str` a link to the dataset's license. *Note*: This implementation assumes that all of the data uses one license. If multiple licenses are provided, the image records will not be assigned a license ID. recursive : bool, optional If `image_src` and/or `label_src` are directories, setting this flag to ``True`` will induce solaris to descend into subdirectories to find files. By default, solaris does not traverse the directory tree. explode_all_multipolygons : bool, optional Explode the multipolygons into individual geometries using sol.utils.geo.split_multi_geometries. Be sure to inspect which geometries are multigeometries, each individual geometries within these may represent artifacts rather than true labels. remove_all_multipolygons : bool, optional Filters MultiPolygons and GeometryCollections out of each tile geodataframe. Alternatively you can edit each polygon manually to be a polygon before converting to COCO format. verbose : int, optional Verbose text output. 