Resolving and Extending Urls

The primary interface is appurls.parse_url`(), which will find and construct a appurl.url.Url for a string. The function will select a Url class using two selection criteria. The first is the appurl.urls entry point. Here is the entrypoint configuration for the appurl package:

  entry_points = {
    'appurl.urls' : [ "\* = appurl.url:Url",
        #
        "http: = appurl.web.web:WebUrl",
        "https: = appurl.web.web:WebUrl",
        "s3: = appurl.web.s3:S3Url",
        "socrata+ = appurl.web.socrata:SocrataUrl",
        #
        # Archive Urls
        ".zip = appurl.archive.zip:ZipUrl",
        #
        # File Urls
        ".csv = appurl.file.csv:CsvFileUrl",
        "file: = appurl.file.file:FileUrl",
    ]
}

The key of each configuration like is a string the indicate the first round of matching, and the value is the class to use for that matcher. The match strings are:

  • ‘proto:’ The URL protocol, which is based on either the URL scheme or scheme extension.
  • ‘.format’ The target file format
  • ‘schemeext+’ The URL scheme extension.
  • ‘*’ Any URL.

Because these configurations are in the entry pointy, you can extend the AppUrls by including these entry points in Python packages.

The rowgenerators.appurls.parse_url collects all of the URL classes that pass the initial matchers and sorts them by the Url.match_priority class property. Then, it iterates through the matched classes in priority order, calling the Url.match method. The function contructs and returns the first match.