Row Generators¶
This module provides services for creating references to data files, downloading those files, and iterating through them as a sequence of rows. For instance, you can define a URL to a CSV file within a ZIP archive on the web as:
After constructing this URL, the module provides an interface to access the CSV file as rows, downloading the archive, caching it, and extracting the inner CSV file.
Additionally, the module provides services for transforming data during iteration, to set default value cast to specific types, extract components of dates, and many other transformations.
The components of this module include:
- Application Urls: exensible URLs for referencing row data
- Row Generators: Objects that yield rows that are referenced by an Application Url
- Row Transforms: Construct pipelines that transform the value of columns while rows are being iterated
- Value Types: Rich types for column values, allowing sophisticated interactions and transformations with Row Transforms.
- Row Pipes: Composable functions that alter and filter entire rows.
Example¶
from rowgenerators import parse_app_url
from os.path import exists
url_str = 'http://public.source.civicknowledge.com/'\
'example.com/sources/test_data.zip#simple-example.csv'
url = parse_app_url(url_str) # Parse a string to an Application url
resource_url = url.get_resource() # Download the .zip file
target_url = resource_url.get_target() # Extract `file.csv` from the .zip
assert(target_url.path) # The path to file.csv
generator = target_url.generator
rows = list(generator) # Fetch all of the rows. First row is header
# Iterate rows as dicts
float_sum = sum(float(row['float']) for row in generator.iter_dict)
# Iterate with RowProxy objects
int_sum = sum(int(row.int) for row in generator.iter_rp)
print(len(rows), float_sum, int_sum)
Install¶
Use pip:
$ pip install rowgenerators
Or, from github:
$ pip install git+https://github.com/CivicKnowledge/rowgenerators.git