Introduction¶

To use BriteETL in a project:

import brite_etl

FrameSet¶

For an overview on Frames & FrameSets, please see Frames & Frame Sets.

To start we’ll create a FrameSet and set our frame data source.

from brite_etl.core.io.frame_sources import CsvSource

# Create a frameset to work with...
cfm = brite_etl.lib.FrameSet('cfm')

#Set the source of our csvs (can also pass BriteDataFrameSource)...
cfm.set_data_sources(source=CsvSource('/tmp/df_cache_root'), prepared_source=CsvSource('/tmp/df_prep_cache_root'))

Now that have our frameset ready, we can start using it! This call will return PropertyItems, populated with the data from the CSV source that we did above.

pi = cfm.frames.get('property_items')
pi.df # This is the actual pandas.DataFrame

Frame Operations¶

Functions that are frame-specific can be called directly, because they should be defined in the Frames class itself.

cfm.frames.get('prepared.claims').function_that_only_applies_to_prepared_claims_and_nothing_else()

Operations (Not frame-specific)¶

Universal functions that are not frame-specific, but are still only used on one frame at a time can be called like this.

Note that in this example, I’m getting the revisions dataframe from the brite_etl.FrameSet, but you can also manually read the csv and pass that if you’d prefer (see bottom of page for example).

from brite_etl.core.operations import hash_cols
revs = cfm.frames.get('revisions').df

result = hash_cols(revs, cols=['policyId', 'revisionId'])

You can also get the frame chain, which will chain the frame.df for you along multiple functions, without having to import them.

_rev = cfm.frames.get('revisions').chain

result = _rev.hash_cols(cols=['policyId', 'revisionId']).another_universal_function().value()

Computations¶

Computations are basically mini-reports. They take multiple frames, do some stuff to them, then return a pandas DataFrame.

To call directly:

from brite_etl.core.computations import get_item_transactions

_frames = {
    'revisions': cfm.frames.get('revisions'),
    'property_items': cfm.frames.get('property_items'),
    'revision_items': cfm.frames.get('revision_items'),
    'prepared': {
        'accounting': cfm.frames.get('prepared')
    }
}

item_trans = get_item_transactions(_frames)

Or, be cool and chain the whole frameset. The frames needed will be fetched and resolved automatically. Don’t even have to import the function you’re calling:

_cfm = cfm.chain
item_trans = _cfm.get_item_transactions().value()

Quick Note About Frame Sets¶

Every frame stored within a specific frameset is a singleton.

_cfm = cfm.chain
item_trans1 = _cfm.get_item_transactions().value()

rev = cfm.frames.get('revisions')

# Do a bunch of stuf to rev...

item_trans2 = _cfm.get_item_transactions().value()

item_trans1 == item_trans2 # False!!!

This is done to ensure the frames inside of a frameset are exactly what you want them to be.

If you want to get a fresh copy of the frame, with data straight from the csv:

new_rev = cfm.frames.get('revisions', fresh=True)

You also don’t have to use a frameset if you don’t want to:

# inside a jupyter report...
from reports.utils import BriteDataFrame
from brite_etl.frames import Policies

bdf = BriteDataFrame()
df = bdf.get_dataframe('policies')

policies = Policies(policies_df)

_policies = polices.chain # Can still chain universal operations, without having to import brite_etl as a whole