demand_acep package

Submodules

demand_acep.demand_acep module

This module contains code for the demand_acep. More documentation to come.

This is a test docstring for the whole module

demand_acep.demand_acep.build_interpolation(y_values, n_val)[source]

This function takes performs the actual 1-d interpolation. If the number of consecutive missing points is less than 3, a linear interpolation is used, else, a cubic interpolation is used.

Parameters:
y_values :

y_values are the values on which the function interpolation is built, that is, y_values = f(x).

n_val :

n_val is the number of consecutive missing points that needs to be filled.

Returns:
y_interp

Array of interpolated values equal in length to the missing supplied length (n_val) of missing data points..

demand_acep.demand_acep.compute_interpolation(df)[source]

This function imputes missing measurement data (Nan) in a series using 1-d interpolation.

Parameters:
df :

df is a series containing missing measurements values.

Returns:
Series

Filled pandas series with no missing values.

demand_acep.demand_acep.data_impute(impute_df)[source]

This function imputes missing measurement in a dataframe using a 1-d interpolation. If the number of consecutive missing points is less than 3, a linear interpolation is used, else, a cubic interpolation is used.

Parameters:
impute_df :

impute_df can either be a dataframe of a dictionary of dataframes containing missing measurements values.

Returns:
Dataframe

Filled pandas dataframe with no missing values.

demand_acep.demand_acep.data_resample(df, sample_time='1T')[source]

This function downsamples a sample-time indexed pandas dataframe containing measurement channel values based on the sample time supplied. It uses the mean of the values within the resolution interval. It uses the pandas dataframe module df.resample

Parameters:
df :

df is a sample-time indexed pandas dataframe containing measurement values from the different channels of each meter.

sample_time :

sample_time determines the desired resolution of the downsampled data. For 1 minute - 1T, 1 hour - 1H, 1 month - 1M, 1 Day - 1D etc. The default chosen here is 1 minute.

Returns:
Dataframe

Resampled-time indexed pandas dataframe containing downsampled measurement values from the given dataframe.

demand_acep.demand_acep.extract_data(dirpath, filename)[source]

This function reads and extracts the NetCDF format data of the given meter channel using a package called xarray.

Parameters:
dirpath :

dirpath is the directory path location of the NetCDF file to be read

filename :

filename is the NetCDF format meter channel file to be read.

Returns:
Dataframe

Sample-time indexed pandas dataframe containing measurement values from the given file.

demand_acep.demand_acep.extract_ppty(filename, meter_name)[source]

This function parses out the given filename as a string to determine the meter name and the measurement channel/type contained in the file

Parameters:
filename :

filename is the NetCDF format meter channel file whose name contains information such as location, date, meter type, measurement channel/type and sampling frequency. An example filename is: ‘PokerFlatResearchRange-PokerFlat-PkFltM1AntEaDel@2018-07-02T081007Z@PT23H@PT146F.nc’

meter_name :

meter_name is a list containing the names of each of the meters at pokerflats

Returns:
meter : string

meter is the meter name of the NetCDF format file given.

channel : string

channel is the measurement type contained in the NetCDF format file given

demand_acep.demand_acep.long_missing_data_prep(dirpath, filename)[source]

This function prepares a dataset in a csv format with missing days, months or years for interpolation using the data_impute function. It fills in the missing time as a ‘DateTimeIndex’ and assigns a value of NaN to the missing data points.

Parameters:
dirpath :

dirpath is the directory path location of the csv file containing the missing data points in already down-sampled to a 1-Minute interval.

filename :

filename is the csv file containing the missing data points to be read.

Returns:
Dataframe

pandas dataframe with ‘DateTimeIndex’ and value of NaN assigned to the missing data points.

This file contains functions that create the schema in the database

demand_acep.create_db_schema.create_schema_from_source_files(sql_engine, config)[source]

This function reads the source file and extracts the table names and corresponding column names.

Parameters:
sql_engine : SQLAlchemy engine

‘sql_engine` should support database operation.

channel_metadata_file : string

channel_metadata_file represents the absolute path of the channel metadata file including the filename.

years_file : string

years_file represents the absolute path of the file containing the years for which the tables should be created for. This file should be a text file with one year per row and the column header “years”.

Returns:
List of table names in the database.

The functions in this file convert the netcdf files to a dataframe and then write them to disk using multiprocessing. So, if we have a multi-core system, the extraction can happen in parallel.

demand_acep.extract_data_to_csv.extract_csv_for_date(config, data_date)[source]

This function writes the data for the data_date specified to a csv on disk.

Parameters:
config :

config contains the configuration for paths etc. needed by files. As an argument we can change the config files for different production vs test configs

data_date : string

data_date string will be used to extract the year and the path to the data. It should be in the ‘mm/dd/yyyy’ format.

Returns:
list

Names of the csv’s written to disk.

demand_acep.extract_data_to_csv.printResult(result)[source]

Functions in this file insert the data into the timescaledb database using the go utility provided by timescaledb called timescaledb-parallel-copy

demand_acep.timescale_parallel_copy.parallel_copy_data_for_date(config, data_date)[source]

This function parallel copies the data for the data_date specified to the appropriate table of TSDB.

Parameters:
config :

config contains the configuration for paths etc. needed by files. As an argument we can change the config files for different production vs test configs

data_date : string

data_date string will be used to extract the year and the path to the data. It should be in the ‘mm/dd/yyyy’ format.

Returns:
None - prints the number of rows copied into the database for the particular day.
demand_acep.timescale_parallel_copy.parallel_copy_data_for_dates(config, start, end)[source]

This function extracts the data to disk for the data_date_range specified.

Parameters:
config :

config contains the configuration for paths etc. needed by files. As an argument we can change the config files for different production vs test configs

start : str or datetime-like

Left bound for generating dates.

end : str or datetime-like

Right bound for generating dates.

Returns:
None

Module contents