- Current operation of FilterList
- Desired operation of FilterList
- Store interface bulk action filtration
- Implementation
- Ungeneralised operation description
- Generalisations of operation
- Application of generalisations in operation description
- How incremental filtering will be implemented
- How versioned data can be split into its own data items and passed through the filtration structures in the store and FilterList
- Synchronisation of grouped filtration s
- Filtration of pieces that are desired to be filtered in memory
- Order problem demonstration
- Filter operation mechanism hinting
- Version handling
- New filters
- Component filter logic gates matching
- Code changes
So a lot of bulk actions take a list of filters as arguments
How should versioned data be handled, possible options to exclude versioned, to only retieve latest version, nth version, first version
Current operation of FilterList
So currently, in a FilterList, we hold a list of filters and we can either filter our operating store id pair s to a specific index or we can filter given store id pair s to a specific index
This filtration is done in memory using the filters defined in Filters.py
currently a filter is passed the entire store id pairs
Desired operation of FilterList
For filtration that can occur within a store to be done within the store to increase speed Data that is loaded from the store into memory may have been modified in memory and so this data should either be filtered using python functions or it should be saved to the store before filtration The stores ability to handle the filtration operations needs to be tested,
when a store filters ids it is nice to make sure the programmer of the store interface doesnt have to worry about whether or not a filter changes its behaviour based upon information from data outside of its own stored data if such a filter existed, then we would have to synchronise the filter index between all of the store id pair s so that that filter could be run with all store id pair s at that level, all the other types of filters would be run solely with one store at a time. This allows multiple filters to be passed to a store at once and the store interface then has the opportunity to group the behaviour together is the complexity of synchornization worth the ability to implement filters whose behaviour changes based on information outside a single store? this problem also presents itself in splitting up ( versioned, in memory, other) data items. This is discussed here ## Commonalities
dont finish filtration prematurely if a blank id list is filtered as we could have addition filters
so we are given a command to filter our operating ids to index 5 we have operating ids: [ [ store0, [ 1, 2, 3, 4, 5, 6, 7]], [ store1, [ 1, 2, 3, 4, 5]], [ store2, [ 1, 2, 3, 4, 5, 6, 7, 8, 9]] ] we have filters a, s, d, f, g, h, j
so we are filtering to h
so if we were only operating in one store: [ store0, [ 1, 2, 3, 4, 5, 6, 7]]
it would be nice to be able to pass multiple
so: we need to pass the filters to the store and the store needs to determine which ones it can group and apply we need to organize filters into rearrangable and non rearrangable at some point those which couldnt be called by the store need to be called using the python implementation
this should be designed in tandem with the way in which the bulk actions use filtration
currently filters are instanciated to construct runtime information for the gui widgets and so it seems to make sense to not pass the data to an initialiser and instead define the data format and pass it along with the class, a custom func or the class init can be called to construct gui This allows the stores to retrieve the nescecary data in a manner that is observable using the filters data specification
Store interface bulk action filtration
What are the places where filtration is used? bulk editing bulk prescence manipulation data retrieval
data retieval also can retrieve solely ids given a sequence of filters
So bulk editing, in mongo, could pass filter to update func if passed filters fully supported, else could resolve to a id, version pair list which is used in the filter slot of update subject to same need to either save all memory data and do in store or do in memory
same thing in prescence manipulation
same thing in data retrival
so thees need to observe list take action, it is specific behaviour to the store so from this example, the store needs access to the full id retrieval from self functionality that the FilterList needs including seperation of versioned data and memory prescence handling
so it seems like this could either be done in the filter list or in a seperate line of functionality seperate from the store and the filter list it could be done in the store to be honest if
so the filter type needs to be passed along with any data needed because the graphical list needs to update the FilterLists understanding of what data is being matched against, it maybe makes sense to store the data in a specified format in a .data field
Implementation
Ungeneralised operation description
Mongo store bulk retrieval convertedDict= attempt conversion of filter sequence into filter dict We cant tell if the filter list will match versioned data unless a filter which specifically excludes versioned was passed or we have a exclude versioned option Continuing design as if versioned unsupported and no data is currently in memory if full conversion find filter= convertedDict else: need to construct filter which can be used with a find either the convertedDict has a mix of converted filters or it has none so need to look at each block of filtration each separated by being either memory filter s or a filter dict if the start block is a filter dict then the initial ids need to be retrieved using that if the start block is a memory filter s then we need to load all the ids in each subsequent block gets an input of the previous steps output upon handling a memory filter s block, we need to call each filters function passing the resultant ids between them upon handling a filter dict block we need to call find with an id projection we then end up with the resultant ids which we can use to construct the find filter | We then use the find filter to retrieve the desired information | If we cant be sure that versioned data isnt present then version data should be loaded into memory and processed there This is split off before the full convertedDict is constructed All versions should be loaded as our units include each version If a filter is specified which denotes specific versions such as ( the 2 newest) then this can influence this process If there is no versioned data then we dont need to process, if there is no non versioned data then we dont need to process that The results of the version filtration can be merged with the non versioned This process could be optimised by handling all of the full data version entries in the store | Data that is already present in memory holds newer information than that in the store and should be referred to When unversioned, in memory representations should be respected over store representations When versioned then there will never be a store saved version which is also loaded in memory !!in memory versions should represent a new version and not a change to the one they were loaded from, it should still store the one it was loaded from, intrinsics become: ( version, loaded version, loaded commit time, loaded type) this doesnt mean that we dont have to process in memory versioned data as unversioned memory data should be processed seperately. should it be excluded from main processing, should main processing be ammended with its results? versioned memory data may
need to figure out versioned and memory prescence seperation and merging
also prescence separation denies the ability of filters to reliably base there filtration upon the prescence of more than one piece of data
given that we are seperating the data
syncronisation is required
Lets do this specific to mongo first
So if we arent sure that versioned data isnt present we need to process it seperately, filtration is already discussed above
We need to load all of the versioned data in to do this, we can use a filter to do this,
we could apply a negative of this filter when loading the initial ids for the main filtration
doing otherwise would possibly lead to sqewey results in the main filtration and the processing time wouldnt be much slower than placing the negative filter
We can also filter present memory items in the same filtration as the versioned data to simplify the potential synchronisation process
a negative filter can be applied to exluf
Generalisations of operation
Application of generalisations in operation description
How incremental filtering will be implemented
How versioned data can be split into its own data items and passed through the filtration structures in the store and FilterList
Synchronisation of grouped filtration s
so to filter the data there are different ways of stepping through the filters
so when one reaches a synchronisation point then all the others have to be filtered to that point too then when all are done before the synchronisation point, the results can be collated and then filtered by the filter i currently dont have a need to do this in anywhere but default memory we can do this for all filters requiring synchronisation in that block
the filtration now needs to be divided up again between its original participants
so in order to determine how to split the now joined results up we can look at the results of each pathway before the joined filter, for each result, if the same result is present in the output of the joined filter then it is passed on, if not then it is not passed on so if it is the intersection of both sequences, for this reason i want to enforce that filters requiring synchronisation dont add any data to their result
the split results can then be lined up as inputs to the different methods then the filtration can be resumed
the synchronisaiton events are in the same order and number for each simultaneous filtration so it oculd be that we aim to run each till the end in order and then when a synchronisation event is found in the first process then the others are brought up
i would prefer to determine the final course of action before taking it, so if we are filtering g, h, j, then we know to filter g to 4, ... so i envision a controlling software aspect which has a generic interface to all types of filtration
Filtration of pieces that are desired to be filtered in memory
So there is a problem to solve
when filtering ids which we know we want to filter in memory, what do we do?
the in memory filter implementation s shouldnt implement their own mechanism s to check if the data is in memory,
the results of its actions should be either using in memory data or loading data into memory
currently filters call out to the store for information which may partially load the data in or use in memory data if present in this scenario we know that the filtered ids are present in memory should two software aspects be allowed to operate on data at once, removal from memory should be dissallowed during this, or this aspect must make compromises
just realised that in memory filtration cant be completely disregarded if no data or tags are processed by the filter sequence
this is because versions in memory are not saved to the store yet and their ids therefore wouldnt get captured by the filter, they must be considered
so when filtering pieces that are known to be in memory then we dont need to worry about conserving memory space as those pieces are already in quick memory when filtering pieces that are in the store in memory then we can balance the loading and the filtration i imagine loading a batch in, filtering that completely, then loading another batch in this would prevent repeated loading of the same piece, and prevents repeated deconstruction of version pieces an issue here is that synchronisation requires all ids to be filtered at once a way around this is
another way is for filters to call a batch loader and the batch loader functionality handles returning in memory items how can this be made to play with not continuously deconstructing and loading in pieces well for the non memory ones that must be filtered in to memory, they can be done in batches still and then the filters can call the batch load the batch loader then should return the in memory items the other alternative would be passing the data to the filter directlyo
{ well so option ¶ is to not load pieces before running the filters this means that each time a filter is run, the piece will be loaded, processed, the id is returned and the piece has no references and is garbage collected then another filter in the chain is called and loads that same piece in again
option ŧ where piece s is loaded in before running the filters and piece s is accessed generically from the filter this can be done by batch loading before running the filters so that batch is in memory and then filtering that batch this would have to be done in between synchronizations as synchronisations require all of the data upon a synchronisation the filter would be called with all ids, as the filter method uses a generic method of obtaining data, it can call that and still obtain the needed information
option ← where pieces are loaded in before running and then passed to the filter for use to retain optimisations where the whole data is not needed to be loaded, the sequence could be scanned and then the required data passed to it this would require the filter to specify what it needs as if it were arguments to the bulk load function this may have an effect on other aspects of software
known bad ways loading all pieces in memory and then filtering } i feel most comfortable with ŧ a method of quickly retieving the in memory instances in important when calling bulk load imo maybe this could be special behaviour for a filter which desires multiple ids, idk,
this is easy to confuse with previous approaches so building up the existing structure sounds good to resist unwanted ideas
Order problem demonstration
Filter grouping only works on filters with behaviour that only changes based upon properties of each input item If behaviour is based upon values outside of a single item then a rearrangement of its position in the order
we have people sorted by their body shape and hair color
if we have a list of people: z= fat, ginger x= fat, blonde c= skinny, blonde v= skinny, brown b= skinny, ginger n= middle, ginger
and we pass them through the filters: q= not blonde w= is skinny
in the filter order q, w we get ( in stages): z, v, b, n v, b
in the filter order w, q we get: c, v, b v, b
If we replace filter q with: return the first two results of the hair color field in alphabetical order
in order q, w we get: x, c c
in order w, q we get: c, v, b c, v
Here the results are different as the filtration was dependant upon factors other than the set properties of each evaluated item The new filter q should be marked as such This marking allows discerning bodies to preserve the specified order of q, w and in this scenario the consistant result of c will be given upon each examination
Filter operation mechanism hinting
So something that I need to do is that if a filter requires information from all simultaneously passed data ids then it can specify q
this If q is so then this filter cant be analysed with some ids in seperation with the results being combined later, they must all be analysed together In the mongo example, provided the stated operation is occuring synchonously, we must wait at each filter with q and then filter the other held filters, this also mans that filters marked with q shouldnt be grouped At this step, if no memory items needing filtration are present then this filter can be handled in the store if memory items are needing filtration then all filtration for this filter must be done in memory
q is also the same property which determines whether the filter can be reordered or not
it may be useful to know what contents of a data piece a filter cares about, id, tags, data, this helps speed up whether or not versioned pieces can be passed however the detection mechanisms can get complicated, might still be good
maybe q is better described as the filter relying on data which can change as a result of a positioning of the filter in the list then it can be explained that filtering based on no because synchronisation is needed when the filtr relies on them all being present and not just positioning
when a filter is adding data, if it adds 5 pieces each time it is run then this means it must be synchronized
It is useful to know whether a filter filters out all versioned data as this can be used to exclude versioned processing which can be difficult due to its delta nature this is shown in w If a filter which adds data items is present after one which implements w then it is still unclear whether or not versioned data may need to be processed This could be a marker about whether it may add versioned items but i think it is good to have a more general marker without adding too many this is shown in e
when queried with a bool mechanism question maybe the filter could return an unknown if that is the case
current in flux extra filter hints
q= whether or not the filter requires information from all input ( data items including splits along version id borders) ( default doesnt require info from all input) w= whether or not the filter filters out all versioned data ( default doesnt filter all) e= whether or not the filter adds new data items ( default doesnt add)
Version handling
Option to ignore version s maybe want a operating store id pair s with no version s in to act just as quickly as if no versioned data was supported versioned data likely cant be queried by the store so it needs to be handled in memory, i still think the store should make this call so the store needs an easy way of either specifying that certain ids must be done in memory or it needs an asy way of filtering them in memory
so most operation operate upon specific versions within a piece of versioned data however deletion, duplication, can make sense to operate on whole versioned data ids movement, could move individual versions but makes more sense to move whole versioned data ids addition, is unknown as its interface is unknown
this is seemingly just an issue for prescence manipulation and not retrieval or editing filter lists see ( unversioned wholes), ( versioned versions) as individual pieces so what if filters that act upon data are passed to a prescence manipulation operation
scenario:
store= [
"3j90435802nikw3": {
"Versions": {
"INIT": {
"cheese"= 34
},
"43rjnjk3309": {
"deltaRemove"= "cheese"
}
}
},
"590rj3klscsxcnxkj": {
"Data": {
"cheese"= "ff"
}
},
"kjfim4momjv0": {
"Data": {
"brushes"= 90
}
},
]
delete(
filterList= [ ( HasAttr, "cheese")],
knownQuantity= None,
):
So here if we are operating on whole versioned data items what would we do
We could check versions at a specific position,
we could check a range of versions and combine their results,
we could pass all versioned data through the data filters,
we could deny all versioned data through the data filters,
so in the filtration step
if we are passing all versioned through filters
versioned data will only be removed by non data processes
if we are able to detect that there are only data filters then we can return all versioned
maybe we just filter all as normal but disregard the versions for now, one should know when calling the function that this is how it will operate
so for now prescence manipulation can solely treat all filtration of versioned by operating on the whole versioned data and not individual versions
New filters
Version is nth version, including negative index and slices, option to pass or reject unversioned data
Component filter logic gates matching
The name of the combination mode is based upon the condition that needs to be met for a piece to pass
logicGates= {
"AND": lambda a, b: ( a== True) and ( b== True), # pass all | piece passes if it passes all components
"OR": lambda a, b: ( a== True) or ( b== True), # pass any | piece passes if it passes any components
"NOR": lambda a, b: ( not a) and ( not b), # fail all | piece passes if it fails all components
"NAND": lambda a, b: not ( ( a== True) and ( b== True)), # fail any | piece passes if it fails any components
"XNOR": lambda a, b: a== b, # all pass or fail | piece passes if all components pass or fail
"XOR": lambda a, b: a!= b, # pass or fail once | piece passes if all the piece is only passed| failed once # Will only fire if <= 2 component filters are present
}
for gate in logicGates:
print( "\n"+ gate)
for a in range( 2):
for s in range( 2):
print( a, s, logicGates[ gate]( a, s))
Code changes
To filters
Need to specify is behaviour is dependant on factors other than th e state of a single evaluated data item, this includes if a field could possibly vary over time, uh oh, if multiple sources are allowed to hold memory references then this could be a wide occurance
Although anyone passing the filters to the filter list should not change the data themselves during operation,
Unplanned changes in the results would have occured regardless
The mark still needs to be places if e.g. a single evaluated data items property is called which runs a custom function that queries the societal time
Component filter takes filters as children not components
Only one component filter, takes logic gates, not called logic gate names