Discussion:
[Openinteract-dev] Review of proposed changes and a question
Vsevolod (Simon) Ilyushchenko
2004-10-22 01:41:13 UTC
Permalink
Chris and Ray,

Now that Chris has some time (hopefully) to consider the changes, I'd
like to submit a patch that would add has_many, auto-save/auto-delete
(optional) and full cache support (which is necessary for the
consistency circular references). Before I do that, I'd like to get
Chris's approval on the following architectural decisions:

1. There will be a basic cache which will always return the cached
object, not its copy.
2. The fetch_many() now will also use cache. It will still make a call
to the database, because it won't know which ids to request from the
cache, though.
3. Auto-save and auto-delete are off by default. This is not what Ray
requested, but it's the only way to stay backwards compatible. With
auto-save on, the graph of object in memory will be traversed to
determine what has been changed. (Thus, I have deep_changed() instead of
changed().)
4. It will be possible to use a separate class that corresponds to the
linking table in links_to (think ClubMembership). The syntax I came up
with is:
Club => {
class => 'Club',
...
links_to =>
{ 'Person' =>
{ table => 'ClubMembership',
link_class => 'ClubMembership',
link_class_alias => 'memberships',
alias => 'members',
to_id_field => 'member_id',
from_id_field => 'club_id' },

The ClubMembership class can have extra attributes (like date), but they
will be specified in a regular way in the config file. The new attribute
'link_class_alias' will return an arrayref of the ClubMembership
instances. Again, this is not as elegant as Ray proposed, but backwards
compatible.

An open question is what to do with has_a. Currently it's implemented in
ClassFactory, not ClassFactory/DBI, but adding auto-save and auto-delete
functionality requires the has_a code to be DBI-aware. I can add extra
functionality into ClassFactory/DBI, but I am not sure which effect it
would have on LDAP storage, for example.

Oh, and a style question. I have marked all of my changes with "tags":
#Simon
#/Simon
and I commented out, not deleted, all replaced code, for ease of
reference. Is this necessary?

I will grudgingly remove lc-ing of field names. I love it, but, as
pointed out before, it will break existing code.

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Ray Zimmerman
2004-10-22 11:13:00 UTC
Permalink
Hi Simon,

It's good to hear that this work is coming along ...
Post by Vsevolod (Simon) Ilyushchenko
1. There will be a basic cache which will always return the cached
object, not its copy.
This sounds like a good idea. This ensures that, within a given
process, there is only ever one copy of an object in memory no matter
how many times and in what contexts you fetch it, right?

Just curious, at what level are you implementing this cache? Is it
*always* used for all SPOPS objects? Only for DBI? Only for DBI when
you are using the new functionality?
Post by Vsevolod (Simon) Ilyushchenko
2. The fetch_many() now will also use cache. It will still make a call
to the database, because it won't know which ids to request from the
cache, though.
3. Auto-save and auto-delete are off by default. This is not what Ray
requested, but it's the only way to stay backwards compatible. With
auto-save on, the graph of object in memory will be traversed to
determine what has been changed. (Thus, I have deep_changed() instead
of changed().)
4. It will be possible to use a separate class that corresponds to the
linking table in links_to (think ClubMembership). The syntax I came up
Club => {
class => 'Club',
...
links_to =>
{ 'Person' =>
{ table => 'ClubMembership',
link_class => 'ClubMembership',
link_class_alias => 'memberships',
alias => 'members',
to_id_field => 'member_id',
from_id_field => 'club_id' },
The ClubMembership class can have extra attributes (like date), but
they will be specified in a regular way in the config file. The new
attribute 'link_class_alias' will return an arrayref of the
ClubMembership instances. Again, this is not as elegant as Ray
proposed, but backwards compatible.
I'd like to give my feedback, but I'm not sure I have a clear
understanding of which pieces of my proposal you implemented exactly,
which pieces you implemented with small modifications, and for which
pieces you chose an alternative approach entirely.

Do you have a documentation detailing the new syntax for the
functionality you are adding? Actually, possibly the simplest thing
(and probably the most helpful thing for me) would be to just take the
examples at the end of my proposal and modify them to your syntax so I
could see an example of what each case looks like.

I forgot, did you implement any lazy loading features or is it just
manual and auto?

If I understand correctly, one of the fundamental differences between
your implementation and my proposal is that for has_many and links_to
relationships the definition of the relationship is no longer specified
in the class containing the linking field, but rather in a class that
is linked to. I'm a bit concerned that with the way I use SPOPS (not to
create classes but simply to define the relationships and persistence
of traditionally defined *.pm-file classes, where linking classes have
not only extra fields, but also extra functionality), some of the
functionality I need which was present in my proposal will be missing
from your redesign ... but I guess I won't know until I see the
details.

In any case, your actual working implementation is far better than any
un-implemented proposal, so don't get me wrong, I really appreciate all
the work you've done even if it doesn't end up covering everything I
would have liked.
Post by Vsevolod (Simon) Ilyushchenko
An open question is what to do with has_a. Currently it's implemented
in ClassFactory, not ClassFactory/DBI, but adding auto-save and
auto-delete functionality requires the has_a code to be DBI-aware. I
can add extra functionality into ClassFactory/DBI, but I am not sure
which effect it would have on LDAP storage, for example.
#Simon
#/Simon
and I commented out, not deleted, all replaced code, for ease of
reference. Is this necessary?
Might be nice initially for Chris to see this, but I'd think eventually
he'll want to remove it. But of course, that's Chris's call.
Post by Vsevolod (Simon) Ilyushchenko
I will grudgingly remove lc-ing of field names. I love it, but, as
pointed out before, it will break existing code.
Thanks, this is appreciated since it was my existing code that was
breaking. :-)

Thanks again for your contributions,

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone: (607) 255-9645
Vsevolod (Simon) Ilyushchenko
2004-10-31 23:38:25 UTC
Permalink
Ray, Chris, and others,

My current progress - code pretty much finalized, and a .t file with 64
tests (I'll have to add a few more tests to cover all combinations of
features). I'd like to do some more internal testing before I send the
official patch, but if anyone has time to play with the new features
(yeah, right :), I'd be happy to share it. It's completely backwards
compatible.
Post by Ray Zimmerman
Just curious, at what level are you implementing this cache? Is it
*always* used for all SPOPS objects? Only for DBI? Only for DBI when you
are using the new functionality?
It's only for DBI.
Post by Ray Zimmerman
Do you have a documentation detailing the new syntax for the
functionality you are adding? Actually, possibly the simplest thing (and
probably the most helpful thing for me) would be to just take the
examples at the end of my proposal and modify them to your syntax so I
could see an example of what each case looks like.
See below.
Post by Ray Zimmerman
I forgot, did you implement any lazy loading features or is it just
manual and auto?
I actually did not implement manual loading - I just can't see where it
can be used (and a bit fuzzy on how it would work). So there is only
auto and lazy loading.

Autosaving is always off by default, to preserve compatibility.

OTOH, there are three types of removes - 'auto', 'manual' and 'forget'.
'Auto' means complete removal of dependent objects, 'forget' -
nullifying id fields pointing to the removed objects, and 'manual' - no
action. The default should logically be 'forget', but it may conflict
with no autosaving, so I'll have to set it to 'manual'.
Post by Ray Zimmerman
If I understand correctly, one of the fundamental differences between
your implementation and my proposal is that for has_many and links_to
relationships the definition of the relationship is no longer specified
in the class containing the linking field, but rather in a class that is
linked to. I'm a bit concerned that with the way I use SPOPS (not to
create classes but simply to define the relationships and persistence of
traditionally defined *.pm-file classes, where linking classes have not
only extra fields, but also extra functionality), some of the
functionality I need which was present in my proposal will be missing
from your redesign ... but I guess I won't know until I see the details.
As I mentioned, I am flexible on the matter of has_many. If you want, I
can create an additional config option (you call it 'auto_by' etc,
I think) and put it into the other class.

Links_to is harder, because to preserve backward compatibility it has to
stay in one of the edge classes. But if you want to add more variables
to the linking class, you'll have to define it for SPOPS anyway, and
there I can probably also implement your suggestion as an option.

Below is the usage example that you posted originally, and after
*** - my syntax.

$CONF = {
X_alias => {
class => "X",
field => [ qw/ x_id x_data myA myB / ],
has_a => {
myA => {
class => "A",
fetch => {
type =>
"manual|auto|lazy|manual_by|auto_by|lazy_by",
name => "someA",
list_field => "list_of_Xs",
},
link => {
type => "manual|auto|lazy",
field => "myB"
name => "someB",
list_field => "linked_Bs",
},
remove => {
type =>

"manual|auto|manual_by|auto_by|manual_null|auto_null"
name => "removeSomeA",
list_field => "list_of_Xs",
}
},
myB => {
class => "B",
fetch => {
type => "manual|auto|lazy|auto_by|lazy_by",
name => "someB",
list_field => "linked_Bs",
},
link => {
type => "manual|auto|lazy",
field => "myA"
name => "someA",
list_field => "list_of_As",
},
remove => {
type =>

"manual|auto|manual_by|auto_by|manual_null|auto_null"
name => "removeSomeB",
list_field => "list_of_Xs",
}
},
}
},
A_alias => {
class => "A",
field => [ qw/ a_id a_data / ],
list_field => { X => ["list_of_Xs"],
B => ["linked_Bs"] }
},
B_alias => {
class => "B",
field => [ qw/ b_id b_data / ],
},
}

***

$CONF = {
X_alias => {
class => "X",
field => [ qw/ x_id x_data myA myB / ],
has_a => {
{A =>
{alias => "mya",
fetch => { type => 'auto|lazy', save =>1 },
remove => { type => 'auto|manual' }
reverse_remove => {type => 'manual|forget' }
}
},
has_many => {
B => {
fetch =>
{ type => 'auto|lazy',
save => 1,
alias => 'list_of_Xs',
reverse_alias => 'myx'},
remove => { type => 'auto|manual|forget' }
}
},
links_to => {
'D' =>
{ table => 'spops_x_d',
alias => 'list_of_Ds',
link_class => 'Membership',
link_class_alias => 'memberships',
}
}
..
},
Membership_alias =>
{
class => 'Membership',
field => ['id, 'x_id', 'd_id', 'start_date',
'end_date'],
...
}


Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Ray Zimmerman
2004-11-01 21:24:14 UTC
Permalink
Hi Simon,
Post by Vsevolod (Simon) Ilyushchenko
My current progress - code pretty much finalized, and a .t file with
64 tests (I'll have to add a few more tests to cover all combinations
of features). I'd like to do some more internal testing before I send
the official patch, but if anyone has time to play with the new
features (yeah, right :), I'd be happy to share it.
Cool. I'd like to check it out.
Post by Vsevolod (Simon) Ilyushchenko
It's completely backwards compatible.
Is this a good thing? :-) (more below in this)

I assume that's why you changed the key values in the 'has_a' spec to
the class name instead of the field name? But isn't this a problem if
you have multiple has_a fields of the same class, with different
behavior for fetching/saving/removing? Or do you just use an arrayref
of hashrefs instead of the single hashref in that case? Blech.

For example, I'm thinking of something like a Transfer object that has
a 'from_account' and a 'to_account' field, both of class 'Account',
where you want to auto-fetch the 'from_account' and lazy-fetch (or
manual fetch) the 'to_account'.

This is something I did not like about the original has_a spec. It
seems so backwards to me. I want to specify relationship behavior for
each individual field. Whether or not two has_a fields belong (or not)
to the same class is irrelevant and therefore (to me) it makes no sense
to organize them by class.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
Just curious, at what level are you implementing this cache? Is it
*always* used for all SPOPS objects? Only for DBI? Only for DBI when
you
Post by Ray Zimmerman
are using the new functionality?
It's only for DBI.
And you say this full cache is necessary for the consistency circular
references. You mean to avoid infinite loops when you have A set to
auto-fetch B which is set to auto-fetch A? Seems to me that you should
be able to detect this type of thing when the classes are configured.
While I think a cache is a nice option that I may very well use, I
don't think it should be mandatory unless it's absolutely necessary.
Can you give me an example where consistency makes it absolutely
necessary?

I've used only application level caching, never SPOPS-level caching so
I'm not clear on how this works. Does this mean that SPOPS objects no
longer go out of scope and get destroyed until the program ends or the
cache is manually flushed? In a mod_perl environment then, do you flush
the cache at the end of every request or what? Seems like you could get
some huge apache children pretty quickly if you're not careful.
Post by Vsevolod (Simon) Ilyushchenko
I actually did not implement manual loading - I just can't see where
it can be used (and a bit fuzzy on how it would work). So there is
only auto and lazy loading.
It was for completeness and to offer a mode that is equivalent to
current has_a behavior, that is, the field normally just gives you an
id, but you also have a convenience method for fetching the object as
well. My idea was that any 'has_a' spec, including 'manual', would
create convenience methods for fetching the related objects. The 'auto'
and 'lazy' options would simply call these methods automatically at the
appropriate time and stash the return values in the object. So in the
way I was picturing things, implementing 'manual' would simply be the
first step in implementing 'auto' and 'lazy'.

Without the manual option, you can't specify a relationship at all
without having it define auto-fetching behavior. You can't, for
example, auto-remove an object without having it also auto-fetched
(which I can imagine you might want if you typically only need to deal
with the ID of the secondary object).

Just curious, does your implementation of 'auto' generate a public
'fetch_myA' method, for example?
Post by Vsevolod (Simon) Ilyushchenko
Autosaving is always off by default, to preserve compatibility.
I'm not sure I follow. Auto-fetching is new so there is no previous
corresponding save behavior to be backward compatible with. Classes
defined without any auto-fetch/auto-remove behavior, could behave as
always. Classes defining new auto behavior could have whatever default
'save' behavior we think makes sense. So I'm not sure there is a
backward compatibility issue here.

And the save behavior described in my updated proposal posted 4 Jan
2002 still seems to be the most consistent and make the most sense to
me.
Post by Vsevolod (Simon) Ilyushchenko
OTOH, there are three types of removes - 'auto', 'manual' and
'forget'. 'Auto' means complete removal of dependent objects, 'forget'
- nullifying id fields pointing to the removed objects, and 'manual' -
no action. The default should logically be 'forget', but it may
conflict with no autosaving, so I'll have to set it to 'manual'.
OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' =>
'forget' in a 'has_a' the same as specifying 'remove' => 'forget' in
the corresponding 'has_many'? If so, which one takes precedence if they
are inconsistent? It looks like 'reverse_remove' => 'forget' is
equvalent to what I called 'null_by', right?. I personally think that
having multiple (and possibly conflicting) ways/places of defining the
behavior for a single relationship is asking for trouble. I think it
will make it difficult to write correct and clear documentation and it
will create some debugging nightmares. (More on this below)
Post by Vsevolod (Simon) Ilyushchenko
As I mentioned, I am flexible on the matter of has_many. If you want,
I can create an additional config option (you call it 'auto_by' etc,
I think) and put it into the other class.
Borrowing from some of your configuration terminology, we could call it
'reverse_auto', 'reverse_lazy' and 'reverse_manual'. That might be more
clear. And for removes we could use 'manual_forget' and 'auto_forget'
in place of my 'manual_null' and 'auto_null' if you prefer. Again,
having more than one way to do it seems like a bad idea to me. (More on
this below).
Post by Vsevolod (Simon) Ilyushchenko
Links_to is harder, because to preserve backward compatibility it has
to stay in one of the edge classes. But if you want to add more
variables to the linking class, you'll have to define it for SPOPS
anyway, and there I can probably also implement your suggestion as an
option.
I suppose if you want to enhance the existing links_to syntax you are
correct. (More below on why I wouldn't do this).

Why do you include both 'link_class' and 'link_class_alias'? Aren't
they redundant? (see [1] below).
And I suppose the 'table' is only necessary if you don't specify the
'link_class' and vice versa, right?


After looking over your configuration syntax I started trying to
convert my examples to your syntax to see if all the bases are covered.
Here are some random comments, based on my understanding of what you're
doing:

* for both forward and reverse direction, auto-remove is not possible
without auto-fetch (missing 'manual' mode)
* for both forward and reverse direction, auto/lazy-fetch looks fine
(except for default behavior of save)
* forward direction manual/auto remove looks fine
* reverse direction auto/forget remove looks fine, but I don't see an
equivalent to manual_null (manual_forget)
* is use of 'alias' and 'reverse_alias' consistent with the way alias
is normally used in SPOPS? (see [1] below)
* didn't see any mention of the 'name' option for explicitly specifying
the name of generated methods
* not clear to me what auto/lazy fetching, auto removing, etc options
are implemented for links_to

In conclusion, here are a few of my perspectives.

I've been trying to keep an open mind toward the differences between
your implementation and my proposed design, adopting a wait and see
approach. I hesitate to state my opinions too strongly since you're
contributing useful code and I'm just talking about design. On the
other hand, I suppose you probably want honest input.

I think that both links_to and has_many share a fundamental flaw. They
both allow relationships to be defined from the "other" end, making it
possible to define conflicting behavior for a single relationship. This
unnecessarily complicates the error checking or precedence rules and
the documentation. I think always forcing the relationship to be
defined by the class with the linking field, and having a single syntax
(has_a) for defining it, makes for a much simpler, cleaner, more
consistent, easier to understand/document/implement, less error prone
design. Including both approaches (e.g. has_many and reverse_auto or
auto_by), in my opinion, only complicates and confuses things further.

For this reason, I think we should forget trying to make the new syntax
backward compatible. I think we should leave the old syntax in place
(but deprecated) as long as necessary for backward compatibility and
add a completely redesigned new syntax that folks can migrate to
gradually or as they need the new features. If it were up to me I
wouldn't touch links_to at all, and I wouldn't touch the old has_a
either. I'd just add a new configuration handler for the case where the
key of a 'has_a' spec matches one of the field names, in which case you
assume it is the new syntax I proposed, otherwise you assume it's a
class and use the old has_a semantics.

I don't know if Chris has enough spare cycles to give an opinion on
this at the moment, but I'd be interested in his perspective.

While I appreciate all your work, I have to say that the above issues
are leading me to a growing preference for my original proposed design.

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone: (607) 255-9645

[1] I confess I never really did understand the purpose of the alias.
What is the difference between the alias and the class? Isn't one of
them redundant?
Vsevolod (Simon) Ilyushchenko
2004-11-02 12:52:03 UTC
Permalink
Post by Ray Zimmerman
I've been trying to keep an open mind toward the differences between
your implementation and my proposed design, adopting a wait and see
approach. I hesitate to state my opinions too strongly since you're
contributing useful code and I'm just talking about design. On the other
hand, I suppose you probably want honest input.
Ray,

Honest input is, of course, appreciated. I'll look over the your
questions over the weekend, but it seems that compatibility is the major
question, as it drives the syntax that you don't like. I had to stay
compatible by default, and I think only Chris can authorize a new
syntax. So I would also like to hear what he has to say.

My implementation is not written in stone - I personally don't really
care what the syntax is as long as it provides the full fuctionality.

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Vsevolod (Simon) Ilyushchenko
2004-11-07 17:20:43 UTC
Permalink
Ray,

I've mulled over your comments, and I think I'm starting to see the
light. :) I've looked at other OOP frameworks (Alzabo, Tangram,
Hibernate, XORM), and they all seem to define the relationship in the
class that has the ID field which refers to the other class.

I continue the discussion of cache and the 'manual' keyword below, but
when we converge to an opinion on those, I'll redo the code your way.
It's less work than it seems.

However, Chris has not replied yet. I'll ask him again about the
compatibility issue. However, if I simply extend the has_a syntax the
way you propose and smartly determine whether the old or the new syntax
is used (I do it anyway now), it should not be a problem.
Post by Ray Zimmerman
I assume that's why you changed the key values in the 'has_a' spec to
the class name instead of the field name? But isn't this a problem if
you have multiple has_a fields of the same class, with different
behavior for fetching/saving/removing? Or do you just use an arrayref of
hashrefs instead of the single hashref in that case? Blech.
I agree, it's detestable.
Post by Ray Zimmerman
And you say this full cache is necessary for the consistency circular
references. You mean to avoid infinite loops when you have A set to
auto-fetch B which is set to auto-fetch A? Seems to me that you should
be able to detect this type of thing when the classes are configured.
While I think a cache is a nice option that I may very well use, I don't
think it should be mandatory unless it's absolutely necessary. Can you
give me an example where consistency makes it absolutely necessary?
For example, you have a Book object with many Authors. If the
application loads a book with the list of authors, adds another author
to this book and asks the new author about its parent book, the current
SPOPS implementation will re-fetch the book object, potentially ignoring
the changes that were made to original book object.

However, I only now realized that you suggest saving both the
parent-to-child reference and the reverse reference in the object
fields. (I distinguish between $author->{book_id}, a number, and
$author->{book}, an object. Let me know if I understood this correctly.)

Thus, once $book->{list_of_authors} is populated, adding a new author to
the book should add the new object to this list, plus it should set the
field $author->{book} to the original book object. This will make the
above situation impossible.

However, inconsistencies still may occur if:

1) I create a new author-book relationship by setting the field
$author->{book_id} instead of saying $book->add_author($author). This
can be discouraged as an incorrect way of altering data, of course, but
logically both make sense, and I'd like to be able to use them both.

Or, 2) if I am working with a second relationship, say books-to-artists
(illustrators). In this case, in one place in my code, I could retrieve
a book object by saying $artist->book, and then in another place I'll
call $author->book, and even though they may refer to the same book,
they will always be two different objects.

So looks like cache is still necessary.
Post by Ray Zimmerman
I've used only application level caching, never SPOPS-level caching so
I'm not clear on how this works. Does this mean that SPOPS objects no
longer go out of scope and get destroyed until the program ends or the
cache is manually flushed? In a mod_perl environment then, do you flush
the cache at the end of every request or what? Seems like you could get
some huge apache children pretty quickly if you're not careful.
It has to be flushed, of course.
Post by Ray Zimmerman
It was for completeness and to offer a mode that is equivalent to
current has_a behavior, that is, the field normally just gives you an
id, but you also have a convenience method for fetching the object as
well. My idea was that any 'has_a' spec, including 'manual', would
create convenience methods for fetching the related objects. The 'auto'
and 'lazy' options would simply call these methods automatically at the
appropriate time and stash the return values in the object. So in the
way I was picturing things, implementing 'manual' would simply be the
first step in implementing 'auto' and 'lazy'.
I feel dumb - I still don't quite get it. However, in your original
examples the method X->myA returns the id of A in the case of manual
fetch and A itself in the case of lazy/auto fetch, right? In my view,
X->myA always return the id and X->fetch_myA always returns the object
(I tend to use them like $author->book_id and $author->book in my
applications). So there is no need for manual fetching.

I think that having X->myA return inconsistent values may be confusing.
Let me know what you think. Perhaps I am still missing the utility of
manual fetching.
Post by Ray Zimmerman
Without the manual option, you can't specify a relationship at all
without having it define auto-fetching behavior. You can't, for example,
auto-remove an object without having it also auto-fetched (which I can
imagine you might want if you typically only need to deal with the ID of
the secondary object).
But in this case you still have to fetch the dependent object, because
it may define its own rules of auto-removal of even more objects.
Post by Ray Zimmerman
Just curious, does your implementation of 'auto' generate a public
'fetch_myA' method, for example?
See above - even if the fetch method name ('alias' in the current
terminology') is not specified in the configuration, it'll be
auto-created by using the name given to the target class. (I mean the
name of config hash key for the target class, not its Perl name. In the
example I sent you, X_alias is such a name.)
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
Autosaving is always off by default, to preserve compatibility.
I'm not sure I follow. Auto-fetching is new so there is no previous
corresponding save behavior to be backward compatible with. Classes
defined without any auto-fetch/auto-remove behavior, could behave as
always. Classes defining new auto behavior could have whatever default
'save' behavior we think makes sense. So I'm not sure there is a
backward compatibility issue here.
And the save behavior described in my updated proposal posted 4 Jan 2002
still seems to be the most consistent and make the most sense to me.
Yes, if we change the syntax, we will be free to follow you rules.
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
OTOH, there are three types of removes - 'auto', 'manual' and
'forget'. 'Auto' means complete removal of dependent objects, 'forget'
- nullifying id fields pointing to the removed objects, and 'manual' -
no action. The default should logically be 'forget', but it may
conflict with no autosaving, so I'll have to set it to 'manual'.
OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove' =>
'forget' in a 'has_a' the same as specifying 'remove' => 'forget' in the
corresponding 'has_many'? If so, which one takes precedence if they are
inconsistent? It looks like 'reverse_remove' => 'forget' is equvalent to
what I called 'null_by', right?. I personally think that having multiple
(and possibly conflicting) ways/places of defining the behavior for a
single relationship is asking for trouble. I think it will make it
difficult to write correct and clear documentation and it will create
some debugging nightmares. (More on this below)
This should not be a problem, because in my current proposal the
programmer specifies either has_a or has_many (which implies the reverse
has_a), so no conflicts should be possible. However, if we change the
syntax, this issue will go away.
Post by Ray Zimmerman
Why do you include both 'link_class' and 'link_class_alias'? Aren't they
redundant? (see [1] below).
'Link_class' refers to the Perl class name, 'link_class_alias' - to the
method name used to retrieve its instances (this is your 'list_field' in
the 'link' hash).
Post by Ray Zimmerman
And I suppose the 'table' is only necessary if you don't specify the
'link_class' and vice versa, right?
Yup. I am a little unhappy that in your proposal one has to have a Perl
class for the linking table even if one is never going to use it, but I
guess this is necessary for the sake of the uniform syntax.
Post by Ray Zimmerman
* didn't see any mention of the 'name' option for explicitly specifying
the name of generated methods
It's called 'alias'.
Post by Ray Zimmerman
* not clear to me what auto/lazy fetching, auto removing, etc options
are implemented for links_to
If we use your syntax, they will be the same as in the fetch_by case.
Post by Ray Zimmerman
[1] I confess I never really did understand the purpose of the alias.
What is the difference between the alias and the class? Isn't one of
them redundant?
The alias is used to generate access methods in other classes referring
to this one. In your configuration examples you always give a value to
the 'name' key, but if it's omitted, methods are given names like
'fetch_X_alias'.

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Ray Zimmerman
2004-11-08 21:20:06 UTC
Permalink
Simon,
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
And you say this full cache is necessary for the consistency circular
references. You mean to avoid infinite loops when you have A set to
auto-fetch B which is set to auto-fetch A? Seems to me that you
should be able to detect this type of thing when the classes are
configured. While I think a cache is a nice option that I may very
well use, I don't think it should be mandatory unless it's absolutely
necessary. Can you give me an example where consistency makes it
absolutely necessary?
For example, you have a Book object with many Authors. If the
application loads a book with the list of authors, adds another author
to this book and asks the new author about its parent book, the
current SPOPS implementation will re-fetch the book object,
potentially ignoring the changes that were made to original book
object.
However, I only now realized that you suggest saving both the
parent-to-child reference and the reverse reference in the object
fields. (I distinguish between $author->{book_id}, a number, and
$author->{book}, an object. Let me know if I understood this
correctly.)
I think you've got it right, except I'm not sure what you mean about
the reverse reference case. Maybe the myA vs fetch_myA stuff below
clarifies what I'm suggesting.
Post by Vsevolod (Simon) Ilyushchenko
Thus, once $book->{list_of_authors} is populated, adding a new author
to the book should add the new object to this list, plus it should set
the field $author->{book} to the original book object. This will make
the above situation impossible.
Right. Except that 'list_of_authors' in $book implies that you've
specified a reverse fetch of the book field in author, so
$author->{book} is just an id, not an object. More on object vs id
below.
Post by Vsevolod (Simon) Ilyushchenko
1) I create a new author-book relationship by setting the field
$author->{book_id} instead of saying $book->add_author($author). This
can be discouraged as an incorrect way of altering data, of course,
but logically both make sense, and I'd like to be able to use them
both.
Or, 2) if I am working with a second relationship, say
books-to-artists (illustrators). In this case, in one place in my
code, I could retrieve a book object by saying $artist->book, and then
in another place I'll call $author->book, and even though they may
refer to the same book, they will always be two different objects.
Right. Without a cache these inconsistencies are always possible. But
this is a "global" issue with OOP frameworks like SPOPS. I think Chris
has taken the right approach in letting the developer decide at the
application level whether s/he needs to always maintain that
consistency and if so whether to use SPOPS level caching or application
level caching/logic to ensure consistency. (And I understand there was
a bug in SPOPS caching which prevented this from working correctly).

But I don't think there is anything in the new has_a design that
REQUIRES one to maintain consistency though an SPOPS level cache,
right? I think the only thing you need to do is ensure that you don't
have any auto-fetching loops (might even want to include lazy-fetching)
when you do the configuration. In other words, you don't want to have a
book auto-fetch it's list of authors AND have the author set to
auto-fetch its book, creating an infinite loop. Even if you use a cache
this would cause circular references which cause a problem for garbage
collection unless you use weak references.

I think this checking is important, but I haven't honestly given any
thought to how to implement it.
Post by Vsevolod (Simon) Ilyushchenko
So looks like cache is still necessary.
Only if you need to guarantee a single in-memory copy for a process. I
argue that this is an arbitrary requirement. In a read-only
environment, it really doesn't matter (except for resource usage) if
you have multiple copies of the same object in memory. And in a web
environment, even using a simple cache doesn't guarantee consistency
across multiple processes (apache children), so you still need a higher
level synchronization mechanism to ensure consistency.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
It was for completeness and to offer a mode that is equivalent to
current has_a behavior, that is, the field normally just gives you an
id, but you also have a convenience method for fetching the object as
well. My idea was that any 'has_a' spec, including 'manual', would
create convenience methods for fetching the related objects. The
'auto' and 'lazy' options would simply call these methods
automatically at the appropriate time and stash the return values in
the object. So in the way I was picturing things, implementing
'manual' would simply be the first step in implementing 'auto' and
'lazy'.
I feel dumb - I still don't quite get it. However, in your original
examples the method X->myA returns the id of A in the case of manual
fetch and A itself in the case of lazy/auto fetch, right? In my view,
X->myA always return the id and X->fetch_myA always returns the object
(I tend to use them like $author->book_id and $author->book in my
applications). So there is no need for manual fetching.
I think that having X->myA return inconsistent values may be confusing.
Let me know what you think. Perhaps I am still missing the utility of
manual fetching.
My thought was that specifying 'auto' or 'lazy' are equivalent to
saying "this field is an object". Specifying 'manual' is equivalent to
saying "this field is an object id". So X->myA always returns the value
stored in the field and X->fetch_myA always returns the object.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
Without the manual option, you can't specify a relationship at all
without having it define auto-fetching behavior. You can't, for
example, auto-remove an object without having it also auto-fetched
(which I can imagine you might want if you typically only need to
deal with the ID of the secondary object).
But in this case you still have to fetch the dependent object, because
it may define its own rules of auto-removal of even more objects.
But the fetch only happens for the purpose of correctly doing the
remove ... the 'manual' specifier still means that the field holds an
object id, not an object.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
Just curious, does your implementation of 'auto' generate a public
'fetch_myA' method, for example?
See above - even if the fetch method name ('alias' in the current
terminology') is not specified in the configuration, it'll be
auto-created by using the name given to the target class. (I mean the
name of config hash key for the target class, not its Perl name. In
the example I sent you, X_alias is such a name.)
The problem I see with this is that it generates clashes when you have
multiple fields with the same class. We need a method name that is
unique for the field we want to fetch, not just for the class we use to
fetch it. I think using the class alias is left-over from the old has_a
config which used the class as the hash key (which you agreed is
detestable :-).

I vote once again that we stick with my proposal to use 'fetch_'
prepended to the name of the field, by default, and allow an option to
explicitly specify a method name.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
OTOH, there are three types of removes - 'auto', 'manual' and
'forget'. 'Auto' means complete removal of dependent objects,
'forget' - nullifying id fields pointing to the removed objects, and
'manual' - no action. The default should logically be 'forget', but
it may conflict with no autosaving, so I'll have to set it to
'manual'.
OK, but what is the 'reverse_remove'? Is specifying 'reverse_remove'
=> 'forget' in a 'has_a' the same as specifying 'remove' => 'forget'
in the corresponding 'has_many'? If so, which one takes precedence if
they are inconsistent? It looks like 'reverse_remove' => 'forget' is
equvalent to what I called 'null_by', right?. I personally think that
having multiple (and possibly conflicting) ways/places of defining
the behavior for a single relationship is asking for trouble. I think
it will make it difficult to write correct and clear documentation
and it will create some debugging nightmares. (More on this below)
This should not be a problem, because in my current proposal the
programmer specifies either has_a or has_many (which implies the
reverse has_a), so no conflicts should be possible. However, if we
change the syntax, this issue will go away.
But since you can put a 'has_many' in Book and a 'has_a' in Author, for
example, where Author has a 'book' field, I think they can be
inconsistent. In my proposal you, for the 'has_a' in Author, you either
specify a forward or reverse direction with no way to specify something
conflicting in the Book class.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
Why do you include both 'link_class' and 'link_class_alias'? Aren't
they redundant? (see [1] below).
'Link_class' refers to the Perl class name, 'link_class_alias' - to
the method name used to retrieve its instances (this is your
'list_field' in the 'link' hash).
But can't you always get the one, given the other?
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
And I suppose the 'table' is only necessary if you don't specify the
'link_class' and vice versa, right?
Yup. I am a little unhappy that in your proposal one has to have a
Perl class for the linking table even if one is never going to use it,
but I guess this is necessary for the sake of the uniform syntax.
Which is why I wouldn't protest too much of we decided to leave the old
'links_to' syntax in untouched, at least for the time being. You would
only need to define the linking class if you needed the
auto-fetching/removing behavior.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
[1] I confess I never really did understand the purpose of the
alias. What is the difference between the alias and the class? Isn't
one of them redundant?
The alias is used to generate access methods in other classes
referring to this one. In your configuration examples you always give
a value to the 'name' key, but if it's omitted, methods are given
names like 'fetch_X_alias'.
Ah ... right ... detestable :-) Let's use something tied to the field
name, not the class, as I mentioned above.

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone: (607) 255-9645
Vsevolod (Simon) Ilyushchenko
2004-11-11 23:41:05 UTC
Permalink
Ray,
Post by Ray Zimmerman
Right. Except that 'list_of_authors' in $book implies that you've
specified a reverse fetch of the book field in author, so
$author->{book} is just an id, not an object. More on object vs id below.
Okay, this is my last stumbling block. I just don't get why you want to
sometimes refer to the object and sometimes to the id. If it's
implemented this way, I'll have to say $author->book->id instead of
$author->book_id for auto/lazy, if I want to to just get the id value.
Also, if I switch from auto/lazy to manual fetch, I'll have to change
my code.

Can you give me an example of manual fetch that cannot be implemented
otherwise or that makes life more convenient? Manual remove, OTOH, is
clearly different from the other two methods of removal.
Post by Ray Zimmerman
But I don't think there is anything in the new has_a design that
REQUIRES one to maintain consistency though an SPOPS level cache, right?
Aww, who am I to argue with developers? I'll make sure both cases work.
Post by Ray Zimmerman
I think the only thing you need to do is ensure that you don't have any
auto-fetching loops (might even want to include lazy-fetching) when you
do the configuration. In other words, you don't want to have a book
auto-fetch it's list of authors AND have the author set to auto-fetch
its book, creating an infinite loop. Even if you use a cache this would
cause circular references which cause a problem for garbage collection
unless you use weak references.
Hmmm... circular references will actually come up even without circular
configuration. If an author refers to a list of books, and the books
refer back to the author, there you have it...
Post by Ray Zimmerman
I think this checking is important, but I haven't honestly given any
thought to how to implement it.
I'm currently doing this checking during saving/removing objects. The
configuration allows loops - I'm afraid that if this is prohibited at
the configuration stage, the potential for error is too great.
Post by Ray Zimmerman
I vote once again that we stick with my proposal to use 'fetch_'
prepended to the name of the field, by default, and allow an option to
explicitly specify a method name.
Okay.
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
This should not be a problem, because in my current proposal the
programmer specifies either has_a or has_many (which implies the
reverse has_a), so no conflicts should be possible. However, if we
change the syntax, this issue will go away.
But since you can put a 'has_many' in Book and a 'has_a' in Author, for
example, where Author has a 'book' field, I think they can be
inconsistent. In my proposal you, for the 'has_a' in Author, you either
specify a forward or reverse direction with no way to specify something
conflicting in the Book class.
Who's to stop the developer from saying that Author has_a Book and Book
has_a Author?
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
'Link_class' refers to the Perl class name, 'link_class_alias' - to
the method name used to retrieve its instances (this is your
'list_field' in the 'link' hash).
But can't you always get the one, given the other?
No, the alias can be overriden by the user.

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Ray Zimmerman
2004-11-15 13:51:01 UTC
Permalink
Post by Vsevolod (Simon) Ilyushchenko
Okay, this is my last stumbling block. I just don't get why you want
to sometimes refer to the object and sometimes to the id. If it's
implemented this way, I'll have to say $author->book->id instead of
$author->book_id for auto/lazy, if I want to to just get the id value.
Also, if I switch from auto/lazy to manual fetch, I'll have to change
my code.
But you'll have to change code anyway if you switch from auto/lazy to
manual if you have any code that accesses the book object (since it
will no longer be auto/lazy-fetched). I view the specification of the
auto/lazy vs manual as part of specifying the 'type' of the field. If
you change the 'type' you should expect to have to change code in order
to get at the same data (e.g. the id).

If I understand your proposal you would always have the field hold the
id and have it store the auto/lazy-fetched object into some other key
in the parent object? What key? This approach requires you to specify
for each auto/lazy fetched field another name by which you access the
object. You have to worry about inconsistency between the id and the
object. When you go to save, if the id in the field doesn't match the
id of the object in corresponding key, which do you save?

I just think it is simpler and more straightforward to use the single
field and think of the auto/lazy vs manual as part of the type
definition for that field.
Post by Vsevolod (Simon) Ilyushchenko
Can you give me an example of manual fetch that cannot be implemented
otherwise or that makes life more convenient? Manual remove, OTOH, is
clearly different from the other two methods of removal.
Well, I think the example that I gave in a previous e-mail ... wait, I
now see that that example made no sense. Sorry. For some reason I was
thinking that the remove spec appeared inside the fetch, not at the
same level. Nevermind.

OK, I think I understand your point. IF you use separate field names
for the id (persistent field) and the auto/lazy-fetched object (temp
non-persistent field), I suppose there is no need for manual fetch.

But if you use the same field for the id and the auto-fetched object,
then you DO need the manual option. I still think this is a cleaner
approach since you're not cluttering up your object with extra fields
unnecessarily and it allows you treat some fields as ids (manual) and
others as objects (auto/lazy), which I find useful conceptually.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
I think the only thing you need to do is ensure that you don't have
any auto-fetching loops (might even want to include lazy-fetching)
when you do the configuration. In other words, you don't want to have
a book auto-fetch it's list of authors AND have the author set to
auto-fetch its book, creating an infinite loop. Even if you use a
cache this would cause circular references which cause a problem for
garbage collection unless you use weak references.
Hmmm... circular references will actually come up even without
circular configuration. If an author refers to a list of books, and
the books refer back to the author, there you have it...
Right, I agree. After thinking about this a bit more, I think there are
two separate issues here. They are related to one another and both
affect, or are affected by, the presence or absence of caching. One is
the circular configuration which could result in auto-fetching loops
and the other is circular references in the objects which prevent
automatic garbage collection.

I think the first one should be handled by SPOPS and it should be at
the earliest possible stage (configuration of the class if possible,
run-time if not).

I think the second one is outside the scope of SPOPS. The developer
needs to be aware of which objects hold references which other objects.
It is certainly possible for an object A to reference B which
references C which in turn references A causing a circular reference
involving more than just two classes. Another thing to keep in mind is
that sometimes circular references are needed/desired, but the
developer needs to handle the breaking of the circular ref manually
before destruction. I don't think it is the job of SPOPS to prevent
circular references.

When implementing an SPOPS cache it may be possible to handle circular
references in an intelligent way during flushing of the cache for
example, but I view this as a separate issue that should not be mixed
in with the new has-a semantics.
Post by Vsevolod (Simon) Ilyushchenko
I'm currently doing this checking during saving/removing objects. The
configuration allows loops - I'm afraid that if this is prohibited at
the configuration stage, the potential for error is too great.
Are you talking about circular references or auto-fetch loops? If the
latter, why is it more error prone at the configuration stage? I think
the circular reference issue is fundamentally a run-time issue since it
deals with instances of objects referring to one another, and for the
reasons stated above, I don't think SPOPS should try to tackle this
except maybe w.r.t. caching. On the other hand I think the auto-fetch
loop issue is about class (as opposed to instance) behavior and is
therefore fundamentally a configuration issue. No class should be able
to auto-fetch a field which through further auto-fetch configuration
ends up auto-fetching an object of the original class.

It may be easier to implement this checking at run-time, but it is
still a configuration level issue and I think it should be handled
during configuration if possible. In other words, if a developer
creates an auto-fetch loop, he has a bug. If the checking is done at
the config stage, the bug will be exposed the first time the classes
are created. If the checking is done during fetching, the class might
be used successfully for a while without the bug be exposed if the
particular field in question isn't populated for example. It's always
best to detect errors at the earliest possible stage.
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
But since you can put a 'has_many' in Book and a 'has_a' in Author,
for example, where Author has a 'book' field, I think they can be
inconsistent. In my proposal you, for the 'has_a' in Author, you
either specify a forward or reverse direction with no way to specify
something conflicting in the Book class.
Who's to stop the developer from saying that Author has_a Book and
Book has_a Author?
True, true. But I think of a 'has_many' and a reverse 'has_a' as two
ways of defining the behavior of a single uni-directional link between
the two classes (the Author's book field). In that sense, allowing both
makes it possible to define contradicting behavior for that single
link. I view your example, on the other hand, as defining the behavior
of two different uni-directional (and obviously circular) links. I
don't think this is a contradictory configuration, but it does require
SPOPS to ensure that it doesn't create an infinite auto-fetch loop, and
the developer needs to be aware of the implications for circular
references (which will depend on manual vs lazy and cache
implementation).


Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone: (607) 255-9645
Chris Winters
2004-12-01 17:23:02 UTC
Permalink
Post by Ray Zimmerman
...
But you'll have to change code anyway if you switch from auto/lazy to
manual if you have any code that accesses the book object (since it
will no longer be auto/lazy-fetched). I view the specification of the
auto/lazy vs manual as part of specifying the 'type' of the field. If
you change the 'type' you should expect to have to change code in order
to get at the same data (e.g. the id).
If I understand your proposal you would always have the field hold the
id and have it store the auto/lazy-fetched object into some other key
in the parent object? What key? This approach requires you to specify
for each auto/lazy fetched field another name by which you access the
object. You have to worry about inconsistency between the id and the
object. When you go to save, if the id in the field doesn't match the
id of the object in corresponding key, which do you save?
I just think it is simpler and more straightforward to use the single
field and think of the auto/lazy vs manual as part of the type
definition for that field.
...
OK, I think I understand your point. IF you use separate field names
for the id (persistent field) and the auto/lazy-fetched object (temp
non-persistent field), I suppose there is no need for manual fetch.
But if you use the same field for the id and the auto-fetched object,
then you DO need the manual option. I still think this is a cleaner
approach since you're not cluttering up your object with extra fields
unnecessarily and it allows you treat some fields as ids (manual) and
others as objects (auto/lazy), which I find useful conceptually.
Just to be clear what we're talking about here:

book:
book_id, title, publisher_id

publisher:
publisher_id, name

book_author:
book_id, author_id

author:
author_id, name

Book 1 ---> 1 Publisher
Book 1 ---> * Author

my $book = Book->fetch(42);
$book->publisher == object (whether manual/lazy fetch)
$book->publisher_id == ID
$book->author == list of objects (whether manual/lazy fetch)

Is that right? Because it sounds from the discussion as if:

$book->publisher == sometimes ID, sometimes object,
depending on the configuration

which IMO is very confusing.

Chris
--
Chris Winters (***@cwinters.com)
Building enterprise-capable snack solutions since 1988.
Vsevolod (Simon) Ilyushchenko
2004-12-01 17:59:04 UTC
Permalink
Post by Chris Winters
my $book = Book->fetch(42);
$book->publisher == object (whether manual/lazy fetch)
$book->publisher_id == ID
$book->author == list of objects (whether manual/lazy fetch)
$book->publisher == sometimes ID, sometimes object,
depending on the configuration
which IMO is very confusing.
Yes, yes, yes! My point exactly!

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Ray Zimmerman
2004-12-01 20:06:03 UTC
Permalink
Post by Vsevolod (Simon) Ilyushchenko
Post by Chris Winters
my $book = Book->fetch(42);
$book->publisher == object (whether manual/lazy fetch)
$book->publisher_id == ID
$book->author == list of objects (whether manual/lazy fetch)
$book->publisher == sometimes ID, sometimes object,
depending on the configuration
which IMO is very confusing.
Yes, yes, yes! My point exactly!
Aw, c'mon Simon ... I thought I'd convinced you on this one too. Now I
have to convince Chris too? :-)

Chris, have you gotten through our whole previous discussion? I think
I've explained why I think this is NOT confusing (at least to me :-) I
can try to restate my reasons if you think it would help.

Ray
Chris Winters
2004-12-01 20:20:04 UTC
Permalink
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
Post by Chris Winters
my $book = Book->fetch(42);
$book->publisher == object (whether manual/lazy fetch)
$book->publisher_id == ID
$book->author == list of objects (whether manual/lazy fetch)
$book->publisher == sometimes ID, sometimes object,
depending on the configuration
which IMO is very confusing.
Yes, yes, yes! My point exactly!
Aw, c'mon Simon ... I thought I'd convinced you on this one too. Now I
have to convince Chris too? :-)
Chris, have you gotten through our whole previous discussion? I think
I've explained why I think this is NOT confusing (at least to me :-) I
can try to restate my reasons if you think it would help.
I believe I read through everything. I just think you're overestimating
how much time and effort people are willing to spend to sort these out. We
who have been using SPOPS for a while see the relationship between the
configuration and the code as blindingly obvious.

But for new people -- or even people who only use SPOPS occasionally --
it's not. And the idea that the behavior of an existing method could
change based on configuration is very disconcerting. People like to think
of their objects as somewhat stable -- even Perl people! -- and when we
change the meaning under the hood like that it's untrustworthy.

OTOH, the idea that something *new* happens -- like adding new methods --
as a result of configuration changes isn't so bad because you have the
choice if you want to use the new feature or not.

Chris
--
Chris Winters (***@cwinters.com)
Building enterprise-capable snack solutions since 1988.
Ray Zimmerman
2004-12-03 17:44:00 UTC
Permalink
Chris,

I only got a chance to skim over your e-mails the other day. Now that
I've looked at your example more carefully, I don't think you got
exactly what I'm proposing yet, so I'll try to clarify and then respond
to some of your points.
Post by Chris Winters
book_id, title, publisher_id
publisher_id, name
book_id, author_id
author_id, name
Book 1 ---> 1 Publisher
Book 1 ---> * Author
my $book = Book->fetch(42);
$book->publisher == object (whether manual/lazy fetch)
$book->publisher_id == ID
$book->author == list of objects (whether manual/lazy fetch)
$book->publisher == sometimes ID, sometimes object,
depending on the configuration
which IMO is very confusing.
In this example it appears that book has a 'publisher_id' field only,
so there would be no 'publisher' method. For the sake of describing my
proposal let me rename the 'publisher_id' field to 'publisher', which
will still hold the publisher's id in the database. My proposal is as
follows:

First, regardless of the configuration ...
1. The publisher column in the database holds the ID of a publisher
object.
2. The book object has a fetch_publisher() method for fetching the
object (config can optionally specify different method name).
$book->fetch_publisher == object
3. The book object has a publisher() method which always returns the
value of the field in the book object.
i.e. $book->publisher == $book->{publisher}

For manual fetch configuration, the publisher field in the book object
is ALWAYS the ID stored in publisher column in the database.
$book->publisher == ID

For auto/lazy fetch configuration, the publisher field in the book
object is ALWAYS the publisher OBJECT whose ID is stored in the
publisher column in the database.
$book->publisher == object
In the case of auto-fetch, the assignment, $book->{publisher} =
$book->fetch_publisher, is executed when the $book object is fetched.
In the lazy-fetch case it happens the first time $book->publisher is
called (or $book->{publisher} is accessed). Again, the purpose of this
is to allow me to think of my book's publisher field as a Publisher
object.
Post by Chris Winters
I believe I read through everything. I just think you're overestimating
how much time and effort people are willing to spend to sort these out. We
who have been using SPOPS for a while see the relationship between the
configuration and the code as blindingly obvious.
But for new people -- or even people who only use SPOPS occasionally --
it's not. And the idea that the behavior of an existing method could
change based on configuration is very disconcerting. People like to think
of their objects as somewhat stable -- even Perl people! -- and when we
change the meaning under the hood like that it's untrustworthy.
OTOH, the idea that something *new* happens -- like adding new methods --
as a result of configuration changes isn't so bad because you have the
choice if you want to use the new feature or not.
I agree completely. But we are not changing the behavior of any
existing methods. The publisher method has always returned the value
held in the publisher field of the Perl object and that is what it
still returns. What we are doing is adding the *new* ability to
configure a field to be an auto/lazy-fetched object instead of simply
the ID. This is a new feature which is why I agree that it is a good
idea to use a name other than 'has_a' for this config.

I really don't like the idea of a 'publisher_id' field configured to
auto-fetch an object into a 'publisher' field. I think using two fields
clutters up the object unnecessarily and creates other ambiguities that
have to be resolved/documented. Since this wasn't envisioned when I
wrote up the detailed description of my original design, I haven't
thought through this in detail, but here are two issues off the top of
my head:

- What do you call the field where you stash the auto-fetched object?
Always specify explicit name in config?
- When auto-saving how should SPOPS handle inconsistencies between
the id of the object in the publisher field and the value in the
publisher_id field? Or to put it another way, if I want to re-assign a
book's publisher, do I assign a new id to the publisher_id field or do
I assign a new object to the publisher field?

Seems much more messy to me for something that I consider to be
completely new functionality (no backward compatibility issues to worry
about). While it does eliminate the need for the manual fetch option,
as Simon mentioned earlier (and probably even lazy fetch), it also
eliminates one of the main features for me, which is the ability to
take an existing field and treat it conceptually as an object.

I suppose if someone insisted on being able to auto-fetch into a second
field I wouldn't object too strongly to permitting that as an option,
but I think it adds unnecessary complexity which would need to be
documented.

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone: (607) 255-9645
Vsevolod (Simon) Ilyushchenko
2004-12-04 13:58:00 UTC
Permalink
Hi,
Post by Ray Zimmerman
For auto/lazy fetch configuration, the publisher field in the book
object is ALWAYS the publisher OBJECT whose ID is stored in the
publisher column in the database.
$book->publisher == object
In the case of auto-fetch, the assignment, $book->{publisher} =
$book->fetch_publisher, is executed when the $book object is fetched. In
the lazy-fetch case it happens the first time $book->publisher is called
(or $book->{publisher} is accessed). Again, the purpose of this is to
allow me to think of my book's publisher field as a Publisher object.
I don't think you can persuade me that this approach is neater. One of
the best things about SPOPS is that it lets me think accurately in terms
of my tables and my objects at the same time. Each database field name
corresponds to an object variable name, period. Having the 'publisher'
id column correspond to a 'publisher' object will break the neat mental
picture. Plus, there is a huge downside that if I want to find out
parent_id from a child (for the auto/lazy fetch), I'll have to call
child->parent->id, which means that for the lazy fetch I have to
retrieve the parent to get parent's id, though it's present in the child
anyway. However, see more below.
Post by Ray Zimmerman
- What do you call the field where you stash the auto-fetched object?
Always specify explicit name in config?
This can be auto-deduced from class names (config hash keys, not Perl
names). Currently SPOPS supports the config key {main_alias} to be used
in such cases.
Post by Ray Zimmerman
- When auto-saving how should SPOPS handle inconsistencies between the
id of the object in the publisher field and the value in the
publisher_id field? Or to put it another way, if I want to re-assign a
book's publisher, do I assign a new id to the publisher_id field or do I
assign a new object to the publisher field?
It's possible to generate methods in such a way that all related id
fields and objects are updated when a change happens. However, this is
implementing caching all over again. This makes me think that maybe
storing objects in fields is not such a good idea after all. (Currently
SPOPS does not do it, and the code that I've written before Ray's
objections does not do it either.)

This brings up another problem, though. How should save() work in the
many-to-many configuration when the parent is saved? Currently links_to
calling addChild() method already causes an update of the linking table.
Hence, when parent->save() is called, there is no need to update the
linking table, though the framework can go ahead and save the children
objects if auto-save is specified. Oh, wait! We don't have the children
objects because we don't like storing objects in the fields! We have to
re-fetch the children and then save them. Fortunately, I have the
"cache_only" option for fetch_group() which will only return objects in
cache, but we still have to make a database call because we don't know
what IDs we want.

In the one-to-many configuration, the addChild method updates the
parent_id field in the child object and saves it, which amounts to
auto-save, even though it may not have been requested. The code can be
changed, though, to only save the new parent_id of the child object.

The problem lies in the fact that we perform database saves not just in
the save() method, but also in the add/removeChild operations. What
happens if the add/remove operations do not access the database? We have
to either 1) maintain a hidden list of dependent ids and work with it,
or 2) go Ray's way and maintain a list of dependent objects. To maintain
consistency we can either A) add logic to generated methods to keep
those ids in sync, or B) prohibit saying child->parent_id if it can be
obtained as child->parent->id. How do we deal, though, with manual fetch
(when a dependent object is not stored) plus auto-save (when a dependent
object/id should be stored to be saved)?

I'll try to install Hibernate this weekend and see what they do there.
I've looked at Fowler's "Patterns of Enterprise Application
Architechture", but I don't think he brings up the problem of saves.

Fowler gave me an idea, though - the various id fields pointing to other
objects should be read-only. This way there is less chance of creating
confusion. Currently the methods that return objects (parent() or
children()) are only getters, but if we want to be really
object-oriented, it would be nice to make them setters too, and if we
can't set ids directly any more, setters may become manageable.
Post by Ray Zimmerman
Seems much more messy to me for something that I consider to be
completely new functionality (no backward compatibility issues to worry
about). While it does eliminate the need for the manual fetch option, as
Simon mentioned earlier (and probably even lazy fetch), it also
eliminates one of the main features for me, which is the ability to take
an existing field and treat it conceptually as an object.
But I like having parent_id() and parent() methods separate, because I
should not completely forget that I'm dealing with a database.

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Ray Zimmerman
2004-12-06 17:37:05 UTC
Permalink
Post by Vsevolod (Simon) Ilyushchenko
Again, the purpose of this is to allow me to think of my book's
publisher field as a Publisher object.
I don't think you can persuade me that this approach is neater.
Maybe not ... but I hope you don't mind my trying :-)
Post by Vsevolod (Simon) Ilyushchenko
One of the best things about SPOPS is that it lets me think accurately
in terms of my tables and my objects at the same time. Each database
field name corresponds to an object variable name, period. Having the
'publisher' id column correspond to a 'publisher' object will break
the neat mental picture.
Your mental picture is a totally valid one, and may often be the best
one, but IMHO it is not the only valid one. The value in the database
is the persistent version of the corresponding field in the objects,
but need not be in an identical form. Developers should be able to
define how one maps to the other in a way that is most useful to them.
Take a look at SPOPS::Tool::DateConvert for an example of how SPOPS
already allows this type of behavior as an option. (Btw, Chris, this a
good example of where changing configuration forces you to change what
you expect to find in that particular field of your object ... which I
still find quite reasonable).

I would say that if you want the "value in database eq value in object"
mental picture, then all you're saying is that you want to configure
for manual fetching.
Post by Vsevolod (Simon) Ilyushchenko
Plus, there is a huge downside that if I want to find out parent_id
from a child (for the auto/lazy fetch), I'll have to call
child->parent->id, which means that for the lazy fetch I have to
retrieve the parent to get parent's id, though it's present in the
child anyway. However, see more below.
Again, I think you're just saying that you want manual fetching. Why
would you configure it for auto/lazy fetch if all you wanted was the
id? If you only need the object sometimes and want the parent() method
to always return an id, you can still always get to the object using
the fetch_parent() method (which you can even rename if you like).
Post by Vsevolod (Simon) Ilyushchenko
- What do you call the field where you stash the auto-fetched
object? Always specify explicit name in config?
This can be auto-deduced from class names (config hash keys, not Perl
names). Currently SPOPS supports the config key {main_alias} to be
used in such cases.
Unless I don't understand what you're suggesting, I think this goes
back to our previous discussion about what to use for the config key,
alias (old has_a syntax) or field name (new has_a syntax). Those
aliases are only unique to the class, but we need something unique to
the field to be able to handle the case where you have two fields which
belong to the same class.
Post by Vsevolod (Simon) Ilyushchenko
- When auto-saving how should SPOPS handle inconsistencies between
the id of the object in the publisher field and the value in the
publisher_id field? Or to put it another way, if I want to re-assign
a book's publisher, do I assign a new id to the publisher_id field or
do I assign a new object to the publisher field?
It's possible to generate methods in such a way that all related id
fields and objects are updated when a change happens. However, this is
implementing caching all over again. This makes me think that maybe
storing objects in fields is not such a good idea after all.
(Currently SPOPS does not do it, and the code that I've written before
Ray's objections does not do it either.)
No storing objects in fields? You mean you want to eliminate
auto/lazy-fetch as an option?

Ah ... wait a minute ... are you assuming a cache? I suppose with a
cache you could do auto/lazy-fetching without storing the auto-fetched
objects in the primary object. But as I mentioned before (and I seem to
remember Chris saying the same), I think caching and the new has_a (or
relates_to) functionality should be treated separately. I have
certainly been thinking of these as completely separate. For the
purposes of designing this new functionality, I've *always* assumed no
caching. Caching can be added as an independent option later.
Post by Vsevolod (Simon) Ilyushchenko
This brings up another problem, though. How should save() work in the
many-to-many configuration when the parent is saved? Currently
links_to calling addChild() method already causes an update of the
linking table. Hence, when parent->save() is called, there is no need
to update the linking table, though the framework can go ahead and
save the children objects if auto-save is specified. Oh, wait! We
don't have the children objects because we don't like storing objects
in the fields! We have to re-fetch the children and then save them.
Fortunately, I have the "cache_only" option for fetch_group() which
will only return objects in cache, but we still have to make a
database call because we don't know what IDs we want.
In my proposal, auto-saving is off by default for 'link'. So you don't
save either the link or the child object. Details are in
<http://sourceforge.net/mailarchive/forum.php?
thread_id=110632&forum_id=3222>.
Post by Vsevolod (Simon) Ilyushchenko
In the one-to-many configuration, the addChild method updates the
parent_id field in the child object and saves it, which amounts to
auto-save, even though it may not have been requested. The code can be
changed, though, to only save the new parent_id of the child object.
It's only an auto-save if it happens automatically as a result of
saving the object that fetched it. I view calling an add_*() method as
an explicit (i.e. "manual") save operation on the child and think it
should be documented as such.
Post by Vsevolod (Simon) Ilyushchenko
The problem lies in the fact that we perform database saves not just
in the save() method, but also in the add/removeChild operations. What
happens if the add/remove operations do not access the database? We
have to either 1) maintain a hidden list of dependent ids and work
with it, or 2) go Ray's way and maintain a list of dependent objects.
To maintain consistency we can either A) add logic to generated
methods to keep those ids in sync, or B) prohibit saying
child->parent_id if it can be obtained as child->parent->id.
I think all of these options are ugly, which is why I propose that we
stick with saving the child object in the add_*() methods and
documenting that clearly.
Post by Vsevolod (Simon) Ilyushchenko
How do we deal, though, with manual fetch (when a dependent object is
not stored) plus auto-save (when a dependent object/id should be
stored to be saved)?
This issue is also addressed in my proposal in the e-mail referenced
above. Since you can't auto-save an object you don't have, auto-saving
with manual fetch makes no sense, and attempting to configure a field
that way should throw an error.
Post by Vsevolod (Simon) Ilyushchenko
Seems much more messy to me for something that I consider to be
completely new functionality (no backward compatibility issues to
worry about). While it does eliminate the need for the manual fetch
option, as Simon mentioned earlier (and probably even lazy fetch), it
also eliminates one of the main features for me, which is the ability
to take an existing field and treat it conceptually as an object.
But I like having parent_id() and parent() methods separate, because I
should not completely forget that I'm dealing with a database.
So why not just use ...

relates_to => {
parent_id => {
class => 'Parent::Class',
fetch => {
type => 'manual'
name => 'parent'
}
}
}

Doesn't that give you what you want? Turn on caching and you even have
your version of lazy-loading, right? The parent_id() method always
gives you the id and the parent() method always gives you the (possibly
cached) object. I think the only thing missing is auto-fetching (into
cache only) without stashing the objects in a field. This 'auto-cache'
option, to give it a name, obviously requires that caching be turned on
and would have to either throw an error or revert to 'manual' if a
cache were not present. But, I honestly don't see the need for this
option. If you ALWAYS want to auto-fetch the secondary object(s), then
you're never in the situation where you need to fetch them just to get
the id. Both are always available. So why not just auto-fetch them into
the field as I proposed?

I think it's clear that there are differences in the way you and I like
to think about our objects/data and the way we intend to use SPOPS, and
this new functionality in particular. And this is all perfectly
reasonable. My hope is that we can find a clean, consistent,
easy-to-document/understand design that is as flexible as possible, and
in particular, flexible enough to encompass both of our requirements.
After all this discussion, I still think my original design, in all of
its gory detail, accomplishes this. If there is a favorite mental
picture or usage scenario that is excluded by my proposed design (or
even made more difficult or confusing to configure), I don't yet see
it. ( And I am trying :-)


Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone: (607) 255-9645
Vsevolod (Simon) Ilyushchenko
2004-12-07 00:12:01 UTC
Permalink
Chris,

Your approval is eventually needed here.

Ray,
Post by Ray Zimmerman
Again, I think you're just saying that you want manual fetching. Why
would you configure it for auto/lazy fetch if all you wanted was the
id? If you only need the object sometimes and want the parent() method
to always return an id, you can still always get to the object using
the fetch_parent() method (which you can even rename if you like).
I think this finally got through. Even before reading the end of your
email, I realized that, indeed, I am describing what you call "manual
fetching". I was objecting to the presence of the word 'fetch', but I
should be able to rename the method in the config.
Post by Ray Zimmerman
Unless I don't understand what you're suggesting, I think this goes
back to our previous discussion about what to use for the config key,
alias (old has_a syntax) or field name (new has_a syntax). Those
aliases are only unique to the class, but we need something unique to
the field to be able to handle the case where you have two fields which
belong to the same class.
Well, class alias and field name can be combined. My point is that a
field is an integer and an object is an object, and I don't like mixing
their names. In fact, the idea of substituting objects for ids (in your
auto-save case) still weirds me out, even though I see where you are
going with it. I'd like Chris's input on this.

Another weirdness-inducing thing is that with your auto- or
lazy-fetching, we effectively have a limited cache, as the fetched
objects are stored in the hidden fields, but with manual fetching we do
not have a hidden cache. Thus, simply changing SPOPS config will change
the code behavior, and it may not even be noticed at once during
compilation (unlike the id-to-object change), because the lack of
caching only rears its head when you fetch the same object more than
once. No matter how much you document this, it's not obvious for a
beginner, and thus the warnings will be ignored.

Your example with DateConvert which is also a configuration change is
not quite in the same league, because then you are clearly changing a
datum type. But a switch from auto- to manual fetching would look very
innocent to a beginner.
Post by Ray Zimmerman
No storing objects in fields? You mean you want to eliminate
auto/lazy-fetch as an option?
Ah ... wait a minute ... are you assuming a cache?
No, that would be even more weird. However, note my comment in the
previous email that the foreign key ids may be made read-only. This will
eliminate a way to create inconsistencies.
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
How do we deal, though, with manual fetch (when a dependent object is
not stored) plus auto-save (when a dependent object/id should be
stored to be saved)?
This issue is also addressed in my proposal in the e-mail referenced
above. Since you can't auto-save an object you don't have, auto-saving
with manual fetch makes no sense, and attempting to configure a field
that way should throw an error.
Actually, I can think of a situation when it does make sense. Suppose
you edit a web page listing info for an author and all the books he
wrote. First you retrieve the old author and book objects from the
database, then you replace some values, then you save the author objects
and would like the book objects become updated to. Even if they have not
been fetched yet, they would get fetched when get an array of children
to iterate over.

You'll say that this is more suited for auto-fetching, but I may still
want to use the manual fetching for the "neatness" reasons. :)

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Ray Zimmerman
2004-12-07 16:25:08 UTC
Permalink
Simon,
Post by Vsevolod (Simon) Ilyushchenko
Another weirdness-inducing thing is that with your auto- or
lazy-fetching, we effectively have a limited cache, as the fetched
objects are stored in the hidden fields, but with manual fetching we
do not have a hidden cache. Thus, simply changing SPOPS config will
change the code behavior, and it may not even be noticed at once
during compilation (unlike the id-to-object change), because the lack
of caching only rears its head when you fetch the same object more
than once. No matter how much you document this, it's not obvious for
a beginner, and thus the warnings will be ignored.
I don't see why this is so difficult, even for beginners. I think you
are confusing things by thinking of it as a cache. A cache is something
that sits between you and the database and intercepts fetch requests to
avoid going to the database. That is not what we have here.

We have two types of methods, accessor methods and fetch methods. In
both the manual and the auto/lazy case, the object's accessor methods
simply return what is stored in the Perl object. In the auto/lazy case,
you expect these to be secondary objects that were fetched previously
and stashed. In the manual case, you are simply saying that you don't
want to store any secondary objects inside your primary object. So you
don't expect to have any accessors that return objects. You still have
convenience methods to fetch these objects, but these are "fetch"
methods, not accessors.

(I just looked back at my original proposal and I see that to be
consistent with what I just said, I would need to change the
'manual_by' option so that the fetch method does NOT store the
retrieved objects in the primary object. We have 'lazy_by' to handle
the "fetch into the object on-demand" case.)
Post by Vsevolod (Simon) Ilyushchenko
Your example with DateConvert which is also a configuration change is
not quite in the same league, because then you are clearly changing a
datum type. But a switch from auto- to manual fetching would look very
innocent to a beginner.
I think that putting the "manual implies id, auto/lazy implies object"
very prominently in the documentation should be sufficient, but you
seem to think not. Can we come up with something other than
'manual/auto/lazy' for the fetch type that captures the differences in
data type as well as differences in fetch behavior?

We already talked about changing *_by to reverse_*, so how about
renaming my proposed fetch types to something like the following?

manual => manual
auto => auto_object
lazy => lazy_object
manual_by => reverse_manual
auto_by => reverse_auto_object
lazy_by => reverse_lazy_object

Does that help?
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
No storing objects in fields? You mean you want to eliminate
auto/lazy-fetch as an option?
Ah ... wait a minute ... are you assuming a cache?
No, that would be even more weird. However, note my comment in the
previous email that the foreign key ids may be made read-only. This
will eliminate a way to create inconsistencies.
OK, now I am confused ... I must have missed what you mean by not
storing objects in fields. If you're auto-fetching something that isn't
going into cache and you're not storing it in the object, where is it
going? Or did you just mean that you store it in the object under a key
that isn't a persistent field?

If so, I think that having the id in the persistent field and the
corresponding object stored under some other key, with the id being
read-only leads to more potential confusion than just storing the
object in the persistent field like I propose. But I suppose this is a
matter of opinion. In this case, if you want to set a new value for the
secondary object you have to change the object if it's auto/lazy-fetch
(since the id is read only), but you change the id if it's manual
(since you don't even have the object).

How about this ... would the following change to my proposal allow you
to do exactly what you want?

Suppose, we change the 'list_field' key to something like
'object_field' and in addition to it's current meaning for reverse
direction fetches, we use it for forward direction fetches to
optionally specify the key under which to store the auto/lazy-fetched
object (i.e. the name of the object accessor). If not specified, the
object would be fetched into the same field (just like the original
proposal). For saves and removes in the auto/lazy-fetch case, I think
the auto-fetched object should still override the id (and we could make
the id field read-only in this case as you mentioned).

I don't personally see the need for the additional complexity
introduced with this option, but I guess it's not so bad ... just more
to document and more for new users to digest.

I *think* this would let you do what you want without preventing me
from doing what I want, right?
Post by Vsevolod (Simon) Ilyushchenko
Post by Ray Zimmerman
Post by Vsevolod (Simon) Ilyushchenko
How do we deal, though, with manual fetch (when a dependent object
is not stored) plus auto-save (when a dependent object/id should be
stored to be saved)?
This issue is also addressed in my proposal in the e-mail referenced
above. Since you can't auto-save an object you don't have,
auto-saving with manual fetch makes no sense, and attempting to
configure a field that way should throw an error.
Actually, I can think of a situation when it does make sense. Suppose
you edit a web page listing info for an author and all the books he
wrote. First you retrieve the old author and book objects from the
database, then you replace some values, then you save the author
objects and would like the book objects become updated to. Even if
they have not been fetched yet, they would get fetched when get an
array of children to iterate over.
You'll say that this is more suited for auto-fetching, but I may still
want to use the manual fetching for the "neatness" reasons. :)
I'm not sure I understand your example (are you modifying any
relationships or just the data?), so it's difficult to say which is
more suited, but it seems to me if you are going to do manual fetching
and updating, then manual saving should not be such a burden. If the
book objects haven't been fetched yet, then they haven't been modified
and don't need to be saved either, right?

If you think the changes I proposed here will satisfy your
requirements, I'll be happy to update my proposed spec to incorporate
them. For me, having a design written up in nitty gritty detail helps
in keeping everyone on the same page and might make it easier for Chris
to keep up with all of our discussions.

Ray Zimmerman
Director, Laboratory for Experimental Economics and Decision Research
428-B Phillips Hall, Cornell University, Ithaca, NY 14853
phone: (607) 255-9645
Vsevolod (Simon) Ilyushchenko
2004-12-11 13:15:03 UTC
Permalink
Ray,
Post by Ray Zimmerman
(I just looked back at my original proposal and I see that to be
consistent with what I just said, I would need to change the 'manual_by'
option so that the fetch method does NOT store the retrieved objects in
the primary object. We have 'lazy_by' to handle the "fetch into the
object on-demand" case.)
Right, then it will precisely mirror the way SPOPS currently works.
Post by Ray Zimmerman
We already talked about changing *_by to reverse_*, so how about
renaming my proposed fetch types to something like the following?
manual => manual
auto => auto_object
lazy => lazy_object
manual_by => reverse_manual
auto_by => reverse_auto_object
lazy_by => reverse_lazy_object
Does that help?
This is clearer.
Post by Ray Zimmerman
If so, I think that having the id in the persistent field and the
corresponding object stored under some other key, with the id being
read-only leads to more potential confusion than just storing the object
in the persistent field like I propose. But I suppose this is a matter
of opinion. In this case, if you want to set a new value for the
secondary object you have to change the object if it's auto/lazy-fetch
(since the id is read only), but you change the id if it's manual (since
you don't even have the object).
That's another example of config-change-influencing-code that bothers me.

After looking at Hibernate, I think I appreciate the consistency of the
auto/lazy approach. The problem is that I like the manual approach too,
but supporting them both makes the framework confusing. I think I'll go
ahead and start implementing the dual approach, as things can be thrown
out if needed, but this definitely has to run by Chris.

I probably did not explain well the read-only id approach in Hibernate.
There are no ids that are foreign keys at all. The only ids visible are
the proper object ids, and even they are read-only.

Also, Hibernate does let you to add/remove children without saving to
the database. Once you save the primary object, all the changes to
dependent objects and collections are saved, and if collection members
have not changed, they are not re-saved. This actually makes the
auto/lazy case more consistent, I think.
Post by Ray Zimmerman
Suppose, we change the 'list_field' key to something like 'object_field'
and in addition to it's current meaning for reverse direction fetches,
we use it for forward direction fetches to optionally specify the key
under which to store the auto/lazy-fetched object (i.e. the name of the
object accessor). If not specified, the object would be fetched into the
same field (just like the original proposal). For saves and removes in
the auto/lazy-fetch case, I think the auto-fetched object should still
override the id (and we could make the id field read-only in this case
as you mentioned).
Sounds good, and see above on ids.
Post by Ray Zimmerman
I'm not sure I understand your example (are you modifying any
relationships or just the data?),
Just the data.
Post by Ray Zimmerman
so it's difficult to say which is more
suited, but it seems to me if you are going to do manual fetching and
updating, then manual saving should not be such a burden. If the book
objects haven't been fetched yet, then they haven't been modified and
don't need to be saved either, right?
I'm saying that presumably the framework will generate for me an
auto-save method that looks like:

sub Author::save()
{
SUPER::save(); #Calling SPOPS methods for a single object
foreach my $book ($self->books)
{
$book-save();
}
}

This will only work correctly with cache, though. If the books have been
fetched before, I will get the same instances. But if you say that the
code should function equally well with and without cache, than this will
indeed make no sense in the manual-fetch case.
Post by Ray Zimmerman
If you think the changes I proposed here will satisfy your requirements,
I'll be happy to update my proposed spec to incorporate them. For me,
having a design written up in nitty gritty detail helps in keeping
everyone on the same page and might make it easier for Chris to keep up
with all of our discussions.
This may not be a bad thing.

Simon
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Chris Winters
2004-12-01 16:49:04 UTC
Permalink
Post by Ray Zimmerman
...
I think that both links_to and has_many share a fundamental flaw. They
both allow relationships to be defined from the "other" end, making it
possible to define conflicting behavior for a single relationship. This
unnecessarily complicates the error checking or precedence rules and
the documentation. I think always forcing the relationship to be
defined by the class with the linking field, and having a single syntax
(has_a) for defining it, makes for a much simpler, cleaner, more
consistent, easier to understand/document/implement, less error prone
design. Including both approaches (e.g. has_many and reverse_auto or
auto_by), in my opinion, only complicates and confuses things further.
Good point.
Post by Ray Zimmerman
For this reason, I think we should forget trying to make the new syntax
backward compatible. I think we should leave the old syntax in place
(but deprecated) as long as necessary for backward compatibility and
add a completely redesigned new syntax that folks can migrate to
gradually or as they need the new features. If it were up to me I
wouldn't touch links_to at all, and I wouldn't touch the old has_a
either. I'd just add a new configuration handler for the case where the
key of a 'has_a' spec matches one of the field names, in which case you
assume it is the new syntax I proposed, otherwise you assume it's a
class and use the old has_a semantics.
My main concern is that we don't break existing code. Everyone seems to
also be avoiding this so: great.

Next is ensuring that new features are easily implemented, testable and
maintainable. I suspect Ray's right in that overloading existing syntax
will make all of these more difficult.

I'm not against adding new syntax/configuration keys as necessary and if
it takes care of both of these items, it's probably the best option.
Post by Ray Zimmerman
[1] I confess I never really did understand the purpose of the alias.
What is the difference between the alias and the class? Isn't one of
them redundant?
Yes, but IME most humans prefer the alias (e.g., 'news') vs. the class
('MyApp::Persist::News'). Plus it's less typing :-)

Chris
--
Chris Winters (***@cwinters.com)
Building enterprise-capable snack solutions since 1988.
Chris Winters
2004-12-01 16:40:02 UTC
Permalink
Terribly sorry it took so long to respond.
Post by Vsevolod (Simon) Ilyushchenko
Now that Chris has some time (hopefully) to consider the changes, I'd
like to submit a patch that would add has_many, auto-save/auto-delete
(optional) and full cache support (which is necessary for the
consistency circular references). Before I do that, I'd like to get
1. There will be a basic cache which will always return the cached
object, not its copy.
Ok.
Post by Vsevolod (Simon) Ilyushchenko
2. The fetch_many() now will also use cache. It will still make a call
to the database, because it won't know which ids to request from the
cache, though.
Ok -- although I'd like to have the current behavior available if the
cache is not enabled. (Where it fetches the actual data from the database,
not just the IDs.)
Post by Vsevolod (Simon) Ilyushchenko
3. Auto-save and auto-delete are off by default. This is not what Ray
requested, but it's the only way to stay backwards compatible. With
auto-save on, the graph of object in memory will be traversed to
determine what has been changed. (Thus, I have deep_changed() instead of
changed().)
I'm for backward compatibility.
Post by Vsevolod (Simon) Ilyushchenko
4. It will be possible to use a separate class that corresponds to the
linking table in links_to (think ClubMembership). The syntax I came up
Club => {
class => 'Club',
...
links_to =>
{ 'Person' =>
{ table => 'ClubMembership',
link_class => 'ClubMembership',
link_class_alias => 'memberships',
alias => 'members',
to_id_field => 'member_id',
from_id_field => 'club_id' },
This looks good.
Post by Vsevolod (Simon) Ilyushchenko
The ClubMembership class can have extra attributes (like date), but they
will be specified in a regular way in the config file. The new attribute
'link_class_alias' will return an arrayref of the ClubMembership
instances. Again, this is not as elegant as Ray proposed, but backwards
compatible.
Ok.
Post by Vsevolod (Simon) Ilyushchenko
An open question is what to do with has_a. Currently it's implemented in
ClassFactory, not ClassFactory/DBI, but adding auto-save and auto-delete
functionality requires the has_a code to be DBI-aware. I can add extra
functionality into ClassFactory/DBI, but I am not sure which effect it
would have on LDAP storage, for example.
We can tell ClassFactory::DBI to override the has_a from ClassFactory. (I
don't remember how to do it off the top of my head, but I'm 98% sure it's
already built-in.)
Post by Vsevolod (Simon) Ilyushchenko
#Simon
#/Simon
and I commented out, not deleted, all replaced code, for ease of
reference. Is this necessary?
No -- that's why we have CVS :-) Just use good comments when you commit.

Chris
--
Chris Winters (***@cwinters.com)
Building enterprise-capable snack solutions since 1988.
Vsevolod (Simon) Ilyushchenko
2004-12-01 16:50:01 UTC
Permalink
Chris,

Please review our subsequent discussion with Ray titled "has_many
progress". He has pretty much converted me to his point of view. He
proposes 1) keeping the cache optional (so I'll have to test two
configuration), 2) storing dependent objects in hashes in objects
instead of retrieving them every time (thus, with cache enabled,
fetch_many can skip trips to the database), and most importantly, 3)
using his syntax for defining relationships. I can still support the
current syntax, of course, but forcing the new configs follow the old
syntax results in awkward rules. I looked at some other frameworks, and
what Ray proposes turns out to be fairly standard.

I am sorry it's not as straightforward.

So even though I already have the code for the currently-out-of-favor
syntax, I'll have to change it to use Ray's syntax. Please let us know
what you think.

Simon
Post by Chris Winters
Terribly sorry it took so long to respond.
Post by Vsevolod (Simon) Ilyushchenko
Now that Chris has some time (hopefully) to consider the changes, I'd
like to submit a patch that would add has_many, auto-save/auto-delete
(optional) and full cache support (which is necessary for the
consistency circular references). Before I do that, I'd like to get
1. There will be a basic cache which will always return the cached
object, not its copy.
Ok.
Post by Vsevolod (Simon) Ilyushchenko
2. The fetch_many() now will also use cache. It will still make a call
to the database, because it won't know which ids to request from the
cache, though.
Ok -- although I'd like to have the current behavior available if the
cache is not enabled. (Where it fetches the actual data from the database,
not just the IDs.)
Post by Vsevolod (Simon) Ilyushchenko
3. Auto-save and auto-delete are off by default. This is not what Ray
requested, but it's the only way to stay backwards compatible. With
auto-save on, the graph of object in memory will be traversed to
determine what has been changed. (Thus, I have deep_changed() instead of
changed().)
I'm for backward compatibility.
Post by Vsevolod (Simon) Ilyushchenko
4. It will be possible to use a separate class that corresponds to the
linking table in links_to (think ClubMembership). The syntax I came up
Club => {
class => 'Club',
...
links_to =>
{ 'Person' =>
{ table => 'ClubMembership',
link_class => 'ClubMembership',
link_class_alias => 'memberships',
alias => 'members',
to_id_field => 'member_id',
from_id_field => 'club_id' },
This looks good.
Post by Vsevolod (Simon) Ilyushchenko
The ClubMembership class can have extra attributes (like date), but they
will be specified in a regular way in the config file. The new attribute
'link_class_alias' will return an arrayref of the ClubMembership
instances. Again, this is not as elegant as Ray proposed, but backwards
compatible.
Ok.
Post by Vsevolod (Simon) Ilyushchenko
An open question is what to do with has_a. Currently it's implemented in
ClassFactory, not ClassFactory/DBI, but adding auto-save and auto-delete
functionality requires the has_a code to be DBI-aware. I can add extra
functionality into ClassFactory/DBI, but I am not sure which effect it
would have on LDAP storage, for example.
We can tell ClassFactory::DBI to override the has_a from ClassFactory. (I
don't remember how to do it off the top of my head, but I'm 98% sure it's
already built-in.)
Post by Vsevolod (Simon) Ilyushchenko
#Simon
#/Simon
and I commented out, not deleted, all replaced code, for ease of
reference. Is this necessary?
No -- that's why we have CVS :-) Just use good comments when you commit.
Chris
--
Simon (Vsevolod ILyushchenko) ***@cshl.edu
http://www.simonf.com

Terrorism is a tactic and so to declare war on terrorism
is equivalent to Roosevelt's declaring war on blitzkrieg.

Zbigniew Brzezinski, U.S. national security advisor, 1977-81
Continue reading on narkive:
Loading...