[Oledb-dev] FillDataOffsets, ulColumnSize and bytea columns
Victor Snezhko
snezhko at indorsoft.ru
Sat May 20 11:10:16 UTC 2006
Shachar Shemesh <psql at shemesh.biz> writes:
>>Hi Shachar,
>>
>>I'm eager to test the patch, but I will not have access to my Visual C
>>build environment till Monday, if you can recompile it, I'll test
>>immediately.
>>
> You can download it from http://www.lingnu.com/oledb, but I'm curious to
> know how you'll test it without a build environment :-)
I have BLOB-aware application that is already compiled and, even if it
weren't, it doesn't need Visual C to compile :)
Your suspicion about the necessity of ulColumnSize assignment in
FillDataOffsets is right. Without it returned byteas have length of
the maximum bytea (i.e. if I have one bytea "test" and another "qw",
the second bytea will be returned as "qw" + 2 bytes of random junk).
>>The following possible fault scenarios come to mind:
>>1) if ADO uses column info to store maximum column size somewhere in
>>it's internals, then we might not be able to store bytea value that is
>>longer than any of the existing ones in the column.
>>
> If that happens, it's an ADO bug. We are OLE DB spec complient.
Hmm, what should ADO think if we (initially) return ulColumnSize equal
not to ~0, but to some value, let's suppose 1000? OLE DB clearly
descripts ulColumnInfo [1] as "The maximum possible length of a value
in the column", and I don't see any spec violation in assuming that
provider can't accept values longer than 1000 bytes in this case.
BTW, I cannot test this until I have compile env. because bytea write
support is needed for this, and I haven't contributed it yet (it
works, but I'd like to improve it first: it accepts ISequentialStream
pointers and doesn't take advantage of the length of the column
(which is supplied via binding)).
> Not to mention that there is nothing we can do against it, or that your
> version was quite worst (plus not being OLE DB spec complient).
If we use ATL Provider templates and support bytea fetching, we can't
be OLE DB spec compliant. OLE DB spec requires to fill ulColumnSize
with ~0 for bytea columns (as this type doesn't have length
restriction), at the same time, ATL template which implements GetData
works incorrectly with ulColumnSize equal to ~0. (A thought sparked
now: we could try to find a way for GetData implementation to use some
other DataConvert, that doesn't allocate all ~0 bytes of memory).
>>2) I also afraid of how long it will take the loop to iterate
>>through all the rows.
>>
> Not much. We do that anyways, after all, when we retrieve the
> values.
I meant that it will increase response time for query to a long table
to begin actually return data to the consumer. But this is definitely
not the showstopper that would stop us from committing the solution
once it works. Performance tuning can always be done later.
>> I would like to compare loop run time with the one of
>>something like "select max(columnsize(column)) from table"
>>
> I wasn't aware of "columnsize" (in fact, I have not found it),
I haven't found the function with such functionality as well (yet), I
just assumed it should exist :)
> but it requires a trifle too much parsing for my liking, anyways.
No parsing is required. Column name can be obtained from PQfname() and
table oid - from PQftable().
>>Aside from all this, I like this patch more than mine.
>>
>>If someone can recompile the provider, please shout here, I'll
>>prepare test scripts.
>>
> I have a slight suspicion that we also need to return the assignment of
> ulColumnSize that I originally added and then we removed :-)
At least a guard should be added agains reading uninitialized memory
with this assignment. It's perfectly legal to call GetColumnInfo
before fetching any data via GetData, and ADO does this.
And, hell, this is exactly the thing I did in my patch :)
> I would also like to verify that GetStatus is only called once (and if
> not, to cache the results).
Yesterday I thought about it too and grepped the word "Status" in the
sources. Seems it's really called once, in PostConstruct.
>>>+ // Find out the maximal returned length
>>>+ // XXX We WILL have to rethink this strategy once we introduce
>>>asynchronious operation
>>
>>BTW, what will cause GetStatus to change with the presense of
>>asynchronous operations?
>>
>>
> At the moment, we need the entire result set the first time we call
> GetStatus. It's not a problem, because the entire result set is already
> in the client's buffers by the time "PQexec" returns. This is also the
> reason that I doubt we will add much time to the query execution time.
>
> If we introduce asynchronous operation, that changes. The first call to
> GetStatus does not automatically have all the results in. This means
> having a bytea column in the result set will cause us to force wait for
> the entire set, thus losing the advantage we gained from going async.
Then we will have to remove the loop, and the code will not differ
from the one with my patch :)
--
WBR, Victor V. Snezhko
E-mail: snezhko at indorsoft.ru
More information about the Oledb-devel
mailing list