Skip to content

TST: Add failing test for HAVING feature request#1458

Open
sandhujasmine wants to merge 4 commits intoblaze:masterfrom
sandhujasmine:having-clause-xfail
Open

TST: Add failing test for HAVING feature request#1458
sandhujasmine wants to merge 4 commits intoblaze:masterfrom
sandhujasmine:having-clause-xfail

Conversation

@sandhujasmine
Copy link
Copy Markdown

Add failing test and mark as xfail for requested feature in GH 1457.
#1457

Test can be run using odo docker provisioning

Add failing test and mark as xfail for requested feature in GH 1457.
ds = discover(big_sql)
nn = symbol('nn', ds)
g1 = by(nn['A'], quant=nn.B.sum())
g1_res = odo(compute(g1, big_sql), list)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the odo call and convert this to compute(g1, big_sql, return_type=list) instead, for consistency?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit.

pytest.skip(str(e))
else:
t = odo(zip(list('a'*100), list(range(100))), t)
t = odo(zip(list("".join(['a'*25, 'b'*25, 'c'*25, 'd'*25])),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: can we put spaces around binary operators, without them expressions are very crowded

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest push.

Jasmine Sandhu added 2 commits March 24, 2016 14:02
GH 1458: add more tests for testing an expression involving WHERE,
HAVING and also multiple conditions on predicate. Move data to
pandas dataframe and assert blaze results against it.

Fix usage of compute() so it is consistent with API update and other
tests.

Add spaces around operator for readability.
g1_res = compute(g1, big_sql, return_type=pd.DataFrame)
assert len(g1_res) == 4

expr = g1[(g1.quant < 100) & (g1.quant > 50)]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this part of the expression expr is not being correctly computed for Python 2.7 .. will setup a 2.7 environment and try to repro this error.

Looking at the error log - the resulting bz_res did not filter out the value greater than 100

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a Python version issue. The SQL compiled is incorrect - still working on #1457

Failing tests for GH 1457
Test for HAVING and WHERE clause failing at present. Data is not randomly
generated so we get failure when compiled expression contains both clauses.
@sandhujasmine
Copy link
Copy Markdown
Author

HAVING tests were previously using randomly generated data - updated data so it is fixed and updated tests accordingly. test_having_where() should now consistently fail since the compiled SQL for this is incorrect - still working on this in #1457

@sandhujasmine
Copy link
Copy Markdown
Author

Here's how to reproduce the incorrect SQL for the two conditionals on group_by expression.

from blaze import by, compute, discover, symbol, data, odo, drop
import pandas as pd

big_sql = data('postgresql://[email protected]/test::tbl0', dshape='var * {A: string, B: int64}')
drop(big_sql)

big_sql = odo(zip("".join(['a' * 25, 'b' * 25, 'c' * 25, 'd' * 25]), range(1, 101)), big_sql)
ds = discover(big_sql)

nn = symbol('nn', ds)

s1 = nn[nn.B > 5]

for data_ in (nn, s1):
    g1 = by(data_['A'], quant=data_.B.sum())

    expr = g1[(g1.quant > 325) & (g1.quant < 2000)]
    bz_res = compute(expr, big_sql)
    print('SQL for ', data_, ': \n')
    print(bz_res)
    print('\nResult for g1[(g1.quant > 325) & (g1.quant < 2000)]:')
    print(compute(expr, big_sql, return_type=pd.DataFrame))
    print('==========\n')

@kwmsmith kwmsmith modified the milestones: 0.11, 0.10 Apr 15, 2016
@kwmsmith kwmsmith modified the milestones: 0.11, 0.11.1 Jul 19, 2016
@postelrich postelrich closed this Feb 13, 2018
@postelrich postelrich reopened this Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants