Mirah Office Hours: Digging In

January 18th, 2012

Mirah -v Frankly embarrassing

This week I wanted to deal with one embarrassing bug, and some that looked fun. After looking through everything last week, I felt like I got a better sense about the current issues. So, I picked a few issues to work on that I thought I could dig into and get something working in an afternoon. I think I was a little ambitious.

First I wanted to fix #161 mirah -v causing an error because, frankly it’s embarrassing. I also thought it’d be pretty easy. It was, but the fix wasn’t as clean as I’d like.

Next time I’m in the option handling code, I’m going to do some house cleaning–it’s getting a bit ugly. For instance, parser should not take a list of commandline args and have to worry about ‘-e’ etc. How to reorganize it I haven’t decided. Maybe using optparse, though that has its own idiosyncrasies.

#13 booleans don’t have an == method.

I fixed this by adding an intrinsic to the jvm backend code. It’s not especially pretty. The intrinsics code is tied up in a few files and some of them feel like balls of mud. It’s not readily apparent where to find things and what they do.

I’ve been thinking about how to make intrinsics nicer and more consistent. It might be nice to have some common types with a common set of expected method definitions that the various (hypothetical) backends should implement. That way you’d have some consistency across different Mirah backends.

I’d also like to reorganize the intrinsics and make their internal APIs easier to grok. Things I’d like to do with them, like allowing easy method aliasing are hard right now, because the API has a lot of sharp edge cases.

Test cleanup

After fixing #13, I renamed all the test files. I was getting frustrated w/ having them all be prefixed with test_. That made tabbing into the right test file take a couple more keystrokes. It added just enough friction that I wanted to change it. So I did.

#30 Const Assign

This one is definitely not a single Sunday afternoon project. And, honestly I didn’t expect it to be. I spent about an hour trying to figure out what would need to change to support creating constants. It looks kind of annoying.

First, you need to transform the mmeta AST nodes into a Mirah AST node for the Const Assign, which could be as simple as creating a static FieldAssign. Then you need to change the code generation to deal with that. Unfortunately, FieldAssign’s don’t currently know about access levels (public, private, protected) which means you’d either have to add access levels to field assign, or create a new AST class that would have to be dealt with in the code generation phase.

Thinking about it a little, it might be nicer to do the second thing. Then, different (hypothetical) backends could handle access levels for constants their own way. That might be handy, particularly if constants are special in different ways in different languages.

#41 i++

The other thing I looked at briefly was adding ++ to the grammar. This turned out to be rather hard looking because the parser’s master branch is tied to mirah’s newast branch, which has a lot of new things in it and doesn’t work with the current release yet.

I’m debating creating a new branch on the parser at a point before it started using the new AST. On the other hand, it might make sense to spend more time trying to update the newast branch so that it is up to date with master. Then I could get it closer to merging back in.

That’s all for this week.

Mirah Office Hours: Back After the Holidays

January 10th, 2012

This week I decided to go and read all of Mirah’s open issues starting with the earliest submitted ones. I’d been spending so much time just looking at the top of the list. I didn’t have a good sense for which ones were duplicates, which were pretty undefined and which were easy. Thankfully, Mirah doesn’t have that many issues, so attempting to read through them all wasn’t a ridiculous undertaking. There were only 64, and I had filed about a half dozen of them myself.

In the end, I was able to close 7, and get a good sense of where a number of the other ones stood in terms of how much work it’ll take to fix them. Of course there were a few that made me confused, where I don’t have any idea where to start digging to fix them.

After going through the stack of issues, I have a better sense of what I want to try to tackle next week. Here’s a few that I think look fun.

#41 – adding ++ to the language.

++ isn’t in the grammar yet, so this would require learning more about the parser. And, I’d get to play around with the grammar, which is always fun, and something I haven’t done much of since graduating.

#127 & #42 – working out the semantics of equality.

Currently Mirah uses Java’s ==, which checks identity not equality. We’d like to use Java’s equals as our ==, but I’ve had a little trouble getting it working properly. Still, this is a fun one to hack on.

#45 – an issue with how field assignments check typing

Namely, they don’t really. It isn’t completely straight forward but I think I’ve nailed down where to make the changes.

#57 – who is self in a block

This looks fun. I’m not quite sure how to make it work. The problem is that you want blocks to work like they do in Ruby, where self is the owner of the outer scope. Currently, Mirah’s bindings don’t do that, so self is the instance of the closure. Looking at this got me thinking about non local return(NLR). There was an interesting discussion about it on Twitter today with @evanphx saying

NLR in closure comment by @evanphx

I think it’d be really interesting to try to add NLR to Mirah’s blocks. It might be crazy, but it could also be awesome.

#69 – Mirah doesn’t check blocks signature against the signature of the method they’re supposed to be implementing.

Madness, but madness I think I can fix.

#74 Constants!

Mirah doesn’t let you create constants, other than classes and interfaces. I’d like to change that. And, the error tells you what’s missing. Now, all I’ve got to do is figure out how to implement it.

Tune in next week…

And that’s what I’m looking at doing next Sunday. Or some of it anyway. There’s a bunch of trickiness out there.

Upgrading from Rails 2.3.8 to 2.3.14: find_or_create_by on associations changed

January 9th, 2012

Update: I’m not the first one to run into this: lighthouse ticket find_or_create-via-has_many-fails-for-hash-parameters and rails/rails#207.
Between 2.3.8 and 2.3.9 a change was added to ActiveRecord::Associations::AssociationCollection#method_missing that caused find_or_create_by_* to no longer accept hashes as arguments. An app I was upgrading from 2.3.8 ran into this as it passed hashes to a find_or_create_by_* call.

The symptom was that foobar in find_or_create_by_foobar became a serialized hash when saved to the database, instead of the string I was expecting.

The reason it was changed was to ensure that the caches on the collection were updated properly (lighthouse 1108) & (commit fad166c1), which is pretty important, but the fix didn’t take into account dealing w/ two of the cases that find_or_create_by_* accepts when called on an ActiveRecord class:

  • post.comments.find_or_create_by_body :body => 'bar', :type => 'baz'
  • post.comments.find_or_create_by_body 'bar', :type => 'baz'

The new behavior looked like this:

I wrote a patch for it (github.com/rails/rails/pull/4331). Then I found out that 2.3 is only accepting security patches, so I closed it.

If you run into this and you want to use my patch, refer to gist 1586708 which has a monkey patchified version of it. If you want to work around the bug, you can use the optional block to set the attributes you were passing in the hash.

Test ActiveRecord with Reset Transactions Without Rails

January 9th, 2012

I found myself wanting to use Rails’ test transaction functionality in a Rails-less environment, and couldn’t find a tutorial about it. So I dug into the Rails source to figure out how it worked–just enough to pull it out into a gist. So, if you add this to your test_helper.rb or equivalent, you too can have your ActiveRecord tests wrapped in transactions that rollback after each case. It also lets you use fixtures, but who does that?


American Censorship Day

November 15th, 2011

SOPA is getting hearings today (Nov 16th). So I added the censorship banner for the day.

http://americancensorship.org/

Mirah Office Hours

November 8th, 2011

This week I looked at doing views in Shatner again. I had thought of a new approach to get around def_edb‘s issues with unquotes. That quickly devolved into an exploration of a part of the inference engine I hadn’t played with yet.

I’d been looking for a good way to share state between views and the ShatnerApp. Initially, my plan was to just have the view be a method on the Shatner app, just like how Dubious currently does it. But, that’s pretty pedestrian and totally wouldn’t cause things to blow up. Blowing things up in Mirah is one of the reasons I work on Shatner.

I thought maybe I could build view objects as closures. That way they’d have references to all the variables defined in the outer scope, but they could still be separate classes and have separate methods ala Rails’ view helpers.

And maybe it’d be easier to build a macro that just modifies the code in place to implement something like (gist):

so that it becomes (gist):

Of course, that didn’t work.

What I discovered was that Mirah currently doesn’t support closures being defined in closures (issue #155). They don’t work because they can’t figure out what the outer scope is.

When a closure is created in Mirah, it begins life as a Mirah::AST::Block. However, it doesn’t stay just a block for long. After the parse phase is over, it is transformed.

The typer figures out what method is being called with the block, and figures out what the block’s interface is. It generates AST nodes for a class that implements that interface and dumps the class into the tree of the outer class. It then adds AST nodes at the call site for creating a new instance of that freshly minted closure class and carries on typing.

The problem is that the AST nodes inside the block still think their parent is the block. This is only a problem when you have another block inside a block because a Mirah::AST::Block doesn’t have a class associated with it, so the closure generated inside the closure doesn’t know what contains it.

I still haven’t quite figured out how to resolve this. I feel like there’s probably some AST munging needed to get it to work properly, but I’m not sure what operations need to be taken.

It’s a very interesting problem though.

The other thing I looked at was getting == to use equals. Earlier, I’d tried copying String#+ doing something like (gist):

Which worked for ==, but not for !=, because I couldn’t figure out how to negate the call.

So, I tried the method used by kind_of?, which creates some AST nodes and shoves them back into the tree. That worked for the Java source generator, but not the Bytecode generator. Which was kind of odd.

Clearly I don’t know enough about how the bytecode generator works. Maybe I’ll look into that some next week.

‘Til next week everybody.

Mirah Office Hours

October 25th, 2011

This week I decided to focus just on bugs. I picked a few that looked interesting and set about investigating.

I decided to look at #108, #150, #151, #68 and #52.

108 I realized, I hadn’t thought enough about. It’s really a design thing. Do you want to require that your closures match signatures with the methods the implement? What if you don’t care about the arguments because everything you want to do has to do with the closed scope and not with what you’re passed? I could see it both ways. On the one hand, requiring a matching signature ensures all the types and whatnot are correct. On the other hand, if you aren’t using the passed arguments, they are just boilerplate code.

I think we should allow closures to have no arguments where the method they are implementing has them for niceness. But it should be an all or nothing thing. Either no parameters, or parameters matching the signature of the method you expect to be implementing.

#52

I spent most of my time on this one. It seemed straight forward at first, but had a number of tricky bits that took some working out. The problem was that if you had a begin rescue end block, and treated it as an expression if the result types of the block itself and the rescue clauses differed, Mirah wouldn’t raise an exception, but the JVM would. For example (gist):

foo either is an int or a String, which would be fine in Ruby–but not in Mirah because Mirah has static typing.

So, I set about fixing it. The first thing was to define the valid behavior. What I came up with was this:

  • if a begin rescue block is treated as an expression, check the compatibility of its result types
  • if it’s not an expression, don’t check the result types

The second one is nicer because it lets you not care when you don’t want to care. For example (gist):

If we checked result types in both expressions and statements, the above would fail to compile, which would suck.

I started by writing a couple tests of the expected behavior. I wanted to get an inference error when I used the rescue as an expression, but not as a statement.

Then I went looking for other, similar pieces of functionality that I could use for a guide. I found that if elses also follow this pattern, so I wanted to look at how If AST nodes were inferred, trying to get a handle on how I would change rescues. I grepped through lib/ for If’s infer method and found it in lib/mirah/ast/flow.rb. What it does is get the if body’s result type, and the else’s result type and check to see if they are compatible (gist).

Rescue inference was a little more complicated than If because rescue’s have a lot more pieces. First, you have the body between the begin and the first rescue. Then, there can be multiple rescue clauses, each handling a different set of exceptions. Finally, if there’s an else, it runs after the body if there is no exception. There’s also ensure which is run after, but doesn’t return anything, so it shouldn’t affect result type comparisons (need to write a test for that …).

So, the types that need to be compatible are the that of the body if there isn’t an else or the else’s and all the rescue clauses. Kind of complicated. But pretty doable. I modified the Rescue node’s infer method to test compatibility of the body or the else and all the clauses, which worked for the tests I wrote but broke other tests.

It broke other tests because of how deferred inference works. When inference is reattempted, the typer assumes that the deferred node was evaluated as an expression, which breaks when it is a Rescue because its inference depends on how it was used.

So as a work around, I added an ivar to the Rescue node to cache how it was initially inferred (lib/mirah/flow.rb).

It’s a hack, but it resolved the issue. I’d like to look into how to change things to make expression vs statementness be something defined in the AST instead of decided at inference time, or make it more explicit that the expression-ness is determined by the parent node during inference.

Another BugMash This Saturday

October 24th, 2011

It’s halloween weekend and I’m going to mashing some bugs. Rails has got 557 issues open and we’re going to take a crack at ‘em.

If you’ve never contributed to Rails before, or you have a couple commits that have been accepted, we’d love to have you.

If you want to join me, please RSVP at http://plancast.com/p/86c6/rails-bugmash-halloween-special-edition

Mirah Office Hours: Shatner Revisited, Fix a Few Bugs

October 18th, 2011

This week I wanted to get back to my pet Sinatra clone–Shatner. Last time I tried working on it, I was trying to figure out how to get views working like they do in Ruby. I want to end up with a dsl like this(gist):

Where you can call edb :view_name just like you call erb :view_name in Sinatra, and it figures out what you want.

I tried to do this by injecting a call to the def_edb macro into the class’s AST node. That didn’t work. Because def_edb was getting expanded before the outer macro, instead of passing it a proper AST node representing a method name, I was passing it an Unquote, which it didn’t know what to do with.

This looked like a rather difficult thing to untangle, so I gave up on it for now and file a bug (#152).

I moved on to fixing some bugs.

I made a quick list of four boogs that looked promising

  • #110 ICE with an empty block
  • #108 closure generation
  • #109 incorrectly attaching macros to the wrong node type
  • #123 ICE from methods that have same name as previously defined macro

#110 ICE with empty block

Somewhere the AST node for the body of the block was being set to nil. This caused some later code to blow up when it attempted to iterate over the block’s children. I fixed it by ensuring that the body of a block is always set to some value.

I looked at some pretty interesting bits of the compiler while fixing this. Did you know that Mirah’s block support works in two ways? You can use it like you would in JRuby, and pass a block in place of an interface implementation, which is cool. And, you can also pass it a block that contains the definitions of the methods you want to implement. For example (gist)

#108 closure generation

I was able to reproduce this, and figure out roughly what’s going on, but haven’t fixed it yet. The problem is that we’re not generating the signature of the method a block passed as an interface correctly all the time. When the block has no arguments, but the method it’s supposed to implement does, Mirah generates a method with the same name but no arguments. Problem.

#109 incorrectly attaching macros to the wrong node type

I reproduced this but didn’t look into it. It’s interesting it only happens when the macro is used somewhere else in the code.

#123 ICE from methods that have same name as previously defined macro

The problem here was that a macro’s return type is InlineCode, and the MethodDefinition node didn’t know how to validate it’s own signature against a macro with the same name. I put in a condition that will cause inference to fail with an error that says the method has the same name as a macro. It’s not particularly elegant, but I don’t think it’s a bad thing for the compiler to do.

Mirah Office Hours

October 11th, 2011

This week ended up mostly being researchy. I had a bunch of plans. I wanted to look at some of the work I’d done with implicit main methods–with an eye for changing from creating them at code generation time to modifying the AST and putting them in the right class. I also wanted to dig into making == use Java’s equals() instead of Java’s ==–which compares identity for objects.

Plans

  • use AST modification for main method generation
  • == –> .equals()
  • eql? –> ==
  • refactor intrinsic files
  • research ivar inference(related #151)
  • boogs

I didn’t get to everything, but I touched a lot of it.

Main method

I had some old code that did this from the last time played around with it, so I tried starting there. It modified the JVM compiler’s define_main method to pull elements out of the AST and inject them into the main method of the class with the same name as the file. It worked as long as a class with a matching name was in the file and failed miserably when there wasn’t.

I tried different things to generate AST nodes for the ‘Script’ class that Mirah generates when there is no class of the same name as the file, but couldn’t get it to work. After taking a break and looking again I also realized that I was doing my editing in the wrong place.

The compiler is actually where the code generation happens and not where the AST is generated or transformed. It gets the AST after it has been transformed, as necessary, and after all the types have been inferred. Which makes it a bad place to try to extend Mirah’s behavior.

I think next time I’ll try to inject the main method AST transform as a plugin that adds functionality to the transformer.

equals

I poked around making == use Java’s equals method instead of using ==, which for objects compares their references to see if they point at the same thing. Getting == to work was actually pretty straight forward, I just replaced the code generation that does the java style == with a method call to equals. I didn’t push it yet because I haven’t added !=, which requires a little more knowledge about bytecode.

ivars

I wanted to see if I could figure out how to get public/protected instance variables from superclasses to be accessible from subclasses. I thought it would have something to do with look up mechanism, but I didn’t know where to start.

After looking around the various Java type lookup mechanisms I’m still a bit confused about how everything works. I think I’ll do a deeper look into it next week, focusing on this file (/lib/mirah/jvm/method_lookup.rb), which looks to be where most of the interesting type stuff happens.