sudoscientist.com

is this tube on?

  • Home
  • About me
  • Installing Ruby Enterprise Edition on OSX Lion (or: I die a little each time I use Ruby)

    • 10 Mar 2012
    • 0 Responses
    •  views
    • Edit
    • Delete
    • Tags
    • Autopost

    And among the dreams of the days that were
        I find my lost youth again.
            And the strange and beautiful song,
            The groves are repeating it still:
            'A boy's will is the wind's will,
    And the thoughts of youth are long, long thoughts.'

    -- Henry Wadsworth Longfellow, My Lost Youth

    Tonight I surrendered another substantial chunk of my life to the hell that is installing Ruby on Rails on Mac OSX Lion.

    This isn't the first time I've had to waste hours of my life on an epic, yak-shaving expedition in the name of working with Rails.  In fact, it seems that every time I want to do a project with Rails, I'm forced by some gem, package or dependency to re-plumb substantial portions of my system's core infrastructure.

    Even if I didn't care about the time lost to this kind of garbage, I very much do care about having to override compilers, replace core system libraries, or otherwise monkey around with parts of my system that a web framework shouldn't have to touch.  It's a total mess.  I don't know what the solution is, but I know that I never have to do this kind of crap with Python or C++.

    ...but I'll leave the ranting for another day.  In the interest of saving someone else the time I had to spend tonight, here's a process that will work for getting Ruby Enterprise Edition (aka "REE") working on a fairly squeaky-clean Mac OSX Lion installation, using RVM and Homebrew:

    Problem 1:  REE doesn't build with Lion's installed version of GCC

    The first problem you run into, if you just try the naive RVM installation approach on Lion (rvm install ree), is that Apple's choice of installing only an LLVM-based version of gcc in XCode 4.3 breaks the build of many software packages (including Ruby Enterprise Edition).  There has been much gnashing of teeth about this choice.  Personally, I say a "plague on both your houses" -- there are plenty of packages that compile cleanly on Lion (including, as of a few days ago, the reference Ruby interpreter itself), so it clearly isn't impossible to make REE work, but at the same time, Apple could also be a bit less difficult about using bleeding-edge compiler technology.  That said, I'm absolutely not going to replace my system's default compiler just so that I can get a non-standard version of Ruby installed on my dev machine. (That's non-negotiable, in my opinion, but it was the Ruby community's standard response to this problem for months.  Grumble.)

    Fortunately, there's a solution:  it's possible to use to install gcc-4.2 without having to resort to replacing XCode.  If you don't have Homebrew installed or if you don't trust someone else to build your compilers, you can compile gcc by hand, using Apple's own source tarballs.  But if you have Homebrew, you can also use homebrew-alt to get the necessary non-LLVM gcc binaries.  (Why can't you get a version of gcc directly from Homebrew?  Good question.  As far as I can tell, the answer involves even more "opinionated" development.  But I digress.)  The relevant command is:

    brew install autoconf automake https://raw.github.com/adamv/homebrew-alt/master/duplicates/apple-gcc42.rb

    This will install the autotools (which are necesary for building later packages), as well as a non-LLVM version of GCC 4.2 (if you're using a standard Homebrew install, this compiler will be located in /usr/local/bin/gcc-4.2 after installation.)  Once we have this, we're theoretically capable of building REE...just not with RVM.  We have another problem:

    Problem 2:  RVM gets confuzzled about the right GCC to use

    Fortunately, this problem is relatively easy to solve.  First, be sure to clean up any previous failed attempts at installing REE:

    rvm remove ree

    Next, update your RVM, to be sure that it has awareness of the peculiarities of installations on Xcode 4.2 and gcc-4.2 systems:

    rvm update

    Next, force RVM to use your shiny new copy of gcc-4.2 by overriding the CC environment variable:

    export CC=/usr/local/bin/gcc-4.2

    Finally, build REE using RVM:

    rvm install ree

    This should grind away for a little while, but ultimately succeed. If you get patch failures or other bizarre build errors, you've almost certainly set your $CC to the wrong value. Ensure that it's pointing at the version of gcc that you just installed (one easy way to do this is to run $CC, and check that the output is 'i686-apple-darwin11-gcc-4.2.1: no input files'.  If, instead, the output has a bunch of garbage about XCode-this and Apple-that, and references LLVM, you're still using the Xcode compiler.)

    Hopefully this post saves you some precious hours, and allows you to just Get Shit Done, without having to become a pawn in the ongoing battle between Apple and the Opinionated Developers, neither of whom are really optimizing for your time.

  • The interwebs are made of science!

    • 17 Jan 2012
    • 0 Responses
    •  views
    • Edit
    • Delete
    • Tags
    • Autopost

    Right after people find out that I spent a lot of time and energy getting a PhD (...in science!) before becoming yet another bit-scraper in the tubes, one of the first reactions that I get is a question of (f)utility:  folks want to know why anyone in their right mind would spend so long learning about the mysteries of the universe, only to abandon the endeavor for a world where 12-year-olds with twitter accounts can claim to be "social media experts".  And let's be honest:  most web development is inane.  CRUD apps are a solved problem, Rails is a ghetto, and Javascript is about as interesting a technological development as a new method of doodling.  Compared to even the dullest scientific research program, the internet is the slow kid in class.

    So yes, it's a reasonable question to ask.  And aside from the stock responses, my real, honest-to-goodness answer can come as a bit of a shock to people who don't know anything about science:  web development has a lot in common with scientific research.  At least, good web development does, because if you're doing it right, you're constantly forming and validating hypotheses.

    So, if we know that most web development is bad, how do you know if you're a good web developer?  Good question.  Here's a simple, four-question test to help you determine your current status as a Competent Web Developer:

    1. Why did you make your last change?
    2. Did it improve something?
    3. How do you know?
    4. Are you sure?

    Notice that there's nothing in this test about Ruby, or Python, or the latest-and-greatest not-javascript, but-oh-wait-it-actually-is-javascript language flavor of the moment.  I assume that you're already a programmer if you're reading this blog.  And if you're not, well...you're not a competent web developer.  Ok?

    No, this test is about being a web developer, not just a programmer.  If you're doing good web work, you can almost always answer these questions -- sometimes you can get away with missing one or two, but mostly you'll be answering all four.  And if you're regularly making changes and checking in code without having answers to at least three of these questions, you're flailing.  You might as well just hire one of those social media experts and call it a day.  But if you really are answering these four questions?  Well, guess what, genius:  you're doing science.

    This post started out as a much-longer diatribe about the abysmal state of web development, but I've decided to break it into a series of posts, diving into each of these questions in greater detail.  And hopefully by the end you'll have a better idea of why science -- lab coats and all -- is a better long-term role-model for web development than those opinionated, ironic-t-shirted hipsters whom you see gathered at your local UX designer meetup.

  • MapReduce is not the answer.

    • 28 Apr 2011
    • 0 Responses
    •  views
    • Edit
    • Delete
    • Tags
    • Autopost

    Dear Eager Software Engineer Applicant Person,

    Let's talk "big data".  You've made it clear that you're enthusiastic about machine learning and data mining, and that's great.  Excitement is good.  You're smart and ambitious, and you're eager to get into the hardcore, algorithmic stuff.  You've read and re-read the famous Google papers, and you've spent weeks studying your copy of Knuth.  You're totally ready for this interview.  Awesome.

    Let me give you a helpful piece of interview advice:  MapReduce is not the answer.  Really.

    You'd be surprised how easy it is to screw this up.  I walk into the interview room, we shoot the shit for a moment or two, and then I toss you a softball, just to get you warmed up.  "Write a function to calculate the length of a string,"  I'll say, or perhaps "write some code to find all of the duplicates in a list of numbers."  And then you'll look at me, all bright-eyed and bushy-tailed and eager to please, and you'll say "well...let's see...there's CLEARLY an easy way to do it, but it won't work AT ALL if the data is too big to fit on a single computer...."

    And then I will erupt into a fit of spastic eye-rolling, and start doodling pictures of you as a clown on my Top Secret Interview Evaluation Sheet.

    See, dear candidate, when I walk into the interview room and ask you do an Absurdly Trivial Problem, I'm actually doing a few basic things:  first, I'm trying to put you at ease, and let you start off your interview on a high note.  Second, I'm testing your knowledge of extremely basic computer science concepts.  Finally, I'm making sure that you aren't a Big Data Dork.  Because, sad as it is, you wouldn't be the first interviewee to try to tell me that the best way to calculate the length of a string is to use Hadoop.  I have a lot of clown doodles on my interview sheets.

    So here's the deal, yo:  if you remember nothing else from this little rant, remember that when you're going into an interview, your one and only job is to convince the interviewer that you're an agreeable person who can get stuff done.  That's it. You'll have plenty of time to demonstrate your amazing "big data" chops when you have the job. Until then, the rule is always to keep it simple, stupid.  When you hear a problem that might have a simple answer, give the simple answer first!  Believe me, there's no harm in it, and you'll go a long way toward convincing your interviewer that you're not that guy -- the one who's going to spend weeks replacing the relational database with a javascript-backed, erlang-powered nosql monstrosity, when you should be fixing the damned website.

    Got it?  Great.  Go get 'em, tiger.

    Sincerely,
    Your Interviewer

  • It *never* made sense to learn Java.

    • 14 Jan 2011
    • 23 Responses
    •  views
    • Edit
    • Delete
    • Tags
    • Autopost

    Another HN thread that got me all worked up:  "should I still learn Java?"

    Frankly, the question falls apart at the premise:  should anyone have ever learned Java?  Because as far as I'm concerned, any program that could be well-written in Java would be better written in C++.  And if you've got a project that demands more efficiency than a high-level language like Python, Ruby, Perl, etc. can deliver, then moving to Java is like buying a tractor to get to work in the morning because it's faster than walking.  You're Doing It Wrong.

    Don't misunderstand:  if you've got a specific passion for writing Java, or you're working on a project that requires Java for some legacy reason, then fine, go learn Java.  The problem comes when people start to confuse "learning Java" with "learning any C-like language".  Because they're not the same.  And it's a common confusion.  If you go to the HN thread, you'll see tons of variations on this sort of nonsense:

    "ho ho ho...I suppose it's worthwhile to learn Java, just so that you have some experience with object-oriented programming, but it's not LISP or anything.  And hey...at least it's not C++. *armfold*

    These people are language bigots.  Do not listen to them.  I'm not saying that you should avoid Java because I think C++ is the best language in the world, and should be used for everything.  Far from it.  But I sincerely believe that for all reasons that you'd want to use Java, C++ is a better long-term choice.  It's like comparing Duplo blocks to Lego...one is dramatically better than the other for building anything of complexity (and the other one is great for people who are prone to poking themselves in the eye with sharp corners!)

    There's not a task that can be done in Java that can't be done faster, smaller and generally better in C++.  It supports higher-level memory management and garbage collection (just like Java!), but also gives you fine-grained control over memory management when you need it.  It lets you choose freely between functional, object-oriented and procedural programming tools.  It has (admittedly ugly) type-safe generics that compile to blazingly fast runtime code.  Java has...the same syntax (sigh).  Feature for feature, there's essentially nothing in Java that isn't done better in C++...except for (perhaps) the garbage collector.  To get that in C++, you've got to download a library.  Sad.

    Java is just not a reasonable choice for systems programming (which makes it all the more insane that there are certain "industrial-strength" libraries written in the language).  It consumes more memory than C++ for equivalent tasks, is slower to run (well-written Java is just not as fast as well-written C++ code), and because Java has become a bit of a sink for bad programmers, Java projects tend to be encrusted with all sorts of unnecessary XML thingamabobs and Factory-Factory-Factory-Singleton hoohahs.  Like barnacles on some kind of ancient fish, these ugly accretions inevitably come along with the beast.

    Finally, learning C++ gets you everything that you would have gained from learning Java, plus you'll be able to write efficient, compiled code that works in any language.  For better or worse, every major programming environment can efficiently utilize C (and C++) code.  That's just not true of Java.

    If you have a choice between spending N hours of your time learning C++ and the same amount of time learning Java, there is absolutely no doubt:  you should learn C++ first. Java is an evolutionary dead-end in language design that just happens to be swimming around today.  Let it swim past.
  • ...you can often save hours in the library.

    • 13 Dec 2010
    • 0 Responses
    •  views
    • Edit
    • Delete
    • Tags
    • Autopost
    "With a few weeks of hard work, creativity and dedication in the lab, you can often save hours in the library."

    This research chestnut popped up in a non-research context this weekend, and it was one of those moments that made me realize that grad school wasn't completely worthless:  if nothing else, grad school in the sciences tends to teach you the value of slowing down as a productivity tool.  Ten minutes of library time can have far more power than ten hours of "doing" stuff quickly.

    For those who don't want to read the original blog post, the core observation is pretty simple:  smart people are prone to doing curiously stupid things, as long as they're grooving on the little adrenaline rush that cleverness can bring.   It's basically the nerd Achilles' heel: on average we're a smart and productive bunch, but without guidance and management most of that creative energy gets shunted into tremendously useless acts. 

    In the case of grad students, this weakness leads to the tendency to re-discover ancient results through huge feats of technical prowess, when a few hours of library time would save the effort.  But in software, this phenomenon usually manifests itself in the pattern wherein coders re-invent the wheel in hurculean, caffeine-powered hack-a-thons, when ten minutes of reading would reveal better solutions.  Most grad students learn this lesson quickly, or they don't make it out of grad school.  But programmers?  They can seem remarkably resistant to good advice.

    Personally, I think it's a cop-out to say that this behavior is a consequence of laziness -- it's pretty clear that anyone willing to do hours of work to avoid reading documentation isn't exactly lazy -- when it's a lot closer to the truth that this is just a simple lack of discipline.  Developers don't like to read or write comments, so they don't. And the rest of us suffer.

    Now, I'm not trying to sound like your dad or your high-school principal, but one of the huge problems in our industry is that technologically, the n00bs run the show.  They're all bright-eyed and eager, and the energy of a thousand caffeinated twenty-somethings is akin to a flamethrower when pointed at technical challenges.  It's great.  But try to make a nerdling write good documentation, and you'll probably find out what it's like to be on the other end of that flame.  The people most responsible for documenting a system are the ones least likely to do it.  And they tend to justify their own resistance to writing documents (or reading them!) by complaining about what a bad job they do.

    It's clearly not impossible to do a good job of documenting code (the Apple API docs are incredibly well-written and comprehensive), it's just that most coders don't put in the effort.  And unfortunately, because the people most qualified to pass judgment on code documentation are the people who write code, there tends not to be a systemic incentive for new developers to learn the value of the kind of discipline that grad students acquire as a matter of course. 

    A developer can crank out undocumented piles of crap code for year after year, and still get promoted to tech lead.  Do that kind of thing in a laboratory at a pharmaceutical company, and you'll be lucky if you aren't fired on your first day on the job.

    Ultimately, the solution has to be cultural:  documentation standards have to be enforced from the top, in a meaningful way.  It isn't enough to say that documentation is important -- if you're the boss, you've got to promote the people who produce the most literate code.  And likewise, stop rewarding the people who do a thousand "clever" things, and document none of them.  Code isn't just a machine, it's institutional intelligence.  Intelligence has no value unless it's communicated properly.

  • About

    A scientific refugee, now living in the world of startups, software and sundry silly social-media show-offs. I used to work at Justin.tv, and now I work on search and data-mining at Yelp. But remember: these are my opinions, not theirs.

    27319 Views
  • Archive

    • 2012 (5)
      • March (3)
      • January (2)
    • 2011 (4)
      • April (2)
      • January (2)
    • 2010 (1)
      • December (1)

    Get Updates

    Subscribe via RSS