Flask WSGI

Flask WSGI

No, it ain't WYSIWYG...

App Lifecycles

I heard many times the phrase that 'a Flask app instance's lifecycle is typically only one request' and received it as a simple truth. For serving web pages that's great, but for some other uses (and designs), it may be a problem.

In particular, I was worried about the implications of this for an app with a model that's performing online learning: if every request is simply an update, and the model is supposed to live across these updates, how will that work? Where should it be stored?

So what's the deal with the 'one request' lifecycle? Is that always true? What makes that true?

Long-running Flask App

In working on my voweler project, I expected this to be a problem. The app runs a perceptron that gets an update with each request.

Ripping off mattupstate's overholt project, I was running the app in a single Python process via a module like this:

from werkzeug.serving import run_simple
from werkzeug.wsgi import DispatcherMiddleware

import voweler

application = DispatcherMiddleware(voweler.create_app())

if __name__ == "__main__":
    run_simple(
        '0.0.0.0', 5000, application,
        use_reloader=True, use_debugger=True)

Lo and behold, there was no issue with cross-request persistence! Re-reading the code here, that actually makes perfect sense: I'm invoking this program once, it calls create_app exactly once, and thus only one app instance is ever handling requests.

So what was I missing?

Serving at the will of the (UWSGI) Emperor

The fundamental problem: the statement about Flask app lifecycle's isn't actually about Flask, it's about whatever's dispatching requests to Flask.

My first experience dealing with "dispatching" was in making a little PHP API at my first job after college. We had an ubuntu server in the office and set up nginx and PHP to provide access to our "street address disambiguation service". I recall the phrase "PHP-FPM", which at the time I thought was about "forwarding" but upon googling is in fact "FastCGI Process Manager"! Wow! This GI Gateway Interface thing is back, and so are "fast processes", ie, processes with a lifecycle of one request!

So there's a pattern where web servers send interesting requests to something that can create short-lived processes written in a language like PHP or Python. In Flask land, a common answer to this appears to be UWSGI.

With UWSGI, we could again (like with the PHP-FPM example) use one nginx webserver, pass requests for non-static resources along to UWSGI, and have UWSGI spawn short-lived Flask apps.

Okay, now it's broken

Now that I know what's responsible for the short lifecycles, I'll run that and see if I can break my app.

Following the UWSGI quickstart guide, I was able to do a simple:

uwsgi --http :9090  --wsgi-file wsgi.py  --processes 4 --threads 2 --stats 127.0.0.1:9191

Using uwsgitop, I was quickly able to see that requests were being spread across the workers (though there was a clear winner handling the majority of requests, which is a mystery for another day).

My favorite application-specific way of detecting a problem involved exposing the perceptron to a data point that it already had right. Working properly, the app should do nothing: no error, no update. However, because the app instance that properly classified 'p' as a consonant wasn't the app instance that received the request to "consider the consonant 'p'", we saw something happen. In fact, even if both had classified 'p' properly, we'd likely see a visual change because the weight vectors of the two perceptrons would be different.

Can we fix it?

I discussed my problem with Michael Cordover and we talked about a few solutions:

Abusing UWSGI SharedArea

Despite the abundant options within proper datastores and shared caches (which is really what we're dealing with) I wanted to have the "fun" of getting closer to managing the memory myself.

UWSGI offers a shared memory feature called SharedArea which offers a Python interface to read/write values from the same memory address within the many short-lived app instances it creates to handle requests.

I definitely appreciated the safeguards UWSGI offers for doing something like this. Via proper locking/unlocking in the read/write methods, I wouldn't have to worry about building my own system of semaphores or something.

But while UWSGI sharedarea itself is thread safe, that doesn't mean your (my) code was.

Initially I had something like this:

def set_state(value):
    """
    Serializes object containing numpy arrays
    writes null bytes across sharedarea before writing real values
    """
    clear_sharedarea()  # function to write null bytes across memory
    json_string = json.dumps(_prepare_numpy_object(value))
    memory_length = len(uwsgi.sharedarea_memoryview(0))
    uwsgi.sharedarea_write(0, 0, json_string)

While this worked fine if I kept the requests to a calm manual clicking pace, when I upped the concurrency and used my 'send a bunch of requests' buttons, I got into bad states. Sometimes an app would read the memory and find nothing, sometimes it would find invalid JSON, typically with an error at the end. Seeing a string terminating with 29095]}5839]} seemed like good evidence that the memory clearing was not quite on target.

The epiphany came when I compared it to database client code: I'm wiping and writing in two different transactions, and there's no reason someone couldn't be reading (or writing) between the two! Thread safety was guaranteed within one uwsgi.sharedarea_* function call, but of course not between them!

So I needed my clearing and writing to be the same function. Here's what I ended up with:

def set_state(value):
    """
    Serializes object containing numpy arrays
    writes null bytes for remaining space in sharedarea
    """
    json_string = json.dumps(_prepare_numpy_object(value))
    memory_length = len(uwsgi.sharedarea_memoryview(0))
    uwsgi.sharedarea_write(0, 0, (
        json_string + (memory_length - len(json_string)) * b'\x00')
    ) 

This ends up working fine, though there's really no good reason to do this instead of using a "real" cache. I'll also note the memory_length should probably be measured once or read out of the application config (since the area is only allocated once on container startup) rather than each time we write a value. Oh well, this was a mere exercise :)

The result is on github on a multiapp branch!