简体繁体 English

如何隔离Java EE中的用户会话？

[英]How to isolate user sessions in a Java EE?

原文 2011-03-09 20:14:07 6 3 java/ session/ java-ee/ isolation

We are considering development of a mission critical application in Java EE, and one thing that really impressed me is the lack of session isolation in the platform. 我们正在考虑在Java EE中开发关键任务应用程序，并且让我印象深刻的一件事是平台中缺少会话隔离。 Let me explain the scenario. 让我解释一下这个场景。

We have a native Windows application (a complete ERP solution) that receives about 2k LoC and 50 bug-fixes per month from sparse contributors. 我们有一个原生的Windows应用程序（一个完整的ERP解决方案），每月从稀疏的贡献者那里获得大约2k LoC和50个bug修复。 It also supports scripting, so the costumer can add their own logic and we have no clue about what such logic does. 它还支持脚本，因此客户可以添加自己的逻辑，我们不知道这样的逻辑是做什么的。 Instead of using a thread pool, each server node has a broker and a process pool. 每个服务器节点都有一个代理和一个进程池，而不是使用线程池。 The broker receives a client request, enqueues it until a pooled instance is free, sends request to that instance, delivers response to client, and releases the instance back to the process pool. 代理接收客户端请求，将其排队直到池化实例空闲，向该实例发送请求，向客户端传递响应，并将实例释放回进程池。

This architecture is robust because with so many sparse contributions and custom scripting, it's not uncommon for a deployed version to have some serious bug such as an infinite loop, a long-waiting pessimistic lock, a memory corruption or memory leakage. 这种体系结构非常强大，因为有这么多稀疏贡献和自定义脚本，所以部署版本有一些严重的错误，例如无限循环，长时间等待的悲观锁定，内存损坏或内存泄漏，这种情况并不少见。 We implemented a memory limit, a timeout for requests, and a simple watchdog. 我们实现了内存限制，请求超时和简单的监视器。 Whenever some process fails to answer correctly and on time, the broker simply kills it, so the watchdog detects and starts another instance. 只要某个进程无法正确及时地回答，代理就会杀死它，因此监视程序会检测并启动另一个实例。 If a process crashes before it started to answer a request, the broker sends the same request to another pooled instance, and the user doesn't know about any failure on the server side (except in admin logs). 如果进程在开始响应请求之前崩溃，则代理会将相同的请求发送到另一个池化实例，并且用户不知道服务器端的任何故障（管理日志除外）。 This is nice because some instances are slowly trashed by bogus code as they work on requests. 这很好，因为有些实例在处理请求时会被伪代码慢慢删除。 Because most session data is held at the client or (in rare cases) at a shared storage, it seems to work perfectly. 因为大多数会话数据都保存在客户端或（在极少数情况下）共享存储中，所以它似乎完美无缺。

Now considering a move to Java EE, I couldn't find anything similar on the spec or popular application servers such as Glassfish and JBoss. 现在考虑转向Java EE，我在规范或流行的应用程序服务器上找不到类似的东西，比如Glassfish和JBoss。 Yes, I know that most cluster implementations do transparent fail-over with session replication, but we have small companies that use our system on a simple 2-node cluster (and we also have adventurers that use the system on a 1-node server). 是的，我知道大多数集群实现都使用会话复制进行透明故障转移，但是我们有一些小公司在简单的双节点集群上使用我们的系统（我们也有冒险者在单节点服务器上使用该系统）。 With a thread pool, I understand that a buggy thread can bring an entire node down, because the server cannot detect and safely kill it. 使用线程池，我理解一个错误的线程可以使整个节点关闭，因为服务器无法检测并安全地杀死它。 Bringing an entire node down is much worst than killing a single process - we have deployments where each node has about 100 pooled process instances. 将整个节点关闭比杀死单个进程要糟糕得多 - 我们有部署，其中每个节点有大约100个池化流程实例。

I know that IBM and SAP are aware of this problem, based on 我知道IBM和SAP已经意识到了这个问题

, respectively. ，分别。 But based on recent JSRs, forums and open-source tools, there isn't much activity on the community. 但基于最近的JSR，论坛和开源工具，社区上没有太多活动。

Now comes the questions! 现在问题来了！

If you have a similar scenario and use Java EE, how did you solve? 如果你有类似的场景并使用Java EE，你是如何解决的？
Do you know about an upcoming open-source product or change in Java EE spec that can address this issue? 您是否了解即将推出的开源产品或Java EE规范中可以解决此问题的更改？
Does .NET have the same problem? .NET有同样的问题吗？ Can you explain or cite references? 你能解释或引用参考文献吗？
Do you know about some modern and open platform that can address this issue and is worth the task doing ERP business logic? 您是否了解一些可以解决此问题的现代开放平台，并且值得完成ERP业务逻辑的任务？

Please, I have to ask you not tell about making more testing or any kind of QA investment, because we cannot force our costumers to make this on their own scripts. 拜托，我不得不告诉你不要做更多的测试或任何类型的QA投资，因为我们不能强迫我们的客户在他们自己的脚本上做这个。 We also have cases where urgent bug-fixes must bypass QA, and while we force the customer to accept this, we cannot make him accept that a buggy software part can affect a range of unrelated features. 我们也有紧急错误修复必须绕过质量保证的情况，虽然我们强迫客户接受这一点，但我们不能让他接受有缺陷的软件部分会影响一系列不相关的功能。 This is issue is about robust architectures, not development process. 这个问题是关于健壮的架构，而不是开发过程。

Thanks for your attention! 感谢您的关注！

3 个解决方案

What you have stumbled upon is a fundamental issue regarding the use of Java and "hostile" applications. 您偶然发现的是使用Java和“恶意”应用程序的基本问题。

It's a fundamental issue not just at the Java EE level, but at the core JVM level. 这不仅是Java EE级别的基本问题，而且是核心JVM级别的基础问题。 The typical JVMs available have all sorts of issues with loading "unsafe code". 可用的典型JVM在加载“不安全代码”时存在各种问题。 From memory leaks, class loader leaks, resource exhaustion, and unclean thread kills, the typical JVM is simply not robust enough to handle badly behaving code well in a shared environment. 从内存泄漏，类加载器泄漏，资源耗尽和不清洁的线程杀死，典型的JVM根本不够健壮，无法在共享环境中很好地处理性能不佳的代码。

A simple example is memory exhaustion of the Java heap. 一个简单的例子是Java堆的内存耗尽。 As a basic rule, NOBODY (and by nobody, I specifically mean the core java library and just about every other 3rd party library out there) catches OutOfMemory exceptions. 作为一个基本规则，NOBODY（并且没有人，我特别指的是核心java库以及几乎所有其他第三方库）捕获OutOfMemory异常。 There are the rare few who do, but even they can do little about it. 有少数人这样做，但即便他们也无能为力。 Typical code handles the exceptions they "expect" to handle, but let others fall through. 典型的代码处理他们“期望”处理的异常，但让其他人失败。 Runtime exceptions (of which OOM is one) will happily bubble up through the call stack all the way to the top, leaving behind a wreckage of unchecked critical path code, leaving all sort of things in unknown state. 运行时异常（其中OOM是其中一个）将很快通过调用堆栈一直到顶部，留下未经检查的关键路径代码的残骸，使所有类型的事物处于未知状态。

Things such as Constructors or static initializers which "can't fail" leaving behind uninitialized class members which are "never null". 诸如构造函数或静态初始化器之类的东西，“不能失败”，留下未初始化的类成员，这些成员“永不为空”。 These damaged classes simply don't know they're damaged. 这些受损的班级根本不知道他们受损了。 Nobody knows they're damaged, and there's no way to clean them up. 没有人知道它们已经损坏了，而且没有办法清理它们。 A Heap that hits OOM is an unsafe image and pretty much needs to be restarted (unless, of course, you wrote or audited ALL of the code yourself, which, naturally, you won't -- who would?). 击中OOM的堆是一个不安全的图像，几乎需要重新启动（当然，除非您自己编写或审核了所有代码，当然，您不会 - 谁会这样做？）。

Now, there may well be vendor specific JVMs which are better behaved and give you better control. 现在，可能有特定于供应商的JVM，它们表现得更好并且可以让您更好地控制。 The ones based on the Sun/Oracle JVM (ie most of them) do not. 基于Sun / Oracle JVM（即大多数）的那些没有。

So, it's not necessarily a Java EE issue, it's a JVM issue. 因此，它不一定是Java EE问题，它是JVM问题。

Hosting hostile code in the JVM is a bad idea. 在JVM中托管恶意代码是个坏主意。 The only way it's practical is if you host a scripting language, and that scripting language implements some kind of resource control. 唯一可行的方法是，如果您托管脚本语言，并且该脚本语言实现某种资源控制。 That could be done, and you can tweak the existing ones as a start (JavaScript, Groovy, JPython, JRuby). 这可以做到，你可以调整现有的作为开始（JavaScript，Groovy，JPython，JRuby）。 The fact that these languages give users direct access to Java libraries makes them potentially dangerous, so you may have to restrict that as well to only aspects wrapped by script handlers. 这些语言允许用户直接访问Java库这一事实使它们具有潜在的危险性，因此您可能必须将其限制为仅由脚本处理程序包装的方面。 At this point, though, the "why use Java at all" question floats up. 但是，在这一点上，“为什么要使用Java”问题浮出水面。

You'll note Google App Engine does none of these. 您会注意到Google App Engine不会执行这些操作。 It spools up a separate JVM for each application that's being run, but even then it greatly restricts what can be done within those JVMs, notably through the existing Java security model. 它为每个正在运行的应用程序分离一个单独的JVM，但即便如此，它也极大地限制了在这些JVM中可以完成的工作，特别是通过现有的Java安全模型。 The distinction here is that these instances tend to be "long lived" so as not to endure the processing costs of startup and shutdown. 这里的区别在于这些实例往往“长寿”，以免承受启动和关闭的处理成本。 I should say, they SHOULD be long lived, and those that are not do incur those costs. 我应该说，它们应该是长寿的，而那些不存在的则会产生这些代价。

You can make several instances of the JVM yourself, give them a bit of infrastructure to handle requests for logic, give them custom class loader logic to try and protect from class loader leaks, and minimally let you kill the instances off (they're simply a process) if you want. 您可以自己创建JVM的几个实例，为它们提供一些基础结构来处理逻辑请求，为它们提供自定义类加载器逻辑以尝试防止类加载器泄漏，并且最低限度地让您终止实例（它们只是简单的一个过程）如果你想。 That can work, and probably work "ok" depending on the granularity of the calls, and the "start up" time for your logic. 这可以工作，并且可能工作“ok”，具体取决于调用的粒度，以及逻辑的“启动”时间。 The start up time will minimally be the loading of the classes for the logic from run to run, that alone may make this a bad idea. 启动时间最低限度是从运行到运行的逻辑类的加载，仅此一点可能会使这个想法变坏。 And it certainly WON'T be "Java EE". 它肯定不会是“Java EE”。 Java EE is not set up to do this kind of thing. Java EE没有设置为执行此类操作。 But you're not clear what Java EE features you're looking at either. 但是你还不清楚你正在寻找什么样的Java EE功能。

Effectively, this is what Apache and "mod_php" does. 实际上，这就是Apache和“mod_php”的作用。 Several instances, as processes, individually handling requests, with badly behaving once being killed off as necessary. 有几个实例，作为进程，单独处理请求，一旦被扼杀就会表现得很糟糕。 This is why PHP is common in the shared hosting business. 这就是PHP在共享托管业务中很常见的原因。 In this structure, it's basically "safe". 在这种结构中，它基本上是“安全的”。

I believe your scenario is highly untypical, thus it is improbable that there is a ready made framework/platform addressing this need. 我相信你的场景非常不典型，因此不可能有一个现成的框架/平台来满足这种需求。 Java EE sort of assumes that the request processing code is written by the same team as the rest of the app, thus it need not be isolated, watched and reset that often, and bug fixes would be handled the same way in all parts of the system. Java EE类假设请求处理代码由与应用程序其余部分相同的团队编写，因此不需要经常隔离，监视和重置，并且错误修复将在所有部分中以相同的方式处理系统。 This assumption greatly simplifies development, deployment, testing etc. for most of the projects, not forcing them to pay for something they don't need, And yes, it isn't suitable for everyone. 这个假设大大简化了大多数项目的开发，部署，测试等，而不是强迫他们为不需要的东西付费，是的，它并不适合所有人。 If you want something fundamentally different, you probably need to implement a fair amount of failover logic yourself. 如果你想要一些根本不同的东西，你可能需要自己实现相当数量的故障转移逻辑。 Java EE does provide the fundamental building blocks for this though. Java EE确实为此提供了基本构建块。

I believe (although have no concrete experience to prove it) that .NET or other platforms are basically built on similar assumptions. 我相信（虽然没有具体的经验证明它）.NET或其他平台基本上建立在类似的假设之上。

We had a similar - though not so severe - port of a really enormous Perl site to Java. 我们有一个类似的 - 虽然不是那么严重 - 一个非常庞大的Perl站点到Java的端口。 On receiving an HTTP request we instantiate a class and call its processRequest method. 收到HTTP请求后，我们实例化一个类并调用其processRequest方法。 surrounded by try-catch and time measurement. 被try-catch和时间测量所包围。 Adding a timer and thread would suffice to be able to kill the thread. 添加计时器和线程就足以杀死线程。 This probably is sufficient in real life. 这在现实生活中可能已足够。

A Java EE server like glassfish is an OSGi container you might have more isolating means. 像glassfish这样的Java EE服务器是一个OSGi容器，你可能有更多的隔离方法。

Also you could run an array of (web or local) applications on which you dispatch your request via a central web applications. 您还可以运行一系列（Web或本地）应用程序，您可以通过中央Web应用程序在其上发送请求。 Those applications then are isolated. 那些应用程序就是孤立的。

Even more isolated are serialized sessions and operating system processes starting a new JVM. 更加孤立的是序列化会话和启动新JVM的操作系统进程。