What is Entropy?

People say many things about entropy: entropy increases with time, entropy is disorder, entropy increases with energy, entropy determines the arrow of time, etc… But I have no idea what entropy is, and from what I find, neither do most other people. This is the introduction I wish I had when first told about entropy, so hopefully you find it helpful. My goal is that by the end of this long post we will have a rigorous and intuitive understanding of those statements, and in particular, why the universe looks different when moving forward through time versus when traveling backward through time.
关于熵，人们说法不一：熵随时间增加、熵是无序的、熵随能量增加、熵决定时间之箭等等。但我对熵一无所知，而且据我所知，大多数人也一样。我希望在第一次接触熵时，能有这样的介绍，希望对你有所帮助。我的目标是，读完这篇长文后，我们能对这些表述有一个严谨而直观的理解，特别是理解为什么宇宙在时间向前流动和时间向后流动时看起来会有所不同。

This journey begins with defining and understanding entropy. There are multiple formal definitions of entropy across disciplines—thermodynamics, statistical mechanics, information theory—but they all share a central idea: entropy quantifies uncertainty. The easiest introduction to entropy is through Information Theory, which will lead to entropy in physical systems, and then finally to the relationship between entropy and time.
这段旅程始于定义和理解熵。熵在各个学科（热力学、统计力学、信息论）中都有多种正式定义，但它们都有一个共同的核心思想：熵量化不确定性。了解熵最简单的方法是通过信息论，这将引出物理系统中的熵，最终引出熵与时间的关系。

Information Theory 信息论

Imagine you want to communicate to your friend the outcome of some random events, like the outcome of a dice roll or the winner of a lottery, but you want to do it with the fewest number of bits (only 1s and 0s) as possible. How few bits could you use?
想象一下，你想和朋友沟通一些随机事件的结果，比如掷骰子的结果或彩票的中奖结果，但你希望用尽可能少的比特（只有 1 和 0）来实现。你能用多少比特呢？

The creator of Information Theory, Claude Shannon, was trying to answer questions such as these during his time at Bell labs. He was developing the mathematical foundations of communication and compression, and eventually he discovered that the minimum number of bits required for a message was directly related to the uncertainty of the message. He was able to then formulate an equation to quantify the uncertainty of a message. When he shared it with his physicist colleague at Bell Labs, John von Neumann, von Neumann suggested calling it entropy for two reasons:
信息论的创始人克劳德·香农在贝尔实验室工作期间，曾试图解答这类问题。他当时正在发展通信和压缩的数学基础，最终发现，一条消息所需的最小比特数与该消息的不确定性直接相关。他随后建立了一个方程来量化消息的不确定性。当他与贝尔实验室的物理学家同事约翰·冯·诺依曼分享这个方程时，冯·诺依曼建议将其称为熵，原因有二：

Von Neumann, Shannon reports, suggested that there were two good reasons for calling the function “entropy”. “It is already in use under that name,” he is reported to have said, “and besides, it will give you a great edge in debates because nobody really knows what entropy is anyway.” Shannon called the function “entropy” and used it as a measure of “uncertainty,” interchanging the two words in his writings without discrimination.
据香农称，冯·诺依曼认为将该函数命名为“熵”有两个充分的理由。“这个名字已经被人用了，”他说道，“而且，它会在辩论中给你带来很大的优势，因为反正也没人真正知道熵是什么。”香农将该函数命名为“熵”，并将其用作“不确定性”的度量，在他的著作中，这两个词毫无区别地互换使用。
— *Harold A. Johnson (ed.),*Heat Transfer, Thermodynamics and Education: Boelter Anniversary Volume (New York: McGraw-Hill, 1964), p. 354.
— Harold A. Johnson（编），《传热学、热力学和教育：博尔特周年纪念卷》（纽约：麦格劳-希尔出版社，1964 年），第 354 页。

Later we will see that the relationship between Shannon’s entropy and the pre-existing definition of entropy was more than coincidental, they are deeply intertwined.
后面我们会看到，香农的熵和先前存在的熵定义之间的关系不仅仅是巧合，而是深深地交织在一起的。

But now let us see how Shannon found definitions for these usually vague terms of “information” and “uncertainty”.
但现在让我们看看香农如何为这些通常模糊的“信息”和“不确定性”术语找到定义。

In Information Theory, the information of an observed state is formally defined as the number of bits needed to communicate that state (at least for a system with equally likely outcomes with powers of two, we’ll see shortly how to generalize this). Here are some examples of information:
在信息论中，一个观测状态的信息量正式定义为传递该状态所需的比特数（至少对于一个具有 2 的幂次方等概率结果的系统而言是这样，我们稍后会看到如何推广这一概念）。以下是一些信息量的示例：

If I flip a fair coin, it will take one bit of information to tell you the outcome: I use a 0 for head and a 1 for tails.
如果我抛一枚公平硬币，只需一点信息就能告诉你结果：我使用 0 表示正面，使用 1 表示反面。
If I roll a fair 8-sided dice, I can represent the outcome with 3 bits: I use 000 for a 1, 001 for 2, 010 for 3, etc.
如果我掷一个公平的 8 面骰子，我可以用 3 位表示结果：我使用 000 表示 1， 001 表示 2， 010 表示 3，等等。

The more outcomes a system can have, the more bits (information) it will require to represent its outcome. If a system has $N$ equally likely outcomes, then it will take $log2(N)$ bits of information to represent an outcome of that system.
系统可能出现的结果越多，表示其结果所需的比特（信息）就越多。如果一个系统有 $N$ 个等概率结果，那么表示该系统的一个结果需要 $log2(N)$ 个比特的信息。

Entropy is defined as the expected number of bits of information needed to represent the state of a system (this is a lie, but it’s the most useful definition for the moment, we’ll fix it later). So the entropy of a coin is 1 since on average we expect it to take 1 bit of information to represent the outcome of the coin. An 8-sided dice will have an entropy of 3 bits, since we expect it to take an average of 3 bits to represent the outcome.
熵的定义是：表示系统状态所需的信息比特数的期望值（这虽然有点夸张，但目前是最有用的定义，我们稍后会修正）。因此，一枚硬币的熵为 1，因为我们预期平均需要 1 比特的信息来表示硬币的结果。一枚 8 面骰子的熵为 3 比特，因为我们预期平均需要 3 比特的信息来表示结果。

It initially seems that entropy is an unnecessary definition since we can just look at how many bits it takes to represent the outcome of our system and use that value, but this is only true when the chance of the outcomes are all equally likely.
最初，熵似乎是一个不必要的定义，因为我们只需查看需要多少位来表示系统的结果并使用该值，但这仅当结果的机会都同样可能时才成立。

Imagine now that I have a weighted 8-sided dice, so the number 7 comes up $50$ % of the time while the rest of the faces come up $≈7.14$ % of the time. Now, if we are clever, we can reduce the expected number of bits needed to communicate the outcome of the dice. We can decide to represent a 7 with a 0, and all the other numbers will be represented with 1XXX where the X s are some unique bits. This would mean that $50$ % percent of the time we only have to use 1 bit of information to represent the outcome, and the other $50$ % of the time we use 4 bits, so the expected number of bits (the entropy of the dice) is 2.5. This is lower than the 3 bits of entropy for the fair 8-sided dice.
假设现在我有一个加权的 8 面骰子，数字 7 出现的概率为 $50$ %，而其余面出现的概率为 $≈7.14$ %。现在，如果我们足够聪明，可以减少传达骰子结果所需的预期位数。我们可以用 0 表示 7，而所有其他数字则用 1XXX 表示，其中 X 是一些唯一的位。这意味着 $50$ % 的时间我们只需使用 1 位信息来表示结果，而另外 $50$ % 的时间我们需要使用 4 位，因此预期位数（骰子的熵）为 2.5。这低于公平 8 面骰子的 3 位熵。

Fortunately, we don’t need to come up with a clever encoding scheme for every possible system, there exists a pattern to how many bits of information it takes to represent a state with probability $p$ . We know if $p=0.5$ such as in the case of a coin landing on heads, then it takes 1 bit of information to represent that outcome. If $p=0.125$ such as in the case of a fair 8-sided dice landing on the number 5, it takes 3 bits of information to represent that outcome. If $p=0.5$ such as in the case of our unfair 8-sided dice landing on the number 7, then it takes 1 bit of information, just like the coin, which shows us that all that matters is the probability of the outcome. With this, we can discover an equation for the number of bits of information needed for a state with probability $p$ .
幸运的是，我们不需要为每个可能的系统都设计一个巧妙的编码方案，因为存在一个模式来表示以概率 $p$ 表示状态所需的信息位数。我们知道如果 $p=0.5$ 比如一枚硬币正面朝上，那么需要 1 位信息来表示该结果。如果 $p=0.125$ 比如一个公平的 8 面骰子掷出数字 5，则需要 3 位信息来表示该结果。如果 $p=0.5$ 比如我们不公平的 8 面骰子掷出数字 7，那么它和硬币一样需要 1 位信息，这表明重要的是结果的概率。由此，我们可以发现一个方程，表示以概率 $p$ 表示状态所需的信息位数。

$$
I(p)=−log2(p)
$$

This value $I$ is usually called information content or surprise, since the lower the probability of a state occurring, the higher the surprise when it does occur.
这个值 $I$ 通常被称为信息内容或惊讶，因为某种状态发生的概率越低，它发生时的惊讶程度就越高。

When the probability is low, the surprise is high, and when the probability is high, the surprise is low. This is a more general formula then “the number of bits needed” since it allows for states that are exceptionally likely (such as $99$ % likely) to have surprise less then 1, which would make less sense if we tried to interpret the value as “the number of needed bits to represent the outcome”.
当概率低时，意外值高；当概率高时，意外值低。这是一个比“所需位数”更通用的公式，因为它允许极有可能（例如 $99$ % 的可能性）的状态意外值小于 1。如果我们试图将该值解释为“表示结果所需的位数”，那么这个公式就不太合理了。

And now we can fix our definition of entropy (the lie I told earlier). Entropy is not necessarily the expected number of bits used to represent a system (although it is when you use an optimal encoding scheme), but more generally the entropy is the expected surprise of the system.
现在我们可以修正熵的定义了（我之前撒的谎）。熵不一定是表示一个系统的预期比特数（尽管在使用最优编码方案时它是），但更一般地说，熵是系统的预期意外值。

And now we can calculate the entropy of systems like a dice or a coin or any system with known probabilities for its outcomes. The expected surprise (entropy) of a system with $N$ possible outcomes each with probability $pi$ (all adding up to 1) can be calculated as
现在我们可以计算像骰子、硬币或任何已知结果概率的系统，其熵了。一个系统有 $N$ 个可能的结果，每个结果的概率为 $pi$ （所有结果之和为 1），其预期意外（熵）可以计算如下：

$$
(Shannon entropy)∑i=1Npi⋅I(pi)=−∑i=1Npi⋅log2(pi)
$$

And notice that if all the $N$ probabilities are the same (so $pi=1N$ ), then the entropy equation can simplify to
请注意，如果所有 $N$ 概率都相同（因此 $pi=1N$ ），则熵方程可以简化为

$$
−∑i=1Npi⋅log2(pi)⇒log2(N)
$$

Here are some basic examples using $(Shannon entropy)$ .
以下是一些使用 $(Shannon entropy)$ 的基本示例。

The entropy of a fair coin is
一枚公平硬币的熵是
$$
−(0.5⋅log2(0.5)+0.5⋅log2(0.5))=log2(2)=1
$$
The entropy of a fair 8-sided dice is
一个公平的八面骰子的熵是
$$
−∑i=180.125⋅log2(0.125)=log2(8)=3
$$
The entropy of an unfair 8-sided dice, where the dice lands on one face $99$ % of the time and lands on the other faces the remaining $1$ % of the time with equal probability (about $0.14$ % each), is
一枚不公平的 8 面骰子的熵为：骰子有 $99$ % 的概率落在一个面上，其余 $1$ % 的概率落在其他面上，概率相等（各约为 $0.14$ %）。
$$
−(0.99⋅log2(0.99)+∑i=170.0014⋅log2(0.0014))=0.10886668511648723
$$

Hopefully it is a bit more intuitive now that entropy represents uncertainty. An 8-sided dice would have higher entropy than a coin since we are more uncertain about the outcome of the 8-sided dice than we are about the coin (8 equally likely outcomes are more uncertain than only 2 equally likely outcomes). But a highly unfair 8-sided dice has less entropy than even a coin since we have very high certainty about the outcome of the unfair dice. Now we have an actual equation to quantify that uncertainty (entropy) about a system.
希望现在熵代表不确定性，这样理解起来更直观一些。一个八面骰子的熵会比一枚硬币高，因为我们对八面骰子结果的不确定性比对硬币结果的不确定性更大（8 个等概率结果比只有 2 个等概率结果的不确定性更大）。但是，一个高度不公平的八面骰子的熵甚至比一枚硬币还要小，因为我们对不公平骰子结果的确定性非常高。现在我们有了一个实际的方程来量化系统的不确定性（熵）。

It is not clear right now how this definition of entropy has anything to do with disorder, heat, or time, but this idea of entropy as uncertainty is fundamental to understanding the entropy of the universe which we will explore shortly. For reference, this definition of entropy is called Shannon entropy.
目前尚不清楚这种熵的定义与无序、热量或时间有何关联，但熵作为不确定性的概念对于理解我们稍后将要探讨的宇宙熵至关重要。作为参考，这种熵的定义被称为香农熵。

We will move on now, but I recommend looking further into Information Theory. It has many important direct implications for data compression, error correction, cryptography, and even linguistics, and touches nearly any field that deals with uncertainty, signals, or knowledge.
我们现在继续，但我建议进一步研究信息论。它对数据压缩、纠错、密码学甚至语言学都有许多重要的直接影响，并且几乎涉及所有涉及不确定性、信号或知识的领域。

Physical Entropy 物理熵

Now we will see entropy from a very different lens, that of Statistical Mechanics. We begin with the tried-and-true introduction to entropy which every student is given.
现在我们将从一个截然不同的视角——统计力学——来看待熵。我们从每个学生都会学到的、经过实践检验的熵的入门知识开始。

Balls in a box 盒子里的球

I shall give you a box with 10 balls in it, $p0$ through $p9$ , and we will count how many balls are on the left side of the box and on the right side of the box. Assume every ball is equally likely to be on either side. Immediately we can see it is highly unlikely that we count all the balls are on the left side of the box, and more likely that we count an equal number of balls on each side. Why is that?
我给你一个盒子，里面有 10 个球，从 $p0$ 到 $p9$ ，我们来数一下盒子左侧和右侧分别有多少个球。假设每个球出现在盒子两侧的可能性相同。我们立刻就能发现，所有球都出现在盒子左侧的可能性极小，而两侧球的数量更有可能相等。这是为什么呢？

Well, there is only one state in which we count all the balls on the left, and that is if every ball is on the left (truly astounding, but stay with me). But there are many ways in which the box is balanced: We could have $p0$ through $p4$ one side and the rest on the other, or the same groups but flipped from left to right, or we could have all the even balls on one side and the odd on the other, or again flipped, or any of the other many possible combinations.
嗯，只有一种情况，我们才能数出左边所有的球，那就是所有球都在左边（这真的很惊人，不过请听我说完）。但是，盒子平衡的方式有很多种：我们可以把 $p0$ 到 $p4$ 的球放在一边，其余的放在另一边；或者把相同的球组从左到右翻转；或者把所有偶数球放在一边，奇数球放在另一边，或者再次翻转；或者其他许多可能的组合。

This box is a system that we can measure the entropy of, at least once I tell you how many balls are counted on each side. It can take a moment to see, but imagine the box with our left and right counts as a system where the outcome will be finding out where all the individual balls are in the box, similar to rolling a dice and seeing which face it lands on.
这个盒子是一个我们可以测量其熵的系统，至少在我告诉你每边数了多少个球之后是这样。这可能需要一点时间才能看明白，但想象一下，这个包含左右计数的盒子是一个系统，其结果是找出盒子里所有单个球的位置，就像掷骰子并查看它落在哪个面上一样。

This would mean that the box where we count all the balls on the left side only has one possible outcome: all the balls are on the left side. We would take this to mean that this system has $0$ entropy (no expected surprise) since we already know where we will find each individual ball.
这意味着，我们统计左侧所有球的盒子只有一个可能的结果：所有球都在左侧。我们认为这意味着这个系统的熵为 $0$ （没有预期的意外），因为我们已经知道每个球会在哪里找到。

The box with balanced sides (5 on each) has many possible equally likely outcomes, and in fact, we can count them. A famous equation in combinatorics is the N-choose-k equation, which calculates exactly this scenario. It tells us that there are 252 possible ways in which we can place 5 balls on each side. The entropy for this system would then be $−∑i=12521252⋅log2(1252)=log2(252)=7.9772799235$ . This is the same as calculating the entropy of a 252-sided dice.
一个四面平衡（每面放 5 个球）的盒子有很多等概率的可能结果，事实上，我们可以数出这些结果。组合数学中一个著名的方程是 N 选 k 方程，它正好可以计算这种情况。该方程告诉我们，每面放 5 个球共有 252 种可能的方法。因此，该系统的熵为 $−∑i=12521252⋅log2(1252)=log2(252)=7.9772799235$ 。这与计算一个 252 面骰子的熵是相同的。

And if we were to increase the number of balls, the entropy of the balanced box would increase since there would then be even more possible combinations that could make up a balanced box.
如果我们增加球的数量，平衡盒子的熵就会增加，因为这样一来，组成平衡盒子的可能组合就会更多。

We should interpret these results as: The larger the number of ways there are to satisfy the large-scale measurement (counting the number of balls on each side), the higher the entropy of the system. When all the balls are on the left, there is only one way to satisfy that measurement and so it has a low entropy. When there are many ways to balance it on both sides, it has high entropy.
我们应该将这些结果解读为：满足大规模测量（计算每侧球的数量）的方法越多，系统的熵就越高。当所有球都在左侧时，只有一种方法可以满足该测量，因此熵值较低。当两侧都有很多方法可以平衡测量时，熵值较高。

Here we see 1000 balls bouncing around in a box. They will all start on the left, so the box would have 0 entropy, but once the balls start crossing to the right and changing the count on each side, the entropy will increase.
这里我们看到 1000 个球在一个盒子里弹来弹去。它们一开始都从左边开始，所以盒子的熵为 0。但是一旦球开始向右移动，并改变两侧的计数，熵就会增加。

In Statistical Mechanics, the formal term for the large-scale measurement is the macrostate, and the specific states that can satisfy that measurement are microstates. We would call the measurement of the number of balls on each side of the box the macrostate, and the different combinations of positions of individual balls the microstates. So rephrasing the above: There is only one microstate representing the macrostate of all balls being counted on one side, and there are many microstates representing the macrostate of a balanced box.
在统计力学中，大尺度测量的正式术语是宏观状态，而能够满足该测量的具体状态则是微观状态。我们将测量盒子两侧球的数量称为宏观状态，将各个球位置的不同组合称为微观状态。因此，重新表述上述内容：只有一个微观状态代表所有球都被计数在一侧的宏观状态，而有多个微观状态代表平衡盒子的宏观状态。

But why did we decide to measure the number of balls on the left and right? We could have measured a different macrostate, and the entropy would be different.
但为什么我们决定测量左右两边球的数量呢？我们测量的可能是不同的宏观状态，熵也会不同。

Macrostates 宏观状态

Imagine instead of selecting the left and right halves of the box to count the number of balls, we instead count how many balls are in each pixel of the box. In this scenario, the entropy would almost always be maximized, as the balls rarely share a pixel. Even if all the balls were on the left side of the box, they would likely still each occupy a different pixel, and the measured entropy would be the same as if the balls were evenly distributed in the box.
想象一下，我们不再选择盒子的左右两半来计算球的数量，而是计算盒子每个像素中有多少个球。在这种情况下，熵几乎总是最大化，因为球很少共享一个像素。即使所有球都在盒子的左侧，它们很可能仍然各自占据不同的像素，测得的熵与球均匀分布在盒子中时相同。

If we use an expensive instrument to measure the box and track the balls with high precision, then the entropy would rarely change and would be very high. If we instead use an inexpensive instrument that can only tell if a ball is on the left or right of the box, then the entropy will be low and could very easily fluctuate if some of the balls temporarily end up on the same side of the box.
如果我们使用昂贵的仪器来测量盒子并高精度地追踪球，那么熵几乎不会发生变化，而且会非常高。如果我们使用一种廉价的仪器，只能判断球是在盒子的左侧还是右侧，那么熵就会很低，而且如果一些球暂时落在盒子的同一侧，熵很容易波动。

Let’s run exactly the same simulation of 1000 balls in the box again, still starting with the balls on the left. But, this time we count how many balls are in each cell in a 50x50 grid, as opposed to the previous two cells (the left and right cells). The entropy will be high since there are many microstates that represent a bunch of cells with only 1 ball in it, and the entropy won’t change much since two balls rarely share the same cell. Recall that if two balls share the same cell, the count would go up, and there are fewer microstates that satisfy a cell with a count of 2 compared to two cells with a count of 1 in each.
让我们再次对盒子中的 1000 个球进行完全相同的模拟，仍然从左边的球开始。但是，这次我们计算 50x50 网格中每个格子中有多少个球，而不是之前的两个格子（左边和右边的格子）。由于存在许多表示一堆格子中只有一个球的微状态，因此熵会很高；由于两个球很少共享同一个格子，因此熵不会有太大变化。回想一下，如果两个球共享同一个格子，计数会增加，并且与两个各为 1 个球的格子相比，满足计数为 2 个格子的微状态会更少。

Entropy is not intrinsic to the physical system alone, but rather to our description of it as well — i.e., the macrostate we’re measuring, and the resolution at which we observe it.
熵不仅仅是物理系统所固有的，也是我们对它的描述所固有的——即我们测量的宏观状态，以及我们观察它的分辨率。

This process of measuring a lower-resolution version of our system (like counting how many balls are on the left or right side of a box) is called coarse-graining.
测量我们系统的低分辨率版本（例如，计算盒子左侧或右侧有多少个球）的过程称为粗粒化。

How we choose/measure the macrostate, that is, how we coarse-grain the system, is dependent on the problem we are solving.
我们如何选择/测量宏观状态，即我们如何粗粒度化系统，取决于我们正在解决的问题。

Imagine you have a box of gas (like our balls in a box, but at the scale of $1025$ balls in the box), and we place a temperature-reader on the left and right side of the box. This gives us a macrostate of two counts of the average ball speed on the left and right sides of the box. We can then calculate the entropy by comparing when the temperature-readers are equal to when they are different by $T$ degrees. Once we learn how time and entropy interact, we will use this model to show that the two temperature-readers are expected to converge to the same value over time.
假设你有一盒气体（类似于盒子里的球，但数量级为 $1025$ 个球），我们在盒子的左右两侧各放置一个温度读数器。这样我们就得到了一个宏观状态，即盒子左右两侧球的平均速度。然后，我们可以通过比较温度读数器相等和相差 $T$ 度时的温度来计算熵。一旦我们了解了时间和熵如何相互作用，我们将使用该模型来证明两个温度读数器预计会随着时间的推移收敛到相同的值。
Imagine you sequence the genome of many different people in a population, you could choose many different macrostates based on what you care about. You could count how many of each nucleotide there are in all the sequences, allowing you to quantify how variable the four nucleotides are in DNA. You could calculate the entropy of every individual position in the DNA sequence by counting how many nucleotide types are used in that position across the population, allowing you to identify portions of DNA that are constant across individuals or vary across individuals.
想象一下，你对一个群体中许多不同个体的基因组进行测序，你可以根据你关注的点选择许多不同的宏观状态。你可以计算所有序列中每种核苷酸的数量，从而量化 DNA 中这四种核苷酸的变异程度。你可以通过计算群体中该位置使用的核苷酸类型数量来计算 DNA 序列中每个位置的熵，从而识别出 DNA 中哪些部分在个体间是恒定的，哪些部分在个体间是变化的。

How you choose to measure the macrostate can come in many forms for the same system, depending on what you are capable of measuring and/or what you care about measuring.
对于同一系统，您选择如何测量宏观状态可以有多种形式，这取决于您能够测量什么和/或您关心测量什么。

But once we have a macrostate, we need a way to identify all the microstates and assign probabilities to them.
但是，一旦我们有了宏观状态，我们就需要一种方法来识别所有微观状态并给它们分配概率。

Microstates 微观国家

When we were looking at the positions of balls in a box in equally sized cells, it was easy to see that every ball was equally likely to be in any of the cells, so each microstate was equally likely. This made calculating the entropy very simple, we just used the simplified version of $(Shannon entropy)$ to find that for $W$ microstates that satisfy a given macrostate, the entropy of the system is $log2(W)$ . It isn’t too hard to extend this idea to microstates that are not equally likely.
当我们观察盒子中大小相等的格子中球的位置时，很容易看出每个球出现在任意格子中的可能性都相等，因此每个微观状态的可能性也相等。这使得熵的计算变得非常简单，我们只需使用简化版的 $(Shannon entropy)$ 即可得出，对于满足给定宏观状态的 $W$ 个微观状态，系统的熵为 $log2(W)$ 。将这个想法扩展到可能性不相等的微观状态也并不难。

For example, let’s calculate the entropy of a box with 5 balls on the left and 5 balls on the right, but we replace one of the balls in the box with a metal ball that is pulled by a magnet to the left. In this case, the probability of each microstate is no longer equally likely. If we assume there is an $80$ % chance that the metal ball is on the left side instead of the right side, then the entropy of the box can be calculated as follows: For all of the 252 microstates, 126 of them have the metal ball on the left, which has a $0.8$ chance of being true, and the other 126 have the metal ball on the right with a $0.2$ chance. This means using the $(Shannon entropy)$ we get an entropy of
例如，让我们计算一个盒子的熵，盒子左边有 5 个球，右边有 5 个球，但是我们将盒子中的一个球替换为一个被磁铁拉向左侧的金属球。在这种情况下，每个微状态的概率不再相等。如果我们假设金属球在左侧而不是右侧的概率为 $80$ %，那么盒子的熵可以计算如下：对于所有 252 个微状态，其中 126 个微状态的金属球在左侧，有 $0.8$ 的概率为真，而另外 126 个微状态的金属球在右侧，概率为 $0.2$ 。这意味着使用 $(Shannon entropy)$ 我们得到的熵为

$$
−∑i=11260.2126⋅log2(0.2126)−∑i=11260.8126⋅log2(0.8126)=7.69921
$$

This is a little less than the box with normal balls which had $7.9772799235$ entropy. This is exactly what we should expect, we are a bit more certain about the outcome of this system since we knew where one of the balls was more likely to be.
这比熵值为 $7.9772799235$ 的普通球盒子略小。这正是我们所期望的，我们对这个系统的结果更加确定，因为我们知道其中一个球更有可能出现在哪里。

But this raises a subtle question: why did we choose this particular set of microstates? For example, if we have the macrostate of 5 balls on the left and 5 balls on the right, but we decide to use the 50x50 grid of cells to describe the microstates, then there are far more microstates that satisfy the macrostate compared to when we were using the 2x1 grid of left and right.
但这引出了一个微妙的问题：我们为什么选择这组特定的微观状态？例如，如果我们的宏观状态是左边有 5 个球，右边也有 5 个球，但我们决定使用 50x50 的网格来描述微观状态，那么与使用 2x1 左右网格相比，满足宏观状态的微观状态会多得多。

Let’s calculate the entropy for those two examples. Keep in mind they both have the same macrostate: 5 balls on the left and 5 balls on the right.
我们来计算这两个例子的熵。请记住，它们的宏观状态相同：左边有 5 个球，右边有 5 个球。

If we choose to use the microstates of looking at the position of individual balls between two cells splitting the box in half, then we can use n-choose-k to calculate that there are 252 possible combinations of balls across the two cells. This gives us an entropy of $log2(252)=7.977279923$ .
如果我们选择使用观察将盒子一分为二的两个单元格之间单个球的位置的微观状态，那么我们可以使用 n-choose-k 来计算出两个单元格之间球的可能组合方式共有 252 种。这给出了熵 $log2(252)=7.977279923$ 。
If we choose to use the microstates of looking at the position of individual balls between 50x50 (2500) cells splitting the box into a grid, then we can use n-choose-k to calculate that there are 252 possible combinations of balls across the two halves of the box, for each of which every ball could be in any of 50x25 (1250) cells. This gives us an entropy of $log2(252∗125010)=110.8544037$ .
如果我们选择使用微观状态来观察盒子中 50x50（2500）个格子间单个球的位置，将盒子分成一个网格，那么我们可以使用 n-choose-k 来计算，盒子两半共有 252 种可能的球位组合，每种组合下的每个球都可能位于 50x25（1250）个格子中的任意一个。这给出了熵 $log2(252∗125010)=110.8544037$ 。

This result lines up very well with our Information-theoretic understanding of entropy: when we allow more microstates to represent the same macrostate, we are more uncertain about the microstate our system is in. But this result does raise some concerns.
这个结果与我们对熵的信息论理解非常吻合：当我们允许更多的微观状态来表示相同的宏观状态时，我们对系统所处的微观状态更加不确定。但这个结果确实引起了一些担忧。

If different microstates give different entropy, how do we choose the right microstates for our problem? Unlike the macrostate, this decision of which microstates to use is not determined by our instruments or the scope of the problem, it has to be determined by the person making the calculation. Often for physical systems people will use the set of microstates that capture all the relevant information related to the macrostate. For example, if our macrostate is about balls on the left or right side of a box, then we probably don’t care about the ball’s velocity or mass or anything else but the ball position.
如果不同的微观状态会产生不同的熵，那么我们如何为我们的问题选择合适的微观状态呢？与宏观状态不同，选择哪种微观状态并非由我们的仪器或问题的范围决定，而是由计算人员决定。对于物理系统，人们通常会使用一组能够捕捉与宏观状态相关的所有信息的微观状态。例如，如果我们的宏观状态是关于盒子左侧或右侧的球，那么我们可能不关心球的速度、质量或其他任何信息，而只关心球的位置。

Another concern is that it feels wrong that the same physical system with the same macrostate can have different entropies depending on the microstate representation we use. Usually, we expect physical systems to have invariant measurements regardless of the internal representation we decide to use for our measurement. But this is incorrect for entropy. We need to recall that entropy is the uncertainty of a system and that the definition of entropy is completely dependent on what we are uncertain about, which for physical systems are the microstates. This would be similar to someone asking “How many parts make up that machine?”, to which we should respond “How do you define a ‘part’?”. When we ask “What is the entropy of this macrostate?”, we need to respond with “What microstates are we using?”.
另一个担忧是，对于具有相同宏观状态的同一物理系统，根据我们使用的微观状态表示，其熵可能不同，这感觉不对。通常，我们期望物理系统具有不变的测量值，而不管我们决定使用何种内部表示进行测量。但对于熵来说，这是不正确的。我们需要回想一下，熵是系统的不确定性，熵的定义完全取决于我们不确定的内容，对于物理系统来说，这些不确定的内容就是微观状态。这类似于有人问“那台机器由多少个零件组成？”，我们应该回答“你如何定义‘零件’？”。当我们问“这个宏观状态的熵是多少？”时，我们需要回答“我们使用哪些微观状态？”。

With all that said, there is some small truth to what our intuition is telling us, although it doesn’t apply to the general case. While the entropy of the system changes when we change the microstates, the relative differences in entropy across macrostates will be equal if the new microstates uniformly multiply the old microstates. That is, if each original microstate is split into the same number of refined microstates, then the entropy of every macrostate increases by a constant. We’re getting lost in the terminology, an example will demonstrate.
综上所述，我们的直觉告诉我们，虽然并非普遍适用，但其中也包含一些小的真理。虽然当我们改变微观状态时，系统的熵会发生变化，但如果新的微观状态均匀地乘以旧的微观状态，则宏观状态间熵的相对差异将相等。也就是说，如果将每个原始微观状态分解成相同数量的精炼微观状态，那么每个宏观状态的熵都会增加一个常数。我们可能在术语上有点困惑，下面举个例子来说明。

Let us again take the 10 balls in a box, and we will calculate the entropy of the system for a few different macrostates and microstate representations. We indicate the number of balls on each side of the box with (L, R), where L is the number of balls on the left and R is the number of balls on the right. Then we calculate the entropy using the microstate of a 2x1 grid of cells (just the left and right halves of the box) and for the 50x50 grid of cells.
再次以盒子里的 10 个球为例，我们将计算系统在几个不同的宏观状态和微观状态表示下的熵。我们用 (L, R) 表示盒子两侧球的数量，其中 L 表示左侧球的数量， R 表示右侧球的数量。然后，我们分别使用 2x1 网格（仅盒子的左右两半）和 50x50 网格的微观状态来计算熵。

	(10,0)	(9,1)	(8,2)	(7,3)	(6,4)	(5,5)	(4,6)	(3,7)	(2,8)	(1,9)	(0,10)
2x1	0.00000	3.32193	5.49185	6.90689	7.71425	7.97728	7.71425	6.90689	5.49185	3.32193	0.00000
50x50	102.87712	106.19905	108.36898	109.78401	110.59137	110.85440	110.59137	109.78401	108.36898	106.19905	102.87712

And if we look, we will see that the entropy in the 50x50 grid microstate values is just the 2x1 grid values plus a constant. The relative entropy in both cases would be identical. This is even more clear if we mathematically show how the entropy is calculated. For the 2x1 grid we use the equation $log2((10L))$ , and for the 50x50 grid we use $log2(125010(10L))=log2(125010)+log2((10L))$ . Mathematically we can see that it is the same as the entropy of the 2x1 grid offset by $log2(125010)$ .
如果我们仔细观察，就会发现 50x50 网格微状态值的熵恰好等于 2x1 网格值加上一个常数。两种情况下的相对熵是相同的。如果我们用数学方法展示熵的计算方式，这一点就更加清晰了。对于 2x1 网格，我们使用公式 $log2((10L))$ ，而对于 50x50 网格，我们使用公式 $log2(125010(10L))=log2(125010)+log2((10L))$ 。从数学上看，它与偏移 $log2(125010)$ 的 2x1 网格的熵相同。

You can imagine if we added another dimension along the microstates that we would increase the entropy again by a constant. For example, if each of the 10 balls could be one of 3 colors, then the number of microstates would grow by a factor of $310$ , and so the entropy of the whole system would increase by $log2(310)$ .
可以想象，如果我们在微观状态上增加一个维度，熵就会再次增加一个常数。例如，如果 10 个球中的每一个都有 3 种颜色，那么微观状态的数量就会增加 $310$ 倍，因此整个系统的熵就会增加 $log2(310)$ 。

Our intuition was correct when we used different microstates that are multiples of each other, but that intuition fails if the microstates are not so neatly multiples of each other. An easy example of this is if we represent the left side of the box as one cell and the right as a 50x25 grid of cells, then the entropy looks very different. Below is the table again, but with the added row of our non-homogenous microstates. An example of how we calculate the entropy of macrostate $(3,7)$ is: there are 120 equally likely ways to place 3 balls on the left and 7 balls on the right, but the balls on the right can also be in $12507$ different states, so the entropy is $log2(120⋅12507)=78.920877252$ .
当我们使用彼此成倍数的不同微观状态时，我们的直觉是正确的，但如果微观状态彼此之间不是那么整齐的倍数，这种直觉就会失效。一个简单的例子是，如果我们将盒子的左侧表示为一个单元格，将右侧表示为 50x25 的单元格网格，那么熵看起来会非常不同。下面再次是该表格，但添加了一行非同质微观状态。以下是我们如何计算宏观状态 $(3,7)$ 的熵的一个例子：有 120 种同样可能的方式将 3 个球放在左侧，将 7 个球放在右侧，但右侧的球也可以处于 $12507$ 种不同的状态，因此熵为 $log2(120⋅12507)=78.920877252$ 。

	(10,0)	(9,1)	(8,2)	(7,3)	(6,4)	(5,5)	(4,6)	(3,7)	(2,8)	(1,9)	(0,10)
2x1	0.00000	3.32193	5.49185	6.90689	7.71425	7.97728	7.71425	6.90689	5.49185	3.32193	0.00000
50x50	102.87712	106.19905	108.36898	109.78401	110.59137	110.85440	110.59137	109.78401	108.36898	106.19905	102.87712
mixed 混合	0.00000	13.60964	26.06728	37.77003	48.86510	59.41584	69.44052	78.92088	87.79355	95.91134	102.87712

A funny thing to note is that when all the balls are on the left, the entropy is zero, but when all the balls are on the right, the entropy is maximized. And again, hopefully, this makes sense from our understanding of entropy, that it measures uncertainty relative to our microstates. If we know all the balls are on the left, then we know they must be in the single left cell, so no uncertainty. If we know the balls are all on the right, then they could be in any of $125010$ microstates, so high uncertainty.
值得注意的是，当所有球都在左边时，熵为零，而当所有球都在右边时，熵达到最大值。再次强调，希望这从我们对熵的理解上说得通，熵衡量的是相对于微观状态的不确定性。如果我们知道所有球都在左边，那么我们就知道它们一定在左边的那个格子里，所以没有不确定性。如果我们知道所有球都在右边，那么它们可能处于 $125010$ 个微观状态中的任何一个，所以不确定性很高。

Clearly, we need to be careful and aware of what microstates we are choosing when measuring the entropy of a system. Fortunately, for most physical systems we use the standard microstates of a uniform grid of positions and momentums of the balls (particles) in the system. Another standard microstate to use is the continuous space of position and momentum.
显然，在测量系统熵时，我们需要谨慎并注意所选择的微观状态。幸运的是，对于大多数物理系统，我们使用系统中球（粒子）位置和动量的均匀网格的标准微观状态。另一个可用的标准微观状态是位置和动量的连续空间。

Continuous Microstates 连续微观状态

So far, we’ve looked at discrete sets of microstates — such as balls in cells. But in physical systems, microstates are often continuous: positions and momenta can vary over a continuum. How do we compute entropy in this setting? This is not related to the rest of the explanation, but it is an interesting tangent to explore.
到目前为止，我们已经研究了离散的微观状态集——例如细胞中的球。但在物理系统中，微观状态通常是连续的：位置和动量可以在连续体中变化。在这种情况下，我们如何计算熵？这与其他解释无关，但这是一个值得探讨的有趣切入点。

Let’s return to our 10 balls in a 2D box. If each ball can occupy any position in the square, then the microstate of the system is a point in a $20$ -dimensional space (2 dimensions per ball). The number of possible microstates is infinite — and each individual one has infinitesimal probability.
让我们回到二维盒子里的 10 个球。如果每个球都能占据正方形中的任意位置，那么系统的微观状态就是 $20$ 维空间（每个球对应 2 个维度）中的一个点。可能的微观状态数量是无限的——并且每个单独的微观状态都有无穷小的概率。

In this setting, we use a probability density function $ρ(x)$ , and entropy becomes a continuous integral:
在这个设置中，我们使用概率密度函数 $ρ(x)$ ，熵变成连续积分：

$$
S=−∫Xρ(x)log2⁡ρ(x)dx
$$

This is called differential entropy. It generalizes Shannon entropy to continuous systems, though it has some subtleties — it can be negative, and it’s not invariant under coordinate transformations.
这被称为微分熵。它将香农熵推广到连续系统，尽管它有一些微妙之处——它可以为负，并且在坐标变换下不是不变的。

If the density is uniform, say $ρ(x)=1V$ over a region of volume $V$ , then the entropy becomes:
如果密度是均匀的，比如说 $ρ(x)=1V$ 在体积为 $V$ 的区域内，那么熵就变成：

$$
S=−∫X1Vlog2⁡(1V)dx=log2⁡(V)
$$

So entropy still grows with the logarithm of the accessible state volume, just as in the discrete case.
因此，熵仍然随着可访问状态体积的对数而增长，就像离散情况一样。

This formalism is particularly natural in quantum mechanics, where the wavefunction $ψ(x)$ defines a probability density $ρ(x)=|ψ(x)|2$ . Consider a 1D Gaussian wavefunction:
这种形式在量子力学中尤为自然，其中波函数 $ψ(x)$ 定义了概率密度 $ρ(x)=|ψ(x)|2$ 。考虑一维高斯波函数：

$$
ψ(x)=(1πσ2)1/4e−x2/(2σ2)
$$

Its entropy (in bits) is:
它的熵（以位为单位）为：

$$
S=−∫−∞∞ρ(x)log2⁡ρ(x)dx=12log2⁡(2πeσ2)
$$

This shows that wider distributions have higher entropy, as expected: a more spread-out wavefunction indicates more uncertainty in the particle’s location.
这表明，正如预期的那样，分布越广，熵就越高：波函数越分散，表明粒子位置的不确定性越大。

For instance:例如：

If $σ=1$ , then $S≈2.047$
如果 $σ=1$ ，则 $S≈2.047$
If $σ=3$ , then $S≈3.600$
如果 $σ=3$ ，则 $S≈3.600$

Which again should make sense: When we are less certain about a system, like where a particle will be when measured, the more entropy it has.
这又应该是有道理的：当我们对一个系统不太确定时，比如测量时粒子的位置，它的熵就越大。

And a quick issue to address: If the state space is unbounded, like momentum in classical mechanics, then the entropy can diverge. This isn’t a problem in practice because physical systems typically have probability distributions (like Gaussians) that decay quickly enough at infinity to keep the entropy finite. When that’s not the case, we either limit the system to a finite region or focus on entropy differences, which remain well-defined even when absolute entropy diverges.
还有一个需要快速解决的问题：如果状态空间无界，就像经典力学中的动量一样，那么熵就会发散。这在实践中并非问题，因为物理系统通常具有概率分布（例如高斯分布），这些概率分布在无穷远处衰减得足够快，从而保持熵的有限性。如果不是这种情况，我们要么将系统限制在一个有限的区域内，要么关注熵的差异，即使绝对熵发散，熵的差异仍然保持明确的定义。

But let’s get back to our main topic, and we’ll get back into it with a historical overview.
但是让我们回到我们的主要话题，我们将通过历史概述来重新讨论它。

Standard Usage of Entropy熵的标准用法

Eighty years before Claude Shannon developed Information Theory, Ludwig Boltzmann formulated a statistical definition of entropy for an ideal gas. He proposed that the entropy $S$ of a system is proportional to the logarithm of the number of microstates $W$ consistent with a given macrostate:
在克劳德·香农（Claude Shannon）创立信息论的八十年前，路德维希·玻尔兹曼（Ludwig Boltzmann）为理想气体提出了熵的统计定义。他提出，一个系统的熵 $S$ 与对应于给定宏观状态的微观状态数 $W$ 的对数成正比：

$$
(Boltzmann entropy)S=kBln⁡(W)
$$

This equation should look familiar: it’s the equal-probability special case of the Shannon entropy we’ve been using, just with a change of base (from $log2$ to $ln$ ) and a scaling factor $kB$ (Boltzmann’s constant). The connection between Boltzmann’s statistical mechanics and Shannon’s information theory is more than historical coincidence—both quantify uncertainty, whether in physical states or messages.
这个等式看起来很熟悉：它是我们一直在使用的香农熵的等概率特例，只是基数从 $log2$ 变为 $ln$ ，并且缩放因子为 $kB$ （玻尔兹曼常数）。玻尔兹曼统计力学与香农信息论之间的联系并非只是历史巧合——两者都量化了不确定性，无论是物理状态还是信息。

A few years later, Josiah Willard Gibbs generalized Boltzmann’s definition to cases where microstates are not equally likely. His formulation remains the standard definition of entropy in modern physics:
几年后，约西亚·威拉德·吉布斯将玻尔兹曼的定义推广到微观状态出现概率不相等的情况。他的公式至今仍是现代物理学中熵的标准定义：

$$
(Gibbs entropy)S=−kB∑ipiln⁡(pi)
$$

This is formally identical to Shannon entropy, again differing only in logarithm base and physical units. But Gibbs’s generalization was a profound leap: it enabled thermodynamics to describe systems in contact with heat baths, particle reservoirs, and other environments where probability distributions over microstates are non-uniform. This made entropy applicable far beyond ideal gases—covering chemical reactions, phase transitions, and statistical ensembles of all kinds.
这在形式上与香农熵完全相同，同样仅在对数底数和物理单位上有所不同。但吉布斯的推广是一次意义深远的飞跃：它使热力学能够描述与热源、粒子库以及其他微观状态概率分布不均匀的环境接触的系统。这使得熵的适用范围远远超出了理想气体——涵盖了化学反应、相变和各种统计系综。

Now that we have a formal understanding of entropy with some historical background, let’s try to understand how entropy relates to our universe and in particular to time.
现在我们对熵有了正式的了解，并了解了一些历史背景，让我们尝试了解熵与我们的宇宙，特别是与时间的关系。

Time 时间

How does time play a role in all of this?
时间在这一切中扮演着什么角色？

When you drop a spot of milk into tea, it always spreads and mixes, and yet you never see the reverse where the milk molecules spontaneously separate and return to a neat droplet. When ocean waves crash into the shore, the spray and foam disperse, but we never see that chaos reassemble into a coherent wave that launches back into the sea. These examples are drawn from this lecture on entropy by Richard Feynman. If you were shown a reversed video of these events, you’d immediately recognize something was off. This sounds obvious at first, but it actually isn’t clear this should be true if we just look at the laws of physics. All the known laws of physics are time-reversible (the wave function collapse seems to be debatable), which just means that they do look the same playing forward and backward. The individual molecules all obey these time-reversible laws, and yet the cup of tea gets murky from the milk always mixing in.
当你将一滴牛奶滴入茶中时，它总是会扩散混合，但你永远看不到牛奶分子自发分离并恢复成均匀液滴的反向过程。当海浪拍打海岸时，浪花和泡沫会散开，但我们永远看不到这些混沌重新聚集成一股连贯的波浪，最终回归大海。这些例子出自理查德·费曼关于熵的讲座。如果你观看这些事件的反转视频，你会立刻意识到有些不对劲。这听起来显而易见，但实际上，如果我们只关注物理定律，就很难确定这是否正确。所有已知的物理定律都是时间可逆的（波函数坍缩似乎存在争议），这意味着它们正向和反向播放时看起来是一样的。每个分子都遵循这些时间可逆的定律，然而，由于牛奶的不断混入，茶杯却变得浑浊。

This highlights a fundamental paradox: the microscopic laws of physics are time-reversible, but the macroscopic world is not. If you took a video of two atoms bouncing off each other and played it backward, it would still look physically valid, but play a video of milk mixing into coffee backward, and it looks obviously wrong.
这凸显了一个根本的悖论：微观物理定律是时间可逆的，但宏观世界却不是。如果你拍摄一段两个原子相互碰撞的视频，然后倒放它，它看起来仍然符合物理规律；但如果倒放一段牛奶和咖啡混合的视频，它看起来就明显不对了。

We want to build a simplified model of time in a way that reflects both the time-reversibility of microscopic laws and the time-asymmetry of macroscopic behavior. Let’s imagine the complete state of a physical system, like a box of particles, as a single point in a high-dimensional space called phase space, with each dimension corresponding to a particle’s position and momentum. As time evolves, the system traces out a continuous trajectory through this space.
我们想要建立一个简化的时间模型，既能反映微观定律的时间可逆性，又能反映宏观行为的时间非对称性。我们设想一个物理系统的完整状态，比如一盒粒子，是高维空间（称为相空间）中的一个点，每个维度对应一个粒子的位置和动量。随着时间的推移，系统会在这个空间中描绘出一条连续的轨迹。

The laws of physics, such as Newton’s equations, Hamiltonian mechanics, or Schrödinger’s equation, all govern this trajectory. They are deterministic and time-reversible. That means if you reverse the momenta of all particles at any moment, the system will retrace its path backward through state space.
物理定律，例如牛顿方程、哈密顿力学或薛定谔方程，都控制着这一轨迹。它们是确定性的，并且时间可逆。这意味着，如果你在任何时刻反转所有粒子的动量，系统就会在状态空间中回溯其路径。

So far everything is time-reversible, including this view of how the universe moves through time. But we will see that even in this toy model, time appears to have a preferred direction, an arrow of time.
到目前为止，一切都是时间可逆的，包括这种关于宇宙如何随时间运动的观点。但我们会看到，即使在这个简单的模型中，时间似乎也有一个优先的方向，即时间之箭。

The key lies in coarse-graining. When we observe the world, we don’t see every microscopic detail. Instead, we measure macrostates: aggregate properties like temperature, pressure, position of an object, or color distribution in a cup of tea. Each macrostate corresponds to many underlying microstates — and not all macrostates are created equal.
关键在于粗粒度。当我们观察世界时，我们无法看到每一个微观细节。相反，我们测量的是宏观状态：诸如温度、压力、物体位置或一杯茶的颜色分布等聚合属性。每个宏观状态都对应着许多底层微观状态——而且并非所有宏观状态都生来平等。

For example, consider a box sliding across the floor and coming to rest due to friction. At the microscopic level, the system is just particles exchanging momentum, and all time-reversible. But we certainly would not call this action time-reversible, we never see a box spontaneously start speeding up from stand-still. But, if we took the moment after the box comes to a rest due to friction, and you reversed the velocities of all the particles (including those in the floor that absorbed the box’s kinetic energy as heat), the box would spontaneously start moving and slide back to its original position. This would obey Newton’s laws, but it’s astronomically unlikely. Why?
例如，假设一个盒子在地板上滑动，并由于摩擦力而停止。在微观层面上，这个系统只是粒子交换动量，并且完全是时间可逆的。但我们当然不会称这种行为为时间可逆的，因为我们从未见过一个盒子自发地从静止状态开始加速。但是，如果我们在盒子因摩擦力而停止后的那一瞬间，反转所有粒子的速度（包括地板上吸收盒子动能转化为热量的粒子），盒子就会自发地开始移动并滑回其原始位置。这符合牛顿定律，但这几乎不可能发生。为什么？

The number of microstates where the energy is spread out as heat (the box is at rest, and the molecules in the floor are jiggling) vastly outnumber the microstates where all that energy is coordinated to move the box. The stand-still macrostate has high entropy while the spontaneous-movement macrostate has low entropy. When the system evolves randomly or deterministically from low entropy, it is overwhelmingly likely to move toward higher entropy simply because there are more such microstates.
能量以热量形式扩散的微观状态（盒子静止，底部的分子在抖动）的数量远远超过所有能量协调移动盒子的微观状态。静止的宏观状态熵值较高，而自发运动的宏观状态熵值较低。当系统从低熵随机或确定性演化时，它极有可能向高熵演化，仅仅是因为存在更多这样的微观状态。

If you had perfect knowledge of all particles in the universe (i.e., you lived at the level of microstates), time wouldn’t seem to have a direction. But from the perspective of a coarse-grained observer, like us, entropy tends to increase. And that’s why a movie of tea mixing looks natural, but the reverse looks fake. At the level of physical laws, both are valid. But one is typical, and one is astronomically rare, all because we coarse-grained.
如果你对宇宙中的所有粒子都了如指掌（也就是说，你生活在微观状态的层面上），时间似乎就没有方向了。但从像我们这样的粗粒度观察者的角度来看，熵往往会增加。这就是为什么调茶的电影看起来很自然，而反过来却看起来很假。从物理定律的层面来看，两者都成立。但其中一种很典型，而另一种则极其罕见，这都是因为我们粗粒度的观察。

To drive the point home, let’s again look at the balls in a box. We’ll define macrostates by dividing the box into a grid of cells and counting how many balls are in each bin.
为了进一步说明这一点，我们再来看看盒子里的球。我们将盒子分成一个个格子，然后计算每个格子里有多少个球，从而定义宏观状态。

Now suppose the balls move via random small jitters (our toy model of microscopic dynamics). Over time, the system will naturally tend to explore the most probable macrostates, as the most probable macrostates have far more microstates for you to wander into. That is, entropy increases over time, not because of any fundamental irreversibility in the laws, but because high-entropy macrostates are far more typical.
现在假设球通过随机的小抖动移动（我们的微观动力学玩具模型）。随着时间的推移，系统自然会倾向于探索最可能的宏观状态，因为最可能的宏观状态包含更多的微观状态供你探索。也就是说，熵会随着时间的推移而增加，这并不是因为定律中存在任何根本的不可逆性，而是因为高熵的宏观状态更为典型。

If we started the simulation with all the balls packed on the left, that’s a very specific (low entropy) macrostate. As they spread out, the number of compatible microstates grows, and so does the entropy.
如果我们一开始模拟时，所有球都挤在左边，那是一个非常特殊的（低熵）宏观状态。随着它们向外扩散，兼容的微观状态的数量会增加，熵也会随之增加。

This leads to a crucial realization: Entropy increases because we started in a low-entropy state. This is often called the Past Hypothesis, the postulate that the universe began in an extremely low-entropy state. Given that, the Second Law of Thermodynamics follows naturally. The arrow of time emerges not from the dynamics themselves, but from the statistical unlikelihood of reversing them after coarse-graining, and the fact that we began in a low-entropy state.
这引出了一个至关重要的认识：熵之所以增加，是因为我们一开始处于低熵状态。这通常被称为“过去假说”，即宇宙起源于极低熵状态。鉴于此，热力学第二定律自然成立。时间之箭并非源于动态本身，而是源于粗粒化之后逆转动态的统计可能性，以及我们一开始就处于低熵状态这一事实。

You could imagine once a system reaches near-maximum entropy that it no longer looks time-irreversible. The entropy of such a system would fluctuate a tiny bit since entropy is an inherently statistical measure, but they would be small enough not to notice. For example, while it is clear when a video of milk being poured into tea (a low-entropy macrostate) is playing forward as opposed to backward, you couldn’t tell if a video of already-combined milk and tea (a high-entropy macrostate) being swirled around is playing forward or backward.
你可以想象，当一个系统接近最大熵时，它看起来就不再像time-irreversible.了。由于熵本质上是一种统计量，这种系统的熵值会有微小的波动，但这些波动非常小，几乎无法察觉。例如，当一段牛奶倒入茶中的视频（低熵宏观状态）正向播放还是反向播放时，你很容易分辨出来；但当一段牛奶和茶已经混合在一起（高熵宏观状态）并被搅动时，你却无法分辨它是正向播放还是反向播放。

While there are tiny fluctuations in entropy, they are not enough to explain the large-scale phenomena that sometimes seem to violate this principle that we just established of entropy always increasing with time.
虽然熵存在微小的波动，但它们不足以解释有时似乎违反我们刚刚建立的熵总是随时间增加的这一原理的大规模现象。

Violations of the Second Law?违反第二定律？

Some real-world examples seem to contradict the claim that entropy always increases. For instance, oil and water separate after mixing, dust clumps into stars and planets, and we build machines like filters and refrigerators that separate mixed substances. Aren’t these violations?
现实世界中一些例子似乎与熵总是增加的说法相矛盾。例如，油和水混合后会分离，尘埃会聚集成恒星和行星，我们制造了过滤器和冰箱等分离混合物的机器。这些难道不是违反熵增定律的吗？

The issue is we have only been considering the position of molecules, while physical systems have many different properties which allow for more microstates. For example, if we start considering both the position and velocity of balls in a box, then the entropy can be high even while all the balls are on the left side of the box since every ball could have a different velocity. If the balls were all on the left and the velocities were all the same, then the entropy would be low. Once we consider velocity as well, entropy can increase both from more spread out positions and more spread out velocities.
问题在于我们只考虑了分子的位置，而物理系统具有许多不同的属性，允许存在更多微观状态。例如，如果我们同时考虑盒子中球的位置和速度，那么即使所有球都在盒子的左侧，熵也可能很高，因为每个球的速度可能不同。如果所有球都在左侧，并且速度相同，那么熵就会很低。一旦我们也考虑速度，熵就会随着位置和速度的分散而增加。

When water and oil separate, the positions of the molecules separate into top and bottom, which appears to decrease positional entropy. However, this separation actually increases the total entropy of the system. Why? Water molecules strongly prefer to form hydrogen bonds with other water molecules rather than interact with oil molecules. When water molecules are forced to be near oil molecules in a mixed state, they must adopt more constrained arrangements to minimize unfavorable interactions, reducing the number of available microstates. When water and oil separate, water molecules can interact freely with other water molecules in more configurations, and oil molecules can interact with other oil molecules more freely. This increase in available microstates for molecular arrangements and interactions more than compensates for the decrease in positional mixing entropy. So, while the entropy decreases if we only consider the general positions of molecules (mixed versus separated), the total entropy increases when we account for all the molecular interactions, orientations, and local arrangements. This demonstrates why we need to consider all properties of a system when calculating its entropy.
当水和油分离时，分子的位置会分离成上下两部分，这似乎会降低位置熵。然而，这种分离实际上会增加系统的总熵。原因在于，水分子强烈倾向于与其他水分子形成氢键，而不是与油分子相互作用。当水分子被迫以混合状态靠近油分子时，它们必须采取更受约束的排列方式，以最大限度地减少不利的相互作用，从而减少可用的微观状态数量。当水和油分离时，水分子可以以更多构型与其他水分子自由相互作用，油分子也可以更自由地与其他油分子相互作用。分子排列和相互作用可用微观状态的增加，足以弥补位置混合熵的降低。因此，如果我们只考虑分子的一般位置（混合与分离），熵会降低；而当我们考虑所有分子相互作用、取向和局部排列时，总熵会增加。这说明了为什么我们在计算系统熵时需要考虑其所有属性。

When stars or planets form together from dust particles floating around in space and clump together from gravity, it would seem that even when we consider position and velocity of the particles that the entropy might be decreasing. Even though the particles speed up to clump together, they slow down after they collide, seemingly decreasing entropy. This is because we are again failing to consider the entire system. When particles collide with each other, their speed decreases a bit by turning that kinetic energy into radiation, causing photons to get sent out into space. If we considered a system where radiation isn’t allowed, then the kinetic energy would just get transferred from one particle to another through changes in velocity, and the entropy of the system would still be increasing because of the faster velocities. Once we start considering the entropy of the position, velocity, and all particles in a system, we can consider all the microstates that are equally likely and calculate the correct entropy.
当恒星或行星由漂浮在太空中的尘埃粒子共同形成，并在引力作用下聚集在一起时，即使我们考虑粒子的位置和速度，熵似乎也会减少。即使粒子加速聚集在一起，它们在碰撞后也会减速，熵似乎会降低。这是因为我们再次没有考虑整个系统。当粒子相互碰撞时，它们的速度会略微降低，因为它们将动能转化为辐射，导致光子被发射到太空中。如果我们考虑一个不允许辐射的系统，那么动能只会通过速度变化从一个粒子转移到另一个粒子，而系统的熵仍然会因为速度更快而增加。一旦我们开始考虑位置、速度和系统中所有粒子的熵，我们就可以考虑所有同样可能的微观状态，并计算出正确的熵。

Similarly, once we consider the entire system around a refrigerator, the decrease in entropy disappears. The entropy from the power generated to run the refrigerator and the heat moved from the inside to the outside of the refrigerator will offset the decrease in entropy caused by cooling the inside of the refrigerator. Local decreases in entropy can be generated, as long as the entropy of the entire system is still increasing.
类似地，一旦我们考虑冰箱周围的整个系统，熵的减少就会消失。冰箱运行产生的电能以及从冰箱内部转移到外部的热量所产生的熵将抵消冰箱内部冷却引起的熵的减少。只要整个系统的熵仍在增加，局部熵的减少就可能发生。

Ensure that the entire system is being considered when analyzing the entropy of a system, with the position, velocity, other interactions of particles, that all particles are included, and that the entire system is actually being analyzed.
确保在分析系统熵时考虑整个系统，包括位置、速度、粒子的其他相互作用，包括所有粒子，并且实际上正在分析整个系统。

Disorder 紊乱

Entropy is sometimes described as “disorder,” but this analogy is imprecise and often misleading. In statistical mechanics, entropy has a rigorous definition: it quantifies the number of microstates compatible with a given macrostate. That is, entropy measures our uncertainty about the exact microscopic configuration of a system given some coarse-grained, macroscopic description.
熵有时被描述为“无序”，但这种类比并不精确，而且常常具有误导性。在统计力学中，熵有一个严格的定义：它量化了与给定宏观状态相容的微观状态的数量。也就是说，熵衡量了在给定一些粗粒度的宏观描述的情况下，我们对系统精确微观结构的不确定性。

So where does the idea of “disorder” come from?
那么“无序”的概念从何而来？

Empirically, macrostates we label as “disordered” often correspond to a vastly larger number of microstates than those we consider “ordered”. For example, in a child’s room, there are many more configurations where toys are scattered randomly than ones where everything is neatly shelved. Since the scattered room corresponds to more microstates, it has higher entropy.
从经验上看，我们称之为“无序”的宏观状态通常对应的微观状态数量远多于我们认为“有序”的微观状态。例如，在儿童房里，玩具随意散落的配置比所有物品都整齐摆放的配置要多得多。由于散落的房间对应的微观状态更多，因此其熵更高。

But this connection between entropy and disorder is not fundamental. The problem is that “disorder” is subjective—it depends on human perception, context, and labeling. For instance, in our earlier example of 1000 balls bouncing around a box, a perfectly uniform grid of balls would have high entropy due to the huge number of possible microstates realizing it. And yet to a human observer, such a grid might appear highly “ordered.”
但熵和无序之间的这种联系并非根本性的。问题在于，“无序”是主观的——它取决于人类的感知、环境和标签。例如，在我们之前 1000 个球在一个盒子里弹跳的例子中，一个完全均匀的球网格会具有很高的熵，因为实现它的可能微观状态数量巨大。然而，对于人类观察者来说，这样的网格可能看起来高度“有序”。

The key point is: entropy is objective and well-defined given a macrostate and a set of microstates, while “disorder” is a human-centric heuristic concept that sometimes, but not always, tracks entropy. Relying on “disorder” to explain entropy risks confusion, especially in systems where visual symmetry or regularity masks the underlying statistical structure.
关键在于：熵是客观的，在给定一个宏观状态和一组微观状态的情况下具有明确的定义；而“无序”是一个以人为中心的启发式概念，有时（但并非总是）追踪熵。依赖“无序”来解释熵可能会造成混淆，尤其是在视觉对称性或规律性掩盖了底层统计结构的系统中。

Conclusion 结论

So here are some thoughts in regard to some common statements made about entropy:
以下是针对熵的一些常见表述的一些想法：

Entropy is a measure of disorder.
熵是无序性的量度。
- “disorder” is a subjective term for states of a system that humans don’t find useful/nice, and usually has much higher entropy than the “ordered” macrostate that humans create. Because of this, when entropy increases, it is more likely that we end up in disordered state, although not guaranteed.
  “无序”是一个主观术语，用来描述人类认为无用或不理想的系统状态，这种状态通常比人类创造的“有序”宏观状态具有更高的熵。因此，当熵增加时，我们更有可能最终进入无序状态，尽管这并非必然。
Entropy always increases in a closed system.
在封闭系统中，熵总是增加。
- This is a statistical statement that for all practical purposes is true, but is not guaranteed and can fail when you look at very small isolated systems or measure down to the smallest details of a system. It also assumes you started in a low-entropy state, giving your system space to increase in entropy. This has the neat implication that since our universe has been observed to be increasing in entropy, it must have begun in a low-entropy state.
  这是一个统计学上的陈述，从实际角度来看，它是正确的，但并不能保证正确，当你观察非常小的孤立系统或测量系统最细微的细节时，它可能会失效。它还假设系统始于低熵状态，这为系统的熵提供了增加的空间。这巧妙地暗示了：既然我们的宇宙被观测到熵在增加，那么它必然始于低熵状态。
Heat flows from hot to cold because of entropy.
由于熵的存在，热量从热处流向冷处。
- Heat flows from hot to cold because the number of ways in which the system can be non-uniform in temperature is much lower than the number of ways it can be uniform in temperature, and so as the system “randomly” moves to new states, it will statistically end up in states that are more uniform.
  热量从热处流向冷处，因为系统温度不均匀的方式数量远少于温度均匀的方式数量，因此当系统“随机”移动到新状态时，从统计上讲，它最终会处于更均匀的状态。
Entropy is the only time-irreversible law of physics.
熵是唯一一条不受时间影响的物理定律。
- All the fundamental laws of physics are time-reversible, but by coarse-graining and starting from a lower-entropy state, a system will statistically move to a higher-entropy state. This means if a system is already in a near-maximum entropy state (either because of its configuration or because of the choice for coarse-graining) or we don’t coarse-grain, then entropy will not look time-irreversible.
  所有物理基本定律都是时间可逆的，但通过粗粒化并从低熵状态开始，系统统计上会趋向于高熵状态。这意味着，如果一个系统已经处于接近最大熵状态（无论是由于其构型还是由于粗粒化的选择），或者我们不进行粗粒化，那么熵看起来就不会是time-irreversible.

And here is some further reading, all of which I found supremely helpful in learning about entropy.
这里还有一些进一步的阅读材料，我发现它们对于了解熵非常有帮助。