"PySpark，读取多行文件 (.sdf)"

Question

What is the most efficient way to read a collection of sdf files?读取 sdf 文件集合的最有效方法是什么？ sdf is a chemical table file, containing both 3D information about molecules but also properties of said molecule. sdf 是一个化学表文件，既包含有关分子的 3D 信息，也包含该分子的属性。 All of this information is stored in a multiline (gzipped) ASCII file.所有这些信息都存储在一个多行（gzipped）ASCII 文件中。 What I am struggling with is defining a custom file reader function that is able to interpret the custom subsection of each molecular entry.我正在努力定义一个自定义文件阅读器功能，该功能能够解释每个分子条目的自定义小节。 At this point I'm doubting if this is even the right approach.在这一点上，我怀疑这是否是正确的方法。

<Molecular-ID>
  -OEChem-10272110393D
 Schrodinger Suite 2021-1.
 32 34  0     0  0  0  0  0  0999 V2000
   31.1383   33.3647   21.1400 C   0  0  0  0  0  0  0  0  0  0  0  0
   30.7977   33.9390   19.9173 C   0  0  0  0  0  0  0  0  0  0  0  0
....
M  END
> <ShapeTanimoto>
0.6969

> <ColorTanimoto>
0.7854

> <TanimotoCombo>
1.7854

$$$$

Answer 1

In my opinion the most 'efficient' way is to use someone else's code, an existing library.在我看来，最“有效”的方法是使用别人的代码，一个现有的库。

The CDK can read SDF files, and collections thereof. CDK 可以读取 SDF 文件及其集合。 https:\/\/cdk.github.io\/<\/a> https:\/\/cdk.github.io\/<\/a>

The Rosetta Wiki gives examples of calling the CDK from Python. Rosetta Wiki 提供了从 Python 调用 CDK 的示例。 https:\/\/ctr.fandom.com\/wiki\/Chemistry_Toolkit_Rosetta_Wiki<\/a> https:\/\/ctr.fandom.com\/wiki\/Chemistry_Toolkit_Rosetta_Wiki<\/a>

"

"PySpark，读取多行文件 (.sdf)"

问题描述

1 个解决方案

解决方案1
0 2022-02-05 12:47:22

"PySpark，读取多行文件 (.sdf)"

问题描述

1 个解决方案

解决方案1 0 2022-02-05 12:47:22

解决方案1
0 2022-02-05 12:47:22