使用文本作为 Pandas 中的列标题和列值将字符串拆分为列

Question

我有一个 df，它有 1 列，每行包含一个字符串。 它看起来像这样：

 df data in 9.14 out 9.66 type 0.0 in 9.67 out 9.69 type 0.0 in 9.70 out 10.66 type 0.0 in 10.67 out 11.34 type 2.0 in 11.35 out 12.11 type 2.0

我想将此列的文本拆分为多列。 我想使用单词 [in, out, type] 作为列标题，并将每个单词后面的值作为行值。 结果将有 3 列标记为输入、输出和类型，如下所示：

 df in out type 9.14 9.66 0.0 9.67 9.69 0.0 9.70 10.66 0.0 10.67 11.34 2.0 11.35 12.11 2.0

谢谢！

Answer 1

如果你事先知道单词是什么，并且也可以保证不会有任何坏数据，这是一个简单的str.extract问题，你可以构建一个健壮的正则表达式来捕获每个组，使用命名组一次性创建 DataFrame。 示例数据的正则表达式包含在方法 #2 中。

但是，为了演示起见，最好假设您可能有错误的数据，并且您可能事先不知道您的列名称是什么。 在这种情况下，你可以使用str.extractall和一些unstack ING。

选项1
extractall + set_index + unstack

generic_regex = r'([a-zA-Z]+)[^0-9]+([0-9\.]+)'

df['data'].str.extractall(generic_regex).set_index(0, append=True)[1].unstack([0, 1])

0         in    out type
match      0      1    2
0       9.14   9.66  0.0
1       9.67   9.69  0.0
2       9.70  10.66  0.0
3      10.67  11.34  2.0
4      11.35  12.11  2.0

选项 2
定义显式正则表达式并使用extract

rgx = r'in\s+(?P<in>[^\s]+)\s+out\s+(?P<out>[^\s]+)\s+type\s+(?P<type>[^\s]+)'

df['data'].str.extract(rgx)

      in    out type
0   9.14   9.66  0.0
1   9.67   9.69  0.0
2   9.70  10.66  0.0
3  10.67  11.34  2.0
4  11.35  12.11  2.0

Answer 2

如果您的数据在name和value之间由空格均匀分隔，如在您的示例中，您可以使用split和str访问器和 stride 来构造所需的输出

df1 = df['data'].str.split()
df_out = pd.DataFrame(df1.str[1::2].tolist(), columns=df1[0][0::2])

Out[1097]:
      in    out type
0   9.14   9.66  0.0
1   9.67   9.69  0.0
2   9.70  10.66  0.0
3  10.67  11.34  2.0
4  11.35  12.11  2.0

使用文本作为 Pandas 中的列标题和列值将字符串拆分为列

问题描述

2 个解决方案

解决方案1
1 2019-08-29 16:51:14

解决方案2
0 2019-08-29 22:37:54

使用文本作为 Pandas 中的列标题和列值将字符串拆分为列

问题描述

2 个解决方案

解决方案1 1 2019-08-29 16:51:14

解决方案2 0 2019-08-29 22:37:54

解决方案1
1 2019-08-29 16:51:14

解决方案2
0 2019-08-29 22:37:54