简体   繁体   English

如何在 Python 中执行此拆分过程?

[英]How can I do this split process in Python?

I'm trying to make a data labeling in a table, and I need to do it in such a way that, in each row, the index is repeated, however, that in each column there is another Enum class.我正在尝试在表中制作数据标签,并且我需要以这样一种方式进行操作,即在每一行中,索引都重复,但是,在每一列中都有另一个 Enum class。

What I've done so far is make this representation with the same enumerator class.到目前为止,我所做的是使用相同的枚举器 class 进行此表示。

A solution using the column separately as a list would also be possible.将列单独用作列表的解决方案也是可能的。 But what would be the best way to resolve this?但是解决这个问题的最佳方法是什么?

import pandas as pd
from enum import Enum


df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
df

class Tipos(Enum):
    B = 1
    I = 2
    L = 3

for index, row in df.iterrows():
    sentencas = row.values
    for sentenca in sentencas:
        for pos, palavra in enumerate(sentenca.split()):
            print(f"{palavra} {Tipos(pos+1).name}")

Results:结果:

                first              second
0   product and other  product and prices
1  product2 and other              price2
2               price  product3 and price

product B
and I
other L
product B
and I
prices L
product2 B
and I
other L
price2 B
price B
product3 B
and I
price L

Desired Results:期望的结果:

        Word Ent
0    product B_first
1        and I_first
2      other L_first
3    product B_second
4        and I_second
5     prices L_second
6   product2 B_first
7        and I_first
8      other L_first
9     price2 B_second
10     price B_first
11  product3 B_second
12       and I_second
13     price L_second

# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...

Instead of using Enum you can use a dict mapping.您可以使用dict映射,而不是使用Enum You can avoid loops if you flatten your dataframe:如果您将 dataframe 展平,则可以避免循环:

out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
                 + '_' + out.index.get_level_values(0)
out = out.reset_index(drop=True)

Output: Output:

>>> out
        Word       Ent
0    product   B_first
1        and   I_first
2      other   L_first
3    product  B_second
4        and  I_second
5     prices  L_second
6   product2   B_first
7        and   I_first
8      other   L_first
9     price2  B_second
10     price   B_first
11  product3  B_second
12       and  I_second
13     price  L_second

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM