[英]How can I do this split process in Python?
I'm trying to make a data labeling in a table, and I need to do it in such a way that, in each row, the index is repeated, however, that in each column there is another Enum class.我正在尝试在表中制作数据标签,并且我需要以这样一种方式进行操作,即在每一行中,索引都重复,但是,在每一列中都有另一个 Enum class。
What I've done so far is make this representation with the same enumerator class.到目前为止,我所做的是使用相同的枚举器 class 进行此表示。
A solution using the column separately as a list would also be possible.将列单独用作列表的解决方案也是可能的。 But what would be the best way to resolve this?
但是解决这个问题的最佳方法是什么?
import pandas as pd
from enum import Enum
df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
df
class Tipos(Enum):
B = 1
I = 2
L = 3
for index, row in df.iterrows():
sentencas = row.values
for sentenca in sentencas:
for pos, palavra in enumerate(sentenca.split()):
print(f"{palavra} {Tipos(pos+1).name}")
Results:结果:
first second
0 product and other product and prices
1 product2 and other price2
2 price product3 and price
product B
and I
other L
product B
and I
prices L
product2 B
and I
other L
price2 B
price B
product3 B
and I
price L
Desired Results:期望的结果:
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second
# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
Instead of using Enum
you can use a dict
mapping.您可以使用
dict
映射,而不是使用Enum
。 You can avoid loops if you flatten your dataframe:如果您将 dataframe 展平,则可以避免循环:
out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
+ '_' + out.index.get_level_values(0)
out = out.reset_index(drop=True)
Output: Output:
>>> out
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.