在 Pandas 中拆分字符串值并将拆分值一个接一个地添加到新列中

Question

I have a Pandas dataframe like this:我有一个 Pandas dataframe 像这样：

id  A   B
0   1   toto+tata
1   1   toto+tata
2   2   titi+tutu
3   2   titi+tutu
4   3   toto+tata+titi
5   3   toto+tata+titi
6   3   toto+tata+titi

Thanks to the split function, I can split these string values:感谢split function，我可以拆分这些字符串值：

for i in range(len(df)):
    var = output.iloc[i, 1].split("+")
    print(var)

['toto', 'tata']
['toto', 'tata']
['titi', 'tutu']
['titi', 'tutu']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']

Now, I want to put these values in a new column, one value after the other:现在，我想将这些值放在一个新列中，一个接一个：

id  A   B              C
0   1   toto+tata      toto
1   1   toto+tata      tata
2   2   titi+tutu      titi
3   2   titi+tutu      tutu
4   3   toto+tata+titi toto
5   3   toto+tata+titi tata
6   3   toto+tata+titi titi

I tried to do this:我试图这样做：

for i in range(len(df)):
    var = df.iloc[i, 1].split("+")
    print(var)
    for y in range(len(var)):
        df.at[i, 'C'] = var[y]

But it always returns the last value of the split:但它总是返回拆分的最后一个值：

id  A   B              C
0   1   toto+tata      tata
1   1   toto+tata      tata
2   2   titi+tutu      tutu
3   2   titi+tutu      tutu
4   3   toto+tata+titi titi
5   3   toto+tata+titi titi
6   3   toto+tata+titi titi

I'm missing this little detail for my algo to work but I can't find it.我错过了我的算法工作的这个小细节，但我找不到它。

Answer 1

Assuming you always have as many rows in a group that there are '+' separated items, that the groups are consecutive, and that the strings are identical per group.假设您在一个组中总是有尽可能多的行，即有“+”分隔的项目，这些组是连续的，并且每个组的字符串是相同的。

A simple way is to remove the duplicates, str.split , and explode :一个简单的方法是删除重复的str.split和explode ：

df['C'] = df.groupby('A')['B'].first().str.split('+').explode().values

output: output：

   id  A               B     C
0   0  1       toto+tata  toto
1   1  1       toto+tata  tata
2   2  2       titi+tutu  titi
3   3  2       titi+tutu  tutu
4   4  3  toto+tata+titi  toto
5   5  3  toto+tata+titi  tata
6   6  3  toto+tata+titi  titi

If the rows are not grouped per consecutive "group", apply the same logic per group with groupby + transform :如果行没有按连续的“组”分组，则使用groupby + transform对每个组应用相同的逻辑：

Example:例子：

# shuffle the rows to generate an example
df2 = df.sample(frac=1)

# extract the chunks
df2['C'] = df2.groupby('A')['B'].transform(lambda x: x.head(1).str.split('+').explode().values)

output: output：

   id  A               B     C
4   4  3  toto+tata+titi  toto
1   1  1       toto+tata  toto
0   0  1       toto+tata  tata
3   3  2       titi+tutu  titi
6   6  3  toto+tata+titi  tata
5   5  3  toto+tata+titi  titi
2   2  2       titi+tutu  tutu

在 Pandas 中拆分字符串值并将拆分值一个接一个地添加到新列中

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-01-04 12:04:04

在 Pandas 中拆分字符串值并将拆分值一个接一个地添加到新列中

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-01-04 12:04:04

解决方案1
0 已采纳 2022-01-04 12:04:04