简体   繁体   English

在 Pandas 中拆分字符串值并将拆分值一个接一个地添加到新列中

[英]Split string value in Pandas & add to a new column the split values one after the other

I have a Pandas dataframe like this:我有一个 Pandas dataframe 像这样:

id  A   B
0   1   toto+tata
1   1   toto+tata
2   2   titi+tutu
3   2   titi+tutu
4   3   toto+tata+titi
5   3   toto+tata+titi
6   3   toto+tata+titi

Thanks to the split function, I can split these string values:感谢split function,我可以拆分这些字符串值:

for i in range(len(df)):
    var = output.iloc[i, 1].split("+")
    print(var)
['toto', 'tata']
['toto', 'tata']
['titi', 'tutu']
['titi', 'tutu']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']

Now, I want to put these values in a new column, one value after the other:现在,我想将这些值放在一个新列中,一个接一个:

id  A   B              C
0   1   toto+tata      toto
1   1   toto+tata      tata
2   2   titi+tutu      titi
3   2   titi+tutu      tutu
4   3   toto+tata+titi toto
5   3   toto+tata+titi tata
6   3   toto+tata+titi titi

I tried to do this:我试图这样做:

for i in range(len(df)):
    var = df.iloc[i, 1].split("+")
    print(var)
    for y in range(len(var)):
        df.at[i, 'C'] = var[y]

But it always returns the last value of the split:但它总是返回拆分的最后一个值:

id  A   B              C
0   1   toto+tata      tata
1   1   toto+tata      tata
2   2   titi+tutu      tutu
3   2   titi+tutu      tutu
4   3   toto+tata+titi titi
5   3   toto+tata+titi titi
6   3   toto+tata+titi titi

I'm missing this little detail for my algo to work but I can't find it.我错过了我的算法工作的这个小细节,但我找不到它。

Assuming you always have as many rows in a group that there are '+' separated items, that the groups are consecutive, and that the strings are identical per group.假设您在一个组中总是有尽可能多的行,即有“+”分隔的项目,这些组是连续的,并且每个组的字符串是相同的。

A simple way is to remove the duplicates, str.split , and explode :一个简单的方法是删除重复的str.splitexplode

df['C'] = df.groupby('A')['B'].first().str.split('+').explode().values

output: output:

   id  A               B     C
0   0  1       toto+tata  toto
1   1  1       toto+tata  tata
2   2  2       titi+tutu  titi
3   3  2       titi+tutu  tutu
4   4  3  toto+tata+titi  toto
5   5  3  toto+tata+titi  tata
6   6  3  toto+tata+titi  titi

If the rows are not grouped per consecutive "group", apply the same logic per group with groupby + transform :如果行没有按连续的“组”分组,则使用groupby + transform对每个组应用相同的逻辑:

Example:例子:

# shuffle the rows to generate an example
df2 = df.sample(frac=1)

# extract the chunks
df2['C'] = df2.groupby('A')['B'].transform(lambda x: x.head(1).str.split('+').explode().values)

output: output:

   id  A               B     C
4   4  3  toto+tata+titi  toto
1   1  1       toto+tata  toto
0   0  1       toto+tata  tata
3   3  2       titi+tutu  titi
6   6  3  toto+tata+titi  tata
5   5  3  toto+tata+titi  titi
2   2  2       titi+tutu  tutu

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM