[英]Split string value in Pandas & add to a new column the split values one after the other
I have a Pandas dataframe like this:我有一个 Pandas dataframe 像这样:
id A B
0 1 toto+tata
1 1 toto+tata
2 2 titi+tutu
3 2 titi+tutu
4 3 toto+tata+titi
5 3 toto+tata+titi
6 3 toto+tata+titi
Thanks to the split
function, I can split these string values:感谢
split
function,我可以拆分这些字符串值:
for i in range(len(df)):
var = output.iloc[i, 1].split("+")
print(var)
['toto', 'tata']
['toto', 'tata']
['titi', 'tutu']
['titi', 'tutu']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
Now, I want to put these values in a new column, one value after the other:现在,我想将这些值放在一个新列中,一个接一个:
id A B C
0 1 toto+tata toto
1 1 toto+tata tata
2 2 titi+tutu titi
3 2 titi+tutu tutu
4 3 toto+tata+titi toto
5 3 toto+tata+titi tata
6 3 toto+tata+titi titi
I tried to do this:我试图这样做:
for i in range(len(df)):
var = df.iloc[i, 1].split("+")
print(var)
for y in range(len(var)):
df.at[i, 'C'] = var[y]
But it always returns the last value of the split:但它总是返回拆分的最后一个值:
id A B C
0 1 toto+tata tata
1 1 toto+tata tata
2 2 titi+tutu tutu
3 2 titi+tutu tutu
4 3 toto+tata+titi titi
5 3 toto+tata+titi titi
6 3 toto+tata+titi titi
I'm missing this little detail for my algo to work but I can't find it.我错过了我的算法工作的这个小细节,但我找不到它。
Assuming you always have as many rows in a group that there are '+' separated items, that the groups are consecutive, and that the strings are identical per group.假设您在一个组中总是有尽可能多的行,即有“+”分隔的项目,这些组是连续的,并且每个组的字符串是相同的。
A simple way is to remove the duplicates, str.split
, and explode
:一个简单的方法是删除重复的
str.split
和explode
:
df['C'] = df.groupby('A')['B'].first().str.split('+').explode().values
output: output:
id A B C
0 0 1 toto+tata toto
1 1 1 toto+tata tata
2 2 2 titi+tutu titi
3 3 2 titi+tutu tutu
4 4 3 toto+tata+titi toto
5 5 3 toto+tata+titi tata
6 6 3 toto+tata+titi titi
If the rows are not grouped per consecutive "group", apply the same logic per group with groupby
+ transform
:如果行没有按连续的“组”分组,则使用
groupby
+ transform
对每个组应用相同的逻辑:
Example:例子:
# shuffle the rows to generate an example
df2 = df.sample(frac=1)
# extract the chunks
df2['C'] = df2.groupby('A')['B'].transform(lambda x: x.head(1).str.split('+').explode().values)
output: output:
id A B C
4 4 3 toto+tata+titi toto
1 1 1 toto+tata toto
0 0 1 toto+tata tata
3 3 2 titi+tutu titi
6 6 3 toto+tata+titi tata
5 5 3 toto+tata+titi titi
2 2 2 titi+tutu tutu
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.