I am working with a column containing lists of strings, and would like to compare the last element in each row. If the final elements do not match, I want to create a new variable that would have the first and last elements concatenated like this: element[0].element[-1]
If they do match, I'd like to differentiate between them by appending the next element in the list: element[0].element[-2].element[-1]
I have made this column a list from its original format. Here is a snippet of the original variable from the pandas
dataframe:
apple.banana.pear
apple.starfruit.grape
apple.kiwi.orange.pear
apple.durian.coconut
Name: original, Length: 4, dtype: string
mylist = df['original'].apply(lambda x: x.split('.'))
My current list:
[apple, banana, pear]
[apple, starfruit, grape]
[apple, kiwi, orange, pear]
[apple, durian, coconut]
Desired output:
apple.banana.pear
apple.grape
apple.orange.pear
apple.coconut
I'm not sure if making it into a list is optimal, but figured it would be easier to access each portion as an element. That may not be the case. Here is what I've tried:
l = 0
j = l + 1
for l in mylist:
for j in mylist:
if mylist[l][-1] == mylist[j][-1]:
newvar = mylist[l][0] + '.' + mylist[l][-2] + '.' + mylist[l][-1]
else:
newvar = mylist[l][0] + '.' + mylist[l][-1]
KeyError: "None of [Index(['apple', 'banana', 'pear'], dtype='object')] are in the [index]"
Any suggestions are greatly appreciated.
We can do
s=df.original.str.split('.')
df['new']=np.where(s.str[-1].duplicated(keep=False),
s.str[0]+'.'+s.str[-2]+'.'+s.str[-1],
s.str[0]+'.'+s.str[-1])
df
Out[47]:
original new
0 apple.banana.pear apple.banana.pear
1 apple.starfruit.grape apple.grape
2 apple.kiwi.orange.pear apple.orange.pear
3 apple.durian.coconut apple.coconut
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.