如何在Python熊猫中修改重复的行

Question

Let's say I have a DataFrame (that I sorted by some priority criterion) with a " name " column. 假设我有一个带有“ name ”列的DataFrame（按某种优先级标准排序）。 Few names are duplicated, and I want to append a simple indicator to the duplicates. 很少有重复的名称，我想在重复的名称后面附加一个简单的指示符。

Eg, 例如，

'jones a'
... 
'jones a'    # this should become 'jones a2'

To get the subset of duplicates, I could do 要获得重复的子集，我可以

df.loc[df.duplicated(subset=['name'], take_last=True), 'name']

However, I think the apply function does not allow for inplace modification, right? 但是，我认为apply函数不允许inplace修改，对吗？ So what I basically ended up doing is: 所以我最终要做的是：

df.loc[df.duplicated(subset=['name'], take_last=True), 'name'] = \
df.loc[df.duplicated(subset=['name'], take_last=True), 'name'].apply(lambda x: x+'2')

But my feeling is that there might be a better way. 但是我的感觉是可能会有更好的方法。 Any ideas or tips? 有什么想法或提示吗？ I would really appreciate your feedback! 非常感谢您的反馈！

Answer 1

Here is one way: 这是一种方法：

# sample data
d = pandas.DataFrame(
    {'Name': ['bob', 'bob', 'bob', 'bill', 'fred', 'fred', 'joe', 'larry'],
     'ShoeShize': [8, 9, 10, 12, 14, 11, 10, 12]
    }
)

>>> d.groupby('Name').Name.apply(lambda n: n + (np.arange(len(n))+1).astype(str))
0      bob1
1      bob2
2      bob3
3     bill1
4     fred1
5     fred2
6      joe1
7    larry1

This appends an indicator to all. 这将为所有指标附加指标。 If you want to append the indicator to only those after the first, you can do it with a little special casing: 如果您只想将指标追加到第一个指标之后，可以使用一些特殊的大小写：

>>> d.groupby('Name').Name.apply(lambda n: n + np.concatenate(([''], (np.arange(len(n))+1).astype(str)[1:])))
0      bob
1     bob2
2     bob3
3     bill
4     fred
5    fred2
6      joe
7    larry
dtype: object

If you want to use this to replace the original names just do d.Name = ... where ... is the expression shown above. 如果要使用它替换原始名称，只需执行d.Name = ... ，其中...是上面显示的表达式。

You should think about why you're doing this. 您应该考虑为什么要这样做。 It is usually better to have this sort of information in a separate column than smashed into a string. 通常，最好将此类信息放在单独的列中，而不是粉碎成字符串。

如何在Python熊猫中修改重复的行

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-01-06 20:52:58

如何在Python熊猫中修改重复的行

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-01-06 20:52:58

解决方案1
1 已采纳 2015-01-06 20:52:58