将列数据框熊猫转换为序列

Question

I have data and convert into dataframe 我有数据并转换为数据框

d = [
  (1,70399,0.988375133622),
  (1,33919,0.981573492596),
  (1,62461,0.981426807114),
  (579,1,0.983018778374),
  (745,1,0.995580488899),
  (834,1,0.980942505189)
]

df = pd.DataFrame(d, columns=['source', 'target', 'weight'])

>>> df
   source  target    weight
0       1   70399  0.988375
1       1   33919  0.981573
2       1   62461  0.981427
3     579       1  0.983019
4     745       1  0.995580
5     834       1  0.980943

I need transform column source into sequence, I have tried using 我需要将列源转换为序列，我尝试使用

df.source = (df.source.diff() != 0).cumsum() - 1

but I just get : 但我得到：

>>> df
   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3       1       1  0.983019
4       2       1  0.995580
5       3       1  0.980943

I need transform value column target based value source, ideal result is : 我需要基于值的转换值列目标，理想的结果是：

>>> df
   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

value target change match value in source, in source , value 1 change into 0, so i need change value 1 in target into 0 too 值target将source中的匹配值更改为source，将source中的value 1更改为0，所以我也需要将target value 1更改为0

How can I do that ? 我怎样才能做到这一点？ Maybe anyone can help me :) 也许任何人都可以帮助我:)

Thanks :) 谢谢：）

Answer 1

Something like this? 像这样吗

df['source_code'] = df.source.astype('category').cat.codes

>>> df
   source  target    weight  source_code
0       1   70399  0.988375            0
1       1   33919  0.981573            0
2       1   62461  0.981427            0
3     579       1  0.983019            1
4     745       1  0.995580            2
5     834       1  0.980943            3

Answer 2

You can use: 您可以使用：

#remember original values
source_old = df.source.copy()

df.source = (df.source.diff() != 0).cumsum() - 1

#series for maping
ser = pd.Series(df.source.values, index=source_old).drop_duplicates()
print (ser)
source
1      0
579    1
745    2
834    3
dtype: int32

#map where values exists
df.target = df.target.mask(df.target.isin(ser), df.target.map(ser)).astype(int)

print (df)
   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

将列数据框熊猫转换为序列

问题描述

2 个解决方案

解决方案1
0 2016-08-18 23:30:21

解决方案2
0 2016-08-19 07:02:02

将列数据框熊猫转换为序列

问题描述

2 个解决方案

解决方案1 0 2016-08-18 23:30:21

解决方案2 0 2016-08-19 07:02:02

解决方案1
0 2016-08-18 23:30:21

解决方案2
0 2016-08-19 07:02:02