如何將pandas數據幀中的字符串值替換為整數？

Question

我有一個包含幾個字符串值的Pandas DataFrame。 我想用整數值替換它們以計算相似性。 例如：

stores[['CNPJ_Store_Code','region','total_facings']].head()
Out[24]: 
    CNPJ_Store_Code      region  total_facings
1    93209765046613   Geo RS/SC       1.471690
16   93209765046290   Geo RS/SC       1.385636
19   93209765044084  Geo PR/SPI       0.217054
21   93209765044831   Geo RS/SC       0.804633
23   93209765045218  Geo PR/SPI       0.708165

我想替換region =='Geo RS / SC'==> 1，region =='Geo PR / SPI'==> 2等

澄清：我想先自動更換，而不先創建字典，因為我事先並不知道我的地區會是什么。 有任何想法嗎？ 我試圖使用DictVectorizer，但沒有成功。

我確信有一種方法可以通過智能方式實現，但我找不到它。

有人熟悉解決方案嗎？

Answer 1

您可以使用.apply()函數和字典將所有已知字符串值映射到其對應的整數值：

region_dictionary = {'Geo RS/SC': 1, 'Geo PR/SPI' : 2, .... }
stores['region'] = stores['region'].apply(lambda x: region_dictionary[x])

Answer 2

它看起來像你真的想要熊貓類別

http://pandas-docs.github.io/pandas-docs-travis/categorical.html

我認為你只需要將文本列的dtype更改為“category”即可。

stores['region'] = stores["region"].astype('category')

Answer 3

你可以做：

df = pd.read_csv(filename, index_col = 0)  # Assuming it's a csv file.

def region_to_numeric(a):
    if a == 'Geo RS/SC':
        return 1
    if a == 'Geo PR/SPI':
        return 2


df['region_num'] = df['region'].apply(region_to_numeric)

如何將pandas數據幀中的字符串值替換為整數？

問題描述

3 個解決方案

解決方案1
4 2015-08-06 07:07:05

解決方案2
4 2015-08-06 07:45:52

解決方案3
0 2015-08-06 19:51:59

如何將pandas數據幀中的字符串值替換為整數？

問題描述

3 個解決方案

解決方案1 4 2015-08-06 07:07:05

解決方案2 4 2015-08-06 07:45:52

解決方案3 0 2015-08-06 19:51:59

解決方案1
4 2015-08-06 07:07:05

解決方案2
4 2015-08-06 07:45:52

解決方案3
0 2015-08-06 19:51:59