添加列数据框

Question

I would to add to the CellID column a number in the way to classify them.我会在 CellID 列中添加一个数字来对它们进行分类。 The dataframe is this: dataframe 是这样的：

umap
                           CellID  wnnUMAP_1  wnnUMAP_2
0      KO_d0_r1:AAACAGCCACCTGCTCx  -8.127543   1.593849
1      KO_d0_r2:AAACAGCCACGTAATTx  -7.246094  -4.566527
2      HT_d0_r1:AAACAGCCATAATGAGx   7.617473   2.449949
3      HT_d0_r2:AAACATGCACCTAATGx  -7.944949   6.633856

And my resoult would be this one我的结果就是这个

 umap
                               CellID    wnnUMAP_1   wnnUMAP_2
    0      KO_d0_r1:AAACAGCCACCTGCTCx-0  -8.127543   1.593849
    1      KO_d0_r2:AAACAGCCACGTAATTx-1  -7.246094  -4.566527
    2      HT_d0_r1:AAACAGCCATAATGAGx-2   7.617473   2.449949
    3      HT_d0_r2:AAACATGCACCTAATGx-3  -7.944949   6.633856

I would to add the 0 to KO_d0_r1, a -1 to KO_d0_r2, a -2 to HT_do_r1 and a -3 HT_d0_r2.我会将0 to KO_d0_r1, a -1 to KO_d0_r2, a -2 to HT_do_r1 and a -3 HT_d0_r2. This is just an example, I have a lot of strings that have the prefix KO_d0_r1 , ecc., so I would to distinguish them by the suffix.这只是一个例子，我有很多带有前缀KO_d0_r1 ，ecc. 的字符串，所以我会通过后缀来区分它们。 My attempt was:我的尝试是：

umap = umap.rename(columns = {'Unnamed: 0':'CellID'})

But it doesn't work但它不起作用

Answer 1

You can use.cat() to concatenate strings.您可以使用 .cat() 连接字符串。

df["CellID"] = df["CellID"].str.cat([df.index.map(str)], sep="-")

https://pandas.pydata.org/docs/reference/api/pandas.Series.str.cat.html https://pandas.pydata.org/docs/reference/api/pandas.Series.str.cat.html

import pandas as pd

data = [["KO_d0_r1:AAACAGCCACCTGCTCx", -8.127543, 1.593849],
        ["KO_d0_r2:AAACAGCCACGTAATTx", -7.246094, -4.566527],
        ["HT_d0_r1:AAACAGCCATAATGAGx", 7.617473, 2.449949]]

df = pd.DataFrame(data, columns=["CellID", "wnnUMAP_1", "wnnUMAP_2"])
df["CellID"] = df["CellID"].str.cat([df.index.map(str)], sep="-")

df is now: df 现在是：

                         CellID  wnnUMAP_1  wnnUMAP_2
0  KO_d0_r1:AAACAGCCACCTGCTCx-0  -8.127543   1.593849
1  KO_d0_r2:AAACAGCCACGTAATTx-1  -7.246094  -4.566527
2  HT_d0_r1:AAACAGCCATAATGAGx-2   7.617473   2.449949

Answer 2

another approach, and simpler solution that don't require mapping, especially if you have big number of uniques CellID.另一种方法和更简单的不需要映射的解决方案，特别是如果您有大量的唯一 CellID。

if no duplicates in df['CellID'] :如果df['CellID']中没有重复项：

df['CellID'] = df['CellID'] + '-' + (df.index + 1).astype(str)

if df['CellID'] contains duplicates:如果df['CellID']包含重复项：

df
    CellID                      wnnUMAP_1   wnnUMAP_2
0   KO_d0_r1:AAACAGCCACCTGCTCx  -8.127543   1.593849
1   KO_d0_r2:AAACAGCCACGTAATTx  -7.246094   -4.566527
2   HT_d0_r1:AAACAGCCATAATGAGx  7.617473    2.449949
3   HT_d0_r2:AAACATGCACCTAATGx  -7.944949   6.633856
4   HT_d0_r2:AAACATGCACCTAATGx  -6.944949   2.633856
5   HT_d0_r2:AAACATGCACCTAATGx  -5.944949   3.633856

df = df.merge((df['CellID'].drop_duplicates() + '-' + (df['CellID'].drop_duplicates().index + 1).astype(str)).reset_index(name='CellID_classified').eval('CellID= CellID_classified.str.split("-").str[0]').drop('index', axis=1), on='CellID', how='left').drop('CellID', axis=1)

df
    wnnUMAP_1   wnnUMAP_2   CellID_classified
0   -8.127543   1.593849    KO_d0_r1:AAACAGCCACCTGCTCx-1
1   -7.246094   -4.566527   KO_d0_r2:AAACAGCCACGTAATTx-2
2   7.617473    2.449949    HT_d0_r1:AAACAGCCATAATGAGx-3
3   -7.944949   6.633856    HT_d0_r2:AAACATGCACCTAATGx-4
4   -6.944949   2.633856    HT_d0_r2:AAACATGCACCTAATGx-4
5   -5.944949   3.633856    HT_d0_r2:AAACATGCACCTAATGx-4

Answer 3

Create a dictionary containing mapping of the prefixes to the corresponding suffix value of interest, then split CellID on : with n=1 which will basically split 1 times at max, then call Series.str.map passing the dictionary mapping object.创建一个字典，其中包含前缀到感兴趣的相应后缀值的映射，然后将CellID拆分为: ，其中n=1基本上最多拆分 1 次，然后调用Series.str.map传递字典映射 object。 You can finally join with the cellID column.您终于可以加入cellID列。

mapping = {'KO_d0_r1':'0', 'KO_d0_r2':'1', 'HT_d0_r1': '2', 'HT_d0_r2':'3'}

umap['CellID']=umap['CellID']\
               +'-'\
               +umap['CellID'].str.split(':', n=1).str[0].map(mapping)

OUTPUT OUTPUT

                         CellID  wnnUMAP_1  wnnUMAP_2
0  KO_d0_r1:AAACAGCCACCTGCTCx-0  -8.127543   1.593849
1  KO_d0_r2:AAACAGCCACGTAATTx-1  -7.246094  -4.566527
2  HT_d0_r1:AAACAGCCATAATGAGx-2   7.617473   2.449949
3  HT_d0_r2:AAACATGCACCTAATGx-3  -7.944949   6.633856

PS: map returns NaN for values that could not be mapped which may throw a TypeError , for the data, I just assumed that it is always going to exist, else, you may want to handle it. PS： map为无法映射的值返回NaN ，这可能会引发TypeError ，对于数据，我只是假设它总是会存在，否则，您可能想要处理它。

If you are not so concerned about the suffices and just require a unique number to be assigned, you can also use groupby then call ngroup() :如果您不太关心足够的内容并且只需要分配一个唯一的号码，您也可以使用groupby然后调用ngroup() ：

umap['CellID'] = umap['CellID'] \
                 + '-' \
                 + (umap
                    .groupby(umap['CellID'].str.split(':', n=1).str[0], sort=False)
                    .ngroup()
                    .astype('str')
                    )

添加列数据框

问题描述

3 个解决方案

解决方案1
1 2022-09-24 12:57:01

解决方案2
1 2022-09-24 13:32:05

解决方案3
0 2022-09-24 12:55:41

添加列数据框

问题描述

3 个解决方案

解决方案1 1 2022-09-24 12:57:01

解决方案2 1 2022-09-24 13:32:05

解决方案3 0 2022-09-24 12:55:41

解决方案1
1 2022-09-24 12:57:01

解决方案2
1 2022-09-24 13:32:05

解决方案3
0 2022-09-24 12:55:41