根據 pandas python 中另一列的組從一個 DataFrame 列創建標簽

Question

以下是我的數據框的簡化：我有數千個基因對在不同的細胞類型和 3 種細胞類型中重復（可能有 9 種組合）

基因對	細胞類型	其他數據
基因4_基因5	單元格1_單元格2
基因1_基因2	細胞1_細胞1
基因1_基因2	單元格1_單元格3
基因2_基因3	cell3_cell2
基因4_基因5	cell2_cell2
基因4_基因5	單元格1_單元格2

在我所有相同的基因對中（這里我使用了 groupby），我想檢查是否存在某些 cell_types 組合。 如果它們是，例如一組基因對存在“cell1_cell2”、“cell1-cell3”和“cell1-cell1”，那么我想在一個新列中給這個基因對一個 label，說“cell1 是一個通用發送者” "並且一個基因對可以有多個標簽。 我希望看到將列添加到我的原始 df 到 function 作為元數據。 我看過多個問題和視頻，無法正確編碼。 任何人都可以幫我一把嗎？ 非常感謝

Answer 1

鑒於您想堅持原始數據結構，解決方案可能是使用 df.loc 在 cell_types 列中查找與“基因對”列中的給定值匹配的所有值，將其轉換為列表並檢查如果定義“通用發送者”的預定義單元類型列表中的所有值都出現在該列表中：

import pandas as pd

data = [ { "Gene pairs": "gene4_gene5", "cell_types": "cell1_cell2" }, { "Gene pairs": "gene1_gene2", "cell_types": "cell1_cell1" }, { "Gene pairs": "gene1_gene2", "cell_types": "cell1_cell3" }, { "Gene pairs": "gene2_gene3", "cell_types": "cell3_cell2" }, { "Gene pairs": "gene4_gene5", "cell_types": "cell1_cell1" }, { "Gene pairs": "gene4_gene5", "cell_types": "cell1_cell3" } ]
df=pd.DataFrame(data)
df['new column'] = df['Gene pairs'].apply(lambda x: "universal sender" if all(item in df.loc[df['Gene pairs'] == x]['cell_types'].tolist() for item in ["cell1_cell2", "cell1_cell3", "cell1_cell1"]) else None)

Output：

|    | Gene pairs   | cell_types   | new column       |
|---:|:-------------|:-------------|:-----------------|
|  0 | gene4_gene5  | cell1_cell2  | universal sender |
|  1 | gene1_gene2  | cell1_cell1  |                  |
|  2 | gene1_gene2  | cell1_cell3  |                  |
|  3 | gene2_gene3  | cell3_cell2  |                  |
|  4 | gene4_gene5  | cell1_cell1  | universal sender |
|  5 | gene4_gene5  | cell1_cell3  | universal sender |

或者，您可以將其包裝在 function 中以獲得更好的可讀性，或者如果您想添加其他過濾器：

def lookup(row):
  cells = sorted(df.loc[df['Gene pairs'] == row['Gene pairs']]['cell_types'].tolist())
  if all(item in cells for item in ["cell1_cell2", "cell1_cell3", "cell1_cell1"]):
    return_value = "universal sender" 
  else:
    return_value = None
  return return_value

df['new column'] = df.apply(lambda row: lookup(row), axis=1)

根據 pandas python 中另一列的組從一個 DataFrame 列創建標簽

問題描述

1 個解決方案

解決方案1
0 2021-01-28 19:07:48

根據 pandas python 中另一列的組從一個 DataFrame 列創建標簽

問題描述

1 個解決方案

解決方案1 0 2021-01-28 19:07:48

解決方案1
0 2021-01-28 19:07:48