根據列條件從單個 DataFrame 創建多個 DataFrame

Question

pandas 和 python 的新手，所以提前謝謝你。 我有一張桌子

# Create DataFrame
data = [{'analyte': 'sample1'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2},
        {'analyte': 'money', 'CAS1': 3, 'CAS2': 1, 'Value2': 1.11},
        {'analyte': 'shoe', 'CAS1': 4},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7, 'CAS2': 4, 'Value2': 6.53},
        {'analyte': 'sample2'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2, 'CAS2': 1, 'Value2': 7.88},
        {'analyte': 'money', 'CAS1': 3},
        {'analyte': 'shoe', 'CAS1': 4, 'CAS2': 3, 'Value2': 15.5},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7}]
df = pd.DataFrame(data)

在將 Pandas DataFrame 寫入 MySQL 數據庫表之前，我需要將 df 拆分為單獨的表，然后將每個表寫入 Z9EDB42C572B40518

如何按列拆分 df，有些人認為，如果列名包含字符串“cas1”，則拆分 df

for col in df.columns:
    if "cas1" in col:
       dfCas1 = df.split
       #add uniq index to indetify to which row belongs to
    if "cas2" in col:
       dfCas2 = df.split
       #add uniq index to indetify to which row belongs to
    if {"analyte","id" .etc } in col: # main table
       dfMain = df.split

dfMain.to_sql("Main", dbConnection, if_exists='fail')
dfCas1.to_sql("cas1", dbConnection, if_exists='fail')
dfCas2.to_sql("cas2", dbConnection, if_exists='fail')

預期的

Answer 1

我不是 100% 確定這是否是您的意思，但是：

dfCas1 = df[df.col.str.contains('cas1')]
dfCas2 = df[df.col.str.contains('cas2')]
dfMain = df[~((df.col.str.contains('cas2')) & df.col.str.contains('cas1'))]

~ 符號否定選擇並表示列不包含 cas1 和 cas2 的所有行。 我希望這是有道理的。

Answer 2

我不完全確定你想要實現什么，但我覺得你想做一些像拆分這樣的事情：

+---------+----+------+--------+------+--------+
| Analyte | id | CAS1 | value1 | Cas2 | Value2 |
+---------+----+------+--------+------+--------+
|         |    |      |        |      |        |
+---------+----+------+--------+------+--------+

對此：

+---------+----+  +------+--------+  +------+--------+
| Analyte | id |  | CAS1 | value1 |  | Cas2 | Value2 |
+---------+----+  +------+--------+  +------+--------+
|         |    |  |      |        |  |      |        |
+---------+----+  +------+--------+  +------+--------+

第一個是通過調用例如df.loc[:, ['Analyte', 'id']]獲得的。 對於其他的，調整列名。

現在對於代碼注釋中的uniq 索引， df.loc[:]保留原始表的索引。 您可以使用df.reset_index()將其重置為唯一的 integer 索引。 如果您還想在解析之前在其中一個子表中刪除空行，請查看df.dropna() 。

根據列條件從單個 DataFrame 創建多個 DataFrame

問題描述

2 個解決方案

解決方案1
1 2020-11-25 18:23:59

解決方案2
1 已采納 2020-11-25 18:44:52

根據列條件從單個 DataFrame 創建多個 DataFrame

問題描述

2 個解決方案

解決方案1 1 2020-11-25 18:23:59

解決方案2 1 已采納 2020-11-25 18:44:52

解決方案1
1 2020-11-25 18:23:59

解決方案2
1 已采納 2020-11-25 18:44:52