繁体   English   中英

Pandas Python 复制基于行的列值

[英]Pandas Python Copy row based column value

我正在处理来自 csv 文件的数据。 我使用 pd.read.csv 阅读了 dataframe。 如果条目在“Mobile_phone”列中有值,我想复制该行并将“Mobile_phone”值放在“Work_phone”列中。

这是我开始的数据 -

    Full name         Work_phone   Mobile_phone  Company
1   Amanda Brown      1234567896   77895641225   A company
2   Bert Sutherland   1234567897                 B company
3   Charlie Chaplin   1234567898                 C company
4   Derek Simpson     1234567899   77895641228   D company

这是我要返回的数据。 因此消除了对“Mobile_phone”数据的需求,这样我就可以与另一个数据集进行匹配 -

    Full name         Work_phone   Mobile_phone  Company
1   Amanda Brown      1234567896                 A company
2   Amanda Brown      77895641225                A company
3   Bert Sutherland   1234567897                 B company
4   Charlie Chaplin   1234567898                 C company
5   Derek Simpson     1234567899                 D company
6   Derek Simpson     77895641228                D company

我们可以使用set_index + stack进行从宽格式到长格式的重塑。 然后通过droplevel清理旧的列标题, reset_index以恢复 RangeIndex 并再次制作 DataFrame,然后重新排序列:

new_df = (
    df.set_index(['Full name', 'Company'])  # Columns to save
        .stack()  # go to long format
        .droplevel(-1)  # remove old column headers
        .reset_index(name='Work_phone')  # Restore Index and name new column
    [['Full name', 'Work_phone', 'Company']]  # re-order columns
)

new_df

         Full name   Work_phone    Company
0     Amanda Brown   1234567896  A company
1     Amanda Brown  77895641225  A company
2  Bert Sutherland   1234567897  B company
3  Charlie Chaplin   1234567898  C company
4    Derek Simpson   1234567899  D company
5    Derek Simpson  77895641228  D company

此外,如果需要,我们可以reindex而不是选择列来添加Mobile_phone列:

new_df = (
    df.set_index(['Full name', 'Company'])  # Columns to save
        .stack()  # go to long format
        .droplevel(-1)  # remove old column headers
        .reset_index(name='Work_phone')  # Restore Index and name new column
        .reindex(
            # re-order columns and add missing columns
            columns=['Full name', 'Work_phone', 'Mobile_phone', 'Company']
        )
)

new_df

         Full name   Work_phone  Mobile_phone    Company
0     Amanda Brown   1234567896           NaN  A company
1     Amanda Brown  77895641225           NaN  A company
2  Bert Sutherland   1234567897           NaN  B company
3  Charlie Chaplin   1234567898           NaN  C company
4    Derek Simpson   1234567899           NaN  D company
5    Derek Simpson  77895641228           NaN  D company

使用的设置:

import pandas as pd
from numpy import nan

df = pd.DataFrame({
    'Full name': ['Amanda Brown', 'Bert Sutherland', 'Charlie Chaplin',
                  'Derek Simpson'],
    'Work_phone': [1234567896, 1234567897, 1234567898, 1234567899],
    'Mobile_phone': ['77895641225', nan, nan, '77895641228'],
    'Company': ['A company', 'B company', 'C company', 'D company']
})

注意:如果 Mobile_phone 包含空字符串 ( '' ) 而不是NaN可能需要先用mask删除那些,否则stack不会自动删除不需要的行:

df['Mobile_phone'] = df['Mobile_phone'].mask(df['Mobile_phone'].eq(''))

TLDR

work_phone_df = df.drop("Mobile_phone", axis=1)
mobile_phone_df = df.drop("Work_phone", axis=1).dropna(subset=["Mobile_phone"]).rename(columns={"Mobile_phone": "Work_phone"})
new_df = pd.concat([work_phone_df, mobile_phone_df])
# if you need to sort your data and fix the index
new_df = new_df.sort_values(["Full name"]).reset_index(drop=True)

每个步骤说明:

首先,您可以获得一份 dataframe 的副本,其中包含每个人的姓名、公司和工作电话。

work_phone_df = df.drop("Mobile_phone", axis=1)

         Full name  Work_phone    Company
0     Amanda Brown  1234567896  A company
1  Bert Sutherland  1234567897  B company
2  Charlie Chaplin  1234567898  C company
3    Derek Simpson  1234567899  D company

然后,与所有拥有手机的人一起获取 dataframe 的副本,但将"Mobile_phone"列重命名为"Work_phone"

mobile_phone_df = df.drop("Work_phone", axis=1).dropna(subset=["Mobile_phone"]).rename(columns={"Mobile_phone": "Work_phone"})

       Full name   Work_phone    Company
0   Amanda Brown  77895641225  A company
3  Derek Simpson  77895641228  D company

现在,您可以将它们连接在一起。

new_df = pd.concat([work_phone_df, mobile_phone_df])

         Full name   Work_phone    Company
0     Amanda Brown   1234567896  A company
1  Bert Sutherland   1234567897  B company
2  Charlie Chaplin   1234567898  C company
3    Derek Simpson   1234567899  D company
0     Amanda Brown  77895641225  A company
3    Derek Simpson  77895641228  D company

我不确定您是否需要对此结果进行排序,但您可以使用 dataframe 对

new_df = new_df.sort_values(["Full name"])

         Full name   Work_phone    Company
0     Amanda Brown   1234567896  A company
0     Amanda Brown  77895641225  A company
1  Bert Sutherland   1234567897  B company
2  Charlie Chaplin   1234567898  C company
3    Derek Simpson   1234567899  D company
3    Derek Simpson  77895641228  D company

如果您需要重新编号索引,您可以执行类似的操作

new_df = new_df.reset_index(drop=True)

         Full name   Work_phone    Company
0     Amanda Brown   1234567896  A company
1     Amanda Brown  77895641225  A company
2  Bert Sutherland   1234567897  B company
3  Charlie Chaplin   1234567898  C company
4    Derek Simpson   1234567899  D company
5    Derek Simpson  77895641228  D company

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM