![](/img/trans.png)
[英]Copy a row value to another column based on condition using Python pandas
[英]Pandas Python Copy row based column value
我正在处理来自 csv 文件的数据。 我使用 pd.read.csv 阅读了 dataframe。 如果条目在“Mobile_phone”列中有值,我想复制该行并将“Mobile_phone”值放在“Work_phone”列中。
这是我开始的数据 -
Full name Work_phone Mobile_phone Company
1 Amanda Brown 1234567896 77895641225 A company
2 Bert Sutherland 1234567897 B company
3 Charlie Chaplin 1234567898 C company
4 Derek Simpson 1234567899 77895641228 D company
这是我要返回的数据。 因此消除了对“Mobile_phone”数据的需求,这样我就可以与另一个数据集进行匹配 -
Full name Work_phone Mobile_phone Company
1 Amanda Brown 1234567896 A company
2 Amanda Brown 77895641225 A company
3 Bert Sutherland 1234567897 B company
4 Charlie Chaplin 1234567898 C company
5 Derek Simpson 1234567899 D company
6 Derek Simpson 77895641228 D company
我们可以使用set_index
+ stack
进行从宽格式到长格式的重塑。 然后通过droplevel
清理旧的列标题, reset_index
以恢复 RangeIndex 并再次制作 DataFrame,然后重新排序列:
new_df = (
df.set_index(['Full name', 'Company']) # Columns to save
.stack() # go to long format
.droplevel(-1) # remove old column headers
.reset_index(name='Work_phone') # Restore Index and name new column
[['Full name', 'Work_phone', 'Company']] # re-order columns
)
new_df
:
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Amanda Brown 77895641225 A company
2 Bert Sutherland 1234567897 B company
3 Charlie Chaplin 1234567898 C company
4 Derek Simpson 1234567899 D company
5 Derek Simpson 77895641228 D company
此外,如果需要,我们可以reindex
而不是选择列来添加Mobile_phone
列:
new_df = (
df.set_index(['Full name', 'Company']) # Columns to save
.stack() # go to long format
.droplevel(-1) # remove old column headers
.reset_index(name='Work_phone') # Restore Index and name new column
.reindex(
# re-order columns and add missing columns
columns=['Full name', 'Work_phone', 'Mobile_phone', 'Company']
)
)
new_df
:
Full name Work_phone Mobile_phone Company
0 Amanda Brown 1234567896 NaN A company
1 Amanda Brown 77895641225 NaN A company
2 Bert Sutherland 1234567897 NaN B company
3 Charlie Chaplin 1234567898 NaN C company
4 Derek Simpson 1234567899 NaN D company
5 Derek Simpson 77895641228 NaN D company
使用的设置:
import pandas as pd
from numpy import nan
df = pd.DataFrame({
'Full name': ['Amanda Brown', 'Bert Sutherland', 'Charlie Chaplin',
'Derek Simpson'],
'Work_phone': [1234567896, 1234567897, 1234567898, 1234567899],
'Mobile_phone': ['77895641225', nan, nan, '77895641228'],
'Company': ['A company', 'B company', 'C company', 'D company']
})
注意:如果 Mobile_phone 包含空字符串 ( ''
) 而不是NaN
可能需要先用mask
删除那些,否则stack
不会自动删除不需要的行:
df['Mobile_phone'] = df['Mobile_phone'].mask(df['Mobile_phone'].eq(''))
TLDR
work_phone_df = df.drop("Mobile_phone", axis=1)
mobile_phone_df = df.drop("Work_phone", axis=1).dropna(subset=["Mobile_phone"]).rename(columns={"Mobile_phone": "Work_phone"})
new_df = pd.concat([work_phone_df, mobile_phone_df])
# if you need to sort your data and fix the index
new_df = new_df.sort_values(["Full name"]).reset_index(drop=True)
每个步骤说明:
首先,您可以获得一份 dataframe 的副本,其中包含每个人的姓名、公司和工作电话。
work_phone_df = df.drop("Mobile_phone", axis=1)
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Bert Sutherland 1234567897 B company
2 Charlie Chaplin 1234567898 C company
3 Derek Simpson 1234567899 D company
然后,与所有拥有手机的人一起获取 dataframe 的副本,但将"Mobile_phone"
列重命名为"Work_phone"
。
mobile_phone_df = df.drop("Work_phone", axis=1).dropna(subset=["Mobile_phone"]).rename(columns={"Mobile_phone": "Work_phone"})
Full name Work_phone Company
0 Amanda Brown 77895641225 A company
3 Derek Simpson 77895641228 D company
现在,您可以将它们连接在一起。
new_df = pd.concat([work_phone_df, mobile_phone_df])
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Bert Sutherland 1234567897 B company
2 Charlie Chaplin 1234567898 C company
3 Derek Simpson 1234567899 D company
0 Amanda Brown 77895641225 A company
3 Derek Simpson 77895641228 D company
我不确定您是否需要对此结果进行排序,但您可以使用 dataframe 对
new_df = new_df.sort_values(["Full name"])
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
0 Amanda Brown 77895641225 A company
1 Bert Sutherland 1234567897 B company
2 Charlie Chaplin 1234567898 C company
3 Derek Simpson 1234567899 D company
3 Derek Simpson 77895641228 D company
如果您需要重新编号索引,您可以执行类似的操作
new_df = new_df.reset_index(drop=True)
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Amanda Brown 77895641225 A company
2 Bert Sutherland 1234567897 B company
3 Charlie Chaplin 1234567898 C company
4 Derek Simpson 1234567899 D company
5 Derek Simpson 77895641228 D company
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.