[英]Create new rows in a Pandas Dataframe based on a column from another pandas dataframe
I have a dataframe DF1 which looks like this:我有一个数据框 DF1,它看起来像这样:
Account Name![]() |
Task Type![]() |
Flag![]() |
Cost![]() |
---|---|---|---|
Account 1![]() |
Repair![]() |
True![]() |
$100 ![]() |
Account 2![]() |
Repair![]() |
True![]() |
$200 ![]() |
Account 3![]() |
Repair![]() |
False![]() |
$300 ![]() |
DF2 looks like this: DF2 看起来像这样:
Country![]() |
Percentage![]() |
---|---|
US![]() |
30% ![]() |
Canada![]() |
20% ![]() |
India![]() |
50% ![]() |
I want to create DF3 based on DF1 & DF2 by doing the following:我想通过执行以下操作基于 DF1 和 DF2 创建 DF3:
The Final output would look like this:最终输出如下所示:
Account Name![]() |
Task Type![]() |
Flag![]() |
Cost![]() |
Country![]() |
Calculated_Cost![]() |
---|---|---|---|---|---|
Account 1![]() |
Repair![]() |
True![]() |
$100 ![]() |
US![]() |
$30 ![]() |
Account 1![]() |
Repair![]() |
True![]() |
$100 ![]() |
Canada![]() |
$20 ![]() |
Account 1![]() |
Repair![]() |
True![]() |
$100 ![]() |
India![]() |
$50 ![]() |
Account 2![]() |
Repair![]() |
True![]() |
$200 ![]() |
US![]() |
$60 ![]() |
Account 2![]() |
Repair![]() |
True![]() |
$200 ![]() |
Canada![]() |
$40 ![]() |
Account 2![]() |
Repair![]() |
True![]() |
$200 ![]() |
India![]() |
$100 ![]() |
Account 3![]() |
Repair![]() |
False![]() |
$300 ![]() |
Nan![]() |
Nan![]() |
Use:利用:
df1['Cost'] = df1['Cost'].str.lstrip('$').astype(int)
df2['Percentage'] = df2['Percentage'].str.rstrip('%').astype(int).div(100)
df = pd.concat([df1[df1['Flag']].merge(df2, how='cross'), df1[~df1['Flag']]])
df['Calculated_Cost'] = df['Cost'].mul(df.pop('Percentage'))
print (df)
Account Name Task Type Flag Cost Country Calculated_Cost
0 Account 1 Repair True 100 US 30.0
1 Account 1 Repair True 100 Canada 20.0
2 Account 1 Repair True 100 India 50.0
3 Account 2 Repair True 200 US 60.0
4 Account 2 Repair True 200 Canada 40.0
5 Account 2 Repair True 200 India 100.0
2 Account 3 Repair False 300 NaN NaN
I am sure there is a more efficient way to do this... but I got it done using the following code:我确信有一种更有效的方法可以做到这一点......但我使用以下代码完成了它:
import pandas as pd
df1 = pd.DataFrame(
{
'Account Name': ['Account 1', 'Account 2', 'Account 3'],
'Task Type': ['Repair', 'Repair', 'Repair'],
'Flag': ['True', 'True', 'False'],
'Cost': ['$100', '$200', '$300']
}
)
df2 = pd.DataFrame(
{
'Country': ['US', 'Canada', 'India'],
'Percentage': ['30%', '20%', '50%']
}
)
df1['Cost'] = df1['Cost'].str.lstrip('$').astype(int)
df2['Percentage'] = df2['Percentage'].str.rstrip('%').astype(int).div(100)
filtered_df_true = df1.loc[df1['Flag'] == 'True']
filtered_df_false = df1.loc[df1['Flag'] == 'False']
df3 = filtered_df_true.assign(key=1).merge(df2.assign(key=1), how = 'outer', on='key')
df3['Calculated Cost'] = df3['Cost']*df3['Percentage']
frames = [df3, filtered_df_false]
result = pd.concat(frames)
result.pop('key')
result.pop('Percentage')
print(result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.