So I'm new to Python and I'm trying to use Pandas to make a new dataframe using values from two existing ones. Basically using these dataframes:
df1= AB a '1' '3' b '4' '3' c '3' '2' d '9' '1'
df2= CD a '5' '1' b '2' '0' c '4' '2' d '1' '9'
I need to create a loop that will compare the value of each line in df1[A] to the value of each line df2[C]. If the values are equal, I need to join df1[A, B] and df2[C] and push that line to a third dataframe. So the result should look like this for the examples above:
dfnew= ABD a '1' '3' '9' b '4' '3' '2'
Since not all the values I'm working with will be integers I also need to treat the values as strings.
I've been checking out other similar questions but none of the answers seem to get me exactly what I need done.
I think you need merge
with default inner join
and drop
:
df = pd.merge(df1, df2, left_on='A', right_on='C').drop('C', axis=1)
Another solution with rename
column for join:
df = pd.merge(df1, df2.rename(columns={'C':'A'}), on='A')
print (df)
A B D
0 '1' '3' '9'
1 '4' '3' '2'
Notice:
Values in joined columns has to be unique.
You can also use pd.Series.map
df1.assign(D=df1.A.map(dict(zip(df2.C, df2.D)))).dropna()
A B D
a '1' '3' '9'
b '4' '3' '2'
Details
With just the map
and assign
we are left with rows that we need to drop.
df1.assign(D=df1.A.map(df2.set_index('C').D))
A B D
a '1' '3' '9'
b '4' '3' '2'
c '3' '2' NaN
d '9' '1' NaN
I decided to drop them with a simple dropna
. To be more precise, we probably should restrict the dropna
to the D
column.
df1.assign(D=df1.A.map(df2.set_index('C').D)).dropna(subset=['D'])
A B D
a '1' '3' '9'
b '4' '3' '2'
We could use other ways as well. But then that wasn't really what this question was about.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.