简体   繁体   English

如何根据第二个 Dataframe 值的条件替换 Dataframe 列值

[英]How to Replace Dataframe Column Values Based on Condition of Second Dataframe Values

When a match of replacements.csv > Link Changed > 'Yes' is found, I want to carry out the following:当找到匹配的replacements.csv > Link Changed > 'Yes' 时,我想执行以下操作:

  • match column replacements.csv > Fruit to main.csv > External Links匹配列替换。csv > Fruit to main.csv > External Links
  • replace matching fruits found in main.csv > External Links with replacements.csv > Fruit Link替换 main.csv > External Links 中找到的匹配水果。csv > Fruit Link

To demonstrate, I need the required output to be shown as below:为了演示,我需要将所需的 output 显示如下:

replacements.csv替代品。csv

Fruit,Fruit Link,Link Changed
banana,https://en.wikipedia.org/wiki/Banana,
blueberry,https://en.wikipedia.org/wiki/Blueberry,
strawberry,https://en.wikipedia.org/wiki/Strawberry,Yes
raspberry,https://en.wikipedia.org/wiki/Raspberry,Yes
cherry,https://en.wikipedia.org/wiki/Cherry,
apple,https://en.wikipedia.org/wiki/Apple,Yes  

main.csv main.csv

Title,External Links
Smoothie Recipes,"['banana', 'blueberry', 'strawberry', 'raspberry', 'apple']"
Fruit Pies,"['cherry', 'apple']"  

required output需要 output

Title,External Links
Smoothie Recipes,"['banana', 'blueberry', 'https://en.wikipedia.org/wiki/Strawberry', 'https://en.wikipedia.org/wiki/Raspberry', 'https://en.wikipedia.org/wiki/Apple']"
Fruit Pies,"['cherry', 'https://en.wikipedia.org/wiki/Apple']"  

Code代码

import pandas as pd

replacements = pd.read_csv('replacements.csv')
main = pd.read_csv('main.csv')

all_scrapes = []
fruits_found = []

## Replace main.csv > External Links when replacements.csv > Link Changed = Yes
def swap_urls(fruit, fruit_link):

    counter = 0

    while counter < len(main):
        title = main['Title'][counter]
        external_links = main['External Links'][counter]

        fruit_count = len(external_links.split(","))
        fruit_item_row = main['External Links'][counter].replace("'","").replace("[","").replace("]","").replace(" ","") # [0] represents main.csv row

        items = 0
        while items < fruit_count:
          single_fruit_list = fruit_item_row.split(',')[items]

          if fruit in single_fruit_list:
            print('Current Fruit Item:', single_fruit_list)
            external_links = external_links.replace(fruit, fruit_link)
            #fruits_found.append(fruit)

            product = {
              'Title': title,
              'External Link': external_links,
              #'Fruits Found': fruits_found,
              }

            print('  Product:', product)
            all_scrapes.append(product)
          else:
            pass

          items +=1

        counter +=1
    return all_scrapes


## Pass Fruit & Fruit Link values to function swap_urls when replacements.csv > Link Changed = Yes
y = 0
while y < len(replacements):
  fruit = replacements['Fruit'][y]
  fruit_link = replacements['Fruit Link'][y]
  link_changed = replacements['Link Changed'][y]

  if replacements['Link Changed'][y] == 'Yes':
      print(f'replacement.csv row [{y}]: {fruit}, Fruit Link: {fruit_link}, Link Changed: \x1b[92m{link_changed}\x1b[0m')
      swap_urls(fruit, fruit_link)
  else:
      print(f'replacement.csv row [{y}]: {fruit}, Fruit Link: {fruit_link}, Link Changed: No')
  y +=1


## Save results to File
df = pd.DataFrame(all_scrapes)
print('DF:\n', df)
df.to_excel('Result.xlsx', index=False)

Issue问题

I'm able to identify the fruits in replacements.csv with their counterparts in main.csv, however I'm unable to update main.csv > External Links as a single entry when multiple fruits are found.我能够识别替换中的水果。csv 与 main.csv 中的对应项,但是当找到多个水果时,我无法更新 main.csv > 外部链接作为单个条目。 See generated output file results.xlsx查看生成的 output 文件 results.xlsx

Any help would be much appreciated.任何帮助将非常感激。

import pandas as pd

replacements = pd.read_csv("replacements.csv")
main = pd.read_csv("main.csv")


# returns replacement link or fruit
def fruit_link(x):
    if x[1:-1] not in (replacements['Fruit'].values):
        return x[1:-1]
    return replacements.loc[replacements['Fruit'] == x[1:-1], 'Fruit Link'].values[0]\
    if replacements.loc[replacements['Fruit'] == x[1:-1], 'Link Changed'].values == 'Yes' else x[1:-1]


# split string of list to list
main["External Links"] = main["External Links"].apply(lambda x: x[1:-1].split(', '))

# explode main to fruits
main = main.explode("External Links")

# applying fruit_link to retrieve link or fruit
main["External Links"] = main["External Links"].apply(fruit_link)

# implode back
main = main.groupby('Title').agg({'External Links': lambda x: x.tolist()}).reset_index()

OUTPUT: OUTPUT:

              Title                                                                                                                                   External Links
0        Fruit Pies                                                                                                  ['cherry', https://en.wikipedia.org/wiki/Apple]
1  Smoothie Recipes  ['banana', 'blueberry', https://en.wikipedia.org/wiki/Strawberry, https://en.wikipedia.org/wiki/Raspberry, https://en.wikipedia.org/wiki/Apple]

Here is a relatively simple way to do this:这是一个相对简单的方法:

r = pd.read_csv('replacements.csv')
df = pd.read_csv('main.csv')

# make a proper list from the strings in 'External Links':
import ast

df['External Links'] = df['External Links'].apply(ast.literal_eval)

# make a dict for mapping
dct = r.dropna(subset=['Link Changed']).set_index('Fruit')['Fruit Link'].to_dict()
>>> dct
{'strawberry': 'https://en.wikipedia.org/wiki/Strawberry',
 'raspberry': 'https://en.wikipedia.org/wiki/Raspberry',
 'apple': 'https://en.wikipedia.org/wiki/Apple'}

# map, leaving the key by default
df['External Links'] = (
    df['External Links'].explode().map(lambda k: dct.get(k, k))
    .groupby(level=0).apply(pd.Series.tolist)
)

# result
>>> df
              Title                                     External Links
0  Smoothie Recipes  [banana, blueberry, https://en.wikipedia.org/w...
1        Fruit Pies      [cherry, https://en.wikipedia.org/wiki/Apple]

# result, as csv (to show quotation marks etc.)
>>> df.to_csv(index=False)
Title,External Links
Smoothie Recipes,"['banana', 'blueberry', 'https://en.wikipedia.org/wiki/Strawberry', 'https://en.wikipedia.org/wiki/Raspberry', 'https://en.wikipedia.org/wiki/Apple']"
Fruit Pies,"['cherry', 'https://en.wikipedia.org/wiki/Apple']"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据具有一系列值的条件替换 pd 数据框列中的值? - How to Replace values in a pd dataframe column based on a condition with a range of values? 根据条件替换数据框列中的值 - Replace values in a dataframe column based on condition 根据条件替换列中的值,然后返回数据框 - Replace values in column based on condition, then return dataframe pandas:如果该值在第二个 dataframe 中,则根据另一个 dataframe 中的条件替换列中的值 - pandas: replace values in a column based on a condition in another dataframe if that value is in the second dataframe 如何在选择不同列的条件下用第二个 DataFrame 中的值替换 DataFrame 中的值? - How to replace values in DataFrame with values from second DataFrame with condition that it selects different column? 如何根据条件用NaN替换数据框列值? - How to replace a dataframe column values with NaN based on a condition? 根据条件从另一个 dataframe 值替换列的值 - Python - Replace values of a column from another dataframe values based on a condition - Python 将python pandas df替换为基于条件的第二个数据帧的值 - Replace python pandas df with values of a second dataframe based with condition 如何使用基于条件的值将 append 列到 dataframe - How to append a column to a dataframe with values based on condition 根据条件用不同的替换字典替换熊猫数据框列中的值 - Replace values in pandas dataframe column with different replacement dict based on condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM