简体   繁体   中英

How to replace elements in Pandas Dataframes with list values based on specific conditions?

I have a CSV file which contains 2 columns, Query and Description. This is the example description of the file:-

| Query                                        | Description |
| --------                                     | -------------- |
| What is the type of \<mach-name> machine>    |  \<mach-name> is ...       |
| What is the use of \<mach-name> machine>     |  The use of \<mach-name> is ...         |
| How long it takes to rain in \<state-name>   | It rains for ... hours in \<state-name>          |
| What is the best restaurant in \<state-name> | \<state-name>'s best food is in ...         |
|
...
etc.

Each of the query column and description column has unique strings like these. Suppose the CSV file is read via Pandas into a dataframe df . The goal is to replace the \<> type elements such as \<mach-name> etc. based on specific conditions.

These replacements need to be made by replacing the tags <> with the corresponding list elements.

mach_name = ["Drilling", "ABC", XYZ".... etc.]
state_name = ["New York", "London", "Delhi"... etc.]

Example: if(\<mach-name>) appears in the Query and Description Columns for any row, replace the tags by corresponding elements in the mach_name list. So, eg if the mach_name list has 10 elements, more such sentences need to be appended to the dataframe df . The expected output would be like this:

| Query                                   | Description |
| --------                                | -------------- |
| What is the type of Drilling machine.   |  Drilling is ...        |
| What is the type of ABC machine.        |  ABC is ...        |
| What is the type of XYZ machine.        |  XYZ is ...      |
| What is the use of Drilling machine     |  The use of Drilling is ...        |
| What is the use of ABC machine          |  The use of ABC is ...       |
| What is the use of XYZ machine.         |  The use of XYZ is ...       |
| How long it takes to rain in New York   | It rains for ... hours in New York          |
| How long it takes to rain in London     | It rains for ... hours in London          |
| How long it takes to rain in Delhi      | It rains for ... hours in Delhi          |

| What is the best restaurant in New York | New York's best food is in ...         |
| What is the best restaurant in London   | London's best food is in ...         |
| What is the best restaurant in Delhi    |Delhi's best food is in ...         |
|

.... etc.

I was hoping to perform a simple Python replacement using str.replace() for instance, but it would potentially involve a for loop for iterating over the Pandas dataframe, and SO answers recommend not iterating over the dataframe but I couldn't find a clear way to replace values based on such conditions while also appending new rows based on the list elements. Any help/guidance is appreciated. Thanks.

This will be easier if you read the raw csv, process it and then convert the result to pandas dataframe, but if you need to read the dataframe before, this could be an option:

data=[ {"query": "What is the type of \<mach-name> machine>", "description": "\<mach-name> is ..."},
      {"query": "What is the use of \<mach-name> machine>", "description": "The use of \<mach-name> is ..."},
      {"query": "How long it takes to rain in \<state-name>", "description": "It rains for ... hours in \<state-name>"}]
      
df = pd.DataFrame(data)

#mark rows that should that satisfy the conditions
df["replace_mach"] = df['query'].str.contains('\<mach-name>') &\ 
                     df['description'].str.contains('\<mach-name>')
df["replace_state"] = df['query'].str.contains('\<state-name>') &\ 
                      df['description'].str.contains('\<state-name>')


dfs_list = []
mach_name = ["Drilling", "ABC", "XYZ"]
state_name = ["New York", "London", "Delhi"]


for n in mach_name:
    aux = df[df["replace_mach"]].copy()
    aux["query"] = aux["query"].str.replace(r"\\<mach-name>",n)
    aux["description"] = aux["description"].str.replace(r"\\<mach-name>",n)
    dfs_list.append(aux)
    
for n in state_name:
    aux = df[df["replace_state"]].copy()
    aux["query"] = aux["query"].str.replace(r"\\<state-name>",n)
    aux["description"] = aux["description"].str.replace(r"\\<state-name>",n)
    dfs_list.append(aux)
    
# add records without wild cards to dataframe
dfs_list.append(df[~((df["replace_mach"])|(df["replace_state"]))]

replaced_df = pd.concat(dfs_list)
replaced_df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM