I appreciate the help in advance!
The question may seem weird at first so let me illustrate what I am trying to accomplish:
I have this df of cities and abbreviations:
I need to add another column called 'Queries' and those queries are on a list as follows:
queries = ['Document Management','Document Imaging','Imaging Services']
The trick though is that I need to duplicate my df rows for each query in the list. For instance, for row 0 I have PHOENIX, AZ
. I now need 3 rows saying PHOENIX, AZ, 'query[n]'
.
Something that would look like this:
Of course I created that manually but I need to scale it for a large number of cities and a large list of queries.
This sounds simple but I've been trying for some hours now I don't see how to engineer any code for it. Again, thanks for the help!
Here is one way, using .explode()
:
import pandas as pd
df = pd.DataFrame({'City_Name': ['Phoenix', 'Tucson', 'Mesa', 'Los Angeles'],
'State': ['AZ', 'AZ', 'AZ', 'CA']})
# 'Query' is a column of tuples
df['Query'] = [('Doc Mgmt', 'Imaging', 'Services')] * len(df.index)
# ... and explode 'unpacks' the tuples, putting one item on each line
df = df.explode('Query')
print(df)
City_Name State Query
0 Phoenix AZ Doc Mgmt
0 Phoenix AZ Imaging
0 Phoenix AZ Services
1 Tucson AZ Doc Mgmt
1 Tucson AZ Imaging
1 Tucson AZ Services
2 Mesa AZ Doc Mgmt
2 Mesa AZ Imaging
2 Mesa AZ Services
3 Los Angeles CA Doc Mgmt
3 Los Angeles CA Imaging
3 Los Angeles CA Services
new to python myself, but I would get around it by creating n (n=# of unique query values) identical data frames without "Query". Then for each of the data frame, create a new column with one of the "Query" values. Finally, stack all data frames together using append
. A short example:
adf1 = pd.DataFrame([['city1','sate1'],['city2','state2']])
adf2 = adf1
adf1['query'] = 'doc management'
adf2['query'] = 'doc imaging'
df = adf1.append(adf2)
Another method if there are many types of queries. Creating a dummy column, say 'key', in both the original data frame and the query data frame, and merge the two on 'key'.
adf = pd.DataFrame([['city1','state1'],['city2','state2']])
q = pd.DataFrame([['doc management'],['doc imaging']])
adf['key'] = 'key'
q['key'] = 'key'
df = pd.merge(adf, q, on='key', how='outer')
More advanced users should have better ways. This is a temporary solution if you are in a hurry.
You should definitely go with jsmart's answer , but posting this as an exercise.
This can also be achieved by exporting the original cities/towns dataframe ( df
) to a list or records, manually duplicating each one for each query then reconstructing the final dataframe.
The entire thing can fit in a single line, and is even relatively readable if you can follow what's going on;)
pd.DataFrame([{**record, 'query': query}
for query in queries
for record in df.to_dict(orient='records')])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.