简体   繁体   中英

Append columns from excel file to csv file based on if statement

I have two files:

  • One with 'filename' and value_count columns ( ValueCounts.csv )
  • Another with 'filename' and 'latitude' and 'longitude' columns ( GeoData.xlsx )

I have started by creating dataframes for each file and the specific columns within that I intend on using. My code for this is as follows:

Xeno_values = pd.read_csv(r'C:\file_path\ValueCounts.csv')
img_coords = pd.read_excel(r'C:\file_path\GeoData.xlsx')

df_values = pd.DataFrame(Xeno_values, columns = ['A','B'])
df_coords = pd.DataFrame(img_coords, columns = ['L','M','W'])

However when I print() each dataframe all the column values are returned as 'NaN'.

How do I correct this? And then write and if statement that iterates over the data and says:

if 'filename' ( col 'A' ) in df_values == 'filename' ( col 'W' ) in df_coords , append 'latitude' ( col 'L' ) and 'longitude' ( col 'M' ) to df_values

If any clarification is needed please do ask.

Thanks, R

Check out the documentation for pandas read_csv and read_excel ( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html ). These functions already return the data in a dataframe. Your code is trying to create a dataframe using a dataframe, which is fine if you don't specify columns, but will return all NaN values if you do.

So if you want to load the dataframes:

df_values = pd.read_csv(r'C:\file_path\ValueCounts.csv')
df_coords = pd.read_excel(r'C:\file_path\GeoData.xlsx')

Will do the trick. And if you just want specific columns:

df_values = pd.read_csv(r'C:\file_path\ValueCounts.csv', usecols=['A','B'])
df_coords = pd.read_excel(r'C:\file_path\GeoData.xlsx', usecols=['L','M','W'])

Make sure that those column names do actually exist in your csv files

If you want to rename columns (make sure you're doing all columns here):

df_values.columns = ['Filename', 'Date'] 

For adding lat/long to df_values you could try:

df = pd.merge(df_values, df_coords[['filename', 'LAT', 'LONG']], on='filename', how='inner')

Which assumes that there are columns 'filename' in both the values and coords dataframes, and that the coords dataframes has columns 'LAT' and 'LONG' in it.

Lastly, do out a tutorial on pandas ( https://www.tutorialspoint.com/python_pandas/index.htm ). Becoming more familiar with it will help you wrangle data better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM