Pandas provides the read_fwf() function to efficiently read data from fixed-width formatted files. These files, unlike comma-separated value (CSV) files, organize data by assigning a specific number of characters to each column. This consistent width allows for structured data storage without delimiters.
The core function for reading these files is pandas.read_fwf(). A critical parameter is filepath_or_buffer, which specifies the path to your fixed-width file. Equally important is colspecs, which defines the starting and ending positions of each column. You can provide a list of tuples, where each tuple represents a column’s start and end indices. Alternatively, you can use ‘infer’, allowing Pandas to attempt to deduce column widths from the file’s content. If you prefer, widths can be used to specify the width of each column, which is more convenient when the columns are contiguous. The delimiter parameter can be used to define filler characters, if the file uses characters other than spaces. The dtype parameter works the same as with other pandas read functions, and allows you to specify the datatypes of the columns.
Here’s a basic example demonstrating how to use read_fwf():
import pandas as pd # Assume 'data.txt' is a fixed-width file # Example data.txt content: # 12345ABCDE67890 # 98765ZYXWV54321 # Define column specifications colspecs = [(0, 5), (5, 10), (10, 15)] # Read the file df = pd.read_fwf('data.txt', colspecs=colspecs, names=['Col1', 'Col2', 'Col3']) print(df)
In this example, colspecs defines three columns, each with a specific range of characters, and names assigns column names to the resulting DataFrame.
Accurate colspecs are essential, as incorrect specifications will lead to misaligned data. Real-world fixed-width files can have variations, so you might need to adjust colspecs or use other parameters to handle these. After reading the data, you might need to perform data cleaning to remove extra spaces or other unwanted characters.
Pandas itself does not have a direct function that writes to a fixed width file. However, you can achieve this by using string formatting in combination with the pandas to_string() function, or by using python’s basic file writing capabilities.
Here is a basic example of how to do this.
import pandas as pd data = {'col1': [1, 100, 10000], 'col2': ['A', 'BB', 'CCC']} df = pd.DataFrame(data) with open('output.txt', 'w') as f: for index, row in df.iterrows(): line = '{:<10}{:<10}\n'.format(row['col1'], row['col2']) f.write(line)
In this example, the ‘{:<10}’ string formatting syntax left aligns the data in a 10 character width space.